Benchmarking NAMD workflows for GPU containers

From Rasulev Lab Wiki
Revision as of 18:22, 21 October 2022 by Sysadmin (talk | contribs) (created and edited page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Benchmarking NAMD2 workloads for GPU containers on CCAST

Stephen Szwiec

Rasulev Computational Chemistry Research Group

North Dakota State University

26 July 2022


ApoA1 information

  • simulates a bloodstream lipoprotein particle
  • physical stats
    • 92224 atoms
    • 70660 bonds
    • 74136 angles
    • 74130 diheadrals
    • 1402 impropers
    • 32992 hydrogen groups
    • 553785 total amu of mass
  • energy stats
    • 300K initial temp.
    • -14 e total charge
  • simulation stats
    • Consists of a startup process followed by 500 steps of simulation for benchmark time
    • GPU workload modified to use CUDA FFTW, CUDA integration
    • GPU workload modified to continue simulation for 10000 steps

NAMD non-gpu run information

  • Charmrun used with one node (condo02)
  • machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
  • 20 processes, 20 cores, 1 physical node, and 16GB memory specified
  • NAMD 2.14 used

NAMD gpu run information

  • NAMD used within Singularity container system
    • namd:3.0-alpha11 image used to generate singularity container
  • machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
  • 2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified
    • binding each GPU to one Nvidia A10 with 22731MB memory
  • NAMD 3.0alpha11

Benchmark findings

Startup Wall Time

CPU 3.4346 s GPU 0.0963 s

Simulation Wall Time

CPU 38.6991 s GPU 5.59571 s

Wall Time Per Step

CPU 0.0918308 s/step GPU 0.0110155 s/step

Days Per Nanosecond Simulation

CPU 0.999022 days/ns GPU 0.0660301 days/ns


Additional information

  • Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.
  • per core speedup with GPU acceleration is ~151.297 times
    • caveat: one CPU must supervise and await output from each GPU because of how NAMD was written
  • CUDA performed 10000steps and wrote output in 136.315155s total wall time
  • CPU performed 500steps and wrote output in 76.477325s total wall time