Benchmarking NAMD workflows for GPU containers

From Rasulev Lab Wiki

Jump to navigation Jump to search

Benchmarking NAMD2 workloads for GPU containers on CCAST

Stephen Szwiec

Rasulev Computational Chemistry Research Group

North Dakota State University

26 July 2022

ApoA1 information

simulates a bloodstream lipoprotein particle
physical stats
- 92224 atoms
- 70660 bonds
- 74136 angles
- 74130 diheadrals
- 1402 impropers
- 32992 hydrogen groups
- 553785 total amu of mass
energy stats
- 300K initial temp.
- -14 e total charge
simulation stats
- Consists of a startup process followed by 500 steps of simulation for benchmark time
- GPU workload modified to use CUDA FFTW, CUDA integration
- GPU workload modified to continue simulation for 10000 steps

NAMD non-gpu run information

Charmrun used with one node (condo02)
machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
20 processes, 20 cores, 1 physical node, and 16GB memory specified
NAMD 2.14 used

NAMD gpu run information

NAMD used within Singularity container system
- namd:3.0-alpha11 image used to generate singularity container
machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified
- binding each GPU to one Nvidia A10 with 22731MB memory
NAMD 3.0alpha11

Benchmark findings

Startup Wall Time

CPU 3.4346 s GPU 0.0963 s

Simulation Wall Time

CPU 38.6991 s GPU 5.59571 s

Wall Time Per Step

CPU 0.0918308 s/step GPU 0.0110155 s/step

Days Per Nanosecond Simulation

CPU 0.999022 days/ns GPU 0.0660301 days/ns

Additional information

Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.
per core speedup with GPU acceleration is ~151.297 times
- caveat: one CPU must supervise and await output from each GPU because of how NAMD was written
CUDA performed 10000steps and wrote output in 136.315155s total wall time
CPU performed 500steps and wrote output in 76.477325s total wall time

Retrieved from "https://wiki.nanobiodata.org/index.php?title=Benchmarking_NAMD_workflows_for_GPU_containers&oldid=137"