Benchmarking NAMD workflows for GPU containers: Difference between revisions
Jump to navigation
Jump to search
(Created page with "<span id="benchmarking-namd2-workloads-for-gpu-containers-on-ccast"></span> == Benchmarking NAMD2 workloads for GPU containers on CCAST == <span id="stephen-szwiec"></span> Stephen Szwiec Rasulev Computational Chemistry Research Group North Dakota State University 26 July 2022 ----- <span id="apoa1-information"></span> ==== ApoA1 information ==== * simulates a bloodstream lipoprotein particle * '''physical stats''' ** 92224 atoms ** 70660 bonds ** 74136 angles ** 741...") |
(created and edited page) |
||
Line 2: | Line 2: | ||
== Benchmarking NAMD2 workloads for GPU containers on CCAST == | == Benchmarking NAMD2 workloads for GPU containers on CCAST == | ||
Stephen Szwiec | Stephen Szwiec | ||
Rasulev Computational Chemistry Research Group | Rasulev Computational Chemistry Research Group | ||
North Dakota State University | North Dakota State University | ||
26 July 2022 | 26 July 2022 | ||
----- | ----- | ||
==== ApoA1 information ==== | ==== ApoA1 information ==== | ||
Line 121: | Line 28: | ||
** GPU workload modified to use CUDA FFTW, CUDA integration | ** GPU workload modified to use CUDA FFTW, CUDA integration | ||
** GPU workload modified to continue simulation for 10000 steps | ** GPU workload modified to continue simulation for 10000 steps | ||
==== NAMD non-gpu run information ==== | ==== NAMD non-gpu run information ==== | ||
Line 129: | Line 34: | ||
* 20 processes, 20 cores, 1 physical node, and 16GB memory specified | * 20 processes, 20 cores, 1 physical node, and 16GB memory specified | ||
* NAMD 2.14 used | * NAMD 2.14 used | ||
==== NAMD gpu run information ==== | ==== NAMD gpu run information ==== | ||
Line 139: | Line 42: | ||
** binding each GPU to one Nvidia A10 with 22731MB memory | ** binding each GPU to one Nvidia A10 with 22731MB memory | ||
* NAMD 3.0alpha11 | * NAMD 3.0alpha11 | ||
----- | ----- | ||
=== Benchmark findings === | === Benchmark findings === | ||
==== Startup Wall Time ==== | ==== Startup Wall Time ==== | ||
'''CPU''' 3.4346 s '''GPU''' 0.0963 s | '''CPU''' 3.4346 s '''GPU''' 0.0963 s | ||
==== Simulation Wall Time ==== | ==== Simulation Wall Time ==== | ||
'''CPU''' 38.6991 s '''GPU''' 5.59571 s | '''CPU''' 38.6991 s '''GPU''' 5.59571 s | ||
==== Wall Time Per Step ==== | ==== Wall Time Per Step ==== | ||
'''CPU''' 0.0918308 s/step '''GPU''' 0.0110155 s/step | '''CPU''' 0.0918308 s/step '''GPU''' 0.0110155 s/step | ||
==== Days Per Nanosecond Simulation ==== | ==== Days Per Nanosecond Simulation ==== | ||
'''CPU''' 0.999022 days/ns '''GPU''' 0.0660301 days/ns | '''CPU''' 0.999022 days/ns '''GPU''' 0.0660301 days/ns | ||
----- | ----- | ||
==== Additional information ==== | ==== Additional information ==== | ||
Latest revision as of 18:22, 21 October 2022
Benchmarking NAMD2 workloads for GPU containers on CCAST
Stephen Szwiec
Rasulev Computational Chemistry Research Group
North Dakota State University
26 July 2022
ApoA1 information
- simulates a bloodstream lipoprotein particle
- physical stats
- 92224 atoms
- 70660 bonds
- 74136 angles
- 74130 diheadrals
- 1402 impropers
- 32992 hydrogen groups
- 553785 total amu of mass
- energy stats
- 300K initial temp.
- -14 e total charge
- simulation stats
- Consists of a startup process followed by 500 steps of simulation for benchmark time
- GPU workload modified to use CUDA FFTW, CUDA integration
- GPU workload modified to continue simulation for 10000 steps
NAMD non-gpu run information
- Charmrun used with one node (condo02)
- machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
- 20 processes, 20 cores, 1 physical node, and 16GB memory specified
- NAMD 2.14 used
NAMD gpu run information
- NAMD used within Singularity container system
- namd:3.0-alpha11 image used to generate singularity container
- machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
- 2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified
- binding each GPU to one Nvidia A10 with 22731MB memory
- NAMD 3.0alpha11
Benchmark findings
Startup Wall Time
CPU 3.4346 s GPU 0.0963 s
Simulation Wall Time
CPU 38.6991 s GPU 5.59571 s
Wall Time Per Step
CPU 0.0918308 s/step GPU 0.0110155 s/step
Days Per Nanosecond Simulation
CPU 0.999022 days/ns GPU 0.0660301 days/ns
Additional information
- Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.
- per core speedup with GPU acceleration is ~151.297 times
- caveat: one CPU must supervise and await output from each GPU because of how NAMD was written
- CUDA performed 10000steps and wrote output in 136.315155s total wall time
- CPU performed 500steps and wrote output in 76.477325s total wall time