Benchmarking NAMD workflows for GPU containers: Difference between revisions

Latest revision as of 18:22, 21 October 2022

Benchmarking NAMD2 workloads for GPU containers on CCAST

Stephen Szwiec

Rasulev Computational Chemistry Research Group

North Dakota State University

26 July 2022

ApoA1 information

simulates a bloodstream lipoprotein particle
physical stats
- 92224 atoms
- 70660 bonds
- 74136 angles
- 74130 diheadrals
- 1402 impropers
- 32992 hydrogen groups
- 553785 total amu of mass
energy stats
- 300K initial temp.
- -14 e total charge
simulation stats
- Consists of a startup process followed by 500 steps of simulation for benchmark time
- GPU workload modified to use CUDA FFTW, CUDA integration
- GPU workload modified to continue simulation for 10000 steps

NAMD non-gpu run information

Charmrun used with one node (condo02)
machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
20 processes, 20 cores, 1 physical node, and 16GB memory specified
NAMD 2.14 used

NAMD gpu run information

NAMD used within Singularity container system
- namd:3.0-alpha11 image used to generate singularity container
machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified
- binding each GPU to one Nvidia A10 with 22731MB memory
NAMD 3.0alpha11

Benchmark findings

Startup Wall Time

CPU 3.4346 s GPU 0.0963 s

Simulation Wall Time

CPU 38.6991 s GPU 5.59571 s

Wall Time Per Step

CPU 0.0918308 s/step GPU 0.0110155 s/step

Days Per Nanosecond Simulation

CPU 0.999022 days/ns GPU 0.0660301 days/ns

Additional information

Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.
per core speedup with GPU acceleration is ~151.297 times
- caveat: one CPU must supervise and await output from each GPU because of how NAMD was written
CUDA performed 10000steps and wrote output in 136.315155s total wall time
CPU performed 500steps and wrote output in 76.477325s total wall time

@@ Line 2: / Line 2: @@
 == Benchmarking NAMD2 workloads for GPU containers on CCAST ==
-<span id="stephen-szwiec"></span>
 Stephen Szwiec
 Rasulev Computational Chemistry Research Group
 North Dakota State University
 July 2022
 -----
-<span id="apoa1-information"></span>
-==== ApoA1 information ====
-* simulates a bloodstream lipoprotein particle
-* '''physical stats'''
-** 92224 atoms
-** 70660 bonds
-** 74136 angles
-** 74130 diheadrals
-** 1402 impropers
-** 32992 hydrogen groups
-** 553785 total amu of mass
-* '''energy stats'''
-** 300K initial temp.
-** -14 e total charge
-* '''simulation stats'''
-** Consists of a startup process followed by 500 steps of simulation for benchmark time
-** GPU workload modified to use CUDA FFTW, CUDA integration
-** GPU workload modified to continue simulation for 10000 steps
-<span id="namd-non-gpu-run-information"></span>
-==== NAMD non-gpu run information ====
-* Charmrun used with one node (condo02)
-* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
-* 20 processes, 20 cores, 1 physical node, and 16GB memory specified
-* NAMD 2.14 used
-<span id="namd-gpu-run-information"></span>
-==== NAMD gpu run information ====
-* NAMD used within Singularity container system
-** namd:3.0-alpha11 image used to generate singularity container
-* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
-* 2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified
-** binding each GPU to one Nvidia A10 with 22731MB memory
-* NAMD 3.0alpha11
------
-<span id="benchmark-findings"></span>
-=== Benchmark findings ===
-<span id="startup-wall-time"></span>
-==== Startup Wall Time ====
-'''CPU''' 3.4346 s '''GPU''' 0.0963 s
-<span id="simulation-wall-time"></span>
-==== Simulation Wall Time ====
-'''CPU''' 38.6991 s '''GPU''' 5.59571 s
-<span id="wall-time-per-step"></span>
-==== Wall Time Per Step ====
-'''CPU''' 0.0918308 s/step '''GPU''' 0.0110155 s/step
-<span id="days-per-nanosecond-simulation"></span>
-==== Days Per Nanosecond Simulation ====
-'''CPU''' 0.999022 days/ns '''GPU''' 0.0660301 days/ns
------
-<span id="additional-information"></span>
-==== Additional information ====
-* Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.
-* per core speedup with GPU acceleration is ~151.297 times
-** caveat: one CPU must supervise and await output from each GPU because of how NAMD was written
-* CUDA performed 10000steps and wrote output in 136.315155s total wall time
-* CPU performed 500steps and wrote output in 76.477325s total wall time
-<span id="benchmarking-namd2-workloads-for-gpu-containers-on-ccast"></span>
-== Benchmarking NAMD2 workloads for GPU containers on CCAST ==
-<span id="stephen-szwiec"></span>
-=== Stephen Szwiec ===
-<span id="rasulev-computational-chemistry-research-group"></span>
-=== Rasulev Computational Chemistry Research Group ===
-<span id="north-dakota-state-university"></span>
-=== North Dakota State University ===
-<span id="july-2022"></span>
-=== 26 July 2022 ===
------
-<span id="apoa1-information"></span>
 ==== ApoA1 information ====
@@ Line 121: / Line 28: @@
 ** GPU workload modified to use CUDA FFTW, CUDA integration
 ** GPU workload modified to continue simulation for 10000 steps
-<span id="namd-non-gpu-run-information"></span>
 ==== NAMD non-gpu run information ====
@@ Line 129: / Line 34: @@
 * 20 processes, 20 cores, 1 physical node, and 16GB memory specified
 * NAMD 2.14 used
-<span id="namd-gpu-run-information"></span>
 ==== NAMD gpu run information ====
@@ Line 139: / Line 42: @@
 ** binding each GPU to one Nvidia A10 with 22731MB memory
 * NAMD 3.0alpha11
 -----
-<span id="benchmark-findings"></span>
 === Benchmark findings ===
-<span id="startup-wall-time"></span>
 ==== Startup Wall Time ====
 '''CPU''' 3.4346 s '''GPU''' 0.0963 s
-<span id="simulation-wall-time"></span>
 ==== Simulation Wall Time ====
 '''CPU''' 38.6991 s '''GPU''' 5.59571 s
-<span id="wall-time-per-step"></span>
 ==== Wall Time Per Step ====
 '''CPU''' 0.0918308 s/step '''GPU''' 0.0110155 s/step
-<span id="days-per-nanosecond-simulation"></span>
 ==== Days Per Nanosecond Simulation ====
 '''CPU''' 0.999022 days/ns '''GPU''' 0.0660301 days/ns
 -----
-<span id="additional-information"></span>
 ==== Additional information ====

Benchmarking NAMD workflows for GPU containers: Difference between revisions

Latest revision as of 18:22, 21 October 2022

Contents

Benchmarking NAMD2 workloads for GPU containers on CCAST

ApoA1 information

NAMD non-gpu run information

NAMD gpu run information

Benchmark findings

Startup Wall Time

Simulation Wall Time

Wall Time Per Step

Days Per Nanosecond Simulation

Additional information

Navigation menu

Benchmarking NAMD workflows for GPU containers: Difference between revisions

Latest revision as of 18:22, 21 October 2022

Benchmarking NAMD2 workloads for GPU containers on CCAST

ApoA1 information

NAMD non-gpu run information

NAMD gpu run information

Benchmark findings

Startup Wall Time

Simulation Wall Time

Wall Time Per Step

Days Per Nanosecond Simulation

Additional information

Navigation menu

Search