Sysadmin: created and edited page

2022-10-21T18:22:52Z

created and edited page

Sysadmin: Created page with " == Benchmarking NAMD2 workloads for GPU containers on CCAST == Stephen Szwiec Rasulev Computational Chemistry Research Group North Dakota State University 26 July 2022 ----- ==== ApoA1 information ==== * simulates a bloodstream lipoprotein particle * '''physical stats''' 92224 atoms 70660 bonds 74136 angles 741..."

2022-10-21T18:21:54Z

Created page with " == Benchmarking NAMD2 workloads for GPU containers on CCAST == Stephen Szwiec Rasulev Computational Chemistry Research Group North Dakota State University 26 July 2022 ----- ==== ApoA1 information ==== * simulates a bloodstream lipoprotein particle * '''physical stats''' ** 92224 atoms ** 70660 bonds ** 74136 angles ** 741..."

New page

== Benchmarking NAMD2 workloads for GPU containers on CCAST ==


Stephen Szwiec
Rasulev Computational Chemistry Research Group
North Dakota State University
26 July 2022
-----


==== ApoA1 information ====

* simulates a bloodstream lipoprotein particle
* '''physical stats'''
** 92224 atoms
** 70660 bonds
** 74136 angles
** 74130 diheadrals
** 1402 impropers
** 32992 hydrogen groups
** 553785 total amu of mass
* '''energy stats'''
** 300K initial temp.
** -14 e total charge
* '''simulation stats'''
** Consists of a startup process followed by 500 steps of simulation for benchmark time
** GPU workload modified to use CUDA FFTW, CUDA integration
** GPU workload modified to continue simulation for 10000 steps


==== NAMD non-gpu run information ====

* Charmrun used with one node (condo02)
* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
* 20 processes, 20 cores, 1 physical node, and 16GB memory specified
* NAMD 2.14 used


==== NAMD gpu run information ====

* NAMD used within Singularity container system
** namd:3.0-alpha11 image used to generate singularity container
* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
* 2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified
** binding each GPU to one Nvidia A10 with 22731MB memory
* NAMD 3.0alpha11

-----


=== Benchmark findings ===


==== Startup Wall Time ====

'''CPU''' 3.4346 s '''GPU''' 0.0963 s


==== Simulation Wall Time ====

'''CPU''' 38.6991 s '''GPU''' 5.59571 s


==== Wall Time Per Step ====

'''CPU''' 0.0918308 s/step '''GPU''' 0.0110155 s/step


==== Days Per Nanosecond Simulation ====

'''CPU''' 0.999022 days/ns '''GPU''' 0.0660301 days/ns

-----


==== Additional information ====

* Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.
* per core speedup with GPU acceleration is ~151.297 times
** caveat: one CPU must supervise and await output from each GPU because of how NAMD was written
* CUDA performed 10000steps and wrote output in 136.315155s total wall time
* CPU performed 500steps and wrote output in 76.477325s total wall time

== Benchmarking NAMD2 workloads for GPU containers on CCAST ==


=== Stephen Szwiec ===


=== Rasulev Computational Chemistry Research Group ===


=== North Dakota State University ===


=== 26 July 2022 ===

-----


==== ApoA1 information ====

* simulates a bloodstream lipoprotein particle
* '''physical stats'''
** 92224 atoms
** 70660 bonds
** 74136 angles
** 74130 diheadrals
** 1402 impropers
** 32992 hydrogen groups
** 553785 total amu of mass
* '''energy stats'''
** 300K initial temp.
** -14 e total charge
* '''simulation stats'''
** Consists of a startup process followed by 500 steps of simulation for benchmark time
** GPU workload modified to use CUDA FFTW, CUDA integration
** GPU workload modified to continue simulation for 10000 steps


==== NAMD non-gpu run information ====

* Charmrun used with one node (condo02)
* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
* 20 processes, 20 cores, 1 physical node, and 16GB memory specified
* NAMD 2.14 used


==== NAMD gpu run information ====

* NAMD used within Singularity container system
** namd:3.0-alpha11 image used to generate singularity container
* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible
* 2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified
** binding each GPU to one Nvidia A10 with 22731MB memory
* NAMD 3.0alpha11

-----


=== Benchmark findings ===


==== Startup Wall Time ====

'''CPU''' 3.4346 s '''GPU''' 0.0963 s


==== Simulation Wall Time ====

'''CPU''' 38.6991 s '''GPU''' 5.59571 s


==== Wall Time Per Step ====

'''CPU''' 0.0918308 s/step '''GPU''' 0.0110155 s/step


==== Days Per Nanosecond Simulation ====

'''CPU''' 0.999022 days/ns '''GPU''' 0.0660301 days/ns

-----


==== Additional information ====

* Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.
* per core speedup with GPU acceleration is ~151.297 times
** caveat: one CPU must supervise and await output from each GPU because of how NAMD was written
* CUDA performed 10000steps and wrote output in 136.315155s total wall time
* CPU performed 500steps and wrote output in 76.477325s total wall time

Benchmarking NAMD workflows for GPU containers - Revision history

Sysadmin: created and edited page