<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://wiki.nanobiodata.org/index.php?action=history&amp;feed=atom&amp;title=Benchmarking_NAMD_workflows_for_GPU_containers</id>
	<title>Benchmarking NAMD workflows for GPU containers - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://wiki.nanobiodata.org/index.php?action=history&amp;feed=atom&amp;title=Benchmarking_NAMD_workflows_for_GPU_containers"/>
	<link rel="alternate" type="text/html" href="https://wiki.nanobiodata.org/index.php?title=Benchmarking_NAMD_workflows_for_GPU_containers&amp;action=history"/>
	<updated>2026-05-14T13:09:45Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.38.2</generator>
	<entry>
		<id>https://wiki.nanobiodata.org/index.php?title=Benchmarking_NAMD_workflows_for_GPU_containers&amp;diff=137&amp;oldid=prev</id>
		<title>Sysadmin: created and edited page</title>
		<link rel="alternate" type="text/html" href="https://wiki.nanobiodata.org/index.php?title=Benchmarking_NAMD_workflows_for_GPU_containers&amp;diff=137&amp;oldid=prev"/>
		<updated>2022-10-21T18:22:52Z</updated>

		<summary type="html">&lt;p&gt;created and edited page&lt;/p&gt;
&lt;a href=&quot;https://wiki.nanobiodata.org/index.php?title=Benchmarking_NAMD_workflows_for_GPU_containers&amp;amp;diff=137&amp;amp;oldid=136&quot;&gt;Show changes&lt;/a&gt;</summary>
		<author><name>Sysadmin</name></author>
	</entry>
	<entry>
		<id>https://wiki.nanobiodata.org/index.php?title=Benchmarking_NAMD_workflows_for_GPU_containers&amp;diff=136&amp;oldid=prev</id>
		<title>Sysadmin: Created page with &quot;&lt;span id=&quot;benchmarking-namd2-workloads-for-gpu-containers-on-ccast&quot;&gt;&lt;/span&gt; == Benchmarking NAMD2 workloads for GPU containers on CCAST ==  &lt;span id=&quot;stephen-szwiec&quot;&gt;&lt;/span&gt; Stephen Szwiec  Rasulev Computational Chemistry Research Group North Dakota State University 26 July 2022 -----  &lt;span id=&quot;apoa1-information&quot;&gt;&lt;/span&gt; ==== ApoA1 information ====  * simulates a bloodstream lipoprotein particle * &#039;&#039;&#039;physical stats&#039;&#039;&#039; ** 92224 atoms ** 70660 bonds ** 74136 angles ** 741...&quot;</title>
		<link rel="alternate" type="text/html" href="https://wiki.nanobiodata.org/index.php?title=Benchmarking_NAMD_workflows_for_GPU_containers&amp;diff=136&amp;oldid=prev"/>
		<updated>2022-10-21T18:21:54Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;&amp;lt;span id=&amp;quot;benchmarking-namd2-workloads-for-gpu-containers-on-ccast&amp;quot;&amp;gt;&amp;lt;/span&amp;gt; == Benchmarking NAMD2 workloads for GPU containers on CCAST ==  &amp;lt;span id=&amp;quot;stephen-szwiec&amp;quot;&amp;gt;&amp;lt;/span&amp;gt; Stephen Szwiec  Rasulev Computational Chemistry Research Group North Dakota State University 26 July 2022 -----  &amp;lt;span id=&amp;quot;apoa1-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt; ==== ApoA1 information ====  * simulates a bloodstream lipoprotein particle * &amp;#039;&amp;#039;&amp;#039;physical stats&amp;#039;&amp;#039;&amp;#039; ** 92224 atoms ** 70660 bonds ** 74136 angles ** 741...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&amp;lt;span id=&amp;quot;benchmarking-namd2-workloads-for-gpu-containers-on-ccast&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== Benchmarking NAMD2 workloads for GPU containers on CCAST ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;stephen-szwiec&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
Stephen Szwiec &lt;br /&gt;
Rasulev Computational Chemistry Research Group&lt;br /&gt;
North Dakota State University&lt;br /&gt;
26 July 2022&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;apoa1-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== ApoA1 information ====&lt;br /&gt;
&lt;br /&gt;
* simulates a bloodstream lipoprotein particle&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;physical stats&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** 92224 atoms&lt;br /&gt;
** 70660 bonds&lt;br /&gt;
** 74136 angles&lt;br /&gt;
** 74130 diheadrals&lt;br /&gt;
** 1402 impropers&lt;br /&gt;
** 32992 hydrogen groups&lt;br /&gt;
** 553785 total amu of mass&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;energy stats&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** 300K initial temp.&lt;br /&gt;
** -14 e total charge&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;simulation stats&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Consists of a startup process followed by 500 steps of simulation for benchmark time&lt;br /&gt;
** GPU workload modified to use CUDA FFTW, CUDA integration&lt;br /&gt;
** GPU workload modified to continue simulation for 10000 steps&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;namd-non-gpu-run-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== NAMD non-gpu run information ====&lt;br /&gt;
&lt;br /&gt;
* Charmrun used with one node (condo02)&lt;br /&gt;
* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible&lt;br /&gt;
* 20 processes, 20 cores, 1 physical node, and 16GB memory specified&lt;br /&gt;
* NAMD 2.14 used&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;namd-gpu-run-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== NAMD gpu run information ====&lt;br /&gt;
&lt;br /&gt;
* NAMD used within Singularity container system&lt;br /&gt;
** namd:3.0-alpha11 image used to generate singularity container&lt;br /&gt;
* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible&lt;br /&gt;
* 2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified&lt;br /&gt;
** binding each GPU to one Nvidia A10 with 22731MB memory&lt;br /&gt;
* NAMD 3.0alpha11&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;benchmark-findings&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Benchmark findings ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;startup-wall-time&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Startup Wall Time ====&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CPU&amp;#039;&amp;#039;&amp;#039; 3.4346 s &amp;#039;&amp;#039;&amp;#039;GPU&amp;#039;&amp;#039;&amp;#039; 0.0963 s&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;simulation-wall-time&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Simulation Wall Time ====&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CPU&amp;#039;&amp;#039;&amp;#039; 38.6991 s &amp;#039;&amp;#039;&amp;#039;GPU&amp;#039;&amp;#039;&amp;#039; 5.59571 s&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;wall-time-per-step&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Wall Time Per Step ====&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CPU&amp;#039;&amp;#039;&amp;#039; 0.0918308 s/step &amp;#039;&amp;#039;&amp;#039;GPU&amp;#039;&amp;#039;&amp;#039; 0.0110155 s/step&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;days-per-nanosecond-simulation&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Days Per Nanosecond Simulation ====&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CPU&amp;#039;&amp;#039;&amp;#039; 0.999022 days/ns &amp;#039;&amp;#039;&amp;#039;GPU&amp;#039;&amp;#039;&amp;#039; 0.0660301 days/ns&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Additional information ====&lt;br /&gt;
&lt;br /&gt;
* Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.&lt;br /&gt;
* per core speedup with GPU acceleration is ~151.297 times&lt;br /&gt;
** caveat: one CPU must supervise and await output from each GPU because of how NAMD was written&lt;br /&gt;
* CUDA performed 10000steps and wrote output in 136.315155s total wall time&lt;br /&gt;
* CPU performed 500steps and wrote output in 76.477325s total wall time&lt;br /&gt;
&amp;lt;span id=&amp;quot;benchmarking-namd2-workloads-for-gpu-containers-on-ccast&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
== Benchmarking NAMD2 workloads for GPU containers on CCAST ==&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;stephen-szwiec&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Stephen Szwiec ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;rasulev-computational-chemistry-research-group&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Rasulev Computational Chemistry Research Group ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;north-dakota-state-university&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== North Dakota State University ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;july-2022&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== 26 July 2022 ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;apoa1-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== ApoA1 information ====&lt;br /&gt;
&lt;br /&gt;
* simulates a bloodstream lipoprotein particle&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;physical stats&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** 92224 atoms&lt;br /&gt;
** 70660 bonds&lt;br /&gt;
** 74136 angles&lt;br /&gt;
** 74130 diheadrals&lt;br /&gt;
** 1402 impropers&lt;br /&gt;
** 32992 hydrogen groups&lt;br /&gt;
** 553785 total amu of mass&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;energy stats&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** 300K initial temp.&lt;br /&gt;
** -14 e total charge&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;simulation stats&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** Consists of a startup process followed by 500 steps of simulation for benchmark time&lt;br /&gt;
** GPU workload modified to use CUDA FFTW, CUDA integration&lt;br /&gt;
** GPU workload modified to continue simulation for 10000 steps&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;namd-non-gpu-run-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== NAMD non-gpu run information ====&lt;br /&gt;
&lt;br /&gt;
* Charmrun used with one node (condo02)&lt;br /&gt;
* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible&lt;br /&gt;
* 20 processes, 20 cores, 1 physical node, and 16GB memory specified&lt;br /&gt;
* NAMD 2.14 used&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;namd-gpu-run-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== NAMD gpu run information ====&lt;br /&gt;
&lt;br /&gt;
* NAMD used within Singularity container system&lt;br /&gt;
** namd:3.0-alpha11 image used to generate singularity container&lt;br /&gt;
* machine topology: 2sockets x 64 cores x 1 PU = 128-way SMP possible&lt;br /&gt;
* 2 CPUs, 2 GPUs, 1 openMP thread, 1 physical node, and 16GB memory specified&lt;br /&gt;
** binding each GPU to one Nvidia A10 with 22731MB memory&lt;br /&gt;
* NAMD 3.0alpha11&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;benchmark-findings&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
=== Benchmark findings ===&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;startup-wall-time&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Startup Wall Time ====&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CPU&amp;#039;&amp;#039;&amp;#039; 3.4346 s &amp;#039;&amp;#039;&amp;#039;GPU&amp;#039;&amp;#039;&amp;#039; 0.0963 s&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;simulation-wall-time&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Simulation Wall Time ====&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CPU&amp;#039;&amp;#039;&amp;#039; 38.6991 s &amp;#039;&amp;#039;&amp;#039;GPU&amp;#039;&amp;#039;&amp;#039; 5.59571 s&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;wall-time-per-step&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Wall Time Per Step ====&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CPU&amp;#039;&amp;#039;&amp;#039; 0.0918308 s/step &amp;#039;&amp;#039;&amp;#039;GPU&amp;#039;&amp;#039;&amp;#039; 0.0110155 s/step&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;days-per-nanosecond-simulation&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Days Per Nanosecond Simulation ====&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;CPU&amp;#039;&amp;#039;&amp;#039; 0.999022 days/ns &amp;#039;&amp;#039;&amp;#039;GPU&amp;#039;&amp;#039;&amp;#039; 0.0660301 days/ns&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
-----&lt;br /&gt;
&lt;br /&gt;
&amp;lt;span id=&amp;quot;additional-information&amp;quot;&amp;gt;&amp;lt;/span&amp;gt;&lt;br /&gt;
==== Additional information ====&lt;br /&gt;
&lt;br /&gt;
* Extending Apoa1 workload steps with CUDA lead to step 3500 being reached in roughly the same time step 500 was reached on 20 CPUs.&lt;br /&gt;
* per core speedup with GPU acceleration is ~151.297 times&lt;br /&gt;
** caveat: one CPU must supervise and await output from each GPU because of how NAMD was written&lt;br /&gt;
* CUDA performed 10000steps and wrote output in 136.315155s total wall time&lt;br /&gt;
* CPU performed 500steps and wrote output in 76.477325s total wall time&lt;/div&gt;</summary>
		<author><name>Sysadmin</name></author>
	</entry>
</feed>