Difference between revisions of "Runtime Systems"

From crtc.cs.odu.edu
Jump to: navigation, search
(Synthetic Benchmarks)
(Synthetic Benchmarks)
Line 1: Line 1:
 
== PREMA 2.0 ==
 
== PREMA 2.0 ==
=== Synthetic Benchmarks ===
+
=== Synthetic Benchmark ===
 +
To examine the performance of our runtime system itself without the difficulties caused by the complexity of the applications running on top of it, we have developed a synthetic benchmark. The benchmark begins by dispersing work units to available processors, computation is the invoked via PREMA's messaging mechanism. Once computations of a data object are complete, a notification is sent back to the root processor. The application terminates once all notifications have been received. The number of subdomains per available core is set to 10, the weights of the individual workers are assigned to two categories, light and heavy. The average time of a heavyweight unit is 2.5 times the time of a lightweight one and 20% of the work units are assigned to the heavy category.
 
{|
 
{|
 
|-
 
|-
 
|[[File:RTS_MPI_320.png|680px|link=]]
 
|[[File:RTS_MPI_320.png|680px|link=]]
 
|[[File:RTS_PREMA_320.png|680px|link=]]
 
|[[File:RTS_PREMA_320.png|680px|link=]]
 +
|-
 +
|colspan="2" style="text-align: center;" | '''Per worker load comparison between MPI and PREMA with ILB on 320 cores. MPI uses 320 as workers while PREMA uses 310 for workers plus 10 for communication'''
 +
|}
 +
 +
{|
 
|-
 
|-
 
|[[File:RTS_MPI_640.png|680px|link=]]
 
|[[File:RTS_MPI_640.png|680px|link=]]
 
|[[File:RTS_PREMA_640.png|680px|link=]]
 
|[[File:RTS_PREMA_640.png|680px|link=]]
 
|-
 
|-
[[File:RTS_MPI_1280.png|680px|link=]]
+
|colspan="2" style="text-align: center;" | '''Per worker load comparison between MPI and PREMA with ILB on 640 cores. MPI uses 640 as workers while PREMA uses 620 for workers plus 20 for communication'''
[[File:RTS_PREMA_1280.png|680px|link=]]
+
|}
 +
 
 +
{|
 +
|-
 +
|[[File:RTS_MPI_1280.png|680px|link=]]
 +
|[[File:RTS_PREMA_1280.png|680px|link=]]
 
|-
 
|-
[[File:RTS_MPI_3200.png|680px|link=]]
+
|colspan="2" style="text-align: center;" | '''Per worker load comparison between MPI and PREMA with ILB on 1280 cores. MPI uses 1280 as workers while PREMA uses 1240 for workers plus 40 for communication'''
[[File:RTS_PREMA_3200.png|680px|link=]]
+
|}
 +
 
 +
{|
 +
|-
 +
|[[File:RTS_MPI_3200.png|680px|link=]]
 +
|[[File:RTS_PREMA_3200.png|680px|link=]]
 +
|-
 +
|colspan="2" style="text-align: center;" | '''Per worker load comparison between MPI and PREMA with ILB on 3200 cores. MPI uses 3200 as workers while PREMA uses 3000 for workers plus 200 for communication'''
 +
|}
 +
 
 +
{|
 
|-
 
|-
 
|[[File:RTS_MPI_4800.png|680px|link=]]
 
|[[File:RTS_MPI_4800.png|680px|link=]]
 
|[[File:RTS_PREMA_4800.png|680px|link=]]
 
|[[File:RTS_PREMA_4800.png|680px|link=]]
 +
|-
 +
|colspan="2" style="text-align: center;" | '''Per worker load comparison between MPI and PREMA with ILB on 4800 cores. MPI uses 4800 as workers while PREMA uses 4500 for workers plus 300 for communication'''
 +
|}
 +
 +
{|
 
|-
 
|-
 
|[[File:RTS_MPI_5600.png|680px|link=]]
 
|[[File:RTS_MPI_5600.png|680px|link=]]
 
|[[File:RTS_PREMA_5600.png|680px|link=]]
 
|[[File:RTS_PREMA_5600.png|680px|link=]]
 +
|-
 +
|colspan="2" style="text-align: center;" | '''Per worker load comparison between MPI and PREMA with ILB on 5600 cores. MPI uses 5600 as workers while PREMA uses 5250 for workers plus 350 for communication'''
 
|}
 
|}
  
 
=== Decoupled CDT3D ===
 
=== Decoupled CDT3D ===

Revision as of 07:46, 27 March 2018

PREMA 2.0

Synthetic Benchmark

To examine the performance of our runtime system itself without the difficulties caused by the complexity of the applications running on top of it, we have developed a synthetic benchmark. The benchmark begins by dispersing work units to available processors, computation is the invoked via PREMA's messaging mechanism. Once computations of a data object are complete, a notification is sent back to the root processor. The application terminates once all notifications have been received. The number of subdomains per available core is set to 10, the weights of the individual workers are assigned to two categories, light and heavy. The average time of a heavyweight unit is 2.5 times the time of a lightweight one and 20% of the work units are assigned to the heavy category.

RTS MPI 320.png RTS PREMA 320.png
Per worker load comparison between MPI and PREMA with ILB on 320 cores. MPI uses 320 as workers while PREMA uses 310 for workers plus 10 for communication
RTS MPI 640.png RTS PREMA 640.png
Per worker load comparison between MPI and PREMA with ILB on 640 cores. MPI uses 640 as workers while PREMA uses 620 for workers plus 20 for communication
RTS MPI 1280.png RTS PREMA 1280.png
Per worker load comparison between MPI and PREMA with ILB on 1280 cores. MPI uses 1280 as workers while PREMA uses 1240 for workers plus 40 for communication
RTS MPI 3200.png RTS PREMA 3200.png
Per worker load comparison between MPI and PREMA with ILB on 3200 cores. MPI uses 3200 as workers while PREMA uses 3000 for workers plus 200 for communication
RTS MPI 4800.png RTS PREMA 4800.png
Per worker load comparison between MPI and PREMA with ILB on 4800 cores. MPI uses 4800 as workers while PREMA uses 4500 for workers plus 300 for communication
RTS MPI 5600.png RTS PREMA 5600.png
Per worker load comparison between MPI and PREMA with ILB on 5600 cores. MPI uses 5600 as workers while PREMA uses 5250 for workers plus 350 for communication

Decoupled CDT3D