PREMA SW4
From crtc.cs.odu.edu
MPI Times
PREMA Times
Allocation: 300 cores in different configurations
Pure MPI Time: 182.32
Cluster: Wahab
Nodes:
d4-w6420b-[07-12], e1-w6420b-20, e2-w6420b-[02-04,06,08,17], e3-w6420b-[09-12,17,20]
#Nodes | #Cores | Total Time | Recv | Send | #m Sent | #m Recvd | App Handlers | Creating Work | P2P-MP | Handlers Executed | LB | ILB | Yieldables | Blocked | Steal | Steal_Succ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
300 | 1 | 170.65 | 0.66 | 0.65 | 12184.75 | 20892.91 | 135.64 | 147.99 | 2.99 | 12126.90 | 11.27 | 0.0 | 35.43 | 103.84 | 1.54 | 0.0 |
150 | 2 | 175.58 | 0.87 | 0.94 | 17895.50 | 30110.82 | 139.69 | 140.11 | 34.10 | 12126.90 | 0.68 | 0.0 | 66.44 | 73.16 | 0.32 | 0.25 |
100 | 3 | 176.58 | 1.08 | 1.28 | 24305.74 | 40606.23 | 143.09 | 143.49 | 31.79 | 12126.90 | 0.68 | 0.0 | 76.11 | 66.93 | 0.27 | 0.20 |
75 | 4 | 176.22 | 1.16 | 1.48 | 29436.79 | 48996.64 | 143.51 | 143.92 | 30.98 | 12126.90 | 0.69 | 0.0 | 75.51 | 67.91 | 0.32 | 0.24 |
60 | 5 | 178.04 | 1.29 | 1.71 | 35129.87 | 58484.55 | 138.08 | 138.57 | 37.84 | 12126.90 | 0.85 | 0.0 | 68.08 | 69.91 | 0.37 | 0.27 |
50 | 6 | 179.93 | 1.49 | 2.10 | 42224.16 | 70137.46 | 141.62 | 142.09 | 36.24 | 12126.90 | 0.83 | 0.0 | 71.45 | 70.12 | 0.32 | 0.21 |
30 | 10 | 187.78 | 1.67 | 3.07 | 64217.67 | 106014.10 | 145.65 | 146.21 | 39.85 | 12126.90 | 0.91 | 0.0 | 73.14 | 72.40 | 0.47 | 0.31 |
25 | 12 | 192.23 | 1.93 | 3.91 | 78517.24 | 129349.92 | 146.19 | 146.82 | 43.58 | 12126.90 | 1.03 | 0.0 | 71.80 | 74.22 | 0.56 | 0.37 |
20 | 15 | 193.54 | 1.93 | 4.53 | 96706.40 | 158971.15 | 143.63 | 144.33 | 47.32 | 12126.90 | 1.12 | 0.0 | 69.72 | 73.74 | 0.60 | 0.38 |
15 | 20 | 195.95 | 2.02 | 5.75 | 123265.27 | 201163.20 | 149.65 | 150.40 | 43.71 | 12126.90 | 1.08 | 0.0 | 72.42 | 76.95 | 0.79 | 0.50 |
12 | 25 | 210.77 | 1.89 | 6.07 | 126917.50 | 205646.75 | 155.94 | 156.87 | 51.74 | 12126.90 | 1.30 | 0.0 | 73.67 | 81.81 | 1.13 | 0.73 |
10 | 30 | 222.58 | 1.74 | 6.01 | 123972.0 | 199357.30 | 160.44 | 161.57 | 58.68 | 12126.90 | 1.48 | 0.0 | 76.66 | 83.16 | 1.45 | 0.91 |
Allocation: 520 cores without dedicated thread for MPI. 1 MPI rank per socket -> 4 MPI ranks/node.
#Nodes | #Cores | Total Time | Recv | Send | #m Sent | #m Recvd | App Handlers | Creating Work | P2P-MP | Handlers Executed | LB | ILB | Yieldables | Blocked | Steal | Steal_Succ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
100 | 10 | 65.69 | 0.75 | 2.47 | 68044.18 | 111721.23 | 49.68 | 52.72 | 6.27 | 12459.53 | 4.55 | 0.0 | 21.10 | 26.85 | 3.58 | 2.45 |
Cluster: Turing
Pure MPI Time: 268.56
Allocation: 310 cores (due to issues with the tools on Turing)
#Nodes | #Cores | Total Time | Recv | Send | #m Sent | #m Recvd | App Handlers | Creating Work | P2P-MP | Handlers Executed | LB | ILB | Yieldables | Blocked | Steal | Steal_Succ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 31 | 276.08 | 34.63 | 51.60 | 186837.90 | 300642.30 | 208.39 | 212.43 | 59.56 | 12026.52 | 2.06 | 0.0 | 106.82 | 100.66 | 3.34 | 2.39 |
496 cores
#Nodes | #Cores | Total Time | Recv | Send | #m Sent | #m Recvd | App Handlers | Creating Work | P2P-MP | Handlers Executed | LB | ILB | Yieldables | Blocked | Steal | Steal_Succ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
16 | 31 | 182.90 | 33.29 | 47.52 | 192765.06 | 313180.31 | 135.00 | 137.86 | 40.99 | 12267.95 | 1.44 | 0.0 | 66.71 | 67.84 | 2.19 | 1.53 |