Difference between revisions of "PDR.PODM Distributed Memory"
Pthomadakis (talk | contribs) (→Current Issues) |
Pthomadakis (talk | contribs) (→Latest Results) |
||
(90 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | === | + | === Issues === |
* No reuse of leaves refined by worker nodes. The picture below shows the issue. Two neighbour leaves (0,1) each refined as the main leaf (0 top, 1 bottom) but not refined as a neighor. | * No reuse of leaves refined by worker nodes. The picture below shows the issue. Two neighbour leaves (0,1) each refined as the main leaf (0 top, 1 bottom) but not refined as a neighor. | ||
[[File:PDR_PODM_Leaves_not_refined.png| 700px]] | [[File:PDR_PODM_Leaves_not_refined.png| 700px]] | ||
Line 7: | Line 7: | ||
[[File:PDR_PODM_Cells_not_distributed.png| 700px]] | [[File:PDR_PODM_Cells_not_distributed.png| 700px]] | ||
− | * | + | * During unpacking the incident cell for each vertex is not set correctly. Specifically, in the case that the initial incident cell is not part of the working unit (Leaf + LVL.1 Neighbours) and thus is not local, |
− | working unit ( | + | it is set to the infinite cell. This causes PODM to crash randomly for some cases. |
− | * The function that unpacks the required leaves before refinement does discard duplicate vertices. Duplicate vertices will always be present since each leaf is packed and sent individually, and as a result, | + | * Another issue comes from the way global IDs are updated for each cell's neighbors' IDs. The code that updates the cell's connectivity using global IDs takes the neighbor's pointer, retrieves its global ID and |
− | neighbouring leaves will include the shared vertices. | + | updates the neighborID field. However, when the neighbour is part of another work unit's leaf and is not local this pointer is NULL. In this case the neighborID field is wrongly reset to the infinite cell ID, which |
+ | as result, deletes the connectivity information forever. | ||
+ | |||
+ | * The function that unpacks the required leaves before refinement does not discard duplicate vertices. Duplicate vertices will always be present since each leaf is packed and sent individually, and as a result, | ||
+ | neighbouring leaves will include the shared vertices. Because duplicate vertices are not handled, multiple vertex objects are created that are in fact the same point geometrically. Thus, two cells that share | ||
+ | a common vertex could have pointers to two different vertex objects and, as a result, each cell views a different state about the same vertex. | ||
+ | |||
+ | === Fixes === | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | [[File:PDR_Fix.png| 700px]] | ||
+ | |||
+ | [[File:Work_Unit_After_Refinement.png| 700px]] | ||
+ | |||
+ | |||
+ | = Interesting Findings = | ||
+ | == Delta 0.880 == | ||
+ | === 15 MPI ranks depth: 3 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | |||
+ | |||
+ | [[File:PDR PODM Histogram Time 15 d3.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks 15 d3.png| 700px]] | ||
+ | [[File:PDR PODM Time Break Down 15 d3.png| 700px]] | ||
+ | |||
+ | [[File:PDR PODM Parallelism.png| 700px]] | ||
+ | [[File:PDR_PODM_Histogram_15.png| 700px]] | ||
+ | </div> | ||
+ | |||
+ | === 15 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | |||
+ | |||
+ | [[File:PDR PODM Histogram Time 15.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks 15.png| 700px]] | ||
+ | [[File:PDR PODM Time Break Down 15.png| 700px]] | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 40 MPI ranks / 40 cores depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | Total Time: 824.29 | ||
+ | |||
+ | Total Tasks: 11413 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 40.png| 700px]] | ||
+ | [[File:PDR PODM Histrogram Tasks 40.png| 700px]] | ||
+ | [[File:PDR PODM Time Break Down 40.png| 700px]] | ||
+ | </div> | ||
+ | |||
+ | === 160 MPI ranks / 10 cores depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | Total Time: 378.32 | ||
+ | |||
+ | Total Tasks: 12652 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 160 10.png| 700px]] | ||
+ | [[File:PDR PODM Histrogram Tasks 160 10.png| 700px]] | ||
+ | [[File:PDR PODM Time Break Down 160 10.png| 700px]] | ||
+ | </div> | ||
+ | |||
+ | == After Parallel int to pointer == | ||
+ | === 15 MPI ranks depth: 3 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | |||
+ | [[File:PDR PODM Histogram Time 15 d3 par int2ptr.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks 15 d3 par int2ptr.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_15_d3.png| 700px]] | ||
+ | | [[File:PDR PODM Time Break Down 15 d3 par int2ptr.png| 700px]] | ||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 15 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | Total time: 569.7 | ||
+ | |||
+ | Total tasks: 8761 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 15 par int2ptr.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks 15 par int2ptr.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_15.png| 700px]] | ||
+ | | [[File:PDR PODM Time Break Down 15 par int2ptr.png| 700px]] | ||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 40 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | Total time: 326.5 | ||
+ | |||
+ | Total tasks: 11201 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 40 par int2ptr.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks 40 par int2ptr.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_40.png| 700px]] | ||
+ | | [[File:PDR PODM Time Break Down 40 par int2ptr.png| 700px]] | ||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 160 MPI ranks 10 cores depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | Total time: 264.3 | ||
+ | |||
+ | Total tasks: 12826 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 160 10 par int2ptr.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks 160 10 par int2ptr.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_160_10.png| 700px]] | ||
+ | | [[File:PDR PODM Time Break Down 160 10 par int2ptr.png| 700px]] | ||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | == After Parallel int to pointer and leaf distribution == | ||
+ | === 15 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | Total time: 452.3 | ||
+ | |||
+ | Total tasks: 8813 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 15 par int2ptr leaf dist bad elements.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks 15 par int2ptr leaf dist bad elements.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel(int2ptr) | ||
+ | ! Parallel(leafDist,BadEl) | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_15.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 15 par int2ptr.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 15 par int2ptr leaf dist bad elements.png| 500px]] | ||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 40 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | Total time: 271.9 | ||
+ | |||
+ | Total tasks: 11455 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 40 par int2ptr leaf dist bad elements.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks 40 par int2ptr leaf dist bad elements.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel(int2ptr) | ||
+ | ! Parallel(leafDist,BadEl) | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_40.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 40 par int2ptr.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 40 par int2ptr leaf dist bad elements.png| 500px]] | ||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 160 MPI ranks 10 cores depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | Total time: 218.3 | ||
+ | |||
+ | Total tasks: 12517 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 160 10 par int2ptr leaf dist bad elements.png| 700px]] | ||
+ | [[File:PDR PODM Histrogram Tasks 160 10 par int2ptr leaf dist bad elements.png| 700px]] | ||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel(int2ptr) | ||
+ | ! Parallel(leafDist,BadEl) | ||
+ | |- | ||
+ | | [[File:PDR PODM Time Break Down 160 10.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 160 10 par int2ptr.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 160 10 par int2ptr leaf dist bad elements.png| 500px]] | ||
+ | |} | ||
+ | </div> | ||
+ | |||
+ | == After Parallel int to pointer, leaf distribution and extra comm thread == | ||
+ | === 15 MPI ranks depth: 3 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | Total time: 1144.7 | ||
+ | |||
+ | Total tasks: 1830 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 15 d3 par int2ptr leaf dist bad elements comm thread.png| 700px]] | ||
+ | [[File:PDR PODM Histrogram Tasks 15 d3 par int2ptr leaf dist bad elements comm thread.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel(best) | ||
+ | ! Parallel(best+comm_thread) | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_15_d3.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 15 d3 par int2ptr leaf dist bad elements.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 15 d3 par int2ptr leaf dist bad elements comm thread.png| 500px]] | ||
+ | |||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 15 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | Total time: 354.6 | ||
+ | |||
+ | Total tasks: 8892 | ||
+ | |||
+ | [[File:PDR PODM Histogram Time 15 par int2ptr leaf dist bad elements comm thread.png| 700px]] | ||
+ | [[File:PDR PODM Histrogram Tasks 15 par int2ptr leaf dist bad elements comm thread.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel(best) | ||
+ | ! Parallel(best+comm_thread) | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_15.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 15 par int2ptr leaf dist bad elements.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 15 par int2ptr leaf dist bad elements comm thread.png| 500px]] | ||
+ | |||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 40 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | Total time: 208.3 | ||
+ | |||
+ | Total tasks: 11468 | ||
+ | |||
+ | |||
+ | [[File:PDR PODM Histogram Time 40 par int2ptr leaf dist bad elements comm thread.png| 700px]] | ||
+ | [[File:PDR PODM Histrogram Tasks 40 par int2ptr leaf dist bad elements comm thread.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel(best) | ||
+ | ! Parallel(best+comm_thread) | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_40.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 40 par int2ptr leaf dist bad elements.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 40 par int2ptr leaf dist bad elements comm thread.png| 500px]] | ||
+ | |||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | === 160 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | Total time: 202.2 | ||
+ | |||
+ | Total tasks: 12715 | ||
+ | |||
+ | |||
+ | [[File:PDR PODM Histogram Time 160 10 par int2ptr leaf dist bad elements comm thread.png| 700px]] | ||
+ | [[File:PDR PODM Histrogram Tasks 160 10 par int2ptr leaf dist bad elements comm thread.png| 700px]] | ||
+ | |||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! Sequential | ||
+ | ! Parallel(best) | ||
+ | ! Parallel(best+comm_thread) | ||
+ | |- | ||
+ | | [[File:PDR_PODM_Time_Break_Down_160_10.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 160 10 par int2ptr leaf dist bad elements.png| 500px]] | ||
+ | | [[File:PDR PODM Time Break Down 160 10 par int2ptr leaf dist bad elements comm thread.png| 500px]] | ||
+ | |||
+ | |} | ||
+ | |||
+ | </div> | ||
+ | |||
+ | == Delta 0.3780 == | ||
+ | === 15 MPI ranks depth: 4 === | ||
+ | <div class="mw-collapsible mw-collapsed"> | ||
+ | |||
+ | |||
+ | |||
+ | [[File:PDR PODM Histogram Time small_delta 15.png| 700px]] | ||
+ | [[File:PDR PODM Histogram Tasks small_delta 15.png| 700px]] | ||
+ | [[File:PDR PODM Time Break Down small_delta 15.png| 700px]] | ||
+ | |||
+ | </div> | ||
+ | |||
+ | |||
+ | === Latest Results === | ||
+ | |||
+ | {| class="wikitable" | ||
+ | |+ Shared MemoryTimes | ||
+ | |- | ||
+ | ! Cores !! Total Time (s) !! Number of Elements | ||
+ | |- | ||
+ | | 40|| 90.8 || 49453719 | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | |||
+ | |||
+ | {| class="wikitable" | ||
+ | |+ MPI Times | ||
+ | |- | ||
+ | ! !! colspan=2 | MPI !! colspan=2 | PREMA | ||
+ | |- | ||
+ | ! Cores !! Total Time (s) !! Number of Elements !! Total Time (s) !! Number of Elements | ||
+ | |- | ||
+ | | 100 || 1151.472406|| 49352359 || 208.746450 || 49347855 | ||
+ | |- | ||
+ | | 200 || 763.70671 || 49357898 || 121.326012 || 49353442 | ||
+ | |- | ||
+ | | 300 || 537.678638 || 49357092 || 105.812248 || 49351224 | ||
+ | |- | ||
+ | | 400 || 490.365970 || 49357881 || 93.101481 || 49351626 | ||
+ | |- | ||
+ | | 500 || 434.921334 || 49347357 || 93.119187 || 49351049 | ||
+ | |- | ||
+ | | 600 || 466.822803 || 49361647 || 96.243807 || 49345030 | ||
+ | |- | ||
+ | | 700 || 425.273205 || 49360615 || 97.721654 || 49346798 | ||
+ | |- | ||
+ | | 800 || 434.638603 || 49349629 || 96.723318 || 49341034 | ||
+ | |- | ||
+ | |} | ||
+ | |||
+ | [[File:PDR.png]] |
Latest revision as of 14:49, 19 May 2023
Contents
Issues
- No reuse of leaves refined by worker nodes. The picture below shows the issue. Two neighbour leaves (0,1) each refined as the main leaf (0 top, 1 bottom) but not refined as a neighor.
- Current algorithm uses neighbour traversal to distribute cells to octree leaving some cells out in some cases. Such a case can happen when a cell is part of an octree leaf based on its circumcenter but
it does not have any neighbour in the same leaf.
- During unpacking the incident cell for each vertex is not set correctly. Specifically, in the case that the initial incident cell is not part of the working unit (Leaf + LVL.1 Neighbours) and thus is not local,
it is set to the infinite cell. This causes PODM to crash randomly for some cases.
- Another issue comes from the way global IDs are updated for each cell's neighbors' IDs. The code that updates the cell's connectivity using global IDs takes the neighbor's pointer, retrieves its global ID and
updates the neighborID field. However, when the neighbour is part of another work unit's leaf and is not local this pointer is NULL. In this case the neighborID field is wrongly reset to the infinite cell ID, which as result, deletes the connectivity information forever.
- The function that unpacks the required leaves before refinement does not discard duplicate vertices. Duplicate vertices will always be present since each leaf is packed and sent individually, and as a result,
neighbouring leaves will include the shared vertices. Because duplicate vertices are not handled, multiple vertex objects are created that are in fact the same point geometrically. Thus, two cells that share a common vertex could have pointers to two different vertex objects and, as a result, each cell views a different state about the same vertex.
Fixes
Interesting Findings
Delta 0.880
15 MPI ranks depth: 3
15 MPI ranks depth: 4
40 MPI ranks / 40 cores depth: 4
160 MPI ranks / 10 cores depth: 4
After Parallel int to pointer
15 MPI ranks depth: 3
15 MPI ranks depth: 4
40 MPI ranks depth: 4
160 MPI ranks 10 cores depth: 4
After Parallel int to pointer and leaf distribution
15 MPI ranks depth: 4
40 MPI ranks depth: 4
160 MPI ranks 10 cores depth: 4
After Parallel int to pointer, leaf distribution and extra comm thread
15 MPI ranks depth: 3
15 MPI ranks depth: 4
40 MPI ranks depth: 4
160 MPI ranks depth: 4
Delta 0.3780
15 MPI ranks depth: 4
Latest Results
Cores | Total Time (s) | Number of Elements |
---|---|---|
40 | 90.8 | 49453719 |
MPI | PREMA | |||
---|---|---|---|---|
Cores | Total Time (s) | Number of Elements | Total Time (s) | Number of Elements |
100 | 1151.472406 | 49352359 | 208.746450 | 49347855 |
200 | 763.70671 | 49357898 | 121.326012 | 49353442 |
300 | 537.678638 | 49357092 | 105.812248 | 49351224 |
400 | 490.365970 | 49357881 | 93.101481 | 49351626 |
500 | 434.921334 | 49347357 | 93.119187 | 49351049 |
600 | 466.822803 | 49361647 | 96.243807 | 49345030 |
700 | 425.273205 | 49360615 | 97.721654 | 49346798 |
800 | 434.638603 | 49349629 | 96.723318 | 49341034 |