Publication Details




Towards Distributed Speculative Adaptive Anisotropic Parallel Mesh Generation


Kevin Garner, Christos Tsolakis, Polykarpos Thomadakis and Nikos Chrisochoides.


Accepted in AIAA Aviation Forum 2024, 2024




This paper presents the foundational elements of a distributed memory method for mesh generation that is designed to leverage concurrency offered by large-scale computing. To achieve this goal, meshing functionality is separated from performance aspects by utilizing a separate entity for each - CDT3D for mesh generation and PREMA for parallel runtime support. Although CDT3D is designed for scalability, lessons are presented regarding design oversight given that this code is optimized for execution within a single multicore node, and what additional measures were taken to enable the code’s integration into the distributed memory method as a black box. Whereas CDT3D targets the chip level, the distributed memory method serves to exploit coarse-grain parallelism at the node level. In the presented method, an initial mesh is data decomposed and subdomains are distributed amongst the nodes of a high-performance computing (HPC) cluster. Meshing operations within the shared memory code are designed to adopt a speculative execution model, enabling the strict adaptation of interior subdomain elements so that interface elements can be adapted in a separate step to maintain mesh conformity. Interface elements undergo several iterations of shifting so that they are adapted when their data dependencies are resolved. PREMA aids in this endeavor by providing constructs which enable asynchronous message passing between encapsulations of data, work load balancing, and migration capabilities all within a globally addressable namespace. PREMA also assists in establishing data dependencies between subdomains, thus enabling "neighborhoods" of subdomains to work independently of each other in performing interface shifts and adaptation. Preliminary results show that after several passes of interface shifts and adaptation, the presented method is able to produce meshes of comparable quality to those generated by the original shared memory CDT3D code. Given the costly overhead of collective communication seen by existing state-of-the-art software, relative communication performance of the presented distributed memory method also shows that its emphasis on avoiding global synchronization presents a potentially viable solution in achieving scalability when targeting large configurations of cores.




  [PDF]          [BibTex] 



[Return to Publication List]