Difference between revisions of "PDR.AFLR"

From crtc.cs.odu.edu
Jump to: navigation, search
(remove unrelated text)
Line 1: Line 1:
 
= Introduction =
 
= Introduction =
 
+
For the last 30 years, legacy Finite Element (FE) mesh generation methods and software were typically developed with a focus on high performance for single core architectures and without any thought towards scalability for a large number of cores. These codes are still used for production in several industries, including NASA. However, NASA’s Computational Fluid Dynamics (CFD) 2030 Vision will require those highly functional codes to run on large-scale parallel architectures. Highly optimized sequential versions of existing state-of-the-art mesh generation codes, in addition to geometric and numerical challenges imposed by the nature of mesh generation complexity, makes their parallelization a highly challenging problem. In this project, we focus on one of the top, industrial strength mesh generators, called Advancing Front Local Reconnection (AFLR), which is used by NASA, the DoD, and DoE, as well as a number of aerospace industry top research groups. AFLR has not been fully parallelized to properly utilize large-scale supercomputing hardware.
 
== Overview ==
 
== Overview ==
''' TODO '''
+
Modifications were made to AFLR to enable its execution within the Parallel Data Refinement (PDR, a generalized version of Parallel Delaunay Refinement meant to be capable of utilizing any mesh generator, i.e. code re-use) method and software framework (specifically the non-progressive approach) while maintaining AFLR’s full functionality and providing stability i.e., ensuring that the quality of the mesh generated (from each of the individually refined subdomains) is comparable to that of a mesh generated sequentially (by serial AFLR). The quality is defined in terms of the shape and number of the elements. PDR decomposes a meshing problem by using an octree consisting of numerous leaves, or subdomains, that each hold a part of the mesh. The general idea of PDR is to concurrently refine the octree leaves while maintaining mesh conformity. This methodology is proven to generate a conforming mesh after refining the subdomains generated from an input geometry using data decomposition.
The goal of this project is the development of a parallel mesh generator using CRTC’s PDR theory, which mathematically guarantees the following mesh generation requirements:
 
:# '''Stability''': the quality of the mesh generated in parallel must be comparable to that of a mesh generated sequentially. The quality is defined in terms of the shape of the elements (using a chosen space-dependent metric), and the number of the elements (fewer is better for the same shape constraint).
 
:# '''Robustness''': the ability of the software to correctly and efficiently process any input data. Operator intervention into a massively parallel computation is not only highly expensive, but most likely infeasible due to the large number of concurrently processed sub-problems.
 
:# '''Code re-use''': a modular design of the parallel software that builds upon a previously designed sequential meshing code, such that it can be replaced and/or updated with a minimal effort. Due to the complexity of meshing codes, this is the only practical approach for keeping up with the ever-evolving sequential algorithms.
 
:# '''Scalability''': the ratio of the time taken by the best sequential implementation to the time taken by the parallel implementation. The speedup is always limited by the inverse of the sequential fraction of the software, and therefore all non-trivial stages of the computation must be parallelized to leverage the current architectures with millions of cores.
 
:# ''' Reproducibility ''': (weak & strong)
 
  
= Summary =
+
= Reproducibility =
 +
AFLR meets the reproducibility requirement of PDR, as it maintains weak reproducibility (which can be seen in the below example of refinement for the missile geometry).
  
= Reproducibility =
 
  
 
= Stability =
 
= Stability =
 +
Preliminary results from the initial implementation of the sequential, data-decomposed AFLR show that PDR’s data decomposition does not hinder the quality of the output as it can be seen from the below quality statistics of meshes in comparison to their quality when generated by the serial AFLR. The integration of AFLR within the PDR framework does in fact maintain its stability.
  
 
= Scalability =
 
= Scalability =
 +
The parallelization of AFLR is currently a work-in-progress. During runtime, the PDR.AFLR method will expose data decomposition information (number of subdomains waiting to be refined) to our underlying run-time system, PREMA 2.0. In turn, PREMA 2.0 will facilitate work-load balancing and guide the program’s execution towards the most efficient utilization of hardware resources. PREMA 2.0 is a parallel runtime system that supports one-sided communication, global address space and load balancing for adaptive and irregular applications. This runtime system serves as an underlying layer that alleviates the burden of monitoring data and computations in parallel, an ideal candidate to support the execution of PDR.AFLR.

Revision as of 10:18, 29 March 2018

Introduction

For the last 30 years, legacy Finite Element (FE) mesh generation methods and software were typically developed with a focus on high performance for single core architectures and without any thought towards scalability for a large number of cores. These codes are still used for production in several industries, including NASA. However, NASA’s Computational Fluid Dynamics (CFD) 2030 Vision will require those highly functional codes to run on large-scale parallel architectures. Highly optimized sequential versions of existing state-of-the-art mesh generation codes, in addition to geometric and numerical challenges imposed by the nature of mesh generation complexity, makes their parallelization a highly challenging problem. In this project, we focus on one of the top, industrial strength mesh generators, called Advancing Front Local Reconnection (AFLR), which is used by NASA, the DoD, and DoE, as well as a number of aerospace industry top research groups. AFLR has not been fully parallelized to properly utilize large-scale supercomputing hardware.

Overview

Modifications were made to AFLR to enable its execution within the Parallel Data Refinement (PDR, a generalized version of Parallel Delaunay Refinement meant to be capable of utilizing any mesh generator, i.e. code re-use) method and software framework (specifically the non-progressive approach) while maintaining AFLR’s full functionality and providing stability i.e., ensuring that the quality of the mesh generated (from each of the individually refined subdomains) is comparable to that of a mesh generated sequentially (by serial AFLR). The quality is defined in terms of the shape and number of the elements. PDR decomposes a meshing problem by using an octree consisting of numerous leaves, or subdomains, that each hold a part of the mesh. The general idea of PDR is to concurrently refine the octree leaves while maintaining mesh conformity. This methodology is proven to generate a conforming mesh after refining the subdomains generated from an input geometry using data decomposition.

Reproducibility

AFLR meets the reproducibility requirement of PDR, as it maintains weak reproducibility (which can be seen in the below example of refinement for the missile geometry).


Stability

Preliminary results from the initial implementation of the sequential, data-decomposed AFLR show that PDR’s data decomposition does not hinder the quality of the output as it can be seen from the below quality statistics of meshes in comparison to their quality when generated by the serial AFLR. The integration of AFLR within the PDR framework does in fact maintain its stability.

Scalability

The parallelization of AFLR is currently a work-in-progress. During runtime, the PDR.AFLR method will expose data decomposition information (number of subdomains waiting to be refined) to our underlying run-time system, PREMA 2.0. In turn, PREMA 2.0 will facilitate work-load balancing and guide the program’s execution towards the most efficient utilization of hardware resources. PREMA 2.0 is a parallel runtime system that supports one-sided communication, global address space and load balancing for adaptive and irregular applications. This runtime system serves as an underlying layer that alleviates the burden of monitoring data and computations in parallel, an ideal candidate to support the execution of PDR.AFLR.