Mapping unstructured mesh codes onto local memory parallel architectures
Jones, Beryl Wyn (1994) Mapping unstructured mesh codes onto local memory parallel architectures. PhD thesis, University of Greenwich.
Beryl_Wyn_Jones_1994.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.
Initial work on mapping CFD codes onto parallel systems focused upon software which employed structured meshes. Increasingly, many large scale CFD codes are being based upon unstructured meshes. One of the key problems when implementing such large scale unstructured problems on a distributed memory machine is the question of how to partition the underlying computational domain efficiently. It is important that all processors are kept busy for as large a proportion of the time as possible and that the amount, level and frequency of communication should be kept to a minimum.
Proposed techniques for solving the mapping problem have separated out the solution into two distinct phases. The first phase is to partition the computational domain into cohesive sub-regions. The second phase consists of embedding these sub-regions onto the processors. However, it has been shown that performing these two operations in isolation can lead to poor mappings and much less optimal communication time.
In this thesis we develop a technique which simultaneously takes account of the processor topology whilst identifying the cohesive sub-regions. Our approach is based on an unstructured mesh decomposition method that was originally developed by Sadayappan et al [SER90] for a hypercube. This technique forms a basis for a method which enables a decomposition to an arbitrary number of processors on a specified processor network topology. Whilst partitioning the mesh, the optimisation method takes into account the processor topology by minimising the total interprocessor communication.
The problem with this technique is that it is not suitable for dealing with very large meshes since the calculations often require prodigious amounts of computing processing power.
The problem can be overcome by creating clusters of the original elements and using this to create a reduced network which is homomorphic to the original mesh. The technique can now be applied to the image network with comparative ease. The clusters are created using an efficient graph bisection method. The coarseness of the reduced mesh inevitably leads to a degradation of the solution. However, it is possible to refine the resultant partition to recapture some of the richness of the original mesh and hence achieve reasonable partitions.
One of the issues to be addressed is the level of granuality to obtain the best balance between computational efficiency and optimality of the solution. Some progress has been made in trying to find an answer to this important issue.
In this thesis, we show how the above technique can be effectively utilised in large scale computations. Results include testing the above technique on large scale meshes for complex flow domains.
|Item Type:||Thesis (PhD)|
|Additional Information:||uk.bl.ethos.282966 This Research Programme was funded by the Science and Engineering Research Council.|
|Uncontrolled Keywords:||parallel processing, distributed processing, computer software, mathematical statistics, operations research,|
|Subjects:||Q Science > QA Mathematics > QA76 Computer software
Q Science > QC Physics
|Pre-2014 Departments:||School of Computing & Mathematical Sciences
School of Computing & Mathematical Sciences > Centre for Numerical Modelling & Process Analysis
|Last Modified:||14 Oct 2016 09:16|
Actions (login required)
Downloads per month over past year