CSLab Research 
The main objective of this activity is to combine the advantages of distributed memory architectures (scalability) and the advantages of the shared memory programming paradigm (easy and secure programming). Other relevant activities are the automatic extraction of parallelism and the subsequent mapping of algorithms onto different types of parallel architectures.
In this context, our research has focused on applying transformations to nested for-loops, in order to efficiently execute them in Non-Uniform Memory Access (NUMA) machines, such as the SCI-Clusters of the laboratory. In particular, we apply a transformation called tiling, or supernode transformation in order to minimize the communication latency effect on the total parallel execution time of the algorithms. Tiling method groups neighboring computation points of the nested loop into blocks called tiles or supernodes thus increasing the computation grain and decreasing both the communication volume and frequence. Applying the tiling techniques, we have developped a tool, which accepts C-like nested loops and partitions them into groups/tiles with small inter-communication requirements. The tool automatically generates efficient message passing code (using MPI) to be executed on SMPs or clusters. Future work contains comprarisons of certain variations of tiling (shape, size, etc) and code generation tecniques based on experimental results taken from the application of the tool on an SCI-cluster.
In addition, we explore several methods such as overlapping of communication and computations, in order to further reduce the total execution time of the transformed code. The targeted communication platforms include SCI, GM message passing over Myrinet interconnect, etc.
 Research Areas