Difference: HPC (1 vs. 4)

Revision 42008-03-12 - GeorgiosGoumas

  High Performance Computing
-<
<
+ Parallel Systems 
The main objective of this activity is to combine the advantages of distributed memory architectures (scalability) and the advantages of the shared memory programming paradigm (easy and secure programming). Other relevant activities are the automatic extraction of parallelism and the subsequent mapping of algorithms onto different types of parallel architectures.

In this context, our research has focused on applying transformations to nested for-loops, in order to efficiently execute them in Non-Uniform Memory Access (NUMA) machines, such as the SCI-Clusters of the laboratory. In particular, we apply a transformation called tiling, or supernode transformation in order to minimize the communication latency effect on the total parallel execution time of the algorithms. Tiling method groups neighboring computation points of the nested loop into blocks called tiles or supernodes thus increasing the computation grain and decreasing both the communication volume and frequence. Applying the tiling techniques, we have developped a tool, which accepts C-like nested loops and partitions them into groups/tiles with small inter-communication requirements. The tool automatically generates efficient message passing code (using MPI) to be executed on SMPs or clusters. Future work contains comprarisons of certain variations of tiling (shape, size, etc) and code generation tecniques based on experimental results taken from the application of the tool on an SCI-cluster.

In addition, we explore several methods such as overlapping of communication and computations, in order to further reduce the total execution time of the transformed code. The targeted communication platforms include SCI, GM message passing over Myrinet interconnect, etc.
->
>
+The main objective of our group in this research area is to optimize critical software (mainly computationally or memory intensive applications) executing on high performance computing platforms. To achieve this intricate goal we need to delve as deeply as possible into the core characteristics of the applications and the underlying architecture. We focus on stencil computations, Sparse Matrix-Vector Multiplication SPMV, PDE solvers, graph algorithms and others. As far as the computing platforms are concerned, we work on PC clusters, multicore and multithreaded (SMT) microprocessors, and emerging architectures like GPGPUs and the CELL B/E.
  Publications
-<
<
+ G. Goumas, A. Sotiropoulos, N. Koziris "Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping", Proceedings of the 2001 International Parallel and Distributed Processing Symposium (IPDPS2001), IEEE Press, San Francisco, California, April  2001 (Best paper award) (pdf)
 
This paper proposes a new method for the problem of
minimizing the execution time of nested for-loops using a
tiling transformation. In our approach, we are interested
not only in tile size and shape according to the required
communication to computation ratio, but also in overall
completion time. We select a time hyperplane to execute
different tiles much more efficiently by exploiting the inherent
overlapping between communication and computation
phases among successive, atomic tile executions. We assign
tiles to processors according to the tile space boundaries,
thus considering the iteration space bounds. Our schedule
considerably reduces overall completion time under the assumption
that some part from every communication phase
can be efficiently overlapped with atomic, pure tile computations.
The overall schedule resembles a pipelined datapath
where computations are not anymore interleaved with
sends and receives to non-local processors. Experimental
results in a cluster of Pentiums by using various MPI send
primitives show that the total completion time is significantly
reduced.
 

 N. Koziris, A. Sotiropoulos, G. Goumas "A Pipelined Schedule to Minimize Comple-tion Time for Loop Tiling with Computation and Communication Overlapping", Journal of Parallel and Distributed Computing, Volume 63, Issue 11, November 2003, pp. 1138-1151 (pdf)
  M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris, P. Tsanakas "Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs", The Journal of Supercomputing, Volume 33, Issue 3, September 2005, pp. 197 - 226 (pdf)
  A. Sotiropoulos, G. Tsoukalas, N. Koziris "Efficient Utilization of Memory Mapped NICs onto Clusters using Pipelined Schedules", Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, Berlin, Germany, May 2002 (pdf)
  A. Sotiropoulos, G. Tsoukalas, N. Koziris "Enhancing the Performance of Tiled Loop Execution onto Clusters using Memory Mapped Interfaces and Pipelined Schedules", Proceedings of the 2002 Workshop on Communication Architecture for Clusters (CAC2002) held in conjunction with Int'l Parallel and Distributed Processing Symposium (IPDPS2002), Fort Lauderdale, Florida, April 2002 (pdf)
  M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris "Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs using Memory Mapped Network Interfaces", Proceedings of the ACM/IEEE Supercomputing 2002: High Performance Networking and Computing Conference (SC2002), Baltimore, Maryland, November 2002 (pdf)
  A. Sotiropoulos, G. Tsoukalas, N. Koziris "A Pipelined Execution of Tiled Nested Loops onto a Cluster of PCs using PCI-SCI NICs", Proceedings of the 2001 SCI-Europe Conference, Dublin, Ireland, October 2001 (ps)
  M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris "A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping", Proceedings of the Workshop on Compile/Runtime Techniques for Parallel Computing, held in conjunction with 2002 International Conference on Parallel Processing (ICPP-2002), Vancouver, Canada, August 2002 (pdf)
  A. Sotiropoulos, N. Koziris "A Pipelined Schedule for Loop Tiling to Minimize Overall Completion Time", Proceedings of the 8th Panhellenic Conference on Informatics, Nicosia, Cyprus, November 2001 (ps)
->
>
+ G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis and N. Koziris, "Understanding the Performance of Sparse Matrix-Vector Multiplication," Proceedings of the 16th Euromicro International Conference on Parallel, Distributed and network-based Processing (PDP2008), Toulouse, France, 13-15 February, 2008
  K. Kourtis, G. Goumas, and N. Koziris, "Optimizing Sparse Matrix-Vector Multiplication Using Index and Value Compression," Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, 4-6 May, 2008
  N. Drosinos and N. Koziris, “Efficient Hybrid Parallelization of Tiled Algorithms on SMP Clusters,” International Journal of Computational Science and Engineering,  2007 (pdf)
  G. Goumas, N. Drosinos, M. Athanasaki and N. Koziris, “Message-Passing Code Generation for Non-rectangular Tiling Transformations,” Journal of Parallel Computing, pp. 711--732, 32(10), November 2006 (pdf)
  N. Drosinos, G. Goumas and N. Koziris, “Selecting the Tile Shape to Reduce the Total Communication Volume,” Proceedings of the 2006 International Parallel and Distributed Processing Symposium (IPDPS2006), Rhodes Island, Greece, 25-29 April, 2006 (pdf)
  M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris, P. Tsanakas, “Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs,” The Journal of Supercomputing, Volume 33, Issue 3, September 2005, pp. 197 - 226 (pdf)
  N. Drosinos and N. Koziris, “Performance Comparison of Pure MPI vs Hybrid MPI-OpenMP Parallelization Models on SMP Clusters,” Proceedings of the 18th International Parallel & Distributed Processing Symposium (IPDPS 2004), p. 15a (CD-ROM), Santa Fe, New Mexico, April 2004 (pdf)
  M. Athanasaki, E. Koukis, N. Koziris, “Scheduling of Tiled Nested Loops onto a Cluster with a Fixed Number of SMP Nodes,” 12th Euromicro Conference on Parallel, Distributed and Network based Processing (PDP '04), pp.424-433, A Coruna, Spain, February 11-13, 2004 (pdf)
  N. Koziris, A. Sotiropoulos, G. Goumas, “A Pipelined Schedule to Minimize Comple-tion Time for Loop Tiling with Computation and Communication Overlapping,” Journal of Parallel and Distributed Computing, Volume 63, Issue 11, November 2003, pp. 1138-1151 (pdf)
  M. Athanasaki, E. Koukis, N. Koziris, “Efficient Scheduling of Tiled Iteration Spaces onto a Fixed Size Parallel Architecture,” 9th Panhellenic Conference in Informatics, pp.178-192, Thessaloniki, Greece, November 21-23, 2003 (pdf)
  G. Goumas, M. Athanasaki and N. Koziris, “An Efficient Code Generation Technique for Tiled Iteration Spaces,” IEEE Transactions on Parallel and Distributed Systems, Vol. 14, No 10, October 2003 (pdf)
  A. Sotiropoulos, G. Tsoukalas, N. Koziris, “Efficient Utilization of Memory Mapped NICs onto Clusters using Pipelined Schedules,” Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, Berlin, Germany, May 2002 (pdf)
  A. Sotiropoulos, G. Tsoukalas, N. Koziris, “Enhancing the Performance of Tiled Loop Execution onto Clusters using Memory Mapped Interfaces and Pipelined Schedules,” Proceedings of the 2002 Workshop on Communication Architecture for Clusters (CAC2002) held in conjunction with Int'l Parallel and Distributed Processing Symposium (IPDPS2002), Fort Lauderdale, Florida, April 2002 (pdf)
  M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris, “Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs using Memory Mapped Network Interfaces,” Proceedings of the ACM/IEEE Supercomputing 2002: High Performance Networking and Computing Conference (SC2002), Baltimore, Maryland, November 2002 (pdf)
  M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris, “A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping,” Proceedings of the Workshop on Compile/Runtime Techniques for Parallel Computing, held in conjunction with 2002 International Conference on Parallel Processing (ICPP-2002), Vancouver, Canada, August 2002 (pdf)
  A. Sotiropoulos, G. Tsoukalas, N. Koziris, “A Pipelined Execution of Tiled Nested Loops onto a Cluster of PCs using PCI-SCI NICs,” Proceedings of the 2001 SCI-Europe Conference, Dublin, Ireland, October 2001 (ps)
  G. Goumas, A. Sotiropoulos, N. Koziris, “Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping,” Proceedings of the 2001 International Parallel and Distributed Processing Symposium (IPDPS2001), IEEE Press, San Francisco, California, April  2001 (Best paper award) (pdf)

Revision 32008-03-11 - VangelisKoukis

Line: 1 to 1

Deleted:

<
<

META TOPICPARENT	name="Parallel"

High Performance Computing

Parallel Systems

Revision 22008-03-06 - GiorgosVerigakis

Line: 1 to 1

META TOPICPARENT	name="Parallel"

High Performance Computing

Line: 40 to 40

A. Sotiropoulos, G. Tsoukalas, N. Koziris "A Pipelined Execution of Tiled Nested Loops onto a Cluster of PCs using PCI-SCI NICs", Proceedings of the 2001 SCI-Europe Conference, Dublin, Ireland, October 2001 (ps)
M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris "A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping", Proceedings of the Workshop on Compile/Runtime Techniques for Parallel Computing, held in conjunction with 2002 International Conference on Parallel Processing (ICPP-2002), Vancouver, Canada, August 2002 (pdf)
A. Sotiropoulos, N. Koziris "A Pipelined Schedule for Loop Tiling to Minimize Overall Completion Time", Proceedings of the 8th Panhellenic Conference on Informatics, Nicosia, Cyprus, November 2001 (ps)

Deleted:

<
<

-- ArisSotiropoulos - 06 Mar 2008

Revision 12008-03-06 - ArisSotiropoulos

Line: 1 to 1

Added:

>
>

META TOPICPARENT	name="Parallel"

High Performance Computing

Parallel Systems

The main objective of this activity is to combine the advantages of distributed memory architectures (scalability) and the advantages of the shared memory programming paradigm (easy and secure programming). Other relevant activities are the automatic extraction of parallelism and the subsequent mapping of algorithms onto different types of parallel architectures.

In this context, our research has focused on applying transformations to nested for-loops, in order to efficiently execute them in Non-Uniform Memory Access (NUMA) machines, such as the SCI-Clusters of the laboratory. In particular, we apply a transformation called tiling, or supernode transformation in order to minimize the communication latency effect on the total parallel execution time of the algorithms. Tiling method groups neighboring computation points of the nested loop into blocks called tiles or supernodes thus increasing the computation grain and decreasing both the communication volume and frequence. Applying the tiling techniques, we have developped a tool, which accepts C-like nested loops and partitions them into groups/tiles with small inter-communication requirements. The tool automatically generates efficient message passing code (using MPI) to be executed on SMPs or clusters. Future work contains comprarisons of certain variations of tiling (shape, size, etc) and code generation tecniques based on experimental results taken from the application of the tool on an SCI-cluster.

In addition, we explore several methods such as overlapping of communication and computations, in order to further reduce the total execution time of the transformed code. The targeted communication platforms include SCI, GM message passing over Myrinet interconnect, etc.

Publications

G. Goumas, A. Sotiropoulos, N. Koziris "Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping", Proceedings of the 2001 International Parallel and Distributed Processing Symposium (IPDPS2001), IEEE Press, San Francisco, California, April 2001 (Best paper award) (pdf)

This paper proposes a new method for the problem of minimizing the execution time of nested for-loops using a tiling transformation. In our approach, we are interested not only in tile size and shape according to the required communication to computation ratio, but also in overall completion time. We select a time hyperplane to execute different tiles much more efficiently by exploiting the inherent overlapping between communication and computation phases among successive, atomic tile executions. We assign tiles to processors according to the tile space boundaries, thus considering the iteration space bounds. Our schedule considerably reduces overall completion time under the assumption that some part from every communication phase can be efficiently overlapped with atomic, pure tile computations. The overall schedule resembles a pipelined datapath where computations are not anymore interleaved with sends and receives to non-local processors. Experimental results in a cluster of Pentiums by using various MPI send primitives show that the total completion time is significantly reduced.

N. Koziris, A. Sotiropoulos, G. Goumas "A Pipelined Schedule to Minimize Comple-tion Time for Loop Tiling with Computation and Communication Overlapping", Journal of Parallel and Distributed Computing, Volume 63, Issue 11, November 2003, pp. 1138-1151 (pdf)
M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris, P. Tsanakas "Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs", The Journal of Supercomputing, Volume 33, Issue 3, September 2005, pp. 197 - 226 (pdf)
A. Sotiropoulos, G. Tsoukalas, N. Koziris "Efficient Utilization of Memory Mapped NICs onto Clusters using Pipelined Schedules", Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, Berlin, Germany, May 2002 (pdf)
A. Sotiropoulos, G. Tsoukalas, N. Koziris "Enhancing the Performance of Tiled Loop Execution onto Clusters using Memory Mapped Interfaces and Pipelined Schedules", Proceedings of the 2002 Workshop on Communication Architecture for Clusters (CAC2002) held in conjunction with Int'l Parallel and Distributed Processing Symposium (IPDPS2002), Fort Lauderdale, Florida, April 2002 (pdf)
M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris "Pipelined Scheduling of Tiled Nested Loops onto Clusters of SMPs using Memory Mapped Network Interfaces", Proceedings of the ACM/IEEE Supercomputing 2002: High Performance Networking and Computing Conference (SC2002), Baltimore, Maryland, November 2002 (pdf)
A. Sotiropoulos, G. Tsoukalas, N. Koziris "A Pipelined Execution of Tiled Nested Loops onto a Cluster of PCs using PCI-SCI NICs", Proceedings of the 2001 SCI-Europe Conference, Dublin, Ireland, October 2001 (ps)
M. Athanasaki, A. Sotiropoulos, G. Tsoukalas, N. Koziris "A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping", Proceedings of the Workshop on Compile/Runtime Techniques for Parallel Computing, held in conjunction with 2002 International Conference on Parallel Processing (ICPP-2002), Vancouver, Canada, August 2002 (pdf)
A. Sotiropoulos, N. Koziris "A Pipelined Schedule for Loop Tiling to Minimize Overall Completion Time", Proceedings of the 8th Panhellenic Conference on Informatics, Nicosia, Cyprus, November 2001 (ps)

-- ArisSotiropoulos - 06 Mar 2008

View topic | History: r4 < r3 < r2 < r1 | More topic actions...

No permission to view TWiki.WebBottomBar