
META TOPICPARENT 
name="ActivitiesProjects" 
Sparse MatrixVector Multiplication 
 Another important optimization that has been widely applied to SPMV is blocking. While such optimization was initially intended to increase register reuse and the locality of references to the input vector, the main advantage of blocking is that it can significantly reduce the working set of the algorithm, thus leading to less memory traffic, meaning better performance for the case of the memorybound SPMV. After attacking the memory intensiveness of the algorithm, the computational part of the kernel becomes more significant as a result. This allows for computation optimizations, such as vectorization, which can add a further performance boost to the blocked kernel. However, preliminary experiments have shown that the performance among different block shapes, which lead to the same working set, can have significantly varying performance, especially in the case of vectorization, due to a number of architectural issues raised from modern commodity microprocessors. Considering the multitude of blocked storage formats available for sparse matrices and the fact that block shapes can heavily affect the performance of SPMV, a performance model that could accurately predict the performance of a blocked storage format and then guide through the proper selection of both the right method and the right block shape would be desirable. Towards this direction, we have developed a performance model that accounts for both the memory and computation times of the blocked kernels. Our model is capable to better approximate the real execution time and lead to a more fit selection of method and block shape than earlier models, which solely based their prediction to memory traffic.
Publications 

< < 
 V. Karakasis, G. Goumas, and N. Koziris, "Performance models for blocked Sparse MatrixVector multiplication kernels," International Conference on Parallel Processing (ICPP), Vienna, Austria, September 2225, 2009. (to appear).
 V. Karakasis, G. Goumas, and N. Koziris, "A comparative study of blocking storage methods for sparse matrices on multicore architectures," IEEE International Conference on Computational Science and Engineering (CSE), Vancouver, BC, Canada, August 2931, 2009. (to appear).
 V. Karakasis, G. Goumas, and N. Koziris, "Exploring the effect of block shapes on the performance of sparse kernels," Proceedings of the 23rd International Parallel and Distributed Processing Symposium (IPDPSPDSEC), Rome, Italy, May 2529, 2009.
 K. Kourtis, G. Goumas, and N. Koziris, "Optimizing Sparse MatrixVector Multiplication Using Index and Value Compression," Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, 46 May, 2008
 G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis and N. Koziris, "Understanding the Performance of Sparse MatrixVector Multiplication," Proceedings of the 16th Euromicro International Conference on Parallel, Distributed and networkbased Processing (PDP2008), Toulouse, France, 1315 February, 2008

> > 
 V. Karakasis, G. Goumas, and N. Koziris, "Performance models for blocked Sparse MatrixVector multiplication kernels," International Conference on Parallel Processing (ICPP), Vienna, Austria, September 22–25, 2009.
 V. Karakasis, G. Goumas, and N. Koziris, "A comparative study of blocking storage methods for sparse matrices on multicore architectures," IEEE International Conference on Computational Science and Engineering (CSE), Vancouver, BC, Canada, August 29–31, 2009.
 V. Karakasis, G. Goumas, and N. Koziris, "Exploring the effect of block shapes on the performance of sparse kernels," Proceedings of the 23rd International Parallel and Distributed Processing Symposium (IPDPSPDSEC), Rome, Italy, May 25–29, 2009.
 K. Kourtis, G. Goumas, and N. Koziris, "Optimizing Sparse MatrixVector Multiplication Using Index and Value Compression," Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, 4–6 May, 2008
 G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis and N. Koziris, "Understanding the Performance of Sparse MatrixVector Multiplication," Proceedings of the 16th Euromicro International Conference on Parallel, Distributed and networkbased Processing (PDP2008), Toulouse, France, 13–15 February, 2008


Diploma Thesis
 Implementation of the Sparse MatrixVector Multiplication computational kernel on massively threaded graphics processors (pdf).
