Sparse Matrix-Vector Multiplication (SPMV) is one of the most important computational kernels, lying at the heart of very effective and popular iterative solution methods like Conjugate Gradient (CG) and Generalized Minimal Residual (GMRES). Sparse matrices are generally stored in special formats (e.g. the Compressed Storage Row format) in order to save memory space and computations. SPMV is a challenging kernel posing various problems to the underlying architecture that need to be well understood in order consolidate an effective optimization process. Our work "Understanding the Performance of Sparse Matrix-Vector Multiplication" (pdf) published in Euromicro/PDP 2008 performs an extensive experimental evaluation of SPMV that leads to a number of optimization guidelines. The major problem that needs to be addressed is the memory bottleneck caused by the data intensity of the algorithm, which is further aggravated by the extra indexing information kept by the sparse matrix storage format. One approach to deal with this problem is to sacrifice CPU cycles in order to compress the data involved in the kernel and consequently alleviate the memory bus from excessive data transfers. A method to compress the indexing information of the matrix structure and the numerical values of the matrix is presented in CF'08 (pdf)

Another important optimization that has been widely applied to SPMV is blocking. While such optimization was initially intended to increase register reuse and the locality of references to the input vector, the main advantage of blocking is that it can significantly reduce the working set of the algorithm, thus leading to less memory traffic, meaning better performance for the case of the memory-bound SPMV. After attacking the memory intensiveness of the algorithm, the computational part of the kernel becomes more significant as a result. This allows for computation optimizations, such as vectorization, which can add a further performance boost to the blocked kernel. However, preliminary experiments have shown that the performance among different block shapes, which lead to the same working set, can have significantly varying performance, especially in the case of vectorization, due to a number of architectural issues raised from modern commodity microprocessors. Considering the multitude of blocked storage formats available for sparse matrices and the fact that block shapes can heavily affect the performance of SPMV, a performance model that could accurately predict the performance of a blocked storage format and then guide through the proper selection of both the right method and the right block shape would be desirable. Towards this direction, we have developed a performance model that accounts for both the memory and computation times of the blocked kernels. Our model is capable to better approximate the real execution time and lead to a more fit selection of method and block shape than earlier models, which solely based their prediction to memory traffic.

- G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis and N. Koziris, "Understanding the Performance of Sparse Matrix-Vector Multiplication," Proceedings of the 16th Euromicro International Conference on Parallel, Distributed and network-based Processing (PDP2008), Toulouse, France, 13-15 February, 2008
- K. Kourtis, G. Goumas, and N. Koziris, "Optimizing Sparse Matrix-Vector Multiplication Using Index and Value Compression," Proceedings of the ACM International Conference on Computing Frontiers, Ischia, Italy, 4-6 May, 2008

Topic revision: r13 - 2009-01-07 - VasileiosKarakasis

No permission to view TWiki.WebTopBar