CSLab Web>Arch>EmergingArchitectures (revision 2)~~EditAttach~~

Emerging Architectures

Until recently, microprocessor manufacturers have been seeking ways to continuously increase the clock frequency of their processors by using superscalar techniques, very deep pipelines and advanced microelectronic technologies. However, deep pipelines can pose quite large performance bottlenecks, and additionally, power consumption issues did not allow for further increase in the clock speed of the processors. Thus, a shift to multicore architectures is encountered the last years, where each core has more "modest" specifications, but a cleaner and more efficient microarchitecture. Nevertheless, even multicore architectures, in the form that we know them now, are becoming questionable, especially in terms of scalability, and a transition to the manycore era is not so obvious.

Quite recently, there have emerged architectures that can deliver performance incomparable to a classic superscalar multicore architecture, and at the same time they can be positioned at the low-end of the high-performance computing market. Among the most promising such architectures, there are the STI Cell Broadband Engine and the General Purpose Graphical Processing Units (GP-GPUs). Although these architectures have significant differences, they both share a basic concept: Instead of consuming the majority of the die area to build massive data caches, which is the case of conventional multicore architectures, they dedicate it to specialized SIMD (Single Instruction Multiple Data) processing elements, which act as coprocessors to a main or host processor.

Our group is now performing research on the field of both these emerging architectures. Having evaluated the performance and the architecture of the STI Cell through a set of microbenchmarks (diploma thesis), we are now moving to port SpMV on the Cell. Since Cell's architecture is quite different than a commodity multicore architecture, it poses a set of challenges concerning the data partitioning of the input matrix (static, dynamic, load balancing issues), as well as the algorithm itself. Similarly, we are now evaluating the performance of GP-GPUs by developing a set of microbenchmarks, and we are also investigating streaming programming models.