Difference: Arch (1 vs. 10)

Revision 102010-04-26 - KonstantinosNikas

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Line: 20 to 20
 
Added:
>
>
 

Revision 92008-04-01 - KonstantinosNikas

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Line: 13 to 13
 
  • Simultaneous multithreading (SMT)
  • Cell Broadband Engine (Cell)
  • General-purpose computing on graphics processing units (GPGPU)
Added:
>
>
  • Caches for Chip Multiprocessor Architectures (CMPs)
 

Relevant Project Activites

Added:
>
>
 

Revision 82008-03-13 - KorniliosKourtis

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Added:
>
>
One of the major concerns regarding our group's research activity in the field of computer architecture is the exploration and evaluation of modern and emerging architecture designs. Recent developments in microprocessor technology indicate a paradigm shift that is likely to alter the present programming methodologies (see DDJ article). We aim at the exploration of these new architectures, while focusing especially on multithreaded designs. Some examples of our involvement include:
  • Typical (Intel Core Duo / Opteron) and Aggressive multicore designs (Niagara)
  • Simultaneous multithreading (SMT)
  • Cell Broadband Engine (Cell)
  • General-purpose computing on graphics processing units (GPGPU)

Relevant Project Activites

<--

Links

-->
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"

Revision 72008-03-11 - KorniliosKourtis

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Deleted:
<
<

Software Optimization

Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited, yet important, class of sparse matrices that contain a small number of distinct values.

Operating Systems

Efficient sharing of block devices over an interconnection network is an important step in deploying a shared-disk parallel file system on a cluster of SMPs. We present gmbock, a client/server system for network sharing of storage devices over Myrinet, which uses an optimized data path in order to transfer data directly from the storage medium to the NIC, bypassing the host CPU and main memory bus. Its design enhances existing programming abstractions, combining the user level networking characteristics of Myrinet with Linux's virtual memory infrastructure, in order to construct the datapath in a way that is independent of the type of block device used. Experimental evaluation of a prototype system shows that remote I/O bandwidth can improve up to 36%, compared to an RDMA-based implementation. Moreover, interference on the main memory bus of the host is minimized, leading to an up to 41% improvement in the execution time of memory-intensive applications.

Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. To overcome the architectural limitation of a low number of outstanding requests in gmblock, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.

 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"

Revision 62008-03-11 - NikosAnastopoulos

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Revision 52008-03-08 - AnastasiosNanos

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Line: 8 to 8
 

Operating Systems

Changed:
<
<
Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. We study the performance of gmblock, an nbd server over Myrinet utilizing a direct disk-to-NIC data path which bypasses the CPU and main memory bus. To overcome the architectural limitation of a low number of outstanding requests, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.
>
>
Efficient sharing of block devices over an interconnection network is an important step in deploying a shared-disk parallel file system on a cluster of SMPs. We present gmbock, a client/server system for network sharing of storage devices over Myrinet, which uses an optimized data path in order to transfer data directly from the storage medium to the NIC, bypassing the host CPU and main memory bus. Its design enhances existing programming abstractions, combining the user level networking characteristics of Myrinet with Linux's virtual memory infrastructure, in order to construct the datapath in a way that is independent of the type of block device used. Experimental evaluation of a prototype system shows that remote I/O bandwidth can improve up to 36%, compared to an RDMA-based implementation. Moreover, interference on the main memory bus of the host is minimized, leading to an up to 41% improvement in the execution time of memory-intensive applications.
 
Added:
>
>
Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. To overcome the architectural limitation of a low number of outstanding requests in gmblock, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.
 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"

Revision 42008-03-07 - VasileiosKarakasis

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Software Optimization

Changed:
<
<
Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited yet important class of sparse matrix that contain a small number of distinct values.
>
>
Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited, yet important, class of sparse matrices that contain a small number of distinct values.
 

Operating Systems

Revision 32008-03-06 - ArisSotiropoulos

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Line: 11 to 11
 Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. We study the performance of gmblock, an nbd server over Myrinet utilizing a direct disk-to-NIC data path which bypasses the CPU and main memory bus. To overcome the architectural limitation of a low number of outstanding requests, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.
Deleted:
<
<

Interconnects

Publications

 
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"

Revision 22008-03-04 - ArisSotiropoulos

Line: 1 to 1
 
META TOPICPARENT name="WebHome"

Computer Architecture

Line: 10 to 10
  Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. We study the performance of gmblock, an nbd server over Myrinet utilizing a direct disk-to-NIC data path which bypasses the CPU and main memory bus. To overcome the architectural limitation of a low number of outstanding requests, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.
Deleted:
<
<

Publications

 
Deleted:
<
<
 \ No newline at end of file
Added:
>
>

Interconnects

Publications

META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649531" name="04227947.pdf" path="04227947.pdf" size="792734" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649356" name="01386058.pdf" path="01386058.pdf" size="312577" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649449" name="01655680.pdf" path="01655680.pdf" size="237494" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT attr="h" autoattached="1" comment="" date="1204649651" name="epy2003.pdf" path="epy2003.pdf" size="459614" user="Main.ArisSotiropoulos" version="1"
META FILEATTACHMENT attr="h" autoattached="1" comment="Scheduling" date="1204648880" name="01271475.pdf" path="01271475.pdf" size="510858" user="Main.ArisSotiropoulos" version="1"

Revision 12008-03-03 - GiorgosVerigakis

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="WebHome"

Computer Architecture

Software Optimization

Previous research work has identified memory bandwidth as the main bottleneck of the ubiquitous Sparse Matrix-Vector Multiplication kernel. To attack this problem, we aim at reducing the overall data volume of the algorithm. Typical sparse matrix representation schemes store only the non-zero elements of the matrix and employ additional indexing information to properly iterate over these elements. In this paper we propose two distinct compression methods targeting index and numerical values respectively. We perform a set of experiments on a large real-world matrix set and demonstrate that the index compression method can be applied successfully to a wide range of matrices. Moreover, the value compression method is able to achieve impressive speedups in a more limited yet important class of sparse matrix that contain a small number of distinct values.

Operating Systems

Providing scalable clustered storage in a cost-effective way depends on the availability of an efficient network block device (nbd) layer. We study the performance of gmblock, an nbd server over Myrinet utilizing a direct disk-to-NIC data path which bypasses the CPU and main memory bus. To overcome the architectural limitation of a low number of outstanding requests, we focus on overlapping read and network I/O for a single request, in order to improve throughput. To this end, we introduce the concept of synchronized send operations and present an implementation on Myrinet/GM, based on custom modifications to the NIC firmware and associated userspace library. Compared to a network block sharing system over standard GM and the base version of gmblock, our enhanced implementation supporting synchronized sends delivers 81% and 44% higher throughput for streaming block I/O, respectively.

Publications

 
This site is powered by the TWiki collaboration platform Powered by Perl

No permission to view TWiki.WebBottomBar