The Memory Bandwidth aware Userspace Scheduler (MemBUS)

Symmetric Multiprocessor (SMPs) are commonly used as the building blocks for scalable clustered systems. Often they are combined with modern, low-latency, high-bandwidth interconnects such as Myrinet. However, their design leads to contention among processors for access to shared resources, which can limit their efficiency significantly.

Such resources include the Front Side Bus (FSB) used to interconnect processors with one another and with the main memory controller (Northbridge), mainly used in designs based on Intel chipsets, the peripheral bus (commonly PCI/PCI-X) and the node Network Interface Card (NIC).

In the context of this activity, we first explored [PDP'05] the impact of memory contention on a single cluster node, when running compute-intensive applications. The experiments showed contention on the memory bus can limit the degree of parallelism achieved, leading to severe degradation in attainable performance. Moreover, we highlighted that the DMA engines on the Myrinet NIC can induce significant load on the memory subsystem. This means that communication can interfere with computation even when employing a zero-copy, OS-bypass protocol such as Myrinet/GM, which removes the CPU from the critical path of communication completely.

To attack the problem, we try to enhance local scheduling by taking into account run-time information on the memory bandwidth demands of each individual process. Memory behavior is monitored dynamically by using the performance monitoring counters provided by modern microprocesors. The proposed scheduling policy tries to increase throughput for multiprogrammed workloads, by adjusting the selection of processes to be run simultaneously on an SMP node so as to avoid memory bus saturation.

The policy has been implemented in userspace, as the Memory Bandwidth aware Userpace Scheduler (MemBUS). Scheduling decisions are enforced with a combination of the perfctr performance-monitoring framework, the ptrace() Linux system call and standard UNIX SIGSTOP / SIGCONT signaling.

Later on, MemBUS was expanded [ICPADS 2006] to support cluster-wide gang scheduling; context switches are coordinated so that all peer processes belonging to the same job are scheduled simultaneously across the cluster, while trying to minimize interference due to contention for access to main memory and to the NIC on each node. Experimental evaluation based on the NAS parallel benchmark suite showed singificant increase in throughput compared to uncoordinated local scheduling.


Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2008-03-11 - VangelisKoukis

No permission to view TWiki.WebTopBar

This site is powered by the TWiki collaboration platform Powered by Perl

No permission to view TWiki.WebBottomBar