4.5. Enhancements to NUMA in Red Hat Enterprise Linux 6
Red Hat Enterprise Linux 6 includes a number of enhancements to capitalize on the full potential of today's highly scalable hardware. This section gives a high-level overview of the most important NUMA-related performance enhancements provided by Red Hat Enterprise Linux 6.
4.5.1. Bare-metal and Scalability Optimizations
4.5.1.1. Enhancements in topology-awareness
The following enhancements allow Red Hat Enterprise Linux to detect low-level hardware and architecture details, improving its ability to automatically optimize processing on your system.
- enhanced topology detection
- This allows the operating system to detect low-level hardware details (such as logical CPUs, hyper threads, cores, sockets, NUMA nodes and access times between nodes) at boot time, and optimize processing on your system.
- completely fair scheduler
- This new scheduling mode ensures that runtime is shared evenly between eligible processes. Combining this with topology detection allows processes to be scheduled onto CPUs within the same socket to avoid the need for expensive remote memory access, and ensure that cache content is preserved wherever possible.
malloc
malloc
is now optimized to ensure that the regions of memory that are allocated to a process are as physically close as possible to the core on which the process is executing. This increases memory access speeds.- skbuff I/O buffer allocation
- Similarly to
malloc
, this is now optimized to use memory that is physically close to the CPU handling I/O operations such as device interrupts. - device interrupt affinity
- Information recorded by device drivers about which CPU handles which interrupts can be used to restrict interrupt handling to CPUs within the same physical socket, preserving cache affinity and limiting high-volume cross-socket communication.
4.5.1.2. Enhancements in Multi-processor Synchronization
Coordinating tasks between multiple processors requires frequent, time-consuming operations to ensure that processes executing in parallel do not compromise data integrity. Red Hat Enterprise Linux includes the following enhancements to improve performance in this area:
- Read-Copy-Update (RCU) locks
- Typically, 90% of locks are acquired for read-only purposes. RCU locking removes the need to obtain an exclusive-access lock when the data being accessed is not being modified. This locking mode is now used in page cache memory allocation: locking is now used only for allocation or deallocation operations.
- per-CPU and per-socket algorithms
- Many algorithms have been updated to perform lock coordination among cooperating CPUs on the same socket to allow for more fine-grained locking. Numerous global spinlocks have been replaced with per-socket locking methods, and updated memory allocator zones and related memory page lists allow memory allocation logic to traverse a more efficient subset of the memory mapping data structures when performing allocation or deallocation operations.