index.md (4294B)
1 +++ 2 title = 'Managing physical memory' 3 +++ 4 5 # Managing physical memory 6 7 ## Basic data structures 8 Physical memory -- DRAM (Dynamic random access memory) 9 - memory cells with 1-bit data 10 - each cell has 1 capacitor + 1 transistor 11 - charge/discharge capacitor == 1/0 bit value 12 - capacitors leak charge → need periodic refresh (difference with SRAM) 13 - organized in: channels, DIMMs, ranks, chips, banks, rows/ 14 15 ![DRAM layout](dram-layout.png) 16 17 - NUMA: each CPU package has own memory, can access memory of others with some latency 18 19 Managing physical memory on Linux 20 - logically divided in number of consecutive physical memory pages (page frames) 21 - identified by page frame numbers (PFNs) 22 - frames organized in 23 - nodes - "banks" 24 - zones - tagged regions for each node 25 - when under memory pressure, zone boundaries become "blurry" 26 - watermarking strategy to free pages (kswapd): WMARK\_MIN (trigger direct reclaim), WMARK\_LOW (trigger async reclaim), WMARK\_HIGH (stop reclaiming) 27 - pages - physical page frames per zone 28 - 4KB units of physical memory 29 30 ## Allocating memory 31 32 ![Memory allocation overview](allocating-memory-overview.png) 33 34 ### Memblock allocator 35 - early boot-time (low mem) allocator 36 - replaces old bootmem allocator 37 - mostly used to initialize buddy allocator, discarded after initialization 38 - consists of two arrays: 39 - memory: all present memory in system 40 - reserved: allocated memory ranges 41 - allocates by finding regions in `memory && !reserved` 42 43 Implementation: 44 - setup: 45 - add all available physical memory regions to memory 46 - add reserved ones to reserved 47 - all regions sorted by base address 48 - allocation: 49 - first-fit `memory && !reserved` 50 - just add the range to reserved 51 - merge neighboring regions as necessary 52 - deallocation: 53 - linear scan in reserved for containing region 54 - remove the range from reserved 55 - split region if necessary 56 57 ### Buddy/zone allocator 58 59 - "power-of-two allocator with free coalescing" 60 - blocks arranged in 2ᴺ (N=order) pages 61 - allocations satisfied by exact N 62 - if not possible, split larger 2ᴺ⁺¹ block to 2×2ᴺ 63 - two smaller blocks are 'buddies' 64 - if not possible, split larger 2ᴺ⁺² block twice, etc 65 - deallocations return block to allocator 66 - if buddy free, coalesce into larger block 67 - if larger block's buddy free, coalesce again, etc. 68 69 Implementation: 70 - for each node and zone, array of MAX\_ORDER freelists 71 - freelist N maintains free blocks size 2ᴺ 72 - split/merge moves block(s) to previous/next freelist 73 - PG_buddy, order in page attributes (set if free) 74 - buddy operations: reserve (update page flags), lookup (flip 'order' bit in block address) 75 76 ![Buddy allocator schema](buddy-allocator.png) 77 78 #### Fragmentation 79 External fragmentation: 80 - can't allocate because of too many small free blocks 81 - addressed via 82 - free coalescing 83 - vmalloc allocator for large allocations 84 - first-fit, similar to memblock 85 - to allocate size N: allocate N page frames, map page frames in virtually continuous buffer 86 87 ![Vmalloc diagram](vmalloc.png) 88 89 Internal fragmentation 90 - wasted because because large block assigned to smaller allocation(s) 91 - addressed via slab allocators 92 - for small allocations 93 - many implementations 94 - memory allocated from per-object-size caches (typical range 8 B to 8 KB) 95 - allocated memory physically contiguous 96 97 ## Memory errors in frame allocation 98 99 ![Diagram of memory allocation errors](memory-errors.png) 100 101 Sanity checks done by Linux (`CONFIG_DEBUG_PAGEALLOC` macro): 102 - out-of-bounds detection 103 - with page guarding: function `page_is_guard`, bracket allocated page with guard pages. but can only detect out-of-bounds access in those limits. used by Linux 104 - alternative, page canaries: stick a special value on each side of the allocated page 105 - use-after-free detection 106 - page remapping: (function `kernel_map_pages`, when the allocator frees the object, unmap corresponding page frames from virtual memory) 107 - alternative, page poisoning: function `kernel_poison_pages`, poison the page with some value when deallocated 108 - `check_new_page`: semantic checks on page descriptor 109 - `free_pages`: invalid free detection 110 111 Object-level sanitizers: kmemcheck (uninitialized reads), kmemleak (memory leaks), kasan (out-of-bounds and use-after-free)