index.md (6760B)
1 +++ 2 title = 'Page tables' 3 +++ 4 # Page tables 5 ## Memory access 6 What happens on pointer dereference? It depends. 7 Address in virtual memory may not be the same as in RAM. 8 9 Virtual memory: 10 - OS tells CPU how to map virtual memory addresses to physical memory addresses 11 - OS creates page table: storing physical addresses, indexed with virtual addresses 12 - physical address of page table are in special register (car3) 13 - on memory access, CPU looks up virtual address to find physical address 14 15 ### Address spaces 16 - OS can create multiple page tables 17 - each defining different virt → phys mapping 18 - only one active at a time, selected by cr3 19 - each page table defines address space 20 21 Why address spaces? 22 - virtualize physical memory 23 - flexible memory management 24 - isolation and protection 25 26 ### Page tables 27 address translated at page granularity 28 - page: chunk of memory 29 - architecture defines possible page size(s), typically 4 KB 30 31 Memory Management Unit (MMU) performs translation 32 33 #### Basic linear page-table 34 Let's say we dereference virtual address `0010000000000100`. 35 1. The first 4 bits are index into page table. `0010` is 2 in decimal, so page at index 2 in page table is used. 36 2. Index 2 contains the value `1101`, where the last bit is the 'present/absent' bit (i.e. does the entry map to a physical address). The first 3 bits are used as the first 3 bits in the outgoing physical address. 37 3. The resultant physical address is `110000000000100`: the 3 bits from the page table entry, and the 12 bits copied from the virtual address. 38 39 ![Linear page table](linear-page-table.png) 40 41 #### Hierarchical page tables (x86_32) 42 Use a top-level page table that points to other page tables. 43 44 The first 10 bits are an index into top-level page table, the second 10 bits index into second-level page table, and the last 12 bits are the offset. 45 46 The downside is, you need more lookups. 47 48 #### Inverted page tables (IA64) 49 Make a page table tailored to the size of physical memory instead of virtual. 50 Then, use a hash table, indexed by hash on virtual page, with list of pages. 51 52 #### Four-level page tables (x86_64) 53 Only implements lowest 48 bits of the address, the rest are sign extend. 54 55 Register cr3 is pointer to a highest-level page table (PML4E). 56 First 9 bits are index into table, which points to page-directory-pointer table (PDPE). 57 The next 9 bits to page-directory table, then the page table, then a physical page, with the last 12 bits being the offset. 58 59 ![Four level paging structures](four-level-paging-structures.png) 60 61 ## Page table management 62 ### Static page tables for bootstrapping 63 OpenLSD maps two locations in virtual memory: 64 - identity, i.e. address is same as in virtual and physical 65 - KERNBASE (where kernel is loaded) 66 - programs are loaded at lower address after 67 68 In AT&T syntax (src, dst): 69 70 ```asm 71 movl $(PAGE_PRESENT | PAGE_WRITE | PAGE_HUGE), %eax ; mapping 2 MiB pages 72 movl $page_dir, %edi 73 movl $4, %ecx ; we want 4 entries 74 75 .map_pages: ; identity mapping 76 movl %eax, (%edi) 77 addl $0x200000, %eax ; map next physical address 78 addl $8, %edi 79 dec %ecx ; decrement counter 80 jnz .map_pages 81 82 ;; at next higher level, load single entry with identity mapping 83 movl $page_dir, %eax 84 orl $(PAGE_PRESENT | PAGE_WRITE), %eax 85 movl %eax, pdpt 86 87 ;; and higher level again 88 movl $pdpt, %eax 89 orl $(PAGE_PRESENT | PAGE_WRITE), %eax 90 movl %eax, pml4 91 movl %eax, pml4 + 256*8 92 93 ;; Load the root into cr3 94 movl $pml4, %eax 95 movl %eax, %cr3 96 ``` 97 98 ![OpenLSD after boot](openlsd-after-boot.png) 99 100 ### Dynamic page table management (post boot) 101 Two issues: 102 - what should your virtual address spaces look like (policies) 103 - how would you do these things (mechanisms) 104 105 Policies: 106 - Supervisor bit: allow or disallow kernel access 107 - Present bit: 1 if the page is mapped (but should also clear page frame address on unmap). 108 - page table address bits: point to physical address (need to flush CPU buffers containing recent data to avoid RIDL) 109 110 Mechanisms: 111 - dynamically update page tables per address space 112 - MMU uses updated page tables to find physical pages for virtual pages 113 114 page table walk on x86_64 with 4-level page table, 48 bit virtual address, page tables mapped in virt memory: 115 116 1. locate top-level page table: read cr3 register (or process struct) 117 2. locate 2nd-level page table 118 - get virtual pointer to top-level page 119 - use bits 39-47 of virtual address as index 120 - if not page table entry present, abort. 121 3. locate 3rd-level page table 122 - get virtual pointer to 2nd-level page 123 - use bits 30-38 of virtual address as index 124 - if not page table entry present, abort. 125 4. ... 126 5. last page table entry has physical address of page 127 128 page table mapping: 129 - locate page table entry for virtual address to be mapped, using page walk 130 - if not present: allocate new page, store physical address in non-present entry, continue with next level 131 - store physical address to be mapped in final page table entry & mark as present 132 133 page table unmapping 134 - locate page table entry for virtual address to be unmapped, using page walk 135 - zero out final page table entry 136 - free pages 137 138 permission bits: 139 - P: present (1) or not (0) 140 - R/W: read-only (0) or writable (1) 141 - U/S: supervisor-only (0, i.e. kernel) or user-accessible (1) 142 - SMAP protection: 1 becomes user-only 143 - XD: execute allowed (0) or disabled (1) 144 145 ## Optimizations 146 ### TLB (caching) 147 result of page table translation: virtual address x is at physical address y 148 149 approach for memory access: 150 - look up x in TLB 151 - on hit, no need to consult page table 152 - on miss, page table walk and then cache in TLB 153 154 Needs to be super fast, so small number of sets. 155 156 Flush TLB: 157 - when switching to new address space 158 - when deleting/updating page table entries 159 160 TLB scalability: 161 - translation caches: 162 - TLB tagged with all bits of virtual address 163 - translation caches tagged with part of virtual address 164 - managed transparently by hardware 165 - use bigger pages 166 167 168 ## Security 169 - leave holes in identity map (i.e. guard pages) 170 - no wastage, but more complex management 171 - kernel exploitation: 172 - find known locations in kernel space: physmap (mapping of physical address space), kernel base address 173 - trigger vulnerability and corrupt data in known locations, or diverge control flow 174 175 kernel address space layout randomization (KASLR): 176 - randomizes sections of kernel address space 177 - harder to exploit the kernel. but still, bruteforce (though crashes), or leak kernel pointers (need second vulnerability), or side-channel attacks (but complicated) 178 - implementation: 179 - limit entropy to simplify memory management 180 - remain same until reboot 181 - random slot chosen early in boot, kernel mapped there 182 - random mapping translates to different slots in page table pages