lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (6760B)


      1 +++
      2 title = 'Page tables'
      3 +++
      4 # Page tables
      5 ## Memory access
      6 What happens on pointer dereference? It depends.
      7 Address in virtual memory may not be the same as in RAM.
      8 
      9 Virtual memory:
     10 - OS tells CPU how to map virtual memory addresses to physical memory addresses
     11 - OS creates page table: storing physical addresses, indexed with virtual addresses
     12 - physical address of page table are in special register (car3)
     13 - on memory access, CPU looks up virtual address to find physical address
     14 
     15 ### Address spaces
     16 - OS can create multiple page tables
     17   - each defining different virt → phys mapping
     18   - only one active at a time, selected by cr3
     19 - each page table defines address space
     20 
     21 Why address spaces?
     22 - virtualize physical memory
     23 - flexible memory management
     24 - isolation and protection
     25 
     26 ### Page tables
     27 address translated at page granularity
     28 - page: chunk of memory
     29 - architecture defines possible page size(s), typically 4 KB
     30 
     31 Memory Management Unit (MMU) performs translation
     32 
     33 #### Basic linear page-table
     34 Let's say we dereference virtual address `0010000000000100`.
     35 1. The first 4 bits are index into page table. `0010` is 2 in decimal, so page at index 2 in page table is used.
     36 2. Index 2 contains the value `1101`, where the last bit is the 'present/absent' bit (i.e. does the entry map to a physical address). The first 3 bits are used as the first 3 bits in the outgoing physical address.
     37 3. The resultant physical address is `110000000000100`: the 3 bits from the page table entry, and the 12 bits copied from the virtual address.
     38 
     39 ![Linear page table](linear-page-table.png)
     40 
     41 #### Hierarchical page tables (x86_32)
     42 Use a top-level page table that points to other page tables.
     43 
     44 The first 10 bits are an index into top-level page table, the second 10 bits index into second-level page table, and the last 12 bits are the offset.
     45 
     46 The downside is, you need more lookups.
     47 
     48 #### Inverted page tables (IA64)
     49 Make a page table tailored to the size of physical memory instead of virtual.
     50 Then, use a hash table, indexed by hash on virtual page, with list of pages.
     51 
     52 #### Four-level page tables (x86_64)
     53 Only implements lowest 48 bits of the address, the rest are sign extend.
     54 
     55 Register cr3 is pointer to a highest-level page table (PML4E).
     56 First 9 bits are index into table, which points to page-directory-pointer table (PDPE).
     57 The next 9 bits to page-directory table, then the page table, then a physical page, with the last 12 bits being the offset.
     58 
     59 ![Four level paging structures](four-level-paging-structures.png)
     60 
     61 ## Page table management
     62 ### Static page tables for bootstrapping
     63 OpenLSD maps two locations in virtual memory:
     64 - identity, i.e. address is same as in virtual and physical
     65 - KERNBASE (where kernel is loaded)
     66   - programs are loaded at lower address after
     67 
     68 In AT&T syntax (src, dst):
     69 
     70 ```asm
     71 movl $(PAGE_PRESENT | PAGE_WRITE | PAGE_HUGE), %eax ; mapping 2 MiB pages
     72 movl $page_dir, %edi
     73 movl $4, %ecx           ; we want 4 entries
     74 
     75 .map_pages:             ; identity mapping
     76 movl %eax, (%edi)
     77 addl $0x200000, %eax    ; map next physical address
     78 addl $8, %edi
     79 dec %ecx                ; decrement counter
     80 jnz .map_pages
     81 
     82 ;; at next higher level, load single entry with identity mapping
     83 movl $page_dir, %eax
     84 orl $(PAGE_PRESENT | PAGE_WRITE), %eax
     85 movl %eax, pdpt
     86 
     87 ;; and higher level again
     88 movl $pdpt, %eax
     89 orl $(PAGE_PRESENT | PAGE_WRITE), %eax
     90 movl %eax, pml4
     91 movl %eax, pml4 + 256*8
     92 
     93 ;; Load the root into cr3
     94 movl $pml4, %eax
     95 movl %eax, %cr3
     96 ```
     97 
     98 ![OpenLSD after boot](openlsd-after-boot.png)
     99 
    100 ### Dynamic page table management (post boot)
    101 Two issues:
    102 - what should your virtual address spaces look like (policies)
    103 - how would you do these things (mechanisms)
    104 
    105 Policies:
    106 - Supervisor bit: allow or disallow kernel access
    107 - Present bit: 1 if the page is mapped (but should also clear page frame address on unmap).
    108 - page table address bits: point to physical address (need to flush CPU buffers containing recent data to avoid RIDL)
    109 
    110 Mechanisms:
    111 - dynamically update page tables per address space
    112 - MMU uses updated page tables to find physical pages for virtual pages
    113 
    114 page table walk on x86_64 with 4-level page table, 48 bit virtual address, page tables mapped in virt memory:
    115 
    116   1. locate top-level page table: read cr3 register (or process struct)
    117   2. locate 2nd-level page table
    118      - get virtual pointer to top-level page
    119      - use bits 39-47 of virtual address as index
    120      - if not page table entry present, abort.
    121   3. locate 3rd-level page table
    122      - get virtual pointer to 2nd-level page
    123      - use bits 30-38 of virtual address as index
    124      - if not page table entry present, abort.
    125   4. ...
    126   5. last page table entry has physical address of page
    127 
    128 page table mapping:
    129 - locate page table entry for virtual address to be mapped, using page walk
    130 - if not present: allocate new page, store physical address in non-present entry, continue with next level
    131 - store physical address to be mapped in final page table entry & mark as present
    132 
    133 page table unmapping
    134 - locate page table entry for virtual address to be unmapped, using page walk
    135 - zero out final page table entry
    136 - free pages
    137 
    138 permission bits:
    139 - P: present (1) or not (0)
    140 - R/W: read-only (0) or writable (1)
    141 - U/S: supervisor-only (0, i.e. kernel) or user-accessible (1)
    142   - SMAP protection: 1 becomes user-only
    143 - XD: execute allowed (0) or disabled (1)
    144 
    145 ## Optimizations
    146 ### TLB (caching)
    147 result of page table translation: virtual address x is at physical address y
    148 
    149 approach for memory access:
    150 - look up x in TLB
    151 - on hit, no need to consult page table
    152 - on miss, page table walk and then cache in TLB
    153 
    154 Needs to be super fast, so small number of sets.
    155 
    156 Flush TLB:
    157 - when switching to new address space
    158 - when deleting/updating page table entries
    159 
    160 TLB scalability:
    161 - translation caches:
    162   - TLB tagged with all bits of virtual address
    163   - translation caches tagged with part of virtual address
    164   - managed transparently by hardware
    165 - use bigger pages
    166 
    167 
    168 ## Security
    169 - leave holes in identity map (i.e. guard pages)
    170   - no wastage, but more complex management
    171 - kernel exploitation:
    172   - find known locations in kernel space: physmap (mapping of physical address space), kernel base address
    173   - trigger vulnerability and corrupt data in known locations, or diverge control flow
    174 
    175 kernel address space layout randomization (KASLR):
    176 - randomizes sections of kernel address space
    177   - harder to exploit the kernel. but still, bruteforce (though crashes), or leak kernel pointers (need second vulnerability), or side-channel attacks (but complicated)
    178 - implementation:
    179   - limit entropy to simplify memory management
    180   - remain same until reboot
    181   - random slot chosen early in boot, kernel mapped there
    182   - random mapping translates to different slots in page table pages