lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (8116B)


      1 +++
      2 title = "Lecture 12: advanced exploitation"
      3 +++
      4 # Lecture 12: advanced exploitation
      5 Attacks so far have these steps:
      6 1. vulnerability: overflow, format, UAF, uninitialized read, type confusion
      7     - can be exploited
      8     - defense so far: e.g. stack canary preventing contiguous stack overflow
      9 2. control code pointer
     10     - runs shellcode or starts ROP
     11     - defense so far: e.g. W⊕X fully prevents, ASLR or CFI (control flow integrity) maybe prevents?
     12 3. arbitrary execution
     13 
     14 
     15 ## Exploit mitigations
     16 Examples:
     17 - Buffer overflow:
     18     - address sanitizer:
     19         - detects buffer overflow and use-after-free
     20         - shadow memory (mapping of virt addr space that holds metadata) tracks alloc status, add check before memory access
     21         - red zones between allocations (few bytes that shouldn't be written)
     22         - deallocated memory in quarantine (marked as unusable and placed in queue)
     23         - integrated in gcc and llvm
     24         - drawbacks: incomplete, 73% overhead impact on performance, can still jump over red zone in overflow
     25     - delta pointers:
     26         - fast buffer overflow detection, cannot detect e.g. underflow but performance is much better
     27         - tagged pointers use some pointer bits for metadata
     28         - checks are implicit using MMU
     29 - Format string: static analysis finds non-literal format strings by default, but false positives are possible
     30 - Uninitialized read: safeinit
     31     - automatically initialize to zero: every local variable (compiler), every heap allocation (allocator)
     32     - optimisations: initialize close to first usage, initialize only one byte for strings, rely on OS for zeroing large heap allocations, dead store elimination (don't do assignment that will always be overwritten)
     33     - good performance
     34 - Use-after-free:
     35     - address sanitizer
     36     - dangsan:
     37         - prevents use after free by invalidating dangling pointers (so must keep list of pointers to each object)
     38         - instrumentation: keep track of pointer on pointer assignment, set most significant bit of remaining pointers on free
     39         - complications:
     40             - which object does pointer point to?
     41                 - use shadow memory like address sanitizer
     42             - what if multiple threads copy pointers to same object?
     43                 - avoid locks, observe that most workload is write
     44                 - solution: per-thread append-only log
     45         - 41% overhead, good scalability
     46     - type-after-type:
     47         - allow dangling pointers, but only to same type
     48         - don't need to track all pointers, just types
     49         - separate heap and stack for each type:
     50             - never reuse memory used for one type for another
     51             - dangling pointer keeps pointing to same type
     52         - problem is with type inference (no type specified for malloc), so try to guess from context with static analysis and fall back to fake per-allocation type
     53         - if there are allocation wrappers, inline the wrappers into the caller (replace function call with body of function)
     54         - 4.3% runtime overhead, 17.4% memory overhead
     55 - Type confusion: typesan
     56     - requires knowing runtime type of object on static_cast
     57     - use shadow memory:
     58         - translate each pointer to set of allowable casts
     59         - this set is determined at compile type
     60     - 13.2% overhead
     61 
     62 All approaches have similarities:
     63 - detect undefined behavior (based on C standard)
     64 - built into compiler (well, depends on the compiler, e.g. clang vs gcc)
     65 - static analysis: can we prove properties (e.g. never undefined behavior)
     66     - incomplete because halting problem
     67 - dynamic instrumentation: add runtime checks for undefined behavior
     68     - performance hit, only use where you can't do static analysis
     69     - ensures crash before undefined behavior happens
     70 
     71 ## ASLR-related attacks and defences
     72 Threat model:
     73 - program secure if satisfies security requirements under threat model
     74 - typical assumptions:
     75     - humans mess stuff up so can't avoid bugs
     76     - comprehensive mitigations not used because of overhead/compatibility
     77     - enough code available for full set of ROP gadgets
     78     - exploitations eventually leads to arbitrary read/write
     79     - arbitrary code execution leads to privilege escalation
     80 - so, attacker can do arbitrary read/write, must prevent execution of ROP chain
     81 
     82 ASLR:
     83 - moves base of each code section to random address
     84 - can't do ROP chain because unknown addresses
     85 - but leaking single address reveals all addresses in section
     86 - even arbitrary read breaks it, can read with relative offset like buffer overflow
     87 
     88 Fine-Grained ASLR
     89 - randomise relative addresses
     90     - shuffle around functions, or parts of them
     91     - rewrite functions (change registers, replace instructions, add random NOPs)
     92 - JIT-ROP breaks it
     93     - attacker can
     94         - leak at least one code pointer
     95         - then leak data given absolute address
     96         - then interact with vulnerable code after leak
     97     - use code pointers to read code revealing more pointers, then identify gadgets in code, the "compile" ROP payload on the fly
     98 
     99 XnR:
    100 - JIT-ROP relies on reading code, so prevent code from being read
    101 - problems is any code that's marked executable is also readable
    102 - XnR marks all code pages not-present, so trying to read/execute them results in page fault, whose handler then determines whether to allow it
    103 - to prevent more code becoming readable, unmap first page when number of present code pages > security parameter *n*
    104 ![XnR](a8b8f5828a6544ce9aaf4bb248e1a6b1.png)
    105 - Blind JIT-ROP breaks it
    106     - instead of reading gadgets, construct them
    107     - possible even if W⊕X enabled
    108     - e.g. JS code compiled to predictable gadgets
    109 
    110 Information hiding:
    111 - Code Pointer Integrity (CPI)
    112     - full memory safety is expensive, so only protect most sensitive data
    113     - protect: code pointers, pointers to sensitive data, arrays/structs containing sensitive data
    114     - type-based static analysis identifies sensitive variables
    115     - store sensitive variables in safe region, which uses comprehensive memory safety to access it
    116     - randomize safe region base
    117     - pointers to safe region never stored outside safe region (accessed via segment register)
    118 - Code Pointer Separation (CPS)
    119     - like CPI, but only code pointers stored in safe region
    120     - faster, but not as safe
    121 - can be broken by using allocation oracles
    122     - ephemeral: temporarily alloc buffer of attacker-chosen size
    123     - permanent: permanently alloc buffer of attacker-chosen size
    124     - assumption is that attacker can see if allocation worked.
    125     - find size of largest hole:
    126         1. try allocation size *n* on ephemeral oracle
    127             - if works: hole ≥ *n* bytes
    128             - otherwise: hole < *n* bytes
    129         2. do binary search
    130     - find size of other holes
    131         1. Use persistent alloc to fill largest hole
    132         2. Repeat binary search for next-smaller hole
    133 
    134 
    135 ## Control flow integrity (CFI)
    136 We don't want attacker to run Turing complete program, but a more restrictive set.
    137 Allow only legitimate branches and calls (i.e. those that follow the control flow graph).
    138 Give valid targets a label, check that program doesn't branch anywhere else (fine-grained CFI).
    139 May be combined with shadow stack that can be verified.
    140 
    141 Problems:
    142 - requires precise control flow graph, so need source code or debug info
    143 - performance overhead
    144 
    145 Maybe do loose/coarse-grained CFI, which uses only few labels: common label for all call sites, common label for all entry points.
    146 
    147 This eliminates 98% of gadgets, but can still do:
    148 - entry point gadgets
    149 - call site gadgets
    150 - can link gadgets
    151 
    152 Context-sensitive CFI:
    153 - can call/jump/ret X target Y given the current call stack?
    154 - implementation: PathArmor
    155 - but data guides code through control flow graph, so manipulating data changes the control flow
    156     - data oriented programming: find gadget dispatcher ⇒ identify/classify gadgets ⇒ convert workload into gadget operation sequence ⇒ build sequence of buffers to trigger operations ⇒ send buffers to target machine
    157     - of course, can't do e.g. system calls if not reachable through regular control flow