Update softsec notes - lectures.alex.balgavy.eu - Lecture notes from university.

commit 28dd0f5439643f3ee7adf6768e031aa903e0ae6f
parent 388b07c4f03efe8d50ad20654e2f489b5f868263
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Sat,  9 Oct 2021 20:09:44 +0200

Update softsec notes

Diffstat:
M content/softsec-notes/_index.md  | 8 ++++++--
A content/softsec-notes/defenses-and-bypassing-them/defenses-overview.png  | 0 
A content/softsec-notes/defenses-and-bypassing-them/dop-example.png  | 0 
A content/softsec-notes/defenses-and-bypassing-them/index.md  | 230 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A content/softsec-notes/heap-overflows.md  | 66 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A content/softsec-notes/temporal-errors.md  | 45 +++++++++++++++++++++++++++++++++++++++++++++
A content/softsec-notes/type-confusion-cpp.md  | 17 +++++++++++++++++

7 files changed, 364 insertions(+), 2 deletions(-)
diff --git a/content/softsec-notes/_index.md b/content/softsec-notes/_index.md
@@ -7,5 +7,9 @@ title = 'Software Security'
 3. [Local privilege escalation](local-privilege-escalation)
 4. [Simple attacks](simple-attacks)
 5. [Shellcode](shellcode)
-6. [Integer overflows](integer-overflows)
-7. [Format strings](format-strings)
+6. [Heap overflows](heap-overflows.md)
+7. [Integer overflows](integer-overflows)
+8. [Format strings](format-strings)
+9. [Temporal errors](temporal-errors)
+10. [Type confusion (C++)](type-confusion-cpp)
+11. [Defenses and bypassing them](defenses-and-bypassing-them)
diff --git a/content/softsec-notes/defenses-and-bypassing-them/defenses-overview.png b/content/softsec-notes/defenses-and-bypassing-them/defenses-overview.png
Binary files differ.
diff --git a/content/softsec-notes/defenses-and-bypassing-them/dop-example.png b/content/softsec-notes/defenses-and-bypassing-them/dop-example.png
Binary files differ.
diff --git a/content/softsec-notes/defenses-and-bypassing-them/index.md b/content/softsec-notes/defenses-and-bypassing-them/index.md
@@ -0,0 +1,230 @@
++++
+title = 'Defenses and bypassing them'
++++
+# Defenses and bypassing them
+
+Techniques that can make attacks harder:
+- protect sensitive data from leakage/corruption
+- make using corruption for code execution harder
+- detect undefined behavior
+
+Static analysis: can we prove code never results in undefined behavior?
+- incomplete due to halting problem
+
+Dynamic instrumentation: runtime checks for undefined behavior
+- negative performance impact
+- false positives mean crashing a correct program
+
+![Defenses overview](defenses-overview.png)
+
+## Stack canaries
+Value between local vars and return address, compiler initializes with random value in function prologue.
+On return, check whether value is still the same, and crash if it isn't.
+
+Enable with `-fstack-protector`.
+`%fs:40` is randomized at thread creation, canary is never left in a register, and is moved to a position in the stack, as a local variable.
+
+Can still exploit, if we can jump over the canary. E.g. in this example, overwrite the len variable so memcpy writes to it:
+
+```c
+void echo(int fd) {
+    long len;
+    char name[64], reply[128];
+    /* ... */
+    read(fd, name, 128);
+    memcpy(reply + len, name, 64);
+    /* ... */
+}
+```
+
+Three approaches:
+- jump over canary: corrupt pointer/index used to write memory, write target address past canary. need two writes - either two vulnerabilities or a loop.
+- leaking canaries: leak canary, overwrite with same value. need leak vulnerability, ability to provide more input after leak and buffer overflow with new input.
+- brute force canaries: guessing randomly out of 2⁶⁴ possibilities is impossible. but can brute force if we can corrupt part of canary, program restarts with same canary, and get feedback about crashes.
+    - overflow into canary by one byte, try all values and find which is correct for first byte (doesn't crash). Try next bytes until full canary found.
+    - max 8×2⁸ attempts needed
+
+## Data execution prevention (DEP)
+OS marks data pages as non-executable (no-execute bit in page tables).
+- attempt to execute data (like shellcode) gives segfault
+
+W⊕X: no memory is both writable and executable.
+
+We can still exploit this, but no longer by jumping to own shellcode - need to reuse existing code.
+
+Can use shared library functions (return-to-libc), or chain together parts of code (return-oriented programming).
+
+### Ret2libc
+On x86\_32, params pass on the stack, so params to function can be easily set.
+
+In 64-bit, parameters are passed in registers, so only works if parameter is in %rdi.
+Maybe reuse additional code to load desired %rdi value (e.g. with a piece of code that pops to %rdi and returns).
+
+### Return oriented programming (ROP)
+Any sequence of instructions ending in RET is a 'gadget', which can be chained together.
+Return oriented programming is a chain of such gadgets.
+
+Might want some useful ones:
+- load value to register (pop)
+- read from memory (mov (reg), reg)
+- write to memory (mov reg, (reg))
+- computation (add, sub, etc.)
+- syscall
+
+Given the right gadgets, can do arbitrary computation without any code.
+Tools help generate ROP chains, e.g. [Ropper](https://github.com/sashs/Ropper)
+
+If we run out of writable stack, build a new stack anywhere in memory (such as attacker-controlled heap memory).
+Function epilogue can load stack pointer (by using it twice).
+This is a "stack pivot".
+
+If we can't find the right gadget, use libraries, especially libc.
+Can jump into middle of an instruction.
+
+## Address space layout randomization (ASLR)
+Randomizes memory addresses of code, data, heap, and stack.
+Prevents attacker from finding code pointer to overwrite, or knowing what to overwrite it with.
+
+Stack and heap randomized by OS alone:
+- stack: determined by %rsp, set in `execve()`
+- heap: brk() and mmap() return values
+
+Code and data require compiler support
+- absolute code/data refs would break with randomized addresses
+- position independent code (PIC) forces all pointers to be relative to instruction pointer
+    - libraries used PIC to prevent clashes in address space
+    - executables can also use PIC (`-fPIE`)
+
+Can still be attacked:
+- relative pointers don't change
+- if one pointer is leaked, all others can be computed
+- so leak pointers from stack/heap, or use side channels to recover complete address space
+
+### Information hiding
+ASLR: randomness limited to base:
+- first shared object is loaded at random position
+- next object located right below (lower addresses) the last object
+- ⇒ all libraries located side by side at single random place
+
+Shadow stack:
+- move sensitive data (e.g. return address) on separate stack
+- make sure shadow stack is protected
+- real hiding:
+    - no pointer in memory should refer to secret location of shadow stack
+    - only dedicated register (e.g. `%gs:108`) points to shadow stack
+- how to find it:
+    - address space typically has some holes
+    - don't look for hidden region, look for holes -- one larger and one smaller -- surrounding the region
+    - even if we remove all pointers, we still have the size of the hole left that indicates the hidden region
+    - repeatedly allocate large chunks of memory until we find the "right size": if allocation succeeds, hole it is that size or bigger. if fails, hole is smaller.
+    - ephemeral allocation: allocate and free within the request
+    - persistent allocation: allocate 'permanently' (can also use ephemeral and just not complete the request)
+    - steps:
+        1. determine large hole using ephemeral allocation
+        2. allocate large hole using persistent allocation
+        3. run ephemeral allocation algorithm again, giving us small hole
+
+## Control-flow integrity
+Prevent turing completeness with ROP chains.
+
+idea:
+- only allow "legitimate branches and calls"
+- i.e. those that follow control flow graph
+- give valid targets a label, and check that we don't branch anywhere else
+
+May be combined with runtime shadow stack.
+
+in practice
+- requires precise CFG (from source code or debug info)
+- large performance overhead
+- so, coarse-grained CFI:
+    - one common label for all call sires
+    - one common label for all entry points
+
+gadgets left after this:
+- entry point gadget - jump to entry point and go up to next indirect call/jump
+- call site gadget - jump to call site, go to the next return
+
+### Data oriented programming
+Perfect CFI means CFG is never violated.
+But, data guides code through CFG, so by manipulating data we change control flow.
+
+Attacker can overwrite data e.g. using buffer overflow, and overwritten data drives a dispatching data
+- loop executes as often as attacker wants
+- loop performs several operations, providing gadgets
+- attacker controls values and pointers used in the operations
+
+![Data oriented programming example](dop-example.png)
+
+Approach:
+- find gadget dispatcher (attacker-controlled loop)
+- identify and classify gadgets
+- convert workload into a sequence of gadget operations
+- build a sequence of buffers to trigger those operations
+- send the buffers to the target machine
+
+## Sanitizers
+Fully detect or mostly detect particular vulnerabilities.
+Big performance overhead, so not really used in production settings, but useful for debugging and testing.
+
+Sanitizers in GCC/LLVM
+- ASan: buffer overflow, memory leak, use-after-free
+- Leak sanitizer: memory leaks
+- TSan: race conditions
+- UBSan: undefined arithmetic
+- MSan: uninitialized read
+
+Address sanitizer (ASan):
+- detects buffer overflow and use-after-free
+- shadow memory tracks allocation status:
+    - add check before memory access
+    - shadow addr = (addr >> 3) + offset
+    - red zones between allocations
+    - deallocated memory in quarantine
+- drawbacks: compatibility (if program depends on memory layout), performance, incomplete (no overflows in structs, misses overflows that jump over read zone, use-after-free still possible with memory massaging)
+
+Memory sanitizer (MSan):
+- detects uninitialized reads
+- loading values allowed
+    - uninitialized status tracked in shadow memory
+    - 1-y-1 per-bit shadow mapping
+    - allocation: memory poisoned
+    - computation: poison propagated based on operation
+- poison checked at: conditional branches, syscalls, pointer derefs
+
+### Research topics
+Delta pointers: fast buffer overflow detection
+- tagged pointers use some pointer bits for metadata
+- checks are implicit using MMu
+- cannot detect all cases (like underflow), but much better performance
+
+SafeInit: automatically initialize to zero
+- compiler: every local variable
+- allocator: every heap allocation
+- optimizations:
+    - only close to first use
+    - only one byte for strings
+    - dead store elimination: prevent initializations that are later overwritten in all cases
+    - rely on OS zeroing for large heap allocations
+
+DangSan: prevents use after free
+- invalidates dangling pointers
+- keep list of pointers to each object
+- on pointer assignment: keep track of new pointer to object
+- free: set most significant bit of remaining pointers
+- complications:
+    - which object does pointer point to? use shadow memory
+    - what if multiple threads copy pointers to same object? use lock-free data structure (it's a write-mostly workload, so use per-thread append-only log)
+
+Type-after-type
+- allow dangling pointers, but only to same type
+- separate heap and stack for each type
+    - never reuse memory used for one type for another
+    - dangling pointer keeps pointing to same type
+- challenge: type inference
+    - use static analysis to guess from context
+    - explicit type on stack and with new operator, trace result to pointer cast, or sizeof in malloc size
+
+TypeSan
+- type confusion mitigation requires knowing runtime type of object on static_cast
+- use shadow memory, translate each pointer to set of allowable casts (which is determined at compile time)
diff --git a/content/softsec-notes/heap-overflows.md b/content/softsec-notes/heap-overflows.md
@@ -0,0 +1,66 @@
++++
+title = 'Heap overflows'
++++
+# Heap overflows
+Stack-based buffer overflow relatively easy to exploit, because return address.
+Integer overflow can bypass length checks.
+Heap buffer overflows and format strings can provide arbitrary write.
+
+Use-after-free and type confusion allow corrupting specific data.
+
+Buffer overflows:
+- most common, can exploit locally and remotely, modify both data and control flow
+- typical signs: fixed-length buffers, passing pointer to buffer without size, array access without size check, pointer arithmetic without size/end pointer
+- vulnerable functions:
+    - gets() -- replace with fgets()
+    - strcpy(),strcat() -- replace with strncpy() or strncat()
+    - sprintf() etc. -- replace with snprintf() etc.
+    - scanf() etc. -- put bound on %s formats
+
+Off-by-one:
+- wrong comparison operator, forget about strong terminator
+- can only overwrite one element above array capacity
+
+Pointer storage:
+- pointer is integer storing a memory address
+- in x86_64, only 48 least significant bits of 64-bit integer are used (so every pointer contains two null bytes)
+- x86_64 is little endian, so least significant byte is stored first (lowest address)
+
+Buffer overread:
+- reading out-of-bounds can be just as harmful, and can also leak pointers (compromising ASLR)
+
+Data/BSS overflows
+- data has initialized global data, BSS uninitialized
+- alternative ways of hijacking control:
+    - overwrite function pointer
+    - overwrite saved frame pointer (fake stack, then return from it)
+    - overwrite C++ object pointer (e.g. hijack virtual functions)
+- data-only alternatives:
+    - changing strings to e.g. bypass authorization
+    - changing integers, e.g. index and size variables to allow further overflows.
+    - changing pointers
+
+Heap overflows:
+- malloc(), new, etc. return memory on heap
+- harder to exploit, can't reach return addresses
+- so, target metadata and/or heap massaging
+- heap organisation:
+    - grows towards higher memory addresses
+    - memory management through in-band control structures (metadata between buffers), which can be manipulated through heap overflows to execute arbitrary code
+    - attacks depend on implementation of malloc
+    - heap divided in chunks
+        - used chunk contains metadata and buffer returned by malloc
+        - free chunk not currently used, but can be reused later
+        - adjacent free blocks are merged to avoid fragmentation
+- exploiting dlmalloc:
+    - assume we find heap buffer overflow
+    - overwrite fd and bk (requires free block)
+    - make program call unlink, e.g. to merge block when previous block is freed
+    - unlink writes chosen data at chosen location
+
+exploiting arbitrary write:
+- heap overflow/format string write to absolute address
+- alternative: global offset table (GOT)
+    - used to lazily load lib functions
+    - address looked up and stored on first call
+    - fixed location
diff --git a/content/softsec-notes/temporal-errors.md b/content/softsec-notes/temporal-errors.md
@@ -0,0 +1,45 @@
++++
+title = 'Temporal errors'
++++
+# Temporal errors
+Main types:
+- use after free
+- uninitialized variables
+
+## Use after free
+Sometimes, program retains pointer to freed memory location ("dangling pointer")
+- e.g. malloc buffer that was freed, or local variable buffer after function return
+
+Future allocation/function call can re-use memory.
+
+Sometimes, attacker can craft input to overwrite memory with own data:
+1. Program allocs buffer or variable X
+2. Program uses X to store some data
+3. Program frees X
+4. Program allocates buffer/variable Y overlapping with X
+5. Data written to Y also overwrites relevant part of X
+6. Program uses X, causing incorrect result
+
+Useful to:
+- bypass length restrictions for later buffer overflow
+- overwrite fields that shouldn't be attacker-controlled
+- overwrite validated data with incorrect data that will not be validated
+- leak sensitive data from new buffer
+
+## Double free
+Free can't efficiently check block validity
+- detects only some cases of double free
+- undetected cases might corrupt metadata, useful for arbitrary write
+- might free reused memory
+
+## Uninitialized variables
+Local variables and buffers not automatically initialized to zero.
+Instead, contain whatever data happened to be on stack/heap before they were allocated.
+
+Sometimes, attacker can craft input to initialize variable;
+1. program allocates buffer/variable X
+2. program uses X to store some data under attacker control
+3. program frees X
+4. program allocs buffer/variable Y overlapping X
+5. program does not initialize (part of) Y, causing attacker's data from X to remain there
+6. program uses Y causing incorrect result
diff --git a/content/softsec-notes/type-confusion-cpp.md b/content/softsec-notes/type-confusion-cpp.md
@@ -0,0 +1,17 @@
++++
+title = 'Type confusion (C++)'
++++
+# Type confusion (C++)
+C++ provides classes, like structs that tie data to functions.
+Class instance known as 'object'.
+
+Stack objects have constructor and destructor automatically called.
+
+Heap objects managed with new/delete which call constructor/destructor.
+
+Typecasts:
+- `reinterpret_cast`: no checks, assumes programmer know what they're doing
+- `static_cast`: compile-time check, allows any cast that might be valid (including parent-to-child)
+- `dynamic_cast`: run-time check, ensures runtime type is consistent with compile-time type
+
+Type confusion: object may be cast to wrong type, members read and written according to the wrong type

	lectures.alex.balgavy.eu Lecture notes from university.
	git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
	Log \| Files \| Refs \| Submodules

M	content/softsec-notes/_index.md	\|	8	++++++--
A	content/softsec-notes/defenses-and-bypassing-them/defenses-overview.png	\|	0
A	content/softsec-notes/defenses-and-bypassing-them/dop-example.png	\|	0
A	content/softsec-notes/defenses-and-bypassing-them/index.md	\|	230	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	content/softsec-notes/heap-overflows.md	\|	66	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
A	content/softsec-notes/temporal-errors.md	\|	45	+++++++++++++++++++++++++++++++++++++++++++++
A	content/softsec-notes/type-confusion-cpp.md	\|	17	+++++++++++++++++