lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

commit 542e38ff36fbcb69c75f574f974cfd269a9b4d7e
parent d0c0ca37ee664496089295a720c036daf9b3944d
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Tue, 30 Mar 2021 23:12:07 +0200

More BAMA notes

Diffstat:
Mcontent/binary-malware-analysis-notes/_index.md | 3+++
Acontent/binary-malware-analysis-notes/assembly.md | 103+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Acontent/binary-malware-analysis-notes/gdb.md | 11+++++++++++
Acontent/binary-malware-analysis-notes/what-happens-before-main.md | 42++++++++++++++++++++++++++++++++++++++++++
4 files changed, 159 insertions(+), 0 deletions(-)

diff --git a/content/binary-malware-analysis-notes/_index.md b/content/binary-malware-analysis-notes/_index.md @@ -4,3 +4,6 @@ title = 'Binary and Malware Analysis' # Binary and Malware Analysis 1. [Introduction](introduction) +2. [Assembly](assembly) +3. [What happens before `main()`](what-happens-before-main) +4. [GDB](gdb) diff --git a/content/binary-malware-analysis-notes/assembly.md b/content/binary-malware-analysis-notes/assembly.md @@ -0,0 +1,103 @@ ++++ +title = 'Assembly' ++++ + +# Assembly +Low-level processor-specific symbolic language. +We focus on user-mode x86 64-bit assembly, AT&T syntax. + +Program composed of +- instructions (actual operations) +- directives: commands for assembler + - `.data` is section with variables + - `.text` is section with code + - `.byte`/`.word`/`.long`/`.quad` defines integer (8/16/32/64 bits) + - `.ascii`/`.asciz` defines string (without/with terminator) +- labels: create symbol at current address +- comments: everything after `#` is ignored + +## Instructions +Form: `mnemonic source, destination` +- `mnemonic` is short code telling CPU what to do (`mov`, `add`, `push`, `pop`, `call`, `jmp`, etc.) + - operand size specified as suffix to mnemonic (`b` for byte, `w` for 16-bit word, `l` for 32-bit long, `q` for 64-bit quad) + - not needed if other operand is register +- `source` and `destination` are operands + - number and type depends on instruction + - registers (`%rax`, `%rsp`, `%al`) + - memory locations on CPU + - dereference pointers + - specify as `displacement(base, index, scale)` + - means `displacement+base+(index*scale)` + - `base`, `index` 64-bit registers + - `displacement` 32-bit constant or symbol (default 0) + - `scale` is 1, 2, 4, or 8 (default 1) + - general pointers `%rax`, %rbx`, `%rcx`, `%rdx`, `%rsi`, `%rdi`, `%r8`-`%r15` + - you can access different parts: for 64-bit `%rax`, low 32 bits is `%eax`, 16 bits is `%ax`, high 4 bits is `%ah`, low 4 bits is `%al` + - stack pointer `%rsp` + - frame pointer `%rbp` + - instruction pointer `%rip` + - flags register + - segment registers `%cs`, `%ds`, etc. + - system registers (`%crN`, `%drN`, MSRs -- only used in OS kernel) + - instruction set registers (`%stN`, `%mmN`, `%xmmN`, `%ymmN` -- only used with special instructions) + - memory (`0x401000`, `8(%4bp)`, `(%rdx, %rcx, 4)`) + - constants (`$42`, `$0x401000`, only for source) + +Intel uses little endian ordering -- from lowest address, you lay out bytes from the end (little address has end bytes) + +Signed integers expressed in 2's complement -- sign change by flipping bits and adding one. + +Comparisons: +- `cmp src1, src2` is like `src2 - src1` but sets flags. +- `test src1, src2` is like `src1 & src2` but sets flags +- `lea src, dst` is `dst = &src` (`src` is in memory) + +Conditional jumps +- form `jcc addr` (or `jncc` for not) +- jumps to `addr` if `cc` holds, decided using flags register + - `e`/`z`: `result == 0` + - `b`: `dst` < `src` (unsigned, `a` for above) + - `l`: `dst` < `src` (signed, `g` for above) + - `s`: `result` < 0 (signed) + +## Data +Data objects in data segment: + +```asm +.data + myvar: .long 0x1234567, 0x23456789 + bar: .word 0x1234 + mystr: .asciz "foo" +``` + +## Stack frames +Stack grows downwards (towards lower memory addresses). +Stack pointer (`%rsp`) points to top of stack + +Stack composed of frames, which are pushed on stack during function calls. +Address of current frame stored in frame pointer register (on Intel, `%rbp`) + +Each frame contains +- function's actual parameters if not in registers (pushed in reverse order) +- return address to jump to after function +- pointer to previous frame +- function's local variables + +Parameter passing in caller function for Linux +- integers, pointers, small structs should best be passed via registers: `%rdi`, then `%rsi`, `%rdx`, `%rcx`, `%r8`, `%r9` +- most other parameters pushed onto stack from right to left + +Prologue in called function +- push old base pointer (`%rbp`) on to stack +- set `%rbp` to current stack pointer +- push callee-saved registers if you use them (Linux: `%rbx`, `%r12-%r15`) +- move stack pointer to make room for local variables (e.g. `sub $n, %rsp` with n the size of local vars) +- `enter` opcode does everything except pushing callee-saved registers + +Epilogue in called function +- save result in `%rax` +- restore old stack pointer from base pointer +- restore callee-saved registers +- restore old base pointer from stack +- run `ret` +- `leave` restores stack pointer and `ret`s diff --git a/content/binary-malware-analysis-notes/gdb.md b/content/binary-malware-analysis-notes/gdb.md @@ -0,0 +1,11 @@ ++++ +title = 'GDB' ++++ + +# GDB +- `break` might not work because of anti-debug, set a hardware breakpoint with `hbreak` (but only once program running) +- `starti` sets breakpoint on first instruction and runs +- `bt` prints backtrace of all stack frames +- `frame [n]` can switch you around frames +- `watch` sets a watch point on something and breaks if the value changes +- `next` skips function calls diff --git a/content/binary-malware-analysis-notes/what-happens-before-main.md b/content/binary-malware-analysis-notes/what-happens-before-main.md @@ -0,0 +1,42 @@ ++++ +title = 'What happens before main()' ++++ + +# What happens before main() +Lots of things: +1. Loaded calls preinitarray, then `_start` +2. That calls `__libc_start_main`, which calls `__libc_csu_init`, which calls a bunch of other stuff +3. After that, `__libc_start_main` calls `main` +4. And then `exit` happens + +## Start at `_start` +- often if you have `%ebp`, `%esi`, etc. and relatively small addresses, probably a 32-bit binary (stack used for argument passing) +- `argc` popped into `%esi` +- `argv` moved from to `%ecx` +- stack pointer aligned to boundary +- push arguments and call `__libc_start_main` + - this calls `__libc_init_first`, retrieving global variable `__environ` + +## `__libc_start_main` +- handles security stuff for setuid/setgid +- starts threading +- registers `fini` and `rtld_fini` arguments to run via `at_exit` for cleanup +- calls `init` argument +- calls `main` with `argc` and `argv` +- calls `exit` with return value of main + +## `__libc_csu_init` +- the constructor of the program +- calls `_init()` +- calls array of function calls with `argc`, `argv`, and `envp` + +## `_init` +- does a bunch of stuff, including global constructors (e.g. constructors for static C++ objects) + +## `exit` +- runs functions registered with `atexit()` (in reverse order of registration) +- runs all functions in `fini_array` +- runs destructors + + +