commit 542e38ff36fbcb69c75f574f974cfd269a9b4d7e
parent d0c0ca37ee664496089295a720c036daf9b3944d
Author: Alex Balgavy <alex@balgavy.eu>
Date: Tue, 30 Mar 2021 23:12:07 +0200
More BAMA notes
Diffstat:
4 files changed, 159 insertions(+), 0 deletions(-)
diff --git a/content/binary-malware-analysis-notes/_index.md b/content/binary-malware-analysis-notes/_index.md
@@ -4,3 +4,6 @@ title = 'Binary and Malware Analysis'
# Binary and Malware Analysis
1. [Introduction](introduction)
+2. [Assembly](assembly)
+3. [What happens before `main()`](what-happens-before-main)
+4. [GDB](gdb)
diff --git a/content/binary-malware-analysis-notes/assembly.md b/content/binary-malware-analysis-notes/assembly.md
@@ -0,0 +1,103 @@
++++
+title = 'Assembly'
++++
+
+# Assembly
+Low-level processor-specific symbolic language.
+We focus on user-mode x86 64-bit assembly, AT&T syntax.
+
+Program composed of
+- instructions (actual operations)
+- directives: commands for assembler
+ - `.data` is section with variables
+ - `.text` is section with code
+ - `.byte`/`.word`/`.long`/`.quad` defines integer (8/16/32/64 bits)
+ - `.ascii`/`.asciz` defines string (without/with terminator)
+- labels: create symbol at current address
+- comments: everything after `#` is ignored
+
+## Instructions
+Form: `mnemonic source, destination`
+- `mnemonic` is short code telling CPU what to do (`mov`, `add`, `push`, `pop`, `call`, `jmp`, etc.)
+ - operand size specified as suffix to mnemonic (`b` for byte, `w` for 16-bit word, `l` for 32-bit long, `q` for 64-bit quad)
+ - not needed if other operand is register
+- `source` and `destination` are operands
+ - number and type depends on instruction
+ - registers (`%rax`, `%rsp`, `%al`)
+ - memory locations on CPU
+ - dereference pointers
+ - specify as `displacement(base, index, scale)`
+ - means `displacement+base+(index*scale)`
+ - `base`, `index` 64-bit registers
+ - `displacement` 32-bit constant or symbol (default 0)
+ - `scale` is 1, 2, 4, or 8 (default 1)
+ - general pointers `%rax`, %rbx`, `%rcx`, `%rdx`, `%rsi`, `%rdi`, `%r8`-`%r15`
+ - you can access different parts: for 64-bit `%rax`, low 32 bits is `%eax`, 16 bits is `%ax`, high 4 bits is `%ah`, low 4 bits is `%al`
+ - stack pointer `%rsp`
+ - frame pointer `%rbp`
+ - instruction pointer `%rip`
+ - flags register
+ - segment registers `%cs`, `%ds`, etc.
+ - system registers (`%crN`, `%drN`, MSRs -- only used in OS kernel)
+ - instruction set registers (`%stN`, `%mmN`, `%xmmN`, `%ymmN` -- only used with special instructions)
+ - memory (`0x401000`, `8(%4bp)`, `(%rdx, %rcx, 4)`)
+ - constants (`$42`, `$0x401000`, only for source)
+
+Intel uses little endian ordering -- from lowest address, you lay out bytes from the end (little address has end bytes)
+
+Signed integers expressed in 2's complement -- sign change by flipping bits and adding one.
+
+Comparisons:
+- `cmp src1, src2` is like `src2 - src1` but sets flags.
+- `test src1, src2` is like `src1 & src2` but sets flags
+- `lea src, dst` is `dst = &src` (`src` is in memory)
+
+Conditional jumps
+- form `jcc addr` (or `jncc` for not)
+- jumps to `addr` if `cc` holds, decided using flags register
+ - `e`/`z`: `result == 0`
+ - `b`: `dst` < `src` (unsigned, `a` for above)
+ - `l`: `dst` < `src` (signed, `g` for above)
+ - `s`: `result` < 0 (signed)
+
+## Data
+Data objects in data segment:
+
+```asm
+.data
+ myvar: .long 0x1234567, 0x23456789
+ bar: .word 0x1234
+ mystr: .asciz "foo"
+```
+
+## Stack frames
+Stack grows downwards (towards lower memory addresses).
+Stack pointer (`%rsp`) points to top of stack
+
+Stack composed of frames, which are pushed on stack during function calls.
+Address of current frame stored in frame pointer register (on Intel, `%rbp`)
+
+Each frame contains
+- function's actual parameters if not in registers (pushed in reverse order)
+- return address to jump to after function
+- pointer to previous frame
+- function's local variables
+
+Parameter passing in caller function for Linux
+- integers, pointers, small structs should best be passed via registers: `%rdi`, then `%rsi`, `%rdx`, `%rcx`, `%r8`, `%r9`
+- most other parameters pushed onto stack from right to left
+
+Prologue in called function
+- push old base pointer (`%rbp`) on to stack
+- set `%rbp` to current stack pointer
+- push callee-saved registers if you use them (Linux: `%rbx`, `%r12-%r15`)
+- move stack pointer to make room for local variables (e.g. `sub $n, %rsp` with n the size of local vars)
+- `enter` opcode does everything except pushing callee-saved registers
+
+Epilogue in called function
+- save result in `%rax`
+- restore old stack pointer from base pointer
+- restore callee-saved registers
+- restore old base pointer from stack
+- run `ret`
+- `leave` restores stack pointer and `ret`s
diff --git a/content/binary-malware-analysis-notes/gdb.md b/content/binary-malware-analysis-notes/gdb.md
@@ -0,0 +1,11 @@
++++
+title = 'GDB'
++++
+
+# GDB
+- `break` might not work because of anti-debug, set a hardware breakpoint with `hbreak` (but only once program running)
+- `starti` sets breakpoint on first instruction and runs
+- `bt` prints backtrace of all stack frames
+- `frame [n]` can switch you around frames
+- `watch` sets a watch point on something and breaks if the value changes
+- `next` skips function calls
diff --git a/content/binary-malware-analysis-notes/what-happens-before-main.md b/content/binary-malware-analysis-notes/what-happens-before-main.md
@@ -0,0 +1,42 @@
++++
+title = 'What happens before main()'
++++
+
+# What happens before main()
+Lots of things:
+1. Loaded calls preinitarray, then `_start`
+2. That calls `__libc_start_main`, which calls `__libc_csu_init`, which calls a bunch of other stuff
+3. After that, `__libc_start_main` calls `main`
+4. And then `exit` happens
+
+## Start at `_start`
+- often if you have `%ebp`, `%esi`, etc. and relatively small addresses, probably a 32-bit binary (stack used for argument passing)
+- `argc` popped into `%esi`
+- `argv` moved from to `%ecx`
+- stack pointer aligned to boundary
+- push arguments and call `__libc_start_main`
+ - this calls `__libc_init_first`, retrieving global variable `__environ`
+
+## `__libc_start_main`
+- handles security stuff for setuid/setgid
+- starts threading
+- registers `fini` and `rtld_fini` arguments to run via `at_exit` for cleanup
+- calls `init` argument
+- calls `main` with `argc` and `argv`
+- calls `exit` with return value of main
+
+## `__libc_csu_init`
+- the constructor of the program
+- calls `_init()`
+- calls array of function calls with `argc`, `argv`, and `envp`
+
+## `_init`
+- does a bunch of stuff, including global constructors (e.g. constructors for static C++ objects)
+
+## `exit`
+- runs functions registered with `atexit()` (in reverse order of registration)
+- runs all functions in `fini_array`
+- runs destructors
+
+
+