lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

commit dd31f0b32bc7c550a195f94815e55674380614cd
parent eeb0c56f7066780433945829620fd8d1cf425a61
Author: Alex Balgavy <alex@balgavy.eu>
Date:   Sun,  4 Apr 2021 13:36:44 +0200

Update BAMA notes

Diffstat:
Mcontent/binary-malware-analysis-notes/_index.md | 1+
Acontent/binary-malware-analysis-notes/packers.md | 108+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 109 insertions(+), 0 deletions(-)

diff --git a/content/binary-malware-analysis-notes/_index.md b/content/binary-malware-analysis-notes/_index.md @@ -9,3 +9,4 @@ title = 'Binary and Malware Analysis' 4. [GDB](gdb) 5. [Anti-analysis](anti-analysis) 6. [Disassembly tools](disassembly-tools) +7. [Packers](packers) diff --git a/content/binary-malware-analysis-notes/packers.md b/content/binary-malware-analysis-notes/packers.md @@ -0,0 +1,108 @@ ++++ +title = 'Packers' ++++ + +# Packers +## Binary packers +Packer takes binary program and makes a new program that has unpacker and packed version of P. +- the loader loads the new binary (unpacker), the unpacker unpacks and loads original program + +## What's a binary? +A binary is code in binary format (PE for Windows, ELF for Linux, Mach-O for Mac). + +The format +- defines what the file looks like on disk and in memory +- contains info about machine to run it on, executable/library, entry point, sections + +ELF format: +- used for executables, libraries, and others, on many architectures and OSes +- dual nature + - view on logical sections: described by section header table (`.data`, `.text`, `.bss`, etc.) + - view on structure in memory: what segments are executable and which are read/write (data), how large they are -- described by program header table +- structure + - elf header at beginning: magic number `7F 45 4C 46`, file type, architecture, entry point, program and section headers offset, string table offset + - program headers divide data in segments, providing easy mapping from data to memory + - array of structures for type of segment, position in ELF file, address in memory, physical address, size on disk, size in memory, flags for r/w/x, alignment in memory + - section headers define sections + - one entry for each section: index in string table, what kind of info it has, flags for write/alloc/exec, base address in memory, location in elf file, some other info +- elf program headers have everything that kernel needs to load file +- sections: + - examples: + - `.text`: code + - `.data`: initialised data + - `.bss`: uninitialized data + - `.got`/`.plt`: for dynamic linking + - `.ctors`/`.dtors`: constructors/destructors + - used at link time + - do not have predefined structure, but described by section headers that do +- symbol tables: + - SYMTAB: contains all symbols needed to link/debug files, not needed for running + - DYNSYM: contains symbols for dynamic linking, loaded in memory at runtime so as small as possible + +### Stripped binaries +Symbol table can be removed with `strip -s <program>` +- dynamic table has to be preserved for functions imported from shared libraries +- all names of functions and variables gone + +### Functions and global symbols +Address of global symbols imported from external libraries computed when binary loaded in memory +- can relocate or PIC (code freely relocatable, adds level of indirection via global offset table and procedure linking table) +- every time code has to reference global symbol, uses Global Offset Table (GOT, `.got`) in data section +- at runtime, GOT entries modified by dynamic linker to point to intended data + +If code needs to call function in different module, dynamic linker creates array of read-only jump stubs: Procedure Linking Table (PLT, `.plt`) +- stubs use GOT entries to call right function +- lazy binding: initially point to resolver in `.plt counterpart` +- relocation confined to `.got` and `.got.plt` rather than `.text` + +### Process creation in Linux +- kernel loads segments defined by program headers into memory + - if interpreter defined, load it too +- kernel sets up stack and starts at interpreter's entry point + - if no interpreter, use process' entry point + +### ELF auxiliary vectors +Mechanism to transfer kernel level info to user processes (such as pointer to system call entry point in memory). + +ELF Loader: +- parses ELF file +- maps various program segments in memory +- sets up entry point, initializes process stack +- puts ELF auxiliary vectors on process stack, along with argc, argv, envp + +## Packers +Initially for compression, but convenient for malware to evade antivirus, and many packers also have anti-debugging techniques. + +We want to run the malware, let it unpack itself, and dump memory at the right moment (when it's completely unpacked). +- the right moment is when you have "normal behavior" +- check system calls, e.g. using `strace` +- you can dump memory in gdb: `dump binary memory dump_name start end` + +## Analysing a binary +### Static +`file`: determine file type + +`readelf`: display information about contents of ELF files +- `-h`: file header +- `-l` program headers +- `-S`: sections headers +- `-s`: symbol table + +`ldd`: print shared libraries + +`nm`: list symbols from object files +- `-D`: list dynamic symbols + +`strings`: print strings of printable characters + +### Dynamic methods +`/proc/<pid>`: general information about process with `<pid>` +- `/cmdline`: command line +- `/environ`: environment +- `/maps`: memory map + +`strace`: tracks system calls performed by process +- can also follow child process, show signals, decode syscall arguments +- `-i`: print instruction pointer at time of syscall + +`ltrace`: tracks dynamically linked library calls