assembly.md (4165B)
1 +++ 2 title = 'Assembly' 3 +++ 4 5 # Assembly 6 Low-level processor-specific symbolic language. 7 We focus on user-mode x86 64-bit assembly, AT&T syntax. 8 9 Program composed of 10 - instructions (actual operations) 11 - directives: commands for assembler 12 - `.data` is section with variables 13 - `.text` is section with code 14 - `.byte`/`.word`/`.long`/`.quad` defines integer (8/16/32/64 bits) 15 - `.ascii`/`.asciz` defines string (without/with terminator) 16 - labels: create symbol at current address 17 - comments: everything after `#` is ignored 18 19 ## Instructions 20 Form: `mnemonic source, destination` 21 - `mnemonic` is short code telling CPU what to do (`mov`, `add`, `push`, `pop`, `call`, `jmp`, etc.) 22 - operand size specified as suffix to mnemonic (`b` for byte, `w` for 16-bit word, `l` for 32-bit long, `q` for 64-bit quad) 23 - not needed if other operand is register 24 - `source` and `destination` are operands 25 - number and type depends on instruction 26 - registers (`%rax`, `%rsp`, `%al`) 27 - memory locations on CPU 28 - dereference pointers 29 - specify as `displacement(base, index, scale)` 30 - means `displacement+base+(index*scale)` 31 - `base`, `index` 64-bit registers 32 - `displacement` 32-bit constant or symbol (default 0) 33 - `scale` is 1, 2, 4, or 8 (default 1) 34 - general pointers `%rax`, %rbx`, `%rcx`, `%rdx`, `%rsi`, `%rdi`, `%r8`-`%r15` 35 - you can access different parts: for 64-bit `%rax`, low 32 bits is `%eax`, 16 bits is `%ax`, high 4 bits is `%ah`, low 4 bits is `%al` 36 - stack pointer `%rsp` 37 - frame pointer `%rbp` 38 - instruction pointer `%rip` 39 - flags register 40 - segment registers `%cs`, `%ds`, etc. 41 - system registers (`%crN`, `%drN`, MSRs -- only used in OS kernel) 42 - instruction set registers (`%stN`, `%mmN`, `%xmmN`, `%ymmN` -- only used with special instructions) 43 - memory (`0x401000`, `8(%4bp)`, `(%rdx, %rcx, 4)`) 44 - constants (`$42`, `$0x401000`, only for source) 45 46 Intel uses little endian ordering -- from lowest address, you lay out bytes from the end (little address has end bytes) 47 48 Signed integers expressed in 2's complement -- sign change by flipping bits and adding one. 49 50 Comparisons: 51 - `cmp src1, src2` is like `src2 - src1` but sets flags. 52 - `test src1, src2` is like `src1 & src2` but sets flags 53 - `lea src, dst` is `dst = &src` (`src` is in memory) 54 55 Conditional jumps 56 - form `jcc addr` (or `jncc` for not) 57 - jumps to `addr` if `cc` holds, decided using flags register 58 - `e`/`z`: `result == 0` 59 - `b`: `dst` < `src` (unsigned, `a` for above) 60 - `l`: `dst` < `src` (signed, `g` for above) 61 - `s`: `result` < 0 (signed) 62 63 ## Data 64 Data objects in data segment: 65 66 ```asm 67 .data 68 myvar: .long 0x1234567, 0x23456789 69 bar: .word 0x1234 70 mystr: .asciz "foo" 71 ``` 72 73 ## Stack frames 74 Stack grows downwards (towards lower memory addresses). 75 Stack pointer (`%rsp`) points to top of stack 76 77 Stack composed of frames, which are pushed on stack during function calls. 78 Address of current frame stored in frame pointer register (on Intel, `%rbp`) 79 80 Each frame contains 81 - function's actual parameters if not in registers (pushed in reverse order) 82 - return address to jump to after function 83 - pointer to previous frame 84 - function's local variables 85 86 Parameter passing in caller function for Linux 87 - integers, pointers, small structs should best be passed via registers: `%rdi`, then `%rsi`, `%rdx`, `%rcx`, `%r8`, `%r9` 88 - most other parameters pushed onto stack from right to left 89 90 Prologue in called function 91 - push old base pointer (`%rbp`) on to stack 92 - set `%rbp` to current stack pointer 93 - push callee-saved registers if you use them (Linux: `%rbx`, `%r12-%r15`) 94 - move stack pointer to make room for local variables (e.g. `sub $n, %rsp` with n the size of local vars) 95 - `enter` opcode does everything except pushing callee-saved registers 96 97 Epilogue in called function 98 - save result in `%rax` 99 - restore old stack pointer from base pointer 100 - restore callee-saved registers 101 - restore old base pointer from stack 102 - run `ret` 103 - `leave` restores stack pointer and `ret`s