
Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

assembly.md (4165B)

      1 +++
      2 title = 'Assembly'
      3 +++
      5 # Assembly
      6 Low-level processor-specific symbolic language.
      7 We focus on user-mode x86 64-bit assembly, AT&T syntax.
      9 Program composed of
     10 - instructions (actual operations)
     11 - directives: commands for assembler
     12     - `.data` is section with variables
     13     - `.text` is section with code
     14     - `.byte`/`.word`/`.long`/`.quad` defines integer (8/16/32/64 bits)
     15     - `.ascii`/`.asciz` defines string (without/with terminator)
     16 - labels: create symbol at current address
     17 - comments: everything after `#` is ignored
     19 ## Instructions
     20 Form: `mnemonic source, destination`
     21 - `mnemonic` is short code telling CPU what to do (`mov`, `add`, `push`, `pop`, `call`, `jmp`, etc.)
     22     - operand size specified as suffix to mnemonic (`b` for byte, `w` for 16-bit word, `l` for 32-bit long, `q` for 64-bit quad)
     23     - not needed if other operand is register
     24 - `source` and `destination` are operands
     25     - number and type depends on instruction
     26     - registers (`%rax`, `%rsp`, `%al`)
     27         - memory locations on CPU
     28             - dereference pointers
     29             - specify as `displacement(base, index, scale)`
     30             - means `displacement+base+(index*scale)`
     31                 - `base`, `index` 64-bit registers
     32                 - `displacement` 32-bit constant or symbol (default 0)
     33                 - `scale` is 1, 2, 4, or 8 (default 1)
     34         - general pointers `%rax`, %rbx`, `%rcx`, `%rdx`, `%rsi`, `%rdi`, `%r8`-`%r15`
     35             - you can access different parts: for 64-bit `%rax`, low 32 bits is `%eax`, 16 bits is `%ax`, high 4 bits is `%ah`, low 4 bits is `%al`
     36         - stack pointer `%rsp`
     37         - frame pointer `%rbp`
     38         - instruction pointer `%rip`
     39         - flags register
     40         - segment registers `%cs`, `%ds`, etc.
     41         - system registers (`%crN`, `%drN`, MSRs -- only used in OS kernel)
     42         - instruction set registers (`%stN`, `%mmN`, `%xmmN`, `%ymmN` -- only used with special instructions)
     43     - memory (`0x401000`, `8(%4bp)`, `(%rdx, %rcx, 4)`)
     44     - constants (`$42`, `$0x401000`, only for source)
     46 Intel uses little endian ordering -- from lowest address, you lay out bytes from the end (little address has end bytes)
     48 Signed integers expressed in 2's complement -- sign change by flipping bits and adding one.
     50 Comparisons:
     51 - `cmp src1, src2` is like `src2 - src1` but sets flags.
     52 - `test src1, src2` is like `src1 & src2` but sets flags
     53 - `lea src, dst` is `dst = &src` (`src` is in memory)
     55 Conditional jumps
     56 - form `jcc addr` (or `jncc` for not)
     57 - jumps to `addr` if `cc` holds, decided using flags register
     58     - `e`/`z`: `result == 0`
     59     - `b`: `dst` < `src` (unsigned, `a` for above)
     60     - `l`: `dst` < `src` (signed, `g` for above)
     61     - `s`: `result` < 0 (signed)
     63 ## Data
     64 Data objects in data segment:
     66 ```asm
     67 .data
     68     myvar: .long 0x1234567, 0x23456789
     69     bar: .word 0x1234
     70     mystr: .asciz "foo"
     71 ```
     73 ## Stack frames
     74 Stack grows downwards (towards lower memory addresses).
     75 Stack pointer (`%rsp`) points to top of stack
     77 Stack composed of frames, which are pushed on stack during function calls.
     78 Address of current frame stored in frame pointer register (on Intel, `%rbp`)
     80 Each frame contains
     81 - function's actual parameters if not in registers (pushed in reverse order)
     82 - return address to jump to after function
     83 - pointer to previous frame
     84 - function's local variables
     86 Parameter passing in caller function for Linux
     87 - integers, pointers, small structs should best be passed via registers: `%rdi`, then `%rsi`, `%rdx`, `%rcx`, `%r8`, `%r9`
     88 - most other parameters pushed onto stack from right to left
     90 Prologue in called function
     91 - push old base pointer (`%rbp`) on to stack
     92 - set `%rbp` to current stack pointer
     93 - push callee-saved registers if you use them (Linux: `%rbx`, `%r12-%r15`)
     94 - move stack pointer to make room for local variables (e.g. `sub $n, %rsp` with n the size of local vars)
     95 - `enter` opcode does everything except pushing callee-saved registers
     97 Epilogue in called function
     98 - save result in `%rax`
     99 - restore old stack pointer from base pointer
    100 - restore callee-saved registers
    101 - restore old base pointer from stack
    102 - run `ret`
    103 - `leave` restores stack pointer and `ret`s