lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

assembly.md (4165B)


      1 +++
      2 title = 'Assembly'
      3 +++
      4 
      5 # Assembly
      6 Low-level processor-specific symbolic language.
      7 We focus on user-mode x86 64-bit assembly, AT&T syntax.
      8 
      9 Program composed of
     10 - instructions (actual operations)
     11 - directives: commands for assembler
     12     - `.data` is section with variables
     13     - `.text` is section with code
     14     - `.byte`/`.word`/`.long`/`.quad` defines integer (8/16/32/64 bits)
     15     - `.ascii`/`.asciz` defines string (without/with terminator)
     16 - labels: create symbol at current address
     17 - comments: everything after `#` is ignored
     18 
     19 ## Instructions
     20 Form: `mnemonic source, destination`
     21 - `mnemonic` is short code telling CPU what to do (`mov`, `add`, `push`, `pop`, `call`, `jmp`, etc.)
     22     - operand size specified as suffix to mnemonic (`b` for byte, `w` for 16-bit word, `l` for 32-bit long, `q` for 64-bit quad)
     23     - not needed if other operand is register
     24 - `source` and `destination` are operands
     25     - number and type depends on instruction
     26     - registers (`%rax`, `%rsp`, `%al`)
     27         - memory locations on CPU
     28             - dereference pointers
     29             - specify as `displacement(base, index, scale)`
     30             - means `displacement+base+(index*scale)`
     31                 - `base`, `index` 64-bit registers
     32                 - `displacement` 32-bit constant or symbol (default 0)
     33                 - `scale` is 1, 2, 4, or 8 (default 1)
     34         - general pointers `%rax`, %rbx`, `%rcx`, `%rdx`, `%rsi`, `%rdi`, `%r8`-`%r15`
     35             - you can access different parts: for 64-bit `%rax`, low 32 bits is `%eax`, 16 bits is `%ax`, high 4 bits is `%ah`, low 4 bits is `%al`
     36         - stack pointer `%rsp`
     37         - frame pointer `%rbp`
     38         - instruction pointer `%rip`
     39         - flags register
     40         - segment registers `%cs`, `%ds`, etc.
     41         - system registers (`%crN`, `%drN`, MSRs -- only used in OS kernel)
     42         - instruction set registers (`%stN`, `%mmN`, `%xmmN`, `%ymmN` -- only used with special instructions)
     43     - memory (`0x401000`, `8(%4bp)`, `(%rdx, %rcx, 4)`)
     44     - constants (`$42`, `$0x401000`, only for source)
     45 
     46 Intel uses little endian ordering -- from lowest address, you lay out bytes from the end (little address has end bytes)
     47 
     48 Signed integers expressed in 2's complement -- sign change by flipping bits and adding one.
     49 
     50 Comparisons:
     51 - `cmp src1, src2` is like `src2 - src1` but sets flags.
     52 - `test src1, src2` is like `src1 & src2` but sets flags
     53 - `lea src, dst` is `dst = &src` (`src` is in memory)
     54 
     55 Conditional jumps
     56 - form `jcc addr` (or `jncc` for not)
     57 - jumps to `addr` if `cc` holds, decided using flags register
     58     - `e`/`z`: `result == 0`
     59     - `b`: `dst` < `src` (unsigned, `a` for above)
     60     - `l`: `dst` < `src` (signed, `g` for above)
     61     - `s`: `result` < 0 (signed)
     62 
     63 ## Data
     64 Data objects in data segment:
     65 
     66 ```asm
     67 .data
     68     myvar: .long 0x1234567, 0x23456789
     69     bar: .word 0x1234
     70     mystr: .asciz "foo"
     71 ```
     72 
     73 ## Stack frames
     74 Stack grows downwards (towards lower memory addresses).
     75 Stack pointer (`%rsp`) points to top of stack
     76 
     77 Stack composed of frames, which are pushed on stack during function calls.
     78 Address of current frame stored in frame pointer register (on Intel, `%rbp`)
     79 
     80 Each frame contains
     81 - function's actual parameters if not in registers (pushed in reverse order)
     82 - return address to jump to after function
     83 - pointer to previous frame
     84 - function's local variables
     85 
     86 Parameter passing in caller function for Linux
     87 - integers, pointers, small structs should best be passed via registers: `%rdi`, then `%rsi`, `%rdx`, `%rcx`, `%r8`, `%r9`
     88 - most other parameters pushed onto stack from right to left
     89 
     90 Prologue in called function
     91 - push old base pointer (`%rbp`) on to stack
     92 - set `%rbp` to current stack pointer
     93 - push callee-saved registers if you use them (Linux: `%rbx`, `%r12-%r15`)
     94 - move stack pointer to make room for local variables (e.g. `sub $n, %rsp` with n the size of local vars)
     95 - `enter` opcode does everything except pushing callee-saved registers
     96 
     97 Epilogue in called function
     98 - save result in `%rax`
     99 - restore old stack pointer from base pointer
    100 - restore callee-saved registers
    101 - restore old base pointer from stack
    102 - run `ret`
    103 - `leave` restores stack pointer and `ret`s