index.md (2201B)
1 +++ 2 title = 'Symbolic execution' 3 +++ 4 5 # Symbolic execution 6 Execute programs with symbols instead of values 7 - program state (memory, registers, etc.) is symbolic and bound by constraints 8 - have set of possible values 9 - subject to constraints, i.e. mathematical restrictions on allowed values 10 - symbolic variables/expressions are immutable 11 - perform all arithmetic & boolean operations symbolically 12 - output symbolic expressions encoding operation performed 13 - fork state when branch depends on symbol 14 - unconditional branch -- direct control flow 15 - concrete conditional branch -- follow condition 16 - symbolic conditional branch -- take both paths 17 - fork program state and set IP to either path 18 - add appropriate branch condition as path constraint 19 - eventually explores all possible paths 20 21 ![Symbolic execution example](symbolic-execution-example.png) 22 23 - solve constraints to get values when needed ('concretization') 24 25 Problems with symbolic execution: 26 - state explosion: number of possible paths in a program is huge, increases with number of branches 27 - dealing with I/O: do nothing? output? 28 - modelling environment and hardware (and simulated hardware isn't the same) 29 - constraint solving: not all constraints are linear, NP-complete 30 31 Basic optimisations: 32 - expression simplification: use algebra to bring down to minimal form 33 - state merging: mitigate state explosion when little difference between states 34 35 Concolic (concrete + symbolic) execution: 36 - repeat: 37 - dynamic analysis to capture trace for concrete run 38 - symbolic execution on traced instructions, yielding path constraints 39 - dynamic analysis to capture trace for concrete run 40 - symbolic execution on traced instructions, yielding path constraints 41 - adjust constraints by flipping last encountered symbolic branch 42 - solve constraints of flipped branch, yielding inputs for new path 43 - advantages: no state explosion so much faster, symbolize only what you need 44 - drawbacks: need good initial input, can only flip branches controlled by symbolic variables, constraint solving is still hard 45 46 SMT solvers: Z3, CVC4, STP 47 48 SymbEx engines: KLEE, S2E, Angr, Triton 49