lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

tracking-control-flow.md (3441B)


      1 +++
      2 title = 'Tracking control flow'
      3 +++
      4 # Tracking control flow
      5 Explore execution paths in the program.
      6 
      7 Control flow graph (CFG) of each function:
      8 - represent potential flow of control inside function
      9 
     10 Call graph
     11 - represent potential flow of control between functions
     12 
     13 Basic block: maximal sequence of instructions that execute one-by-one in order
     14 
     15 Control flow graph:
     16 - nodes are basic blocks with one entry point and one exit point
     17 - directed edges indicate possible control flow
     18 
     19 Call graph
     20 - nodes are functions
     21 - directed edges show potential for one function to invoke another
     22 
     23 Identifying a function
     24 - ideally:
     25     - set a of basic blocks with single entry point
     26     - reached using a call instruction
     27     - ends with ret instructions
     28 - in reality, might be reached using jump, might share blocks with other functions, might have multiple entries
     29 
     30 ## Problems
     31 - Pointer-based control transfer: conditionally set a function pointer, then call function pointer
     32 - non-returning calls: some functions won't return, code following call site may not be valid
     33 - non-contiguous code sections: could have jump tables, data, unparsed code, junk bytes...
     34 - tail calls: `return f(x)`
     35 - gaps in binary: might contain undiscovered functions
     36 - shared code and multiple entry representation
     37 
     38 You should:
     39 - run the program multiple times, observe targets of indirect jumps and calls
     40 - look for function prologue sequences (`push %rbp`, `mov %rsp, %rbp`)
     41 
     42 ## Loops
     43 Loop: subgraph of CFG that is strongly connected (path from any node to any other node), single entry
     44 
     45 Two loops are either disjoint, or one is completely nested in the other.
     46 
     47 Identifying loops:
     48 - dominance: node a dominates b if all paths to b go through a (strictly if a and b are different, immediate if it's right before)
     49 
     50 Natural loops: single head node
     51 
     52 Dominator set D(i) = {i} ∪ (intersection for p ∈ Pred(i) of D(p)), where Pred is set of immediate predecessors
     53 
     54 Dominator tree: each node's children are nodes it immediately dominates
     55 
     56 Computing loops:
     57 1. compute dominator info, store it in a dominator tree
     58 2. identify back edges
     59 3. construct natural loops corresponding to back edges
     60 
     61 ## Data-flow analysis
     62 Flow sensitivity
     63 - flow-sensitive analysis computes for each program point whatever you want to analyze
     64 - flow-insensitive pointer analysis computes what memory locations pointers may refer to, at any time in program execution
     65 
     66 Context-sensitive analysis considers calling context when analyzing target of function call
     67 
     68 ### Reaching definitions
     69 Which data definitions can reach a point in the program.
     70 
     71 - For each basic block determine:
     72     - which definitions the block generates (gen set)
     73     - which it kills (kill set)
     74 - from this local solution build global one. determine:
     75     - which definitions from anywhere can reach the start of a basic block
     76         - in[B] = Union for p ∈ pred[B] of out[p]
     77     - which definitions can still be alive after the basic block
     78         - out[B] = gen[B] ∪ (in[B] - kill[B])
     79 - need to solve iteratively
     80 
     81 Definition-Use chains: definition of variable and all uses, reachable from that definition without other intervening definitions
     82 - compute: for each definition D, compute chain of uses that D may reach
     83 
     84 Use-definitions chains: use of variable and all definitions of that variable, that can reach that use without other intervening definitions
     85 - compute: for each use U, compute chain of definitions that reach U