lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (7819B)


      1 +++
      2 title = "State space search"
      3 template = 'page-math.html'
      4 +++
      5 # State space search
      6 How do we find the solutions of previous problems?
      7 * search the state space
      8 * search through explicit tree generation: root is initial state, nodes/leaves generated through successor function
      9 * search generates a graph
     10 
     11 state: representation of a physical configuration
     12 
     13 node: data structure in the search tree. it contains state, parent node, actions, etc.
     14 
     15 ## Uninformed search strategies
     16 strategy defines picking order of node expansion
     17 
     18 uninformed methods: only use problem definition
     19 
     20 informed methods: use heuristic function to estimate cost of solution
     21 
     22 evaluating performance:
     23 * does it always find a solution if there is one? (completeness)
     24 * does it find least-cost solution? (optimality)
     25 * how many nodes generated/expanded? (time complexity)
     26 * how many nodes are stored in memory during search? (space complexity)
     27 * complexity measured in terms of:
     28     * b: max branching factor of search tree)
     29     * d: depth of least cost solution
     30     * m: max depth of state space, could be infinite
     31 * time/space complexity measured in terms of max branching factor of search tree, depth of least cost solution, max depth of state space (could be infinite)
     32 
     33 ![Summary of uninformed search algorithms](search-alg-summary.png)
     34 
     35 ### Breadth-first (BF) search
     36 * algorithm:
     37     * expand shallowest unexpanded node
     38     * implementation - fringe (nodes that have to be explored) is a FIFO queue
     39 * evaluation:
     40     * completeness: yes, if branching factor b is finite
     41     * time complexity: if every state has b successors, and solution is at depth d, then $O(b^{d+1})$ because of number of nodes generated
     42     * space complexity: shitty if every node has to be in memory - $O(b^{d+1})$
     43     * optimality: in general yes, unless actions have different cost
     44 * memory requirements are bigger problem than execution time
     45 
     46 ### Depth-first (DF) search
     47 * algorithm:
     48     * expand deepest unexpanded node
     49     * implementation: fringe is a stack
     50 * evaluation:
     51     * completeness: no, unless search space is finite and no loops are possible
     52     * time complexity: shitty if m is larger than d (depth of optimal solution) -- $O(b^m)$. but if many solutions, faster than BF
     53     * space complexity: backtracking search uses less memory, one successor instead of all b -- $O(bm+1)$
     54     * optimality: no, same issues as completeness
     55 
     56 ### Depth-limited search
     57 * DF-search with depth limit l (nodes at depth l have no successors)
     58 * solves infinite-path problem
     59 * evaluation:
     60     * time: $O(b^l)$
     61     * space: $O(bl)$
     62     * completeness: not if l < d
     63     * optimality: not if if l > d
     64 
     65 ### Iterative deepening search
     66 * strategy to find best depth limit l
     67 * often used in combination with DF search
     68 * after each iteration, throw away everything and increase depth limit
     69 * combines benefits of DF (space complexity) and BF search (time complexity)
     70 * evaluation:
     71     * completeness: yes (no infinite paths)
     72     * time: $O(b^d)$
     73     * space: $O(bd)$
     74 
     75 ## Informed search (heuristics)
     76 
     77 Heuristic function: "educated guess that reduces/limits search for solutions"
     78 
     79 informedness property of heuristics:
     80 * two heuristics $h_1(n),\; h_2(n)$ with $0 \leq h_1(n) \leq h_2(n) \leq h*(n)$
     81 * then $h_2(n)$ is more informed than $h_1(n)$
     82 * with $h_1$ fewer nodes have to be searched with $h_2$
     83 * but $h_2$ is often more expensive to calculate
     84 * perfect heuristics: $h(n) = h*(n)$
     85 * trivial heuristics: $h(n) = 0$
     86 
     87 Best-first search
     88 * the general approach of informed search
     89 * node selected for expansion based on _evaluation function f(n)_
     90 * evaluation function measures distance to goal, choose node which appears best
     91 * fringe is queue in order of decreasing desirability
     92 
     93 ### A Search
     94 best-known form of best-first search
     95 
     96 avoid expanding paths that are already expensive
     97 
     98 evaluation function: $f(n) = g(n) + h(n)$
     99 * g(n) the cost so far to reach node n
    100 * h(n) estimated cost to get from node n to goal
    101 * f(n) estimated total cost of path through node n to goal
    102 
    103 ### A* Search
    104 
    105 A search, but with an admissible heuristic
    106 * heuristic is admissible if it _never_ overestimates the cost to get to goal
    107 * admissible heuristics are optimistic
    108 * formally: $h(n) \leq h*(n)$ where $h*(n)$ is true cost from n to goal
    109 
    110 evaluation:
    111 * complete: yes
    112 * time: exponential with path length
    113 * space: all nodes are stored
    114 * optimal: yes
    115 
    116 ## Adversarial search
    117 search has no adversary, solution is a (heuristic) method for finding a goal
    118 
    119 games have an adversary, solution is a strategy. time limits force an _approximate_ solution.
    120 
    121 you need a function to evaluate the "goodness" of a game position
    122 
    123 types of games:
    124 
    125 |                       | deterministic                | chance                              |
    126 |-----------------------|------------------------------|-------------------------------------|
    127 | perfect information   | chess, checkers, go, othello | backgammon, monopoly                |
    128 | imperfect information |                              | bridge poker, scrabble, nuclear war |
    129 
    130 ### Minimax
    131 
    132 #### Setup
    133 two players: MAX, MIN
    134 
    135 MAX moves first, take turns until game is over. winner gets award, loser gets penalty.
    136 
    137 how does this relate to search?
    138 * initial state: game configuration e.g. with chess
    139 * successor function: list of <move, state> pairs with legal moves
    140 * terminal test: game finished?
    141 * utility function: numerical value of terminal states (win +1, lose -1, draw 0)
    142 * MAX uses search tree to determine next move
    143 
    144 #### Optimal strategies
    145 
    146 find contingent strategy for MAX assuming infallible MIN.
    147 
    148 assume that both players play optimally.
    149 
    150 given game tree, optimal strategy can be found with minimax value of each node:
    151 
    152 ```
    153     minimax(n) = utility(n)                      if n is a terminal
    154                  minimax(max(successors of n))   if n is a max node
    155                  minimax(min(successors of n))   if n is a min node
    156 ```
    157 
    158 #### Evaluation
    159 - complete: yes
    160 - time: $O(b^m)$
    161 - space: $O(bm)$
    162 - optimal: yes
    163 
    164 ### Reducing problems of complexity with Minimax
    165 
    166 #### Cutting off search:
    167 instead of `if TERMINAL(state) then return UTILITY(state)` do `if CUTOFF-TEST(state, depth) then return EVAL(state)`
    168 
    169 this introduces fixed-limit depth. also loses completeness!
    170 
    171 utility: value based on quality of state
    172 
    173 heuristics: value based on estimation of quality of state
    174 
    175 heuristic EVAL:
    176 * produces estimate of expected utility of a game from a given position
    177 * should order terminal nodes in same way as utility
    178 * computation shouldn't take too long
    179 
    180 #### Alpha-Beta pruning (efficient Minimax)
    181 
    182 with minimax, the number of states is exponential to number of moves
    183 
    184 so, don't examine every node and prune the subtrees you don't have to examine
    185 
    186 Alpha: value of best MAX choice so far
    187 
    188 Beta: value of best MIN choice so far
    189 
    190 you prune the rest of the level if, at any point, beta <= alpha.
    191 
    192 pruning doesn't affect final results, entire subtrees can be pruned
    193 
    194 good move ordering improves effectiveness of pruning
    195 
    196 with 'perfect ordering', time complexity is: $O(b^{m/2})$
    197 
    198 ### Search with no or partial information
    199 
    200 problems:
    201 * contingency: percepts provide new info
    202 * exploration: when states/actions of environment are unknown
    203 * sensorless/conformant: agent may have no idea where it is
    204 
    205 #### Perfect information Monte Carlo sampling (rdeep)
    206 1. rollout:
    207     * assume a belief state (with perfect info)
    208     * play random game in that state.
    209 2. average the rollouts
    210 3. choose one with max average
    211 
    212 ### Games with chance
    213 ![Games with chance](games-chance.png)
    214 
    215 ## Summary (Schnapsen)
    216 Phase 2: minimax & alpha-beta pruning
    217 
    218 Phase 1: PIMC sampling
    219 
    220 what next? give the agent information about the game
    221 
    222 ## Search direction
    223 Data-driven: start with initial state (e.g. a maze)
    224 
    225 Goal-driven: start with goal state, but has bigger branching factor (TODO confirm this)
    226