lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

state-space-search.wiki (8134B)


      1 = State space search =
      2 How do we find the solutions of previous problems?
      3   * search the state space
      4   * search through explicit tree generation: root is initial state, nodes/leaves generated through successor function
      5   * search generates a graph
      6 
      7 state: representation of a physical configuration
      8 
      9 node: data structure in the search tree. it contains state, parent node, actions, etc.
     10 
     11 == Uninformed search strategies ==
     12 strategy defines picking order of node expansion
     13 
     14 uninformed methods: only use problem definition
     15 
     16 informed methods: use heuristic function to estimate cost of solution
     17 
     18 evaluating performance:
     19   * does it always find a solution if there is one? (completeness)
     20   * does it find least-cost solution? (optimality)
     21   * how many nodes generated/expanded? (time complexity)
     22   * how many nodes are stored in memory during search? (space complexity)
     23   * complexity measured in terms of:
     24       * b: max branching factor of search tree)
     25       * d: depth of least cost solution
     26       * m: max depth of state space, could be infinite
     27   * time/space complexity measured in terms of max branching factor of search tree, depth of least cost solution, max depth of state space (could be infinite)
     28 
     29 {{file:img/search-alg-summary.png|Summary of uninformed search algorithms}}
     30 
     31 === Breadth-first (BF) search ===
     32     * algorithm:
     33         * expand shallowest unexpanded node
     34         * implementation - fringe (nodes that have to be explored) is a FIFO queue
     35     * evaluation:
     36         * completeness: yes, if branching factor b is finite
     37         * time complexity: if every state has b successors, and solution is at depth d, then $O(b^{d+1})$ because of number of nodes generated
     38         * space complexity: shitty if every node has to be in memory - $O(b^{d+1})$
     39         * optimality: in general yes, unless actions have different cost
     40     * memory requirements are bigger problem than execution time
     41 
     42 === Depth-first (DF) search ===
     43     * algorithm:
     44         * expand deepest unexpanded node
     45         * implementation: fringe is a stack
     46     * evaluation:
     47         * completeness: no, unless search space is finite and no loops are possible
     48         * time complexity: shitty if m is larger than d (depth of optimal solution) -- $O(b^m)$. but if many solutions, faster than BF
     49         * space complexity: backtracking search uses less memory, one successor instead of all b -- $O(bm+1)$
     50         * optimality: no, same issues as completeness
     51 
     52 === Depth-limited search ===
     53     * DF-search with depth limit l (nodes at depth l have no successors)
     54     * solves infinite-path problem
     55     * evaluation:
     56         * time: $O(b^l)$
     57         * space: $O(bl)$
     58         * completeness: not if l < d
     59         * optimality: not if if l > d
     60 
     61 === Iterative deepening search ===
     62     * strategy to find best depth limit l
     63     * often used in combination with DF search
     64     * after each iteration, throw away everything and increase depth limit
     65     * combines benefits of DF (space complexity) and BF search (time complexity)
     66     * evaluation:
     67         * completeness: yes (no infinite paths)
     68         * time: $O(b^d)$
     69         * space: $O(bd)$
     70 
     71 == Informed search (heuristics) ==
     72 
     73 Heuristic function: "educated guess that reduces/limits search for solutions"
     74 
     75 informedness property of heuristics:
     76     * two heuristics $h_1(n),\; h_2(n)$ with $0 \leq h_1(n) \leq h_2(n) \leq h*(n)$
     77     * then $h_2(n)$ is more informed than $h_1(n)$
     78     * with $h_1$ fewer nodes have to be searched with $h_2$
     79     * but $h_2$ is often more expensive to calculate
     80     * perfect heuristics: $h(n) = h*(n)$
     81     * trivial heuristics: $h(n) = 0$
     82 
     83 Best-first search
     84     * the general approach of informed search
     85     * node selected for expansion based on _evaluation function f(n)_
     86     * evaluation function measures distance to goal, choose node which appears best
     87     * fringe is queue in order of decreasing desirability
     88 
     89 === A Search ===
     90 best-known form of best-first search
     91 
     92 avoid expanding paths that are already expensive
     93 
     94 evaluation function: $f(n) = g(n) + h(n)$
     95     * g(n) the cost so far to reach node n
     96     * h(n) estimated cost to get from node n to goal
     97     * f(n) estimated total cost of path through node n to goal
     98 
     99 === A* Search ===
    100 
    101 A search, but with an admissible heuristic
    102     * heuristic is admissible if it _never_ overestimates the cost to get to goal
    103     * admissible heuristics are optimistic
    104     * formally: $h(n) \leq h*(n)$ where $h*(n)$ is true cost from n to goal
    105 
    106 evaluation:
    107     * complete: yes
    108     * time: exponential with path length
    109     * space: all nodes are stored
    110     * optimal: yes
    111 
    112 == Adversarial search ==
    113 search has no adversary, solution is a (heuristic) method for finding a goal
    114 
    115 games have an adversary, solution is a strategy. time limits force an _approximate_ solution.
    116 
    117 you need a function to evaluate the "goodness" of a game position
    118 
    119 types of games:
    120 
    121 |                       | deterministic                | chance                              |
    122 |-----------------------|------------------------------|-------------------------------------|
    123 | perfect information   | chess, checkers, go, othello | backgammon, monopoly                |
    124 | imperfect information |                              | bridge poker, scrabble, nuclear war |
    125 
    126 === Minimax ===
    127 
    128 ==== Setup ====
    129 two players: MAX, MIN
    130 
    131 MAX moves first, take turns until game is over. winner gets award, loser gets penalty.
    132 
    133 how does this relate to search?
    134     * initial state: game configuration e.g. with chess
    135     * successor function: list of <move, state> pairs with legal moves
    136     * terminal test: game finished?
    137     * utility function: numerical value of terminal states (win +1, lose -1, draw 0)
    138     * MAX uses search tree to determine next move
    139 
    140 ==== Optimal strategies ====
    141 
    142 find contingent strategy for MAX assuming infallible MIN.
    143 
    144 assume that both players play optimally.
    145 
    146 given game tree, optimal strategy can be found with minimax value of each node:
    147 
    148 {{{
    149     minimax(n) = utility(n)                      if n is a terminal
    150                  minimax(max(successors of n))   if n is a max node
    151                  minimax(min(successors of n))   if n is a min node
    152 }}}
    153 
    154 ==== Evaluation ====
    155 - complete: yes
    156 - time: $O(b^m)$
    157 - space: $O(bm)$
    158 - optimal: yes
    159 
    160 === Reducing problems of complexity with Minimax===
    161 
    162 ==== Cutting off search: ====
    163 instead of `if TERMINAL(state) then return UTILITY(state)` do `if CUTOFF-TEST(state, depth) then return EVAL(state)`
    164 
    165 this introduces fixed-limit depth. also loses completeness!
    166 
    167 utility: value based on quality of state
    168 
    169 heuristics: value based on estimation of quality of state
    170 
    171 heuristic EVAL:
    172     * produces estimate of expected utility of a game from a given position
    173     * should order terminal nodes in same way as utility
    174     * computation shouldn't take too long
    175 
    176 ==== Alpha-Beta pruning (efficient Minimax) ====
    177 
    178 with minimax, the number of states is exponential to number of moves
    179 
    180 so, don't examine every node and prune the subtrees you don't have to examine
    181 
    182 Alpha: value of best MAX choice so far
    183 
    184 Beta: value of best MIN choice so far
    185 
    186 you prune the rest of the level if, at any point, beta <= alpha.
    187 
    188 pruning doesn't affect final results, entire subtrees can be pruned
    189 
    190 good move ordering improves effectiveness of pruning
    191 
    192 with 'perfect ordering', time complexity is: $O(b^{m/2})$
    193 
    194 === Search with no or partial information ===
    195 
    196 problems:
    197     * contingency: percepts provide new info
    198     * exploration: when states/actions of environment are unknown
    199     * sensorless/conformant: agent may have no idea where it is
    200 
    201 ==== Perfect information Monte Carlo sampling (rdeep) ====
    202 1. rollout:
    203     * assume a belief state (with perfect info)
    204     * play random game in that state.
    205 2. average the rollouts
    206 3. choose one with max average
    207 
    208 === Games with chance ===
    209 {{file:img/games-chance.png|Games with chance}}
    210 
    211 == Summary (Schnapsen) ==
    212 Phase 2: minimax & alpha-beta pruning
    213 
    214 Phase 1: PIMC sampling
    215 
    216 what next? give the agent information about the game
    217 
    218 == Search direction ==
    219 Data-driven: start with initial state (e.g. a maze)
    220 
    221 Goal-driven: start with goal state, but has bigger branching factor (TODO confirm this)
    222