index.md (7819B)
1 +++ 2 title = "State space search" 3 template = 'page-math.html' 4 +++ 5 # State space search 6 How do we find the solutions of previous problems? 7 * search the state space 8 * search through explicit tree generation: root is initial state, nodes/leaves generated through successor function 9 * search generates a graph 10 11 state: representation of a physical configuration 12 13 node: data structure in the search tree. it contains state, parent node, actions, etc. 14 15 ## Uninformed search strategies 16 strategy defines picking order of node expansion 17 18 uninformed methods: only use problem definition 19 20 informed methods: use heuristic function to estimate cost of solution 21 22 evaluating performance: 23 * does it always find a solution if there is one? (completeness) 24 * does it find least-cost solution? (optimality) 25 * how many nodes generated/expanded? (time complexity) 26 * how many nodes are stored in memory during search? (space complexity) 27 * complexity measured in terms of: 28 * b: max branching factor of search tree) 29 * d: depth of least cost solution 30 * m: max depth of state space, could be infinite 31 * time/space complexity measured in terms of max branching factor of search tree, depth of least cost solution, max depth of state space (could be infinite) 32 33 ![Summary of uninformed search algorithms](search-alg-summary.png) 34 35 ### Breadth-first (BF) search 36 * algorithm: 37 * expand shallowest unexpanded node 38 * implementation - fringe (nodes that have to be explored) is a FIFO queue 39 * evaluation: 40 * completeness: yes, if branching factor b is finite 41 * time complexity: if every state has b successors, and solution is at depth d, then $O(b^{d+1})$ because of number of nodes generated 42 * space complexity: shitty if every node has to be in memory - $O(b^{d+1})$ 43 * optimality: in general yes, unless actions have different cost 44 * memory requirements are bigger problem than execution time 45 46 ### Depth-first (DF) search 47 * algorithm: 48 * expand deepest unexpanded node 49 * implementation: fringe is a stack 50 * evaluation: 51 * completeness: no, unless search space is finite and no loops are possible 52 * time complexity: shitty if m is larger than d (depth of optimal solution) -- $O(b^m)$. but if many solutions, faster than BF 53 * space complexity: backtracking search uses less memory, one successor instead of all b -- $O(bm+1)$ 54 * optimality: no, same issues as completeness 55 56 ### Depth-limited search 57 * DF-search with depth limit l (nodes at depth l have no successors) 58 * solves infinite-path problem 59 * evaluation: 60 * time: $O(b^l)$ 61 * space: $O(bl)$ 62 * completeness: not if l < d 63 * optimality: not if if l > d 64 65 ### Iterative deepening search 66 * strategy to find best depth limit l 67 * often used in combination with DF search 68 * after each iteration, throw away everything and increase depth limit 69 * combines benefits of DF (space complexity) and BF search (time complexity) 70 * evaluation: 71 * completeness: yes (no infinite paths) 72 * time: $O(b^d)$ 73 * space: $O(bd)$ 74 75 ## Informed search (heuristics) 76 77 Heuristic function: "educated guess that reduces/limits search for solutions" 78 79 informedness property of heuristics: 80 * two heuristics $h_1(n),\; h_2(n)$ with $0 \leq h_1(n) \leq h_2(n) \leq h*(n)$ 81 * then $h_2(n)$ is more informed than $h_1(n)$ 82 * with $h_1$ fewer nodes have to be searched with $h_2$ 83 * but $h_2$ is often more expensive to calculate 84 * perfect heuristics: $h(n) = h*(n)$ 85 * trivial heuristics: $h(n) = 0$ 86 87 Best-first search 88 * the general approach of informed search 89 * node selected for expansion based on _evaluation function f(n)_ 90 * evaluation function measures distance to goal, choose node which appears best 91 * fringe is queue in order of decreasing desirability 92 93 ### A Search 94 best-known form of best-first search 95 96 avoid expanding paths that are already expensive 97 98 evaluation function: $f(n) = g(n) + h(n)$ 99 * g(n) the cost so far to reach node n 100 * h(n) estimated cost to get from node n to goal 101 * f(n) estimated total cost of path through node n to goal 102 103 ### A* Search 104 105 A search, but with an admissible heuristic 106 * heuristic is admissible if it _never_ overestimates the cost to get to goal 107 * admissible heuristics are optimistic 108 * formally: $h(n) \leq h*(n)$ where $h*(n)$ is true cost from n to goal 109 110 evaluation: 111 * complete: yes 112 * time: exponential with path length 113 * space: all nodes are stored 114 * optimal: yes 115 116 ## Adversarial search 117 search has no adversary, solution is a (heuristic) method for finding a goal 118 119 games have an adversary, solution is a strategy. time limits force an _approximate_ solution. 120 121 you need a function to evaluate the "goodness" of a game position 122 123 types of games: 124 125 | | deterministic | chance | 126 |-----------------------|------------------------------|-------------------------------------| 127 | perfect information | chess, checkers, go, othello | backgammon, monopoly | 128 | imperfect information | | bridge poker, scrabble, nuclear war | 129 130 ### Minimax 131 132 #### Setup 133 two players: MAX, MIN 134 135 MAX moves first, take turns until game is over. winner gets award, loser gets penalty. 136 137 how does this relate to search? 138 * initial state: game configuration e.g. with chess 139 * successor function: list of <move, state> pairs with legal moves 140 * terminal test: game finished? 141 * utility function: numerical value of terminal states (win +1, lose -1, draw 0) 142 * MAX uses search tree to determine next move 143 144 #### Optimal strategies 145 146 find contingent strategy for MAX assuming infallible MIN. 147 148 assume that both players play optimally. 149 150 given game tree, optimal strategy can be found with minimax value of each node: 151 152 ``` 153 minimax(n) = utility(n) if n is a terminal 154 minimax(max(successors of n)) if n is a max node 155 minimax(min(successors of n)) if n is a min node 156 ``` 157 158 #### Evaluation 159 - complete: yes 160 - time: $O(b^m)$ 161 - space: $O(bm)$ 162 - optimal: yes 163 164 ### Reducing problems of complexity with Minimax 165 166 #### Cutting off search: 167 instead of `if TERMINAL(state) then return UTILITY(state)` do `if CUTOFF-TEST(state, depth) then return EVAL(state)` 168 169 this introduces fixed-limit depth. also loses completeness! 170 171 utility: value based on quality of state 172 173 heuristics: value based on estimation of quality of state 174 175 heuristic EVAL: 176 * produces estimate of expected utility of a game from a given position 177 * should order terminal nodes in same way as utility 178 * computation shouldn't take too long 179 180 #### Alpha-Beta pruning (efficient Minimax) 181 182 with minimax, the number of states is exponential to number of moves 183 184 so, don't examine every node and prune the subtrees you don't have to examine 185 186 Alpha: value of best MAX choice so far 187 188 Beta: value of best MIN choice so far 189 190 you prune the rest of the level if, at any point, beta <= alpha. 191 192 pruning doesn't affect final results, entire subtrees can be pruned 193 194 good move ordering improves effectiveness of pruning 195 196 with 'perfect ordering', time complexity is: $O(b^{m/2})$ 197 198 ### Search with no or partial information 199 200 problems: 201 * contingency: percepts provide new info 202 * exploration: when states/actions of environment are unknown 203 * sensorless/conformant: agent may have no idea where it is 204 205 #### Perfect information Monte Carlo sampling (rdeep) 206 1. rollout: 207 * assume a belief state (with perfect info) 208 * play random game in that state. 209 2. average the rollouts 210 3. choose one with max average 211 212 ### Games with chance 213 ![Games with chance](games-chance.png) 214 215 ## Summary (Schnapsen) 216 Phase 2: minimax & alpha-beta pruning 217 218 Phase 1: PIMC sampling 219 220 what next? give the agent information about the game 221 222 ## Search direction 223 Data-driven: start with initial state (e.g. a maze) 224 225 Goal-driven: start with goal state, but has bigger branching factor (TODO confirm this) 226