state-space-search.wiki (8134B)
1 = State space search = 2 How do we find the solutions of previous problems? 3 * search the state space 4 * search through explicit tree generation: root is initial state, nodes/leaves generated through successor function 5 * search generates a graph 6 7 state: representation of a physical configuration 8 9 node: data structure in the search tree. it contains state, parent node, actions, etc. 10 11 == Uninformed search strategies == 12 strategy defines picking order of node expansion 13 14 uninformed methods: only use problem definition 15 16 informed methods: use heuristic function to estimate cost of solution 17 18 evaluating performance: 19 * does it always find a solution if there is one? (completeness) 20 * does it find least-cost solution? (optimality) 21 * how many nodes generated/expanded? (time complexity) 22 * how many nodes are stored in memory during search? (space complexity) 23 * complexity measured in terms of: 24 * b: max branching factor of search tree) 25 * d: depth of least cost solution 26 * m: max depth of state space, could be infinite 27 * time/space complexity measured in terms of max branching factor of search tree, depth of least cost solution, max depth of state space (could be infinite) 28 29 {{file:img/search-alg-summary.png|Summary of uninformed search algorithms}} 30 31 === Breadth-first (BF) search === 32 * algorithm: 33 * expand shallowest unexpanded node 34 * implementation - fringe (nodes that have to be explored) is a FIFO queue 35 * evaluation: 36 * completeness: yes, if branching factor b is finite 37 * time complexity: if every state has b successors, and solution is at depth d, then $O(b^{d+1})$ because of number of nodes generated 38 * space complexity: shitty if every node has to be in memory - $O(b^{d+1})$ 39 * optimality: in general yes, unless actions have different cost 40 * memory requirements are bigger problem than execution time 41 42 === Depth-first (DF) search === 43 * algorithm: 44 * expand deepest unexpanded node 45 * implementation: fringe is a stack 46 * evaluation: 47 * completeness: no, unless search space is finite and no loops are possible 48 * time complexity: shitty if m is larger than d (depth of optimal solution) -- $O(b^m)$. but if many solutions, faster than BF 49 * space complexity: backtracking search uses less memory, one successor instead of all b -- $O(bm+1)$ 50 * optimality: no, same issues as completeness 51 52 === Depth-limited search === 53 * DF-search with depth limit l (nodes at depth l have no successors) 54 * solves infinite-path problem 55 * evaluation: 56 * time: $O(b^l)$ 57 * space: $O(bl)$ 58 * completeness: not if l < d 59 * optimality: not if if l > d 60 61 === Iterative deepening search === 62 * strategy to find best depth limit l 63 * often used in combination with DF search 64 * after each iteration, throw away everything and increase depth limit 65 * combines benefits of DF (space complexity) and BF search (time complexity) 66 * evaluation: 67 * completeness: yes (no infinite paths) 68 * time: $O(b^d)$ 69 * space: $O(bd)$ 70 71 == Informed search (heuristics) == 72 73 Heuristic function: "educated guess that reduces/limits search for solutions" 74 75 informedness property of heuristics: 76 * two heuristics $h_1(n),\; h_2(n)$ with $0 \leq h_1(n) \leq h_2(n) \leq h*(n)$ 77 * then $h_2(n)$ is more informed than $h_1(n)$ 78 * with $h_1$ fewer nodes have to be searched with $h_2$ 79 * but $h_2$ is often more expensive to calculate 80 * perfect heuristics: $h(n) = h*(n)$ 81 * trivial heuristics: $h(n) = 0$ 82 83 Best-first search 84 * the general approach of informed search 85 * node selected for expansion based on _evaluation function f(n)_ 86 * evaluation function measures distance to goal, choose node which appears best 87 * fringe is queue in order of decreasing desirability 88 89 === A Search === 90 best-known form of best-first search 91 92 avoid expanding paths that are already expensive 93 94 evaluation function: $f(n) = g(n) + h(n)$ 95 * g(n) the cost so far to reach node n 96 * h(n) estimated cost to get from node n to goal 97 * f(n) estimated total cost of path through node n to goal 98 99 === A* Search === 100 101 A search, but with an admissible heuristic 102 * heuristic is admissible if it _never_ overestimates the cost to get to goal 103 * admissible heuristics are optimistic 104 * formally: $h(n) \leq h*(n)$ where $h*(n)$ is true cost from n to goal 105 106 evaluation: 107 * complete: yes 108 * time: exponential with path length 109 * space: all nodes are stored 110 * optimal: yes 111 112 == Adversarial search == 113 search has no adversary, solution is a (heuristic) method for finding a goal 114 115 games have an adversary, solution is a strategy. time limits force an _approximate_ solution. 116 117 you need a function to evaluate the "goodness" of a game position 118 119 types of games: 120 121 | | deterministic | chance | 122 |-----------------------|------------------------------|-------------------------------------| 123 | perfect information | chess, checkers, go, othello | backgammon, monopoly | 124 | imperfect information | | bridge poker, scrabble, nuclear war | 125 126 === Minimax === 127 128 ==== Setup ==== 129 two players: MAX, MIN 130 131 MAX moves first, take turns until game is over. winner gets award, loser gets penalty. 132 133 how does this relate to search? 134 * initial state: game configuration e.g. with chess 135 * successor function: list of <move, state> pairs with legal moves 136 * terminal test: game finished? 137 * utility function: numerical value of terminal states (win +1, lose -1, draw 0) 138 * MAX uses search tree to determine next move 139 140 ==== Optimal strategies ==== 141 142 find contingent strategy for MAX assuming infallible MIN. 143 144 assume that both players play optimally. 145 146 given game tree, optimal strategy can be found with minimax value of each node: 147 148 {{{ 149 minimax(n) = utility(n) if n is a terminal 150 minimax(max(successors of n)) if n is a max node 151 minimax(min(successors of n)) if n is a min node 152 }}} 153 154 ==== Evaluation ==== 155 - complete: yes 156 - time: $O(b^m)$ 157 - space: $O(bm)$ 158 - optimal: yes 159 160 === Reducing problems of complexity with Minimax=== 161 162 ==== Cutting off search: ==== 163 instead of `if TERMINAL(state) then return UTILITY(state)` do `if CUTOFF-TEST(state, depth) then return EVAL(state)` 164 165 this introduces fixed-limit depth. also loses completeness! 166 167 utility: value based on quality of state 168 169 heuristics: value based on estimation of quality of state 170 171 heuristic EVAL: 172 * produces estimate of expected utility of a game from a given position 173 * should order terminal nodes in same way as utility 174 * computation shouldn't take too long 175 176 ==== Alpha-Beta pruning (efficient Minimax) ==== 177 178 with minimax, the number of states is exponential to number of moves 179 180 so, don't examine every node and prune the subtrees you don't have to examine 181 182 Alpha: value of best MAX choice so far 183 184 Beta: value of best MIN choice so far 185 186 you prune the rest of the level if, at any point, beta <= alpha. 187 188 pruning doesn't affect final results, entire subtrees can be pruned 189 190 good move ordering improves effectiveness of pruning 191 192 with 'perfect ordering', time complexity is: $O(b^{m/2})$ 193 194 === Search with no or partial information === 195 196 problems: 197 * contingency: percepts provide new info 198 * exploration: when states/actions of environment are unknown 199 * sensorless/conformant: agent may have no idea where it is 200 201 ==== Perfect information Monte Carlo sampling (rdeep) ==== 202 1. rollout: 203 * assume a belief state (with perfect info) 204 * play random game in that state. 205 2. average the rollouts 206 3. choose one with max average 207 208 === Games with chance === 209 {{file:img/games-chance.png|Games with chance}} 210 211 == Summary (Schnapsen) == 212 Phase 2: minimax & alpha-beta pruning 213 214 Phase 1: PIMC sampling 215 216 what next? give the agent information about the game 217 218 == Search direction == 219 Data-driven: start with initial state (e.g. a maze) 220 221 Goal-driven: start with goal state, but has bigger branching factor (TODO confirm this) 222