【14】CS188 Midterm Cheatsheet

  1. A search problem consists of: A state space, A successor function (with actions, costs), A start state and a goal test. A solution is a sequence of actions (a plan) which transforms the start state to a goal state.

  2. Uniform-cost orders by path cost, or backward cost g(n) Greedy orders by goal proximity, or forward cost h(n) A* Search orders by the sum: f(n) = g(n) + h(n)

  3. Manhattan distance, Euclidean distance

  4. A heuristic h is admissible if $0 \leqslant h(n) \leqslant h^*(n)$

  5. Graph Search Idea: never expand a state twice Tree search + set of expanded states (“closed set”). Expand the search tree node-by-node, but before expanding a node, check to make sure its state has never expanded before. If not new, skip it, if new add to closed set. Important: store the closed set as a set, not a list

  6. Admissibility: heuristic cost ≤ actual cost to goal, h(A) ≤ actual cost from A to G. Consistency: heuristic “arc” cost ≤ actual cost for each, h(A) – h© ≤ cost(A to C).

  7. Consistency implies admissibility

  8. minimax efficiency: time $O(b^m)$ space $O(bm)$ (b: Branch factor, m: depth)

  9. Alpha-Beta pruning: α: MAX’s best option on path to root, β: MIN’s best option on path to root max节点:if $v \geqslant \beta$ return $v$ , $\alpha = max(\alpha, v)$ ;min节点:if $v \leqslant \alpha$ return $v$ , $\beta = min(\beta, v)$ . With perfect ordering: Time complexity drops to $O(b^{m/2})$

  10. An MDP is defined by: A set of states s $\in$ S, A set of actions a $\in$ A, A transition function P(s’|s,a) or T(s, a, s’), A reward function R(s, a, s’), A start state $s_0$, Maybe a terminal state.

  11. The Bellman Equations: $V^* = max_{a} Q^*(s, a)$ $Q^*(s, a) = \sum_{s'} T(s, a, s') [R(s, a, s') + \gamma V^*(s')]$ $V^*(s) = max_a \sum_{s'} T(s, a, s') [R(s, a, s') + \gamma V^*(s')]$

  12. Value Iteration $V_0(s) = 0$ , $V_{k+1}(s) \gets max_a \sum_{s'} T(s, a, s') [R(s, a, s') + \gamma V_k (s')]$ complexity of each iteration: $O(S^2 A)$

  13. Policy Evaluation relation: $V^\pi(s) = \sum_{s'} T(s, \pi(s), s') [R(s, \pi(s), s') + \gamma V^\pi(s')]$ $V_0^\pi(s) = 0$ , $V_{k+1}^\pi(s) \gets \sum_{s'} T(s, \pi(s), s') [R(s, \pi(s), s') + \gamma V_k^\pi(s')]$ efficiency: $O(S^2)$ policy extraction: $\pi^*(s) = argmax_a \sum_{s'} T(s, a, s') [R(s, a, s') + \gamma V^*(s')]$ $\pi^* (s) = argmax_a Q^*(s, a)$

  14. Policy Iteration Evaluation: $V_{k+1}^{\pi_i}(s) \gets \sum_{s'} T(s, \pi_i (s), s') [R(s, \pi_i (s), s') + \gamma V_k^{\pi_i}(s')]$ Improvement: $\pi_{i+1}(s) = argmax_a \sum_{s'} T(s, a, s') [R(s, a, s') + \gamma V^{\pi_i}(s')]$

  15. Temporal difference learning Sample of V(s): $sample = R(s, \pi(s), s') + \gamma V^\pi(s')$ Update to V(s): $V^\pi(s) \gets (1-\alpha)V^\pi(s) + (\alpha)sample$ Same Update: $V^\pi(s) \gets V^\pi(s) + \alpha(sample - V^\pi(s))$

  16. Q-Learning $Q_{k+1}(s, a) \gets \sum_{s'} T(s, a, s') [R(s, a, s') + \gamma max_{a'}Q_k(s', a')]$ $sample = R(s, a, s') + \gamma max_{a’}Q(s', a')$ $Q(s, a) \gets (1 - \alpha)Q(s, a) + (\alpha) [sample]$

  17. Approximate Q-Learning $Q(s, a) = w_1f_1(s, a) + w_2f_2(s, a) + ... + w_nf_n(s, a)$ $difference = [R(s, a, s') + \gamma max_{a'}Q(s', a')] −Q(s, a)$ $w_i \gets w_i + \alpha \cdot difference \cdot f_i(s)$

  18. CSP Variables, Domains, Constraints Unary / Binary / N-ary

  19. Backtracking Search (DFS with two improvements) Idea 1: One variable at a time Idea 2: Check constraints as you go

  20. Arc consistency An arc X $\to$ Y is consistent iff for every x in the tail there is some y in the head which could be assigned without violating a constraint. Forward Checking: Cross off values that violate a constraint when added to the existing assignment. (Enforcing consistency of arcs pointing to each new assignment) Variable Ordering: Minimum remaining values (MRV): Choose the variable with the fewest legal values left in its domain. Value Ordering: Least Constraining Value: Given a choice of variable, choose the least constraining value.

  21. Tree-Structured CSPs, Cutset Conditioning

  22. Propositional Logic: Knowledge base = set of sentences in a formal language Syntax: What sentences are allowed? Semantics: What are the possible worlds? Which sentences are true in which worlds? (i.e., definition of truth) Entailment: $\alpha \vDash \beta$ iff in every world where $\alpha$ is true, $\beta$ is also true Inference: Method 1: model-checking: For every possible world, if α is true make sure that is β true too. Method 2: theorem-proving: Search for a sequence of proof steps (applications of inference rules) leading from α to β. $A \to B \Leftrightarrow \neg A \vee B$ , $A \leftrightarrow B \Leftrightarrow (A \land B) \lor (\neg A \land \neg B)$ , $A \leftrightarrow B \Leftrightarrow (A \lor \neg B) \land (\neg A \lor B)$ , $A \leftrightarrow B \Leftrightarrow (A \to B) \land (B \to A)$ 合取(Conjunction) 析取(Disjunction)

  23. DPLL algorithm Early termination: stop if all clauses are satisfied, or any clause is falsified. Pure literals: if all occurrences of a symbol in as-yet-unsatisfied clauses have the same sign, then give the symbol that value. Unit clauses: if a clause is left with a single literal, set symbol to satisfy clause.

  24. Successor-state Axiom $X_t \Leftrightarrow [X_{t-1} \land \neg(Some Action_{t-1} Made It False)] \lor [\neg X_{t-1} \land (Some Action_{t-1} Made It True)]$

  25. Baye’s Chain Rule $P(x_1, x_2, x_3) = P(x_1)P(x_2|x_1)P(x_3|x_1, x_2)$ $P(x_1, x_2, ...x_n) = \prod_i P(x_i | x_1...x_{i-1})$

  26. Conditional Independence X is conditionally independent of Y given Z if and only if: $\forall x, y, x: P(x, y|z) = P(x|z)P(y|z)$ or, equivalently, if and only if $\forall x, y, z: P(x|z, y) = P(x|z)$

  27. Bayes net global semantics $P(X_1,..,X_n) = \prod_i P(X_i | Parents(X_i))$


1022 Words

2019-03-21 09:30 +0800