• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/71

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

71 Cards in this Set

  • Front
  • Back

Frequentist view of probability

Probabilities come from relative frequencies

Subjectivist view of probability

Probabilities are degrees of belief

Expectation equation

Independence of A and B


(and & or)

P(A and B) = P(A)P(B)




P(A or B) = P(A) + P(B) - P(A)P(B)

Mutual exclusion of A and B


(and & or)

P(A or B) = P(A) + P(B)




P(A and B) = 0

Conditional probability


P(a|b)

Bayes Rule

advantage: don't need joint P(a,b)


JPD

Joint probability distribution

Joint distributions

P(X1, …, Xn)

From this you can infer P(X1), P(X1|Xs) etc.

P(X1, …, Xn)




From this you can infer P(X1), P(X1|Xs) etc.

What are joint distributions useful for?

Classification


- P(X1 | X2 … Xn)




Co-occurence


- P(Xa, Xb)




Rare event detection


- P(X1, …, Xn)

Conditional independence of A and B

P(A|B, C) = P(A|C)




P(A, B|C) = P(A|C) P(B|C)

Bayes Nets




Given ____, each RV independent of ____

Given parents, each RV independent of non-descendants

CPT

Conditional probability table


(use in Bayes Nets)

JPD decomposes to...


P(x1, ..., xn)

What does the CPT show? (equation)

Naive Bayes

Want to find
P(S|W1 … Wn)

Want to find


P(S|W1 … Wn)

Sampling the conditional (2 approaches)

Rejection sampling


- Sample, throw away when mismatch occurs (B != b)




Importance sampling


- Bias the sampling process to get more “hits”


- Use a reweighing trick to unbias probabilities

PDF

probability density function

How to use a PDF (equation)



Markov assumption

State at t depends only on state at t-1


(Future independent of past given present)




We can write


P(St|St-1, St-2, ..., S0) as


P(St|St!1)

Markov model

A model operating under the Markov assumption

Markov chain

Sequence of events generated by Markov model

Assumptions for state machine (Markov model)

- Markov assumption


- Transition probabilities don’t change with time


- Event space doesn’t change with time


- Time moves in discrete increments

HMM

Hidden Markov Model

Uses for HMM

Monitoring/Filtering


- P(St | E0 … Et)


- Estimate patient disease state


- Where is the robot?




Prediction


- P(St | E0 … Ek), k < t


- Where will the robot be in 3 steps?




Smoothing


- P(St | E0 … Ek), k > t


- Where was the robot at t=3?




Most likely state sequence


- What was the robot's path?

Viterbi Algorithm

P(S0) = P(S0 | O0)




P(S1) = P(S0) P(S1 | S0) P(S1 | 01)


= P(S0 | O0) P(S1 | S0) P(S1 | 01)




...and recurse

PDDL

Planning Domain Description Language

Components of PDDL

Domain - describes a class of tasks


- Predicates


- Operators




Task - an instance of domain


- Objects


- Start and goal states

PDDL


Predicate

Given object(s), returns T/F

PDDL


Operator

Operator has:


- Name


- Parameters


- Preconditions


- Effects




*Note: Markov assumption

PDDL


State

Describe the world at a moment in time




Conjunction of positive literal predicates


Those not mentioned assumed to be false (closed world assumption)

PDDL


Goal

Partial state expression




Conjunction of literal predicates


Those not mentioned are don’t-cares

When executing an action in PDDL...

- Check preconditions


- Delete negative effects


- Add positive effects

Planning approaches

Forward search


- Need a heuristic or it's hopeless




FFPlan


- Relaxation - make the problem easier




Regression planning


- Start at goal, regress backward, reversing operators

MDP

Markov Decision Process

Components of MDP

S set of states


A set of actions


γ discount factor


R reward function


- R(s, a, s′)


T transition function


- T(s′|s, a)

Absorbing state

Execution stops

Markov property for MDPs (extended)

s_t+1 depends only on s_t and a_t


r_t depends only on s_t and a_t

Goal of MDP is to find a ____ that maximizes ____

Goal of MDP is to find a policy that maximizes return (expected sum of rewards/min sum of costs)

Policy (equation)



Return (equation & def)

Max expected sum of rewards

Max expected sum of rewards

Policy vs. Plan

Policy: an action for every state


Plan: a sequence of actions

Optimal policy (equation)



VF

Value function

Value function



Bellman's equation

PPDDL

probabilisticproblem domain definition language




Now operators have probabilistic outcomes

Machine learning

Using experience to improve performance on some task

Types of learning

Supervised learning


Labeled data




Unsupervised learning


No feedback, just data




Reinforcement learning


Sequential data, weak labels

Supervised learning

Given training data:


X = {x1, …, xn} inputs


Y = {y1, …, yn} labels




Produce decision function


f : X -> Y




That minimizes error:


Σ err(f(xi), yi)

Classification vs. Regression

Set of labels Y




Y is discrete = Classification


- Minimize number of errors




Y is real-valued = Regression


- Minimize sum squared error

Hypothesis space

Class of functions F, from which to find f that minimizes error

Why test/train split?

Do not measure error on the data you train on! May not generalize.




Fit f using training set


Measure error on test set

Decision trees

- If all the labels are the same, we have a leaf node


- Pick an attribute and split data on it


- Recurse on each half

Information gain



Perceptron equation

𝑓(𝒙) = 𝑠𝑖𝑔𝑛(𝒘 ∙ 𝒙 + 𝑐)




𝑓(𝒙) : function of the perceptron


𝑠𝑖𝑔𝑛(𝑧) : returns true if 𝑧 ≥ 0, false otherwise


𝒙 : vector of inputs


𝒘 : vector of perceptron’s weights


𝑐 : bias

Perceptrons - how they work

For x = [x(1), … , x(n)]: [inputs]




- Create an n-d line


- Slope for each x(i) [weights]


- Constant offset [bias]

Perceptron Algorithm

Set w and c to 0




for each xi:


---predict zi = sign(w.xi + c)


---if zi != yi:


-------w = w + a(yi - zi)xi




a = learning rate




Converges if linearly separate.

Probabilistic classifier

Logistic regression


??????

K nearest neighbors

Distance metric D(xi, xj)

For a new data point x_new:
- Find k nearest points in X (measured via D)
- Set y_new = the label that the majority of those neighbors have

Distance metric D(xi, xj)




For a new data point x_new:


- Find k nearest points in X (measured via D)


- Set y_new = the label that the majority of those neighbors have

Unsupervised learning

Given inputs:


X = {x1, …, xn}




Try to understand thestructure of the data.

Clustering

Split data into discrete clusters (types)




Given:


Data points X = {x1, …, xn}




Find:


- Number of clusters k


- Assignment function f(x) = {1, …, k}






* can answer "which cluster", but not "does this belong"

k-means

- Pick k


- Place k points (“means”) in the data


- Assign new point to ith cluster if nearest to ith “mean”

k-means algorithm (decide where to put the means)

- Place k “means” at random


- Assign each point in the data to closest “mean”


- Move “mean” to mean of assigned data

GMM

Gaussian mixture model

PCA

Principle Components Analysis




Project data into a new space


- Dimensions are linearly uncorrelated


- Measure of importance for each dimension

Reinforcement learning

Learning how to interact with anenvironment to maximize reward.




Learning counterpart of planning

Goal of reinforcement learning

Find policy π that maximizes expected return (sum of
discounted future rewards)

Find policy π that maximizes expected return (sum ofdiscounted future rewards)

Reinforcement in relation to MDP

S set of states


A set of actions


γ discount factor


R reward function


- R(s, a, s′)


T transition function


- T(s′|s, a)




*** don't know R and/or T ***

optimal policy (π)

Maximizes the
return from every state

Maximizes thereturn from every state

Value function

Value of state s under policy π

- Estimate reward
- Improve policy

Value of state s under policy π




- Estimate reward


- Improve policy