• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/101

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

101 Cards in this Set

  • Front
  • Back

Intelligence =?

rationality: their actions are appropriate for their goals and circumstance. they are flexible to environment. they learn from experience. they make appropriate choice given perceptual limitations and limited resources.

what is agent?

something that acts in an environment (eg. person, robot, dog, worm, wind, gravity)

how many types are artificial agents?

2: physical presence = robots; without physical presence = software agents (e.g searching algorithm? any software actually). They just differ in their interaction with the environment.

Representation and Reasoning?

Representation = how should the environment be represented in order to help an agent reason effectively. Reasoning = how should an agent act given the current state of its environment and its goals.

Defined domian? = appropriate R&R system

Domain = Uncertainty x How many actions = (Deterministic vs stochastic domains) x (static vs sequential domains).

static factor is?

finding a solution does not involve reasoning into the future (time is ignored) - one step solution


Sequential is ?

finding a solution requires looking for a number of steps into the future.

Uncertainty types?

1. Sensing Uncertainty: the agent cannot fully observer the current state of the world. 2. Effect uncertainty: the agent does not know for sure the effects of its action.

how can we describe the states of the world?

State can be described in terms of features. States can be described in terms of objects and relationships.

what is a directed acyclic graph (DAG)

a graph with no cycle.

Graph term: neighbor of n1; a path; a cycle

n2 is a ngeighbor of n1 if there is an arc from n1 to n2. That is, if is subset of A.

forward branching factor =? backward branching factor =?

1 = the # of arcs going out of the node. 2 = the # of arcs going into the node.

Define Search space?

set of states that will be searched for a path from initial state to goal, given the available actions. states are nodes and actions are links between them.


setup: states, actions, goal

A search algo is complete =? optmal =?

1 = if whenever there is at least one solution, the algo is guaranteed to find it within a finite amount of time. 2 = optimal if when it finds a solution, it is the best one.

Time complexity of a search algo =?

the worst-case amount of time it will take to run, expressed in terms of: maximum path length m and maximum branching factor b.

what is frontier?

the set of paths which could be explored next!


In the algo, we add all neighbor of n of nk.

what is space complexity?

the worst-case amount of memory that the algo will use (number of nodes) in term of m and b. where m = max path length and b = max branching for b.

think LCFS?

think negative zero loop and refund stuff

what do you mean by uninformed search?

NOT considering any info about the states and the goals to decide which path to expand first on the frontier. (they are blind to the goal); not taking into account the specific nature of the problem.

Think BestFS (heuristic)

it chooses the path on the frontier with the smallest h value. (priority queue ordered by h)

What are the uninformed search?

DFS, BFS, IDS, LCFS

What are informed search?

BestFS, A*, BBA, Dynamic programming, IDA*

What are the condition for A* to be complete and optimal?

1. the branching factor is finite.


2. the arc cost > 0.


3. the heuristic function is admission.

What are the condition for LCFS to be complete? and optimal?

1. complete: arc cost > 0.


2. optimal: arc cost >= 0.

Is BestFS optimal?

BestFS is a priority queue with respect to h(n). BestFS might not be good if h(n) is not admission? cuz bad h(n) will bring you into a loop.

when to use Cycle check?

Tree has no cycle. So, cycle checking should only be used in state graph. It is good when we want to avoid infinite loops, but also want to find more than one solution, if they exist.



Cycle checking does not do anything in BFS, since it's already complete (avoiding the loop scenario).


Cycle checking is good in DFS.

When to use multiple path pruning? when is it good?

We only want one path to the solution. we prune path to a node n that has already been reached via a previous path.



1. good for BFS case? since it does not keep visiting the same node again and again.

What is cycle checking? and its cost relative to each search methods?

we prune the path that ends in a node already on the path. -> this pruning cannot remove an optimal solution.



1. DFS: constant time cuz only one path is being searched at a time



2. Other searching methods: cost is linear in path length (we search multiple paths simultaneously)

Which algo always find the shortest path to nodes on the frontier first?

Only LCFS.

What is Branch-and-Bound search?

combination of DFS and F(p).


It follows exactly the same search path as DFS. But to ensure optimality, it does not stop at the first solution found.



1. it continues after recording upper bound on solution cost (cost of the best solution found so far)



2. Expand only if LB(p) = f(p) < UB

Cons of BBA search? pros?

1. to be optimal: h(n) must be admissible or else we would discard the optimal solution path.



2. UB might not be initilized if we get into a loop (cycle like all those DFS ).



Good thing that it has bm space complexity like DFS.

Idea of Dynamic programmiong?

it doesn't repeat the same path. It saved the cost of that path to the table and used that table to find the costs for other nodes.



-> we build a table of dist(n) which finds the cost from node n to goal node. (dist(n) = 0 when n is the goal node).

If there are no cycles, what can we say about state space and search tree?

they look the same!



if there are cycles, the two look very different!



we don't just eliminate cycles, b/c we sometimes want multiple solution paths.

Multiple-path pruning: what if a subsequent path to n is shorter than the first path to n?

The idea is we will try to avoid that long path via 2 methods:


1. you can remove all paths from the frontier that use the longer path. (as these can't be optimal)



2. you can change the initial segment of the paths on the frontier to use the shorter path.

Goal of dynamic programmiing? and possible cons?

we try to minimize (cost(n,m)+dist(m))



- we need enough space to save the table and the dist function needs to be recomputed for each goal.



- so if we have multiple goals nodes, this might not work as expected (the dist(n) can be misleading).

How can we make the table of dist(n) for the dynamic programming?

we run LCFS with multipath prune backward.

Idea of Iterative Deeping A* (IDA*)

BBA + IDS. we basicly run BBA one level at a time as mentioned in IDS.

is IDA* complete and optimal?

1 h is admissible, all arc costs > 0 , finite branching factor ! - same cond as A*.

Satisfaction constraint setup:

1. Variables (CAPITAL)


2. dom(V)


3. Constraint: unary or k-nary (often binary)



-> Constraint should be displayed in term of :


(k-nary: multiple variables) Variable vs Variable || (unary) Variable vs dom(V)

What is the model/solution of a CSP (Constraint Satisfaction Problem)?

an assignment of values to variables that satisfies all constraints (one possible world).

Define Unary constraint. Define K-nary constraint

1. restriction involving a single variable (e.g A =1)


2. restriction involving the domains of k different variables (e.g. A = B or A > B+ C)

Finding model with CSP: how many ways are there?

1. Generate-and-test: we basically check all combinations of the variables. So this can take really long and bad just to find one model.



2. Search: we can do partial assignment, meaning we assign value to each variable one at a time. the start state is empty. So we basically check slowly.


- Prune with DFS search tree: this is Partial assignment fused with path prune in DFS. We check small conditions on those variables, if it fails early, we stop going down that path. -> the algo efficiency will depend on the order in which variables are expanded.

Should we use h(n) in CSP?

No b/c if there are n variables every solution is at depth n must instantiate (give a value to ) every variable (n).

Best way to do CSP??

1. check all variables' domains with "unary" constraint -> prune those bad values.



2. Make the graph:


one node for each variable


one node for each constraint (in this case k-nary constraints).



3. Do arc consistency to apply domain splitting.


a network is arc consistent if all its arcs are arc consistent.

What is the worst case complexity of Arc consistency algorithm?

O(d^n), where d = max size of a variable domain and n = the number of variables be n.



With binary constraint -> O(d^3n^2)

How many CSPs do we need to keep around at a time?

O(bm): it is DFS.

What is the property of local search?

it finds a solution quickly, but are not guaranteed to find a solution if one exists (thus, cannot prove that there is no solution).



- check complete assignments of values to variables (all possible worlds).


- local search does not backtrack!


- only the current node is kept in memory at each step.


- We use Iterative Best Improvement to choose a node to expand.




- choose neighbor node with minimum # of constraint violations.

Types of Iterative Best Improvement?

S1: We need to define h(n) = evaluation function = # of constraint violations in state n.



S2:


Greedy descent: evaluate h(n) for each neighbor, pick the neighbor n with minimal h(n)



Hill Climbing: Maximize # of satisfied constraints.



Hill Climbing and Greedy descent are pretty much the same. -> it gets trapped in locally optimal points, which can only be avoided by random walk.

What is pro of greedy descent?

good for finding local minima ( neighbour with best evaluation function value), b/c in this case, we are trying to minimize h(n) where h(n) = # of constraint violation. However, we will have multiple local minimas, and the majority of them will be just close to the real answer but wrong.



- to get the right local min: we need to escape the wrong local min with random walk.

Goal of Stochastic local search?

assignment with 0 unsatisfied constraints.

What does SLS use?

it uses:


1. heuristic function h(n) = # of unsatisfied constraints.

What is SLS?

SLS is a just local search but instead of pure greedy descent/hill climbing/ Random walk/random start.



SLS is a mix of the above.



Note:


- random start = reassigning values to all variables.


- random walk: take some random steps (move to a neighbour with some randomness.)


2 main ways of adding randomness to local search for a CSP? SLS

1. one stage selection



2. two stage selection.

what is stopping criteria?

it means "no more improvement in eval function h". we just get stuck.

How many ways are there for selecting neighbors? for one stage and two stage

1. one stage is O(dn)


2. two stage is O(d+n).

How do we evaluate SLS algorithems?

we perform multiple runs (~1000 runs). Know that the time taken until they solve a problem is a random varaible and SLS algo can sometimes fail to terminal at all (stagnation).



- so best thing to do is runtime distributions.

the cost of using cycle checking is?

1. DFS = constant time ( less space required)



2. Others = linear in path length (space b^m)

Does CSP need a heuristic function?

No. b/c all solutions are at the same dept n. (all variables must be assigned the right value).


- > all solutions at the same distance from start.

Can you give me quick summary of how Mac Addr should be used?

Mac addr is used along with IP addr.



Mac addr = your ssn, whereas IP addr = postal code. It's good to have both for each adapth (host, router). Mac addr has flat structure! To send a datagram, we need both Mac addr and IP addr.



ARP (table contained conversion btw Mac addr and IP addr - NAT table?). However, ARP is valid only in the current subnet. The info stored here has expired date.



Mac Addr is used to send datagram with LAN (locally within the subnet).

In STRIPS, an action has two parts:

1. Precondition: a set of assignments to variables that must be satisfied in order for the action to be legal/valid/applicable.



2. Effects: a set of assignments to variables that are caused by the action.

What are the rules of STRIPS?

1. all features not explicitly changed by an action stay uncahanged.



2. STRIPS is an action-centric representation

What is forward planning?

1. reasoning into the future.


- to find a plan, a solution : search in the state-space graph.



What are forward planning components and their roles?

1. states = the possible worlds (full assingments of values to features).


2. arcs from a state s = all the actions that are possible in state s


3. plan = a path from the state representing the initial state to a state that satisfied the goal.




solutions = a sequence of actions that gets us from start to goal.



both states and goals are a set of features.

what are possible action a in a state s?

those where a's preconditions are satisfied in s!

all about CSP?

State: assignments of values to a subset of the variables


• Successor function: assign values to a “free” variable


• Goal test: set of constraints


• Solution: possible world that satisfies the constraints


• Heuristic function: none (all solutions at the same distance from start)

all about forward planning?

State: full assignment of values to features


• Successor function: states reachable by applying valid actions


• Goal test: partial assignment of values to features


• Solution: a sequence of actions


• Heuristic function

optimal efficient algo?

given a specific heuristic, no algo can do better--in terms of time-- than an optimally efficient algo.




A* is optimmally efficient among the algo that extend the search path from initial state, cuz it finds the goal with min # of expansions.

when is a node expanded?

when you cross that line.

How many constraints are there in CSP planning?

1. Precondition


2. Effect


3. action constraints


- mutual exclusion


- specify which actions cannot occur simultaneously.



4. state constraints


- hold between variables at the same time step


- they can capture physical constraints of the system.


- this is handled via the actions' preconditions zone.


How do we run forward planning in CSP?

1. we make CSP


2. defines the horizon k level we want to run


3. run AC on it and check if the effect states have the solution.


4. if yes, we ends else we keep on increasing the horizontal lv check.

What is 'logic' here?

a framework for representation and reasoning.



Representation (representing a domain that we have only partial but certain info) : Objects, Relationships between objects.



Reasoning:


Sound: only generates correct answers with respect to the semantics • Complete: Guaranteed to find an answer if it exists


How does 'logic' work?

we use 'Query answering'



tell the computer how the world works • tell the computer some facts about the world • ask a yes/no question about whether other facts must be true


tell me all syntax about logic:

1. atom = a symbol starting with lowercase letter


2. body = an atom or is of the form b1 ^ b2 wgere b1 and b2 are bodies.



3. definite clause: an atom, OR a rule of the form h <- b where h is an atom('head') and b is a body



4. knowledge base(KB) = a set of definite clauses

tell me about Semantics of 'logic'

1. interpolation I: assigns a truth value to each atom


- 2 to the power of I ( one per atom)


- we rely on truth table for semantics




***KB is true in I if and only if every clause in KB is true it I


- a model of KB is an interpretation in KB is true.

So the idea of applying logic is =?

1. Tell the system knowledge about a task domain. • This is your KB • which expresses true statements about the world



2. Ask the system whether new statements about the domain are true or false. • We want the system responses to be – Sound: only generates correct answers with respect to the semantics – Complete: Guaranteed to find an answer if it exists

def of logical consequence is =?

If KB is a set of clauses and G is a conjunction of atoms, G is a logical consequence of KB, written KB ⊧ G, if G is true in every model of KB.

Diff btw User's view of semantic and that of computer

user: intended interpretation


computer: has no access to the intended interpretation.

What are the two major components of proof procedure and how do they work?

1. soundness:


- every atom derived by P follows logically from KB



2. completeness:


- every atom that logically follows from KB is derived by P

Proability:

Dependence: Baye rule, and chain rule


- Joint distribution


- Given the joint distribution, we can compute distributions over subsets of the variables through marginalization:




Independence: marginal independence, conditionally independence

Give an example of 'conditionally but not marginally independent variables'.

Exam grad is conditionally independent of assignment grade given 'understanding material'.

Exploiting Conditional Independence

We can then rewrite P(C | A,B) as P(C|B)



Example 2: Boolean variables A,B,C,D • D is conditionally independent of A given C • D is conditionally independent of B given C • We can then rewrite P(D | A,B,C) as P(D|B,C) • And can further rewrite P(D|B,C) as P(D|C)

Independence and its compactness

Under independence we gain compactness • The chain rule allows us to write the JPD as a product of conditional distributions • Conditional independence allows us to write them compactly

Bayesian Network !

keys:


- Under independence we gain compactness


- Bayesian (Belief) Networks are such a representation


- Conditional indept is there to break marginal indept.


- The parents Pa(Xi ) of a variable Xi are those Xi directly depends on

Tell me about compactness

- depend on the order of interests


- A CPT for a Boolean variable Xi with k Boolean parents needs 2 k rows for the combinations of parent values: the value we add to the # of probabilities.


- the numbers required grow linearly with n, vs. O(2^n ) for the full joint distribution


- Bnets are most useful in sparse (or locally structured) domains


- Compactness typically enables more efficient reasoning

tell me about 'sparse' = locally structured domain

- Domains in with each component interacts with (is related to) a small fraction of other components


- If the network is highly connected (e.g. k ≈ n), there is not much saving compared to the numbers needed to specify for the full JPD

What does CPT stand for?

a conditional probability table

Key idea of Bnetwork

In a Belief network, the JPD of the variables involved is defined as the product of the local conditional distributions: P (X1 , … ,Xn ) = ∏i P(Xi | X1 , … ,Xi-1 ) = ∏ i P (Xi | Parents(Xi )) •


- Any entry of the JPD can be computed given the CPTs in the network


- this is why Bnet is good: we have less computations to consider.

what is so special about ' causal structure'?

The causal structure typically leads to the most compact network

Since we can use so many diff orders of interests to do Bnet? can we get wrong order? what's the rule here?


- Given an order of variables, a network with arcs in excess to those required by the direct dependencies implied by that order are still ok. This is just like some order give higher # of probabilities.


** the fully connected network is always correct but rarely the best choice



* It will beomce wrong if and only if:


- If it misses directed edges that are required to represent existing dependencies between variables: meaning if your logic is messed up -> giving wrong calculation.

Tell me about Dependencies in a Bayesian Network

- X and Y become dependent as soon as there is evidence on Z or on any of its descendants.

Turning Dependencies into 'conditional' independence? how?

- blocking paths for probability propagation. Three ways in which a path between Y to X (or viceversa) can be blocked, given evidence E



- X and Y are independent if there is no evidence on their common effect

what is 'Variable Elimination (VE)'?

- Variable Elimination (VE) is an algorithm to perform inference in Bayesian networks more efficiently


- It manipulates conditional probabilities in the form of “factors”. • So first we have to introduce factors and the operations we can perform on them.

How to interference? what's its setup?


setup:


- Y: subset of variables that is queried • E: subset of variables that are observed . E = e • Z1 , …,Zk remaining variables in the JPD



Process to solve: condition probability


- We need to marginalize over all the variables Z1 ,…Zk not involved in the query.



- All we need to compute is the numerator: joint probability of the query variable(s) and the evidence! • Variable Elimination is an algorithm that efficiently performs this operation by casting it as operations between factors

Variable Elimination procedure:

1. Factor:


- A factor denotes one or more (possibly partial) distributions over the given tuple of variables,


- Factors do not have to sum to one


- We can make new factors out of an existing factor

If we assign variable A=a in factor f (A,B), what is the correct form for the resulting factor?

f(B). When we assign variable A we remove it from the factor’s domain

If we marginalize variable A out from factor f (A,B), what is the correct form for the resulting factor?

f(B). When we marginalize out variable A we remove it from the factor’s domain

What are all the operations of 'factor'?

1. assignment of value


2. marginalization of variable


3. multiplication of factors: combining factors.



Tell me about 'multiplication' factor

The product of factor f1 (A, B) and f2 (B, C), where B is the variable in common, is the factor (f1 × f2 )(A, B, C) defined by:

If we multiply factors f4 (X,Y) and f6 (Z,Y), what is the correct form for the resulting factor?

f(X,Y,Z) • When multiplying factors, the resulting factor’s domain is the union of the multiplicands’ domains

What is the correct form for B f5 (A,B) × f6 (B,C)?

As usual, product before sum: B ( f5 (A,B) × f6 (B,C) ) • Result of multiplication: f6 (A,B,C). Then marginalize out B: f7 (A,C) 2

what is elimination ordering?

marginalizing over variables not related to the one we want to query.