Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key


Play button


Play button




Click to flip

90 Cards in this Set

  • Front
  • Back
a priori
derived by logic, without observed facts.
abductive reasoning
if p then q………q therefore p is not a valid deductive principle. This is called abduction.
activation level
based on the cumulative strength of its input signals. Each input signal is based on the connection weight.
The sum of the weighted inputs.
analogical reasoning
Assumes that if two situations are similar in some respects they will be similar in others (pg 409)
Believe that we define the meaning of an object in terms of a network of associations with other objects. This can be displayed using a semantic network.
associative memory
Works by recalling information in response to an information cue. Type of memory that when searched returns a particular value. There are three different types:
auto-associative just returns X if it exists
heter-associative returns the Y corresponding to the closes Xi to the X you typed in
interpolative takes your X, it transforms your X into the closest Xi. Then it then looks at the Yi corresponding to the Xi and applies a transformation that’s analogous to the transformation from X to Xi and returns this newly transformed Yi.
atomic actions
Actions that cannot be broken down in to smaller actions. For example, in robotics, turn the motor one revolution.
attractor networks
networks that employ feedback connections to repeatedly cycle a signal within the network. The network output is considered to be the network state upon reaching equilibrium.
an attractor is a state in the network which other states, that lie within its basin, evolve to over time.
these type of networks are also known as “memories”
a system that can interact with its environment without the direct intervention of any other agents. Thus, It must have control over its own actions and internal state.
another type of agent-oriented problem solver.
a way of placing blame on intermediate steps by splitting up the error based how much significance they had in creating the answer.
A multilayer feedforward neural net architecture which uses the supervised mode of learning. This is the most widely used type of neural net.
takes some samples from the training instances (and then put them back). When sample, we replicate some of these training sets.
This helps us to produce multiple classifiers which will form into a composite classifier by each one voting (all have same vote strength).
described by Bart Kosko consists of two fully connected layers of processing elements. The vectors on a are taken from the set of Hamming vectors. network can be transformed into an autoassociative network by using the same weight initialization formula on the set of associations. is used to retrieve patterns from memory by initializing the X layer with an input pattern. (pg 496)
Bayesian belief network
focuses only on the things that are relevant. However there is a danger of eliminating something relevant accidentally. Thus you build an inductive bias into your program with these assumptions. Also uses certainty measures.
Offers a computational model for reasoning to the best explanation of a set of data in the context of the expected causal relationships of a problem domain.
we look at all of our training instances at each replication. When looking we assign a weight to each instance that is in the training set. This weight corresponds with that particular set’s (vector’s) importance. As we adjust these weights our different classifiers are produced.
The reason this happens is because the learner will focus on whatever has the most weights. When we change these weights to different instances, the learner looks at different instances and thus creates new classifiers.
This clearly helps us to produce multiple classifiers which will form into a composite classifier by each one voting (the more accurate the component classifiers the stronger their vote).
candidate elimination
ombines two algorithms to reduce the version space (a set of all concept descriptions that go along with the training set). Maintains two sets of candidates: maximally general(G) and maximally specific(S). Then, the algorithm specializes G and generalizes S until they both converge on a single candidate. The combination turns into a bi-directional search.
This has several advantages such as: eliminating the need to save instances and DOES NOT requires all training examples to be present before it starts “learning”.
This is a type of supervised learning.
canonical form
When working with conceptual graphs, we can follow these rules that have a “subtle but important property of preserving meaningfulness”. This kind of stuff is important when using conceptual graphs to implement natural language understanding.
NOTE: basically introduces a rule that says “if two sentences have the same meaning, they will be graphed both syntactically and semantically identical.”
case frame
collecting primitives like agents, time, object and making them into a frame like structure. Then applying situations like “sally fixed her chair with glue” (page 235)
case-based reasoning
A problem-solving system that relies on stored representations of previously solved problems and their solutions. Also allows programs to “learn” from its own experiences.
Does not require extensive analysis of domain knowledge
You can fill empty cases by:
retrieving cases form memory
modify a similar one to the one you’re missing
make that the rule for the empty case
cellular automaton
families of simple finite-state machines that exhibit interesting, emergent, behaviors through their interactions in a population.
Think of an infinite, regular grid of cells, each in one of a finite number of states. Every cell has the same rule for updating, based on the values in this neighborhood. Each time the rules are applied to the whole grid a new generation is produced.
“Game of Life.”
refers to a short-term memory mechanism and techniques to exploit it.
closed world assumption
in traditional logic and programming, if you don’t explicitly say that it is true, then we make the “closed world assumption” that it is false. These follow monotonicity. You always add conclusions/knowledge never take any away. Once its in there, its in there.
common super/sub type
Being in the same set as something else. Kind of self explanatory.
Common sub: if s, t, and u are types and t < s and t < u. Then t is a common sub of both s and u.
conceptual clustering
trying to discover useful categories in unclassified data. You begin with a collection of unclassified objects and a way of measuring the similarity between each of the objects.
The goal is to organize the objects into classes that meet some standard of quality.
This is a machine based learning approach.
conceptual graph
finite, connected graph split into two parts. The nodes of the graph are either concepts or conceptual relations. Each graph represents a single proposition.
NOT use labeled arcs. The conceptual relations take the place of these.
conceptual model
when the knowledge engineer talks to the domain expert they construct a conceptual model in order to illustrate the domain knowledge, this helps to actually determine the construction of the knowledge base.
lies between human expertise and the implemented program
belief that design does not need structured symbolic sentences.
decision tree
A representation that allows us to determine the classification of an object by testing its values for certain properties. (pg 408)
a way of mixing two chromosomes by splicing pieces of each one into the other one. Two parental programs are selected based on fitness.
knowledge base that can be added and undone. Unlike closed world thinking.
delta rule
generalization of the perceptron learning algorithm that is used many neural networks, including back-propagation.
Instead of the hard-limiting function used in perceptrons, it uses a smoother, continuous function (also known as sigmoidal). A common sigmoidal activatin function is the “logistic function”.
portion of a program that is not invoked explicitly, but that lies dormant waiting for some condition(s) to occur.
Invoked as a side effect by some other action in the knowledge base
Dempster-Shafer theory of evidence
Addresses the problem of measuring certainty by making a distinction between the lack of certainty and ignorance.
Great for when it is important to make a decision based on the amount of evidence that has been collected.
a type of representation of intelligence. Used in agent-based computing.
domain knowledge
added by the “system builder”
dynamic programming
a type of reinforcement learning inference algorithm that computes value functions by backing up values from successor states to predecessor states. The methods systematically update one state after another, based on a model of the next state distribution. (pg 447)
evolutionary learning
learning based around methods mapped and inspired from evolution.
expert system
A type of application program that makes decisions or solves problems in a particular field by using knowledge and analytical rules defined by experts in the field.
explanation-based learning
uses an explicitly represented domain theory to construct an explanation of a training example, usually a proof that the example logically follows from the theory. By generalizing from the explanation of the instance, rather than from the intance itself, it filters noise, selects relevant aspects of experience, and organizes training data into a systematic and coherent structure.(pg 424)
family resemblance theory
by Wittgenstein argues that categories are defined based on the relationship and similarities between the members of the group, NOT by some necessary and sufficient conditions.
fitness function
deciding to keep the top certain percentage of generated chromosomes (or something else). For example, only the top 70%, the most fit, make it to the next round.
a static data structure used to represent stereotype situations
A formal method of representing information about a single idea or concept in terms of properties where the information is stored in slots.
Extends semantic networks by making it easier to organize our knowledge hierarchically and allowing complex objects to be represented as a single frame rather than a large network structure
frame problem
the problem of representing the side effects of actions.
fuzzy logic
logic that consists of not just 0’s and 1’s but also .5’s and ‘3’s. For situations like “matt is tall”. Can’t really say yes or no, but give a more relative answer
genetic algorithm
An algorithm that mimics evolution and natural selection to solve a problem by creating chromosomes based on a few decided strategies (probably conditional statements). Then setting up these chromosomes against each and mutating them until you achieve optimal values for each of the strategies.
goal regression
matches the generalization goal with the root of the proof tree, replacing constants with variables as required for the match. (pg 427)
goal-directed preference
Organizing cases can be difficult. GDP helps us by organizing cases by goal descriptions. Retrieves cases that have the same goal as the current situation.
Hebbian learning
as your brain does the same task over and over again, a “rut” gets created and completes that same task quicker. Based on observations in biology when one neuron contributes to the firing of another neuron, the connection or pathway between the two neurons is strengthened. (pg 484)
implies a one-to-one correspondence between objects and actions in the world and the computational objects and operations of the programming language
Hopfield nets
a special type of attractor network who’s convergence (ending output) can be represented by energy minimization.
Can be used to solve constraint satisfaction problems (like the traveling salesperson problem) by mapping optimization function to energy function (pg 500)
hybrid design
Combining different reasoning models, for example case based and rule based reasoning. It is good because integrated paradigms get a cooperative effect where the strengths of one system compensate for the weakness of another.
using previous knowledge and given facts and making your “best guess” about a conclusion.
inductive bias
choosing to ask particular questions by making assumption and disregarding certain ideas. Refers to any criteria a learner uses to constrain the concept space.
inference engine
The processing portion of an expert system. With information from the knowledge-base, the inference engine provides the reasoning ability that derive inferences (conclusions) on which the expert system acts. A small part of the program that interprets the rules to be used.
knowledge base
the heart of the expert system. Contains all of the knowledge (usually if.. then statements) of the application domain.
Contains both general knowledge and case-specific knowledge
knowledge engineer
Their main task is to select the software and hardware tools for the project and help the domain expert transform their knowledge into the knowledge base.
law of the excluded middle
States an element can not belong to both a set and its complement.
a unique token contained in a conceptual graph. Instead of a node like “dog” it’s a node like “dog:#668584” which is a specific dog with specific attributes.
Markov model
A graph that is directed where the probability of arriving at any state S1 from the set of states S at a discrete time T is a function of the probability distribution of its being in previous states of S.
Each state si of S corresponds to a physically observable situation.
It seems like the word “observable” goes along with “Markov Model.”
programming languages are the medium whereas data structures are the scheme.
minimum distance classification
A discriminant function evaluates class membership (or importance) based on the distance from some central point. Classification based on this discriminant function is called minimum distance classification.
Classes that are linearly separable can have a minimum distance classification.
Monte Carlo method
A type of reinforcement learning inference algorithm. The method does not require a complete model. It instead samples the entire trajectories of states to update the value function based on the episodes' final outcomes. It does not require experience , a sample sequences of states, actions and rewards on-line or simulated with the enviroment. It solves the reinforcement learning problem by averaging sample returns. To ensure well defined returns the methods are defined only for full episodes. (pg 448)
take a random number and apply it to one of the values in the chromosomes.
negation as failure
a way to prove something by proving that its negative cannot be true. This works because in certain languages like PROLOG anything we don’t explicitly state as true we assume to be false.
network topology
the pattern of connections between the individual neurons. This ends up being the most significant factor in determining a net’s inductive bias.
neural network
it uses neurons for problem solving. Its topology is its primary source of inductive bias.
Cannot represent an idea like “Matt is taller than Justin” according to connectionists.
A neural network is designed as an interconnected system of processing elements, each with a limited number of inputs and outputs. Rather than being programmed, these systems learn to recognize patterns.
based on how things usually work. Similar to frames.
opportunistic search
whenever rules fire to conclude new information “control moves to consider” those rules which have that new information as a premise this makes any new concluded information the controlling force for finding the next rules to fire
inputs and outputs of +1 or -1 based on weights and threshold values. Learns based on training examples. Also takes in to consideration a learning rate.
Can only solve problems that are linearly separable (thus it can have a minimum distance classification).
Perceptrons minize the error of only the training set you give them. It only fits the data that you give it.
For better inputs, use a logicistic function
Finding a sequence of actions that allows a problem solver to accomplish some specific task. Must note things in the environment that are changing as well as things that are not. (because we have to keep track of side effects in coincides with frame problem)
Where we organize explicit knowledge to control problem solving in complex domain. Examples: robotics.
One of the propositions in a deductive argument. If X then Y, X is the premise.
production rule
conditional statements including high-level if and then statements. Something about “update working memory”.
the individual that we are currently looking at in a conceptual graph. As Tiffy would say “the ID number associated with the concept node.” Examples: Dog; Emma (Emma is the referent) Dog; #123 (#123 is the referent)
reinforcement learning
places an “agent” in an environment and it receives feedback. In this case, learning requires the agent to act and then to interpret the feedback from those actions.
This is different from supervised learning because it does not have a “teacher” per say, but instead the agent must create a policy for interpreting all feedback.
the best way to capture the significant parts of the intelligent activities we go through. Examples: language, use of motor skills.
an algorithm that helps optimize a search for rules in the knowledge base of an expert system by matching rules to the data by using a pointer to the rule. Greatly enhances the speed and execution of the system without destroying the semantic behavior.
salient-feature preference
Organizing cases can be difficult. This preference matches cases based on the most important features or matching the largest number of important features.
data structures are the scheme whereas programming languages are the medium.
takes a sequence of events that often happens on a daily basis and makes a structured representation.
Often used in natural language understanding systems
Broken up in to scenes
semantic network
A graph consisting of nodes that represent physical
or conceptual objects and arcs that describe the relationship
between the nodes, resulting in something like a data flow
set cover
There is a set cover approach to abduction. It defines an abductive explanation as a covering of predicates describing observations by predicates describing hypotheses.
In other words, the set cover approach to abduction attempts to explain the act of adopting a revocable belief in some explanatory hypothesis on the grounds that it explains an otherwise unexplainable set of facts.
an agent that receives input from the environment that it is active in and can ALSO effect changes within that environment. Examples: internet, game playing.
strong method
a method of problem solving that involves knowledge specifically about the domain of the problem. Its explicitly encoded into the program.
subsumption architecture
Just a bunch of task-handling behaviors. Each behavior is a finite state machine that continually maps some perception based input into an action oriented output. Production rules drive this in a blind sort of way. Basically like a little lego mindstorms toy.
Brooks questioned the need for any centralized representational scheme and said that “I will attempt to show how intelligent beings might evolve from lower and supporting forms of intelligence.”
Intelligent behavior emerges from the interactions of simple architectures
supervised learning
Where the program basically has a teacher that is able to tell it whether an instance is a positive or negative example of a target concept.
Organization and training of a neural network by a combination of repeated presentation of patterns, such as alphanumeric characters, and required knowledge. An example of required knowledge is the ability to recognize the difference between two similar characters such as O and Q. Synonym: learning with a teacher. Contrast with self-organized system; unsupervised learning.
rules for combining words into legal phrases and sentences. Also can be involved with numbers or code.
Does not have meaning like semantics.
threshold function
produces an “on” or “off” state by computing the final output state of neuron based on a threshold value. If the neuron is to far above or below the threshold value, it is “off”, otherwise “on”.
transformational analogy
possibly something to do with interpolative associative memory.
truth maintenance
Representations and search procedures that keep track of conclusions that might later need to be questioned. In defeasible reasoning the TMS preserves the consistency of the knowledge base.
weak method
Examines only the syntactic form of states to try to come up with its solution. Intended to solve a wide variety of problems.
selects the node whose pattern is most like the input vector and adjusts it to make it more like the input vector. It is unsupervised in that “winning” is simply seeing which node has a current weight vector closest to the input vector.
An algorithm that works with the single node in a layer of nodes that responds most strongly to the input pattern. The learning is unsupervised because the winner is determined by a “maximum activation” test. The weight vector of the winner is rewarded by bringing its components closer to those of the input vector. It can be viewed as a competition among a set of network nodes. (pg 474)