• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/30

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

30 Cards in this Set

  • Front
  • Back
Why can't we apply a simple rule like "homogeneous areas belong to the same object" in order to find an object's contours?
Because humans sometimes perceive object contours even in areas of an image where there is no physical difference between the object and its background.
Draw a figure that includes an illusory contour.
An illusory contour is one that is perceived even though they are not present in the physical stimulus. The Kanisza triangle (left) is one famous example
What is the guiding philosophy behind Gestalt psychology? How does it contrast with the earlier approach known as structuralism?
The structuralists believed that perception of a complex scene was simply the sum of the basic "atoms" of perception (color, orientation, etc.) in the scene. Gestalt psychologists reacted to this position, arguing that a perceptual whole was much more than the sum of its elemental parts.
What do the Gestalt grouping principles seek to describe?
The grouping principles provide rules for how different individual elements in an image will be combined by the visual system into wholes (i.e., objects).
Why is it important to include the phrase "all else being equal" when stating the Gestalt grouping principles?
Because we can be absolutely sure that a principle will adequately predict how elements will be grouped only if no other principles can also be applied. For example, at right we see a display in which the proximity grouping principle would suggest that we organize the elements into 4 columns, while the similarity principle suggests we should perceive 5 rows. Only one principle can "win" (in this case, most people probably see rows rather than columns).
How are the Gestalt grouping principles related to texture segmentation?
A texture is really just a collection of a lot of perceptual elements that are similar to each other and arranged fairly close together. Therefore, stating that areas of an image with different textures are segmented from each other (the definition of texture segmentation) is really the same thing as saying that areas of an image in which elements are similar to each other and/or close together group together.
How is camouflage related to grouping principles?
To camouflage yourself, you basically have to make your features (that is, the visual elements that are visible to anyone who might observe you) group with the features present in your environment.
What is the basic idea behind the "perception by committee" metaphor?
The visual world is a complicated place, and no one rule for interpreting the world can possibly do an adequate job. But once we introduce multiple rules, conflicts between interpretations will inevitably arise. Various parts of our visual system act like perceptual committees, considering which rules conflict and which agree in a given situation and eventually negotiating a single interpretation for the scene.
What are ambiguous figures, and how do they relate to the perception by committee metaphor?
Ambiguous figures, such as the Necker cube seen at left, have more than one valid interpretation. Our perceptual committees settle on one and only one of these interpretations at a time, but the interpretation may "flip" from time to time.
What are some of the assumptions that perceptual committees make?
First, the committees must "know" something about physics; for example, understanding that opaque objects block light is a prerequisite for perceiving the sides of the triangle in the Kanisza triangle. Second, the committees assume that we are not viewing a scene from an accidental viewpoint, which would mask the true structure of the objects in the scene.
What is figure-ground assignment?
The process of determining which areas of an image constitute a to-be-recognized object (the "figure") and which areas form the background (the "ground").
Describe and note the implications of the experimental finding that shows that meaning plays a role in figure-ground assignment.
When subjects are shown a display like the one at left and asked which side of the display is the figure, they have a strong tendency to respond with the side that resembles a meaningful shape (the black side in this case, which resembles a lamp). This indictes that high-level object recognition processes are already at work even at the presumably early processing stage of figure-ground assignment.
What do non-accidental features tell us about a scene?
Certain arrangements of edges can be interpreted as providing important information about segmenting objects in a scene, provided we are seeing the edges from a non-accidental viewpoint. For example, a "T-junction" (a place where one edge abutts another straight edge in a T-like fashion; the arrow in the figure at left points to a T-junction) strongly indicates that the two edges are parts of different objects.
What rules do our perceptual committees use to divide objects into parts?
One widely accepted proposal is that we use valleys, rather than bumps, in an object as clues to where to divide the object into parts, cutting the object by connecting pairs of valleys (see figure at left).
What evidence is there that the visual system starts with large objects and then divides them into smaller parts, rather than processing scenes the other way around?
Evidence for this proposition comes from the global superiority effect: in displays like those at the left, it was found that identifying the small (local) letters took longer than identifying the larger (global) letter, indicating that the global information comes "on-line" faster than the local information.
What is the fundamental goal of object recognition?
To match a representation of a perceived visual stimulus to a representation of a previously-encountered object encoded in memory.
What is a naive template theory, and why can such theories be rejected as a complete theory of object recognition?
The formal definition of a template is complicated, but template theories essentially follow a "lock and key" principle: the perceived image is the key, and the template is the lock. The naive template approach says that we store templates for all the images of all the objects we've seen previously. When we perceive an object that we want to recognize, we try to match this perception to all the templates stored in memory until we find a lock in which the key fits exactly. This doesn't strike most people as being a very efficient process. One of the most important problems is that it seems unlikely that we have enough brain capacity to store templates to match every single object that we're likely to encounter in our lives.
What is the basic idea behind a structural description, and how do structural description theories improve on template theories?
Straightforwardly enough, a structural description describes the structure of an object. Different theories propose different sets of building blocks with which to create the descriptions, and most structural description theories also propose some way to describe how parts are related to each other. The advantage over templates is that a single structural description can potentially match a large number of slightly different shapes. For example, if an X is described as two oblique lines that cross near their centers, this description will match all the figures at left; each would require a different template in a naive template theory.
What is a geon?
Geons are the building blocks of structural descriptions in Biederman's Recognition-By-Components theory. The defining quality of geons is that they are discriminable from each other based on non-accidental features, so they should be equally recognizable from any viewpoint.
Describe the essence of the viewpoint invariance vs. viewpoint dependence debate in the object recognition literature.
Many structural description theories, such as Recognition-By-Components, predict that in most circumstances, object recognition should be equally efficient (i.e., equally fast) regardless of what viewpoint you see the object from. Such a pattern of performance, in which recognition time does not vary across changes in viewpoints, is known as viewpoint invariance. However, many empirical studies have revealed that object recognition times are in fact dependent on viewpoint: if subjects study a novel object from a single viewpoint, they are usually slower to recognize the object later when shown from a new viewpoint than when shown from the trained viewpoint. These findings have cast doubt on structural description theories and led to a resurgence of interest in theories that use template-like representations.
What do we mean when we say that objects can be recognized at different levels?
Object recognition is essentially a categorization process: identifying an object means deciding what category the object belongs in. Most objects actually have a number of categories thay could be placed in. The level of recognition refers to the specificity of the category you use when identifying an object.
What are basic, subordinate, and superordinate categories?
These terms are best described in relation to each other. A subordinate level category is one that is quite specific, referring to a relatively small number of objects. A superordinate level category, on the other hand, is much more general; superordinate categories are often defined by functional or conceptual, rather than shape based, qualities. Basic level categories are in-between. Some examples of subordinate-basic-superordinate triplets are: schnauzer-dog-animal; office chair-chair-furniture; and iMac-computer-machine.
What is the difference between an entry level and a basic level category?
The entry level term for an object is operationally defined as the first word that comes to mind when someone is asked to name the object. The formal definition of the basic level is more complicated and somewhat more vague. Usually, an object's entry level term is the same as its basic level term, but exceptions occur for strangely-shaped objects such as penguins and bean bag chairs.
Why is face recognition thought be accomplished via different mechanisms than the recognition of other objects?
Most objects require considerably more time to recognize at the subordinate than at the basic level. However, recognition of individual faces, which is a subordinate-level task, is a very fast process—so fast that many researchers believe the visual system must use "special" mechanisms to recognize faces.
What is the inversion effect, and how does it relate to the special mechanisms thought to be operating when we recognize faces?
Faces are more difficult than other objects to recognize when inverted (turned upside-down). Researchers have proposed that when faces are inverted, the special processes that are usually brought to bear in recognizing faces cannot operate, so we are forced to rely on our "normal" object recognition processes, which are not as efficient.
What is prosopagnosia, and what does it say about special face recognition processes?
Prosopagnosia is a neuropsychological disorder in which people cannot be recognized via their faces, although recognition of other objects may remain relatively intact. It is thought that this disorder is brought about by damage to the portion of the brain where special face recognition processes are carried out.
What cortical brain structures do visual information pass through as it is processed?
Information first reaches the cortex in a region called striate cortex, so-called because it has a distinctive striped pattern under the microscope. Early vision processes are carried out here, then information is passed to extrastriate cortex, where the tasks of middle vision are carried out (for example, this is where illusory contours are formed). From here, information travels via two separate pathways, one that ends in the parietal lobe and one that terminates in inferotemporal (IT; lower temporal lobe) cortex. It is in IT cortex that the end-stage processing of face and object recognition is carried out.
What are the receptive field characteristics of cells in IT cortex?
Many neurons in IT have been shown to respond most actively to particular objects or faces. The term grandmother cell was coined to describe these neurons, the implication being that a single cell might be ultimately responsible for deciding whether an image was of one's grandmother's face.
What methods are used to study the function of brain areas such as IT?
Some labs lesion (surgically remove) parts of the brains of non-human subjects to see what functions are impaired following the surgery. The results of such studies are often compared to deficits shown by human patients who have had homologous regions of their brains damaged by accident. Other labs use single-cell recording techniques to determine the responses of individual neurons to different types of stimuli (it was in these labs that grandmother cells were found). Recently, a great deal of resources have been poured into laboratories employing non-invasive techniques such as functional magnetic resonance imaging (fMRI), which can take snapshots of humans' brains as they perform different tasks.
How can researchers find out about perceptual processes operating in infants, given that these subjects cannot explicitly respond to researchers' questions?
One methodology is to show an infant subject the same stimulus over and over again until he or she becomes bored and stops looking at the stimulus when it is shown. Then the stimulus can be changed in some way and presented again. If the infant acts surprised to see the new stimulus, we can infer that he or she must be able to perceive the difference between the new and old stimuli.