• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/48

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

48 Cards in this Set

  • Front
  • Back
What are the 4 basic issues in speech perception?
1) Linearity problem
2) Segmentation problem
3) Invariance (acoustic-perceptual) problem
4) Unit of speech problem
Describe the Linearity Problem.
- if speech perception/production were a truly linear phenomena then each perceived phoneme in an utterance would be associated with a discrete and non-overlapping stretch of sound
- COARTICULATION produces overlapping features and a kind of 'smearing' of neighbouring phonemes
- recall that lip rounding can begin 5-6 phonemes ahead of the rounded vowel /u/
- Complex (and nonlinear) mapping between the perceived phonemes and the related acoustic cues
Describe the Segmentation Problem. (1 of 2)
- although listeners perceive speech as a series of discrete phonemes and words, the 'physical temporal boundaries' between phonemes are often difficult to define on the acoustic signal
- we can develop reliable segmentation rules for marking points on the acoustic signal but these
acoustic segments may not correspond to the onset and offset of the perceived phoneme
- segmentation problems are particularly apparent in some of the more gradually varying phonemes (e.g. try to define the phoneme boundaries in 'I owe you a yo-yo')
Describe the Segmentation Problem. (2 of 2)
- coarticulation and the nonlinear nature of speech also creates problems for segmentation (ie. if lip
rounding or nasal resonance starts during a preceding phoneme should it be used to mark the
onset of a phoneme?)
- the segmentation problem creates significant difficulties for scientists who are trying to develop speech recognition devices
Describe the Acoustic-Perceptual Invariance Problem.
- the invariant units of perception (ie. phonemes) do not correspond to invariant acoustic signals
- the acoustic features for a given phoneme can show a great deal of variation as a function of phonetic context (and other factors such as speech rate)
ie. - F2 for /d/ and /g/ show a lot of variation across vowel contexts (VOT for voiceless stops varies with rate of speech)
Describe the Unit of Speech Problem. (1 of 2)
- what is the minimal unit of perceptual analysis in our perception of speech?
- how is the complex and information rich acoustic signal reduced to a fairly small set of phonemes?
- it has been estimated that the conversion from speech sounds to phonemes reduces the information transfer rate of speech from approximately 40,000 bits per second down to about 40 bits per second
- several possible units of analysis have been suggested: (phonetic features, phonemes, syllables, morphemes, etc.)
Describe the Unit of Speech Problem. (2 of 2)
- the context sensitivity problem has caused a number of speech scientists to reject the phoneme as the basic unit
- instead there have been more context sensitive
units proposed (syllable, context sensitive
spectra, context sensitive allophones
(wickelphones))
What are the different ways to prepare stimuli for speech perception analysis?
A) Pattern playback device: painted formant patterns on acetate film that are converted into acoustic signals.
B) Waveform editing: splicing to remove/add segments.
C) Formant synthesizer: computers generate formants and resonances that are similar to human speech.
-e.g. Klatt synthesizer
D) LPC resynthesis: computer uses natural speech, performs an LPC analysis, then provides a list of variables that can be selectively manipulated (e.g. F0, formant frequencies, segment durations, etc.)
-This has a lot of POTENTIAL FOR DISORDERED SPEECH.
What are some ways to present stimuli?
-discrimination procedures
-rating procedures (equal appearing interval scales, visual analog scales, etc.)
-identification procedures (intelligibility tests)
What are some additional issues in speech perception?
1) Specialization of speech perception
2) Categorical perception
Describe the Specialization of Speech Perception.
-Is speech perception special?
-Is there a specialized speech mode of perception? Or is speech perception simply a reflection of some 'general principles of auditory perception?
What is the McGurk Effect?
The McGurk effect is a perceptual phenomenon that demonstrates an interaction between hearing and vision in speech perception. The illusion occurs when the auditory component of one sound is paired with the visual component of another sound, leading to the perception of a third sound.
-it is the interaction between VISUAL and AUDITORY aspects of speech.
Describe adult categorical perception. (x)
- when a specific acoustic parameter (ie. VOT, F2 frequency, rise-time, etc.) is changed gradually listeners identification responses show abrupt categorical changes (instead of gradual or continuous changes).
ie. -gradual changes in F2 transitions lead to
abrupt shifts in /b/,/d/ and /g/ identifications
-gradual changes in VOT lead to abrupt shift
from /d/ to /t/.

- early evidence related to the perception of certain non-speech stimuli (ie. gradual changes in pure tones) suggested that nonspeech sounds were perceived in a continuous (not categorical) manner
(seen as evidence of a special speech mode of perception)
-more recent evidence from nonspeech stimuli that is more analogous to speech stimuli has demonstrated that some nonspeech stimuli are perceived categorically (e.g. VOT analog - noise burst + periodic buzz)

-vowels are perceived less categorically than most consonants.

- categorical perception may be related to the length, complexity and/or rate of change of stimuli (and not to whether the stimuli is speech versus nonspeech)
-ie. the auditory system may deal with certain
brief, complex, and/or rapidly changing stimuli
in a categorical manner
Describe infant categorical perception.
- the infant's 'high amplitude sucking' (HAS) response has been used to examine categorical
perception in infants at 2-4 months of age
-the HAS response - infants increase rate of sucking a pacifier in response to a novel stimulus
- present a gradually changing speech signal continuum and measure the points at which the HAS response occurs
- the abrupt shift in HAS response is believed to reflect the infants detection of a categorical change in the stimuli
ie. VOT -20, 0, 20, 40
same same same different
low low low HAS
Describe non-human categorical perception.
- chinchillas were studied in a speech stimuli + shock avoidance procedure
- these and other animals have shown enhanced discrimination of speech stimuli (categorical-like perception) at the same category boundaries that have been reported for adult human listeners

- the implication is that we may not need to hypothesize an innate and uniquely human mechanism of phonetic perception
- the development of phonetic contrasts in human speech perception may have emerged from
certain fairly general auditory functions. human speech may have exploited these general auditory functions
What are some examples of intelligibility measures? (1 of 2)
1) Rating scales
-equal appearing interval scales (e.g. 1=normal, 7=intelligibility deficit)
-visual analog scales (10cm line where 0=0% intelligibility and 10=100% intelligibility)
-magnitude estimation procedure (50% intelligibility sample used as reference point)
2) Transcription and multiple choice tests
a) Intelligibility Severity Score Tests
-CAIDS (Computerized Assessment of Intelligibility in Dysarthric Speech)
-50 words, multiple choice or transcription,
gives % intelligibility.
-Sentence Intelligibility Test (SIT) 22 sentences, transcribed, gives % intelligibility.
What are some examples of intelligibility measures? (2 of 2)
b) Phonetic Intelligibility Test
-multiple choice test, 70 single words, attempts to give phonetic explanation for patient's intelligibility deficit
-examines 19 phonetic errors that commonly occur in dysarthria (voicing, nasality, etc.)
-provides phonetic error profile that shows proportion % for each phonetic error.
What is important for VOWEL perception?
1) formant frequencies (F1, F2 and F3)
- usually only F1 and F2 required for adequate perception

2) formant transitions
- better vowel identification for vowels in CVC context than for isolated vowels
What is important for SEMIVOWEL perception?
a) GLIDES (w,j)
- the rate of change of the formant transitions serves to cue glide/stop manner ie. /b/= 0-50msec /w/ = 75-150msec
- /w/ and /j/ can be synthesized with only F1 & F2
- recall that the F2 transitions distinguish /w/
from /j/ (ie. /w/ = 800 Hz loci; /j/ = 2200 Hz loci)

b) LIQUIDS (r,l)
- /r/ & /l/ usually require F1, F2 and F3 for synthesis
- F3 values will distinguish /l/ (2700Hz) from /r/
(1600Hz)
What is important for NASAL perception?
a) Manner cues (ie. vowel vs nasal)
- weakening of the upper formants' amplitudes (due to damping and antiformants)
- low frequency 'nasal' resonance (300 Hz)

b) Place cues (ie. m, m, ng)
- formant transitions and loci same as for stops
/m/= 800 Hz /n/= 1800 Hz /ng/= 3000 Hz
What is important for STOP perception? (1 of 2)
a) Manner cues
i) rate of change in formant transitions (glide versus stop)
ii) silent period (stop gap) signals a stop/affricate from fricative
iii) duration of turbulent noise (less than 40msec for stops and 40-90msec for affricates and +90 msec for fricatives)
iv) rise time- stops 10 msec(5-20msec), affricates 30 msec(30-50msec), fricatives +70 msec

b) Place cues (p, t, k)
i) burst frequency
/p/= 500-1500Hz, /t/= +4000Hz, /k/=1500-4000Hz
ii) F2 transitions and loci
/p/= 800 Hz /t/= 1800 Hz /k/= 3000 Hz
What is important for STOP perception? (2 of 2)
c) Voicing cues (ptk vs bdg)
i) voicing during closure
ii) presence of aspiration noise
iii) voice onset time (burst to onset of voicing)
iv) closure duration( ie. "rabid" with +70msec silence before /b/ changes to "rapid")
v) duration of vowel preceding stop (voiceless shorter than voiced)

'Trading relations' exist between these 5 voicing cues such that changes in one cue can change the
categorical boundaries of another cue (ie. increasing the length of the preceding vowel duration can shift the categorical boundary observed for VOT)
What is important for FRICATIVE perception? (1 of 2)
A) Manner cues (fricative vs affricate/stop)
- extended period of frication noise (+90 msec)
- slow rise time (+70 msec)

B) Place cues
i) noise frequency (spectrum)
- s and sh have sharp peaked specta, f and th have flat spectra; s (+4000Hz) has higher frequency than sh (2500Hz)
ii) F2 formant transitions
- very important cues for /f/, /v/, and interdentals (th)
- f/v have low starting frequency 900 Hz
- interdentals have higher starting frequency 1700-2400Hz

iii) relative intensity:
- much lower for f/th than for s/sh
What is important for FRICATIVE perception? (2 of 2)
C) Voicing cues
i) voicing during frication noise
ii) length of vowel preceding fricative (ie. /u/ ) longer in /juz/ than /jus/
iii) voiced fricatives have higher intensity than voiceless
What is important for AFFRICATE perception? (1 of 2)
a) Manner cues (stop versus affricate versus fricative)
i) silent period (preceding frication noise)
- fricative 0 - 20 msec grey ship
- affricate 20 - 60 msec grey chip
- an inserted stop may be perceived at +100
msec (ie. grey chip becomes great ship)

ii) duration of frication noise
- less than 40msec for stops
- 40-90msec for affricates
- +90 msec for fricatives

iii) rise time
- stops 10 msec (5-20msec)
- affricates 30 msec (30-50msec)
- fricatives +70 msec
What is important for AFFRICATE perception? (2 of 2)
-A trading relation has been observed for these 3 manner cues such that a change in one cue can
cause a shift in the categorical boundary for another cue (ie.'dish'-insert silent period before sh-becomes 'ditch' but extend the noise duration of 'sh' and it switches back to 'dish'

b) Voicing cues
i) voicing during frication noise
ii) longer fricative interval for voiceless
iii) longer silent interval for voiceless
iv) longer vowels preceding voiced affricates
What is Motor Theory?
- speech is perceived by processes that are also involved in its production
- speech perception involves a matching between incoming acoustic-phonetic information and the
stored representations for articulatory gestures
- the proposed basic unit of perception is the intended articulatory gesture
- Despite the acoustic variability of speech (context sensitivity problem; ie. F2 transitions vary across vowels) the underlying motor commands and articulatory gestures are relatively invariant.
What are some criticisms of Motor Theory?
1. Little empirical support for the role of articulatory gestures in perception of speech.
2. May be more efficient to propose that we go directly from acoustics to phoneme identification rather than add an intermediate step of gesture matching.
3.Articulatory data has shown that speech gestures are highly variable and also demonstrate context-sensitivity.
4. Children with severe speech production and speech planning disorders from birth will often demonstrate normal speech perception (same thing for acquired adult speech disorders such as apraxia of speech).
What are Information Processing Theories? (1 of 2)
- these theories generally propose several hierarchically organized levels of processing
- frequently proposed processing levels include:
1. Preliminary (low level) auditory analysis
(may operate like a filtering device)
2. Feature analysis/detectors: usually a type of
phonetic or acoustic feature detectors
3. Auditory memories and/or buffers
- there are usually important interactions
between the feature detectors and the memory
processes, and these are responsible for the
recognition of phonemes or words
What are Information Processing Theories (2 of 2)
- the output from these 3 processing levels is sent forward and becomes the input to higher level language/cognitive processes.
What is Lexical Access from Spectra (LAS)?
- proposes that speech perception involves direct non-interactive access to lexical entries (stored words) via context sensitive spectral sections (spectral templates for words).
- basic unit = diphone = a 2 sound sequence (ie CV or VC or CC)
- associated with each diphone is a prototypical spectral representation
- diphone sequences for each word are stored as templates for comparison and matching
- word recognition is accomplished when a best match is found between an input spectral sequence and a stored lexical (word) diphone sequence
-BOTTOM-UP theory
What is Fuzzy Set Theory? (1 of 2)
Proposes the following operations in phoneme identification:
1) Feature evaluation: an operation that determines which acoustic-phonetic feature has occurred.
- in the fuzzy set model features are assigned continuous values or weightings (range from 0 to 1)
-the weightings indicate the degree of certainty that a particular feature is present in the signal (a kind of probability score)
2) Prototype matching: the probable acoustic features are compared with stored acoustic prototypes for each phoneme
-another probability score is assigned to possible matches and the most likely match is selected.
What is Fuzzy Set Theory? (2 of 2)
- Does not require a perfect match between an acoustic input and an identified phoneme
-This model tolerates lot of variation in the acoustic signal (addresses the context-sensitivity problem)
What is Logogen Theory?
-A word recognition theory
logogen = a passive sensing device that contains all of the information about a given word (ie. its meaning, syntactic functions, phonetic and orthographic structure, frequency of occurrence, etc.)

- logogens are activated by the presence of a specific word in the acoustic signal
- when a word is presented to the listener, the logogen is activated and then all of the stored knowledge about the word is made available.
What is Cohort Theory? (1 of 2)
- 2 stage process of word recognition; autonomous and interactive process
1) autonomous process
- a completely bottom-up process
- acoustic-phonetic information from the beginning of a word activates (a cohort) that includes all of the words in memory that have the same word initial information (e.g. input 'slave' activates a cohort of all words that begin with 's')
- as more acoustic information becomes available a process of elimination is applied until the cohort is reduced to a single identified word
What is Cohort Theory? (2 of 2)
2) interactive stage
- once the cohort is activated and the elimination process is initiated many other sources of information (ie. from high level linguistic/cognitive sources--top-down processes) become incorporated into the elimination process

Strength of model: Priority is given to the beginnings of words (which we know are important)
What is Interaction Activation Theory?
- Proposes multiple levels of representation and a both feedforward and feedback connections between the processing units
- connections can involve both activation and inhibition of units
- information is provided to the network via the speech signal (bottom-up) and the higher levels of linguistic/cognitive knowledge (top-down)

- excitatory and inhibitory links

Criticism: because everything is so highly interconnected, hard to test.
How is severity of hearing loss related to severity of speech impairment? (1 of 2)
- significant negative relationship between severity of hearing loss and speech intelligibility scores(- .62 and - .65)
- a great deal of individual variation in intelligibility can be seen at each level of hearing loss
Smith (1975) study: 40 severe-profound children had average speech intelligibility of 20% (range 0 - 76%)
Boothroyd (1985) study:greater individual variation in intelligibility for those with profound HL
How is severity of hearing loss related to severity of speech impairment? (2 of 2)
- many additional factors that influence intelligibility (ie. hearing level, age of identification/aided, speech recognition, etc.)
- inverse relationship between frequency of segmental errors vs intelligibility (- .80 in Smith (1975) and - .77 in Boothroyd (1985))
What respiratory impairments are those for the hearing impaired?
- usually only seen in the severe-profound
- relationship between intelligibility and abnormal resp. patterns
i) start speech on low lung volumes
ii) use a restricted lung volume range
- shorter breath groups
- inappropriate pauses
iii) speak into abnormally low lung volume levels
What laryngeal impairments are those for the hearing impaired? (1 of 2)
- both abnormally high and low laryngeal airflow rates observed suggesting both hyperadductory and hypo adductory patterns across deaf speakers
- hyper pattern may be associated with more
intelligible speech than hypo pattern
- hyper pattern may be a strategy to increase
laryngeal tactile sensation

i) Voice quality: hoarse and strained quality most
frequent
- abnormal acoustic measure for vocal jitter
- hoarseness ratings and jitter values
(0.7 correlation)
What laryngeal impairments are those for the hearing impaired? (2 of 2)
ii) Fundamental frequency: abnormally high - may be an important concern in adolescent & adult males

iii) Speech intensity: judging contextually appropriate levels is often a problem
How is resonance affected for the hearing impaired?
i) Cul-de-sac resonance (pharyngeal resonance)
- highly salient and common
- not completely understood but appears to be related to abnormal postures of the posterior tongue and pharynx
- more anterior tongue root and dorsum creating a larger pharyngeal space
- more neutral tongue position across vowels (less differentiated)

ii) Nasality
- abnormal amounts of vowel nasalization
- denasalized nasal Cs and nasalized non-nasal Cs
- relationship between nasality and reduced intelligibility (- .74).
What are some VOWEL errors for hearing impaired?
-Vowels usually have fewer errors than consonants

i) High-front vs low-back
- high vowels worse than low vowels
- front vowels worse than back vowels

ii) Neutralization (substitutions towards more central vowels)
- it is as if most vowels are drifting towards an undifferentiated central or neutral vowel.
- physiologically a similar tongue shape is used across all vowels.

iii) Other vowel errors
- vowel prolongations (slower speech)
- diphthongization
- diphthongs can be reduced and are produced with a great deal of variability
- nasalization
What are some CONSONANT errors for hearing impaired? (1 of 2)
i) Omissions
- the most frequent error type
- final C omissions most frequent
- omissions occur in middle and back places of
articulation, infrequent errors for front place


ii) Distortions/Siibstitutions of fricatives and affricates
- affricates often have one part omitted and become a stop or fricative
- fricatives are often overconstricted to become stops esp. th and sh are produced as /t/ or /d/
- these manner errors predominate over place errors
- when place errors occur they tend to be close to the intended target
What are some CONSONANT errors for hearing impaired? (2 of 2)
iii) Voicing errors
- a very common error in deaf speech
- both types of voicing errors occur (voiced for voiceless & voiceless for voiced)
- observed for most plosives and fricatives
- acoustically the voice onset time (VOT) values show greater overlap across the voiced and voiceless consonants

iv) Glottalization
- substitution of a glottal stop for a stop, fricative or affricate
- substitution of a glottal fricative /h/ for a stop, fricative or affricate (ie. /k/ and /g/ often substituted by /h/)
What are some suprasegmental errors in hearing impaired speech? (1 of 2)
a) Slow rate: severe-profound may speak at more than half the normal rate of speech.
- sound prolongations (esp. vowels can be 5x normal length)
- Synthetic correction (by computer) of vowel length and pauses did little to improve intelligibility of deaf speech, which suggests that rate therapy may have little benefit on deaf intelligibility.

b) Inappropriate FO and Intensity variation
- some speakers can be monotone or monoloud
- some speakers may show excessive and inappropriate pitch/loudness variation
What are some suprasegmental errors in hearing impaired speech? (2 of 2)
c) Equalization of stress patterns
- duration of stressed vs unstressed syllables are equivalent.
- Synthetic corrections of stress/unstressed syllables did produce an improvement in speech
intelligibility.