• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

Card Range To Study



Play button


Play button




Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

49 Cards in this Set

  • Front
  • Back
Three systems that make up the speech apparatus
Respiratory System
Lungs, rib cage, diaphragm, tissues, intercostal muscles.
An air pump providing the aerodynamic energy for the laryngeal and articulatory systems. Elastic process.
Usually passive but can be active. Ex. taking a large breath before a long sentence.
Laryngeal System
Job: change voiceless sounds to voiced.
The larynx is at the top of the trachea and consists of a number of cartilages and muscles. Just lateral to the vocal ligament is the internal cricoarytnoid muscles. Lateral to that is the external cricoarytnoid muscles.
Cartilages, muscles and bones of the laryngeal system
Cartilages: cricoid, thyroid (largest), arytenoids (on top of cricoid).
Muscles: internal and external thyroarytenoids.
Hyoid bone: one of the one free floating bones.
Opening between the vocal folds.
Lateral cricoarytenoid muscles
Contract to adduct the vocal ligaments
Transverse arytenoid muscle
Only unpaired muscle. Adducts the vocal ligaments.
Oblique arytenoid muscles
Crossed behind transverse arytenoid muscle.
Muscles of vocal fold adduction
Lateral cricoarytenoids, transverse arytenoid, oblique arytenoids.
Muscle of vocal fold abduction
Posterior cricoarytenoids
Articulatory system
Tongue, lips, jaw, and velum.
The shape of the system determine the resonance properties.
Voiced sounds with an open vocal tract to produce specific resonances. Always voiced. Steady-state articulatory configuration and acoustic pattern. Vowels have inherent differences in duration.
Produced with a narrow constriction in the vocal tract. Broadband noise. Voiced fricatives have extra low frequency energy.
A brief closure and a burst of noise. Then movement toward another vocal tract configuration. Fastest sound in connected speech, about 10-15 ms. 50 ms in isolation.
Produced with the velopharynx open. Sound passes through nasal and oral tract or just nasal tract. The formants of the nasal cavity depend on the length of the cavity from the uvula to the nostrils.
Nasal formant (murmer) a band at 200-300 Hz.
Average man's voice
Fundamental frequency of 120 Hz and has spectral energy at 120, 240, 360, 480 and so on in harmonic steps.
Average females voice
About 225 Hz
How are we able to produce intelligible speech with a variety of energy sources?
The independence of the source and the filter.
The natural mode of vibration of the vocal tract. Make up the transfer function (input-output relation similar to filtering) of the vocal tract.
Radiation characteristic
The filtering effects when sound escapes the mouth and radiates into space. The amount of energy measured at the lips.
Articulatory-acoustic relationship
Front=high F2
Back=low F2
High=low F1
Low=high F1
Lip rounding effect on formants
Lip rounding occurs for some back and center vowels. Lip rounding extends the vocal tract, lowering all formant frequencies.
Have a friction segment that is intermediate in duration between the burst for stops and the friction interval for fricatives. Combination of a stop and fricative.
Lateral liquid /l/ have formants similar to nasal consonants. The rhotic consonant /r/ has a very low F3 frequency when compared to /l/. Laterals involve a splitting of the vocal tract around a midline constriction.
Combinations of vowels. Movement of the articulators. Like vowels b/c relatively open vocal tract and well defined formant structure. Cannot be steady state acoustic features.
Show movement. /w/ and /j/.
Vertical striations
VF vibration
Does bandwidth contribute to intelligibility?
Limitations of Simple Vowel Target Model
Does not account for speaker variations, temporal or dynamic variations.
Inability to account for target undershoot. F2 in a CVC syllable does not reach the target value determined the the isolated vowel because of coarticulation.
Elaborated Target Model
The Bark transform is designed to model the normalization of acoustic data performed by the auditory system. Must be non-linear output b/c cochlea is a non-linear structure.
Trying to find: is there a key thing that helps us identify vowels?
Dynamic Specification Model
Temporal or dynamic information is used to identify vowels. These cues are the formant transitions into and out of a vowel steady state and the duration of the steady state. Timing is not specific to vowels!
Vowel perception
Constructed patterns (multiple speakers).
Templates (single speaker): pull other speakers into that template.
How we differentiate vowels
Formant pattern (only one we can see on a spectrogram)
Fundamental frequency
Optimal octaves for /i/
1250-2500 Hz, 2500-5000 Hz, 5000-10000 Hz
Optimal octaves for /u/
80-160 Hz, 160-315 Hz
Optimal octaves for /a/
630-1250 Hz, 1250-2500 Hz
Does fundamental frequency vary with vowel height?
Yes. Higher vowel=higher F0.
F0 secondary to F2 for identifying vowels.
Formant bandwidth
Increases with damping. Increases with formant number. Dulling of formant spectrum.
Relationship between formant bandwidth and amplitude
Increased bandwidth leads to reduction in overall amplitude.
Compare diphthongs and vowels
Similarities: voicing and open vocal tract.
Differences: no steady state info.
Consonants involve
1. Noise generation
2. A period of complete obstruction
3. A narrowing of the vocal tract
4. Strictly oral
5. Nasal
6. Voiced vs. voiceless
Acoustic properties of stop consonants
1. Stop gap
2. Release burst
3. Formant transition- out of a stop into a vowel.
4. Voicing
Stop gap
The acoustic interval corresponding to the articulatory occlusion. 50-100 msec. If there is nothing ahead of the stop you can't tell where the stop starts (voiceless).
The three places of occlusion for stop consonants
Stop consonant release burst
The transient that is produced on release of the occlusion and is no more than 40 msec in duration. Fastest acoustic event in speech production.
Stop identification using a simplified burst cue and the following vowel.
1. Bursts with a center frequency lower than the vowel F2 were identified as /p/ (bilabial).
2. Bursts with a center frequency close to F2 were identified as /k/ (velar).
3. Bursts with a center frequency higher than the vowel F2 were identified as /t/ (alveolar).
Stop aspiration
Voiceless stops have aspirated releases except when they follow /s/.
Voice onset time. The interval between the articulatory release of the stop and the onset of vocal fold vibrations. For voiced stops -20 msec to +20 msec VOT. For voiceless stops 25 msec to 100 msec VOT.
Problems with the simple vowel target
1. Assumes that the vowel is invariant across phonetic contexts and defined by a static vocal tract shape or by a point in the F1-F2 plane.
2. Inability to account for target undershoot of F2 in a CVC syllable.
3. Cannot account for temporal or dynamic variations.