Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Chapter 7.1 Speech Recognition

Chapter 7.1 Speech Recognition

by Katja-Malkova, Feb. 2015

Subjects: Speech Communication

Favorite

Add to folder

Flag

Related Essays

Speech Recognition Essay
SPEECH RECOGNITION INTRODUCTION Speech recognition is also said to be automatic speech recognition or computer speech recognition which means understandin...
Recognition Vs Speaker Recognition Essay
In the testing phase, the input speech is matched with stored reference model(s) and a recognition decision is made. One of the important decisions in any ...
Benefits Of Speech Recognition
Several simple products are available now that exploit speech recognition software technology. Speech recognition is a ground breaking technology in software...
Speech Perception Paper
Sounds are by their nature dynamic, changing over time in terms of level and spectral content. In general, consonants contribute primarily to speech intellig...
Speech Perception Test Essay
Figure… : Showing wave form P1, N1, P2, and N2 of control subject from I. H. S system. SPEECH PERCEPTION TEST ANALYSIS Speech perception score were measur...
Phonetics, Phonology And Morphology Essay
In sign language, phonetics is the possible shapes, movements and use of physical space. (SIL International, 2015)Phoneticians traditionally rely on careful ...
The Importance Of Human-Computer Interaction
[ref][ref] There are also applications like google voice search which can recognize speech but only few computers programs really “understand” the exact mean...
Speaker Recognition Essay
1.1. Background of the Study Speaker recognition is the process of automatically recognizing who is speaking [4][24] on the basis of information obtained fro...
The Phonological Model: The Psychodynamic Model
The first being the speech signal that is heard by the child, and the final being the speech signal that is spoken by the child. There are two lexicon mode...
Phonetics And Phonology
These similar sounds are referred to as allophones. Some of the methods used to analyse speech sounds include the following: 1. Acoustic properties: dur...

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/15

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

15 Cards in this Set

Front
Back

	What things have to be considered when transforming speech signal to symbol level?	1. Speech: Language, dialect, speaking style, … 2. Speaker: – Dependent vs. independent vs. adaptive – Known vs. unknown – Cooperative vs. uncooperative 3. Target units: Type, number, complexity 4. Environment: Background noise, transmission channels, etc.
	What makes speech recognition difficult?	1. Variances and invariances in the speech signal have to be differentiated 2. Contextual knowledge helps to understand: „to recognize speech“ <-> „to wreck a nice beach“ 3. Speech is a continuous signal, not a sequence of elementary sounds (even across word boundaries) 4. Articulation depends on the sourrounding sounds (co-articulation)
	Explain the schematic setup of speech recognizer (ASR).	1. Feature extraction from signal 2. Acoustic model (features matched with sound/phonetic patterns) 3. Lexicon: which sound patterns can make sequences 4. Language model: what are the probabilities of different sequences calculated by lexicon, based on probability analysis of some language material 5. Algorithms that choose the most probable sound pattern
	In which state of speech recognizer are HMM and neural networks usually used?	Acoustic model and lexicon. Language model usually utilized HMM.
	Name 3 ideas of Feature Extraction	Feature Extraction - Extract information which allow to differentiate between sounds - First idea: (Fourier) Spectrum - Second idea: Separation of excitation and vocal tract modulation - Third idea: Further improvements by considering hearing characteristics
	Explain how feature extraction of Mel-scaled ceptrum works.	1. FFT for a signal window 2. weighting by triangular filters, coefficients produced 3. log of coefficients 4. IFT to coefficients 5. time behaviour added by calculating derivative and double derivative of these coefficients
	Explain the steps of perceptual linear predictive (PLP) coding.	1. Windowed via a Hamming-window 2. Spectral analysis 3. Transform spectrum to a scale similar to the bark scale 4. Weighting filter for the calculation of mel-scaled cepstral coefficients 5. Convolved with an artificial frequency band masking curve 6. the spectrum is sampled in steps of 1 bark -> smoothing of the spectrum. 7. amplify high frequencies to balance the frequency dependence of the loudness 8. the intensity representation is transferred into an (approximate) loudness representation. 9. Re-transformed loudness representation into the time domain 10. Calculate the LPC coefficients.
	Explain what RASTA tries to solve and how it works.	Tries to be robust against interferences, additive and multiplicative. It is better against multiplicative.
	What is the idea of Markov chain in speech recognition. When is it called hidden?	One state corresponds to one symbol. Transitions to next states (and to itself) determined by transition probabilities. In speech recognition usually only one way, i.e. no going back in the chain. The result is the emitted symbols. One state can emit several symbols and these several symbols have emission probabilities (not one-to-one match). This means output sequence doesn't tell which states were visited, that's why it's called a hidden model. Hidden = two stochastic levels 1)transitions 2)emitted symbols
	What is phoneme HMM? How does it differ from normal HMM?	Doesn't output discrete symbols, but ... Takes co-articulation into account.
	What can be achieved with Viterbi algorithm?	Finding the most likely path in HMM quickly.
	What does perceptron do in speech recognition? What is its basic structure?	It's a two class classifier. It calculated the weighted sum of feature vector and outputs either class 1 or 2. Putting perceptions in series makes it possible to have more than two classes. Good for one phoneme classification.
	What is n-gram language model?	The probabilities of n words succeeding each other are analysed from texts. Those probabilities are used in making decisions in speech recognition. If certains words are not succeeded by each other in grammar, single word probabilities are used, that is called n-gram back-off language model.
	How can you measure speech recognizer error?	1. Two ways: Word Error Rate (WER) or Word Accuracy (WA). Also sentence error rates etc. possible. 2. Labelled data compared with recognizer output data. Datas need to be aligned in time domain, e.g. with dynamic time warping. 3. Correctly recognized, swapped, deleted and inserted words counted. WA = 1-WER = 1 - (swapped+inserted+deleted)/all_the_words
	What can be done to improve speech recognizer results?	1. Training material in real conditions 2. Choosing a good feature set, preprocessing 3. Parallel recognizers (multi-stream speech recognition) for different frequency ranges. It's assumed interferences only effect some frequency ranges

Share This Flashcard Set

Set the Language

Chapter 7.1 Speech Recognition

Add to Folders

Upgrade to Cram Premium

Related Essays

Card Range To Study

15 Cards in this Set