• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/65

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

65 Cards in this Set

  • Front
  • Back

6.1 What are requirements for the transmission?

- Quality as high as possible
- Bitrate as low as possible
- Algorithm complexity as low as possible
- Delay as low as possible

6.1 What is idea underlying speech and audio coding?

- Compression of the signal
- Reduction of bit-rate
- Improving robustness against distortions

by taking into account knowledge about
- Speech signal production -> Speech coding
- Human auditory perception -> Audio coding

6.1 What are the classes of speech coding?

1. Waveform coding. ~ Bandwidth 64 kBit/s
2. Parametric coding. ~ Bandwidth 2-8 kBit/s
3. Hybrid coding. ~ Bandwidth 8-32 kBit/s

6.1 What is the principle of waveform coding?

Reduces the information already in the sender by quantization.
Results in less information that needs to be transmitted. This filtering and normalization is typically applied adaptively, e.g. it adapts to the changing signal characteristics.

On t...

Reduces the information already in the sender by quantization.
Results in less information that needs to be transmitted. This filtering and normalization is typically applied adaptively, e.g. it adapts to the changing signal characteristics.

On the part of the receiver, the signal is reconstructed by inverse filtering or normalization in which the relevant filtering or normalization parameters are gained by backward adaption out of the transmitted signal.

6.1 What is the principle of parameter coding?

Parameters are transmitted, from which the speech signal may be synthesized on the receiving side.
This may be done e.g. in utilizing the source filter model of speech production.
Contrary to waveform coding, here the signal channel remains empty,...

Parameters are transmitted, from which the speech signal may be synthesized on the receiving side.
This may be done e.g. in utilizing the source filter model of speech production.
Contrary to waveform coding, here the signal channel remains empty, and only the parameter channel is used.
This method allows for very low bit rates – but quality may be limited very much.

6.1 Which type of coding tries to reproduce speech signal?

Waveform coding

6.1 What are advantages and disadvantages of discrete signal representation?

Advantages:
- Robustness against distortions: In principle lossless transmission
- Universality: Transmission of different types of signals over the same channel
- Simple processing on digital computers

Disadvantages:
- Quantization error due to limited number of amplitude values
- Higher bandwidth requirement due to sampling with at least double frequency

6.1 What is aliasing? And how to deal with it?

Aliasing is an effect that causes different signals to aliases of one another when sampled.
It also refers to the distortion or artifact that results when the signal reconstruction from samples is different from the original continuous signal.
The...

Aliasing is an effect that causes different signals to aliases of one another when sampled.
It also refers to the distortion or artifact that results when the signal reconstruction from samples is different from the original continuous signal.
The easiest way to prevent aliasing is the application of a steep sloped low-pass filter with half the sampling frequency before the conversion.
Aliasing can be avoided by keeping fs ≥ 2 ∙fmax

6.2 What is Pulse Code Modulation (PCM)?

Pulse Code Modulation is used by linear quantization. When the amplitude values are collected within equidistant intervals. In data items of the word length w this leads to a range of values of 2^w possibilities

Pulse Code Modulation is used by linear quantization. When the amplitude values are collected within equidistant intervals. In data items of the word length w this leads to a range of values of 2^w possibilities

6.2 What clipping effect is about?

Clipping is a form of distortion that limits a signal once it exceeds a threshold.

Clipping is a form of distortion that limits a signal once it exceeds a threshold.

6.2 How do we calculate quantization steps in PCM? What is the maximum error we can get?

Quantization steps Q accrue from the maximal dynamic range D (this is the range within which analog signal amplitudes may take on values) and the number of steps 2^w
The maximal error, which may occur in the quantization of an amplitude value is Q/2.

Quantization steps Q accrue from the maximal dynamic range D (this is the range within which analog signal amplitudes may take on values) and the number of steps 2^w
The maximal error, which may occur in the quantization of an amplitude value is Q/2.

6.2 What is quantizing noise? How do we calculate it?

Power of the error signal e(k) (quantizing noise) with a signal of equal distribution x(k).
(holds approximately also when equally distributed within an interval)

Power of the error signal e(k) (quantizing noise) with a signal of equal distribution x(k).
(holds approximately also when equally distributed within an interval)

6.2 What is SNR? What is important to note from the formula?

SNR with linear quantization:

• the signal-to-noise ratio rises with 6 dB for 1 bit word length
• the signal-to-noise ratio depends on the degree of saturation D

SNR with linear quantization:

• the signal-to-noise ratio rises with 6 dB for 1 bit word length
• the signal-to-noise ratio depends on the degree of saturation D

6.2 What is the aim of non-linear quantization?

Removing the dependency on the degree of saturation

Removing the dependency on the degree of saturation

6.2 What is logarithmic companding and why do we need to use it?

Since logarithmic quantization characteristic line is difficult to implement in an analog to digital converter.
Therefore we do logarithmic companding:
1. logarithmic distortion (compression) of the signal
2. linear quantization
3. exponential exp...

Since logarithmic quantization characteristic line is difficult to implement in an analog to digital converter.
Therefore we do logarithmic companding:
1. logarithmic distortion (compression) of the signal
2. linear quantization
3. exponential expanding

6.2 What is the problem with logarithmic quantization? How we can solve it?

6.2 What is idea of optimal quantization?

Quantizing more frequent values (around zero) with more precision – i.e. with more steps – than rare values (with high amplitudes)
- selection of intervals and representatives in a way that the overall SNR is maximized

"Optimum quantization“ depends on the distribution function of speech

6.2 How do we calculate quantization interval in optimal quantization method?

6.2 What is the peculiarity of the optimal quantization method?

• Representatives are "centers of gravity“ of the intervals
• Interval boundaries are in the middle between the centers of gravity
• Optimum quantization results in the optimal working point an approx. 3 dB improvement of the SNR, but in turn is more dependent on the degree of saturation, and the realization is far more complicated
-> is only rarely used

6.2 What is idea of adaptive quantization?

Adaptation of the step height of the quantizer to the momentary degree of saturation -> reduces the dependency on the degree of saturation in a dynamic way

6.2 How do we calculate quantized values in adaptive quantization method?

6.2 What is the principle of Adaptive Quantization Forward (AQF)?

The step height ∆x(k) is calculated in blocks for signal sections of the length of N, and then retained for this section.
The value of the step height then needs to be transmitted also, because if it would be lacking, the signal may not be resto...

The step height ∆x(k) is calculated in blocks for signal sections of the length of N, and then retained for this section.
The value of the step height then needs to be transmitted also, because if it would be lacking, the signal may not be restored correctly on the side of the receiver. As to do this, additional information needs to be transmitted, the block length is chosen relatively long, e.g. N = 128 values.

6.2 What is the principle of Adaptive Quantization Backward (AQB)?

The necessity of transmitting the step height is no longer required, as this information may be retrieved from the transmitted signal Z(k) (as long as the transmission functions error-free).

The necessity of transmitting the step height is no longer required, as this information may be retrieved from the transmitted signal Z(k) (as long as the transmission functions error-free).

6.2 How do we calculate step height and estimation of the variance in Adaptive Quantization?

6.2 Which type of quantization makes the SNR independent of the signal amplitude?

Logarithmic quantization

6.2 What is idea of vector quantization?

6.2 Why are we doing vector quantization?

- In case the codebook is known not the full code vectors, but only their indices (addresses) need to be transmitted
- Significant reduction of bit-rate.

- In case the codebook is known not the full code vectors, but only their indices (addresses) need to be transmitted
- Significant reduction of bit-rate.

6.2 What are disadvantages of vector quantization?

• Computationally intensive, as a full comparison needs to be performed
-> quick search algorithms and intelligent selection of code vectors
• Representatives still depend on the level of saturation
-> Adaptation of an additional gain factor (gain-shape vector quantization)

6.3 What are the possibilities to reduce the bitrate?

• Redundancy reduction: Everything not containing information is omitted in the transmission.
• Irrelevance reduction: All information that is irrelevant for the present application is omitted. A possible goal could e.g. be to transmit intelligible speech without transmitting information which may shed light on the identity of the speaker; in this case, speech characteristics typical for certain speakers may be omitted.

6.3 What difference PCM is about? (1 Approach) How we can improve it?

Due to the correlation between single sampling values it is convenient not to transmit the sampling value itself but the difference to the preceding sampling value. By this calculation of difference a signal may be obtained, which modulates the qu...

Due to the correlation between single sampling values it is convenient not to transmit the sampling value itself but the difference to the preceding sampling value. By this calculation of difference a signal may be obtained, which modulates the quantizer less heavily, and which therefore may be transmitted using a minor word length (bit rate).
d(k) = x(k) − x(k−1)
The coding of the difference can be improved by transmitting a weighted difference between two sampling values: d(k) = x(k) − a⋅x(k−1)

6.3 What is Adaptive Predictive Coding (APC)?
The weighting factor a may of course be also adjusted adaptively, either adaptively in blocks or sequentially.
In Adaptive Predictive Coding (APC) we choose the optimal coefficient (optimal, here, with respect to the mean square error) i.e. the re...
The weighting factor a may of course be also adjusted adaptively, either adaptively in blocks or sequentially.
In Adaptive Predictive Coding (APC) we choose the optimal coefficient (optimal, here, with respect to the mean square error) i.e. the relationship of the auto-correlation functions in shift from one sampling value to the auto-correlation of zero.

6.3 What is idea and principle of linear prediction?

Transmitting a weighted difference between several preceding sample values.

Transmitting a weighted difference between several preceding sample values.

6.3 Forward prediction

6.3 Forward prediction

6.3 Backward prediction

6.3 Backward prediction

6.3 Which type of prediction provides a better SNR?

Backward prediction

6.3 Which type of prediction provides a more speech-like shaped prediction error?

Forward prediction

On the left side (backward prediction) we find white quantizing noise, which exceeds the signal in low energy signal sections, - but which is notably below the useful signal in the sections with high energy.
The inverse case is shown in the middle part of the illustration, showing the forward prediction.

6.3 How do we do a noise-shaping?

Between two extremes, the noise may be manipulated optimally with respect to its spectrum by a filter downstream to the forward predictor, so that
- the signal-to-noise ratio is enhanced similar to the backward prediction, while at the same time
- the remaining error is made spectrally approximated to the signal spectrum (as in open loop prediction) – and thus becomes inaudible.

6.3 What is Adaptive Differential Pulse Code Modulation (ADPCM) about?

Transmission of the adaptively weighted difference between several preceeding sample values with
- Adaptive Quantization (AQB)
- Adaptive Prediction (Backward Prediction)

Quantization steps are adapted to the respective signal amplitude, and the predictor coefficients ai are identified for the respective signal section

6.3 How much can the necessary bandwidth be reduced by ADPCM compared to log PCM?

To a half

6.4 What is the difference between parametric coding and waveform coding?

- Waveform codecs transmit the speech signal (or residuals thereof)
- Parametric codecs transmit a parametric description of the speech signal

6.4 What is idea of parametric coding?

• Model-based recapitulation of the speech production process
• Description via parameters

Example: Analysis-Synthesis Systems, Vocoder (Voice Coder)

6.4 Which information is contained in the parameters of parametric coders?

1. Vocal tract (a_i or formants)
2. Type of excitation (voiced or voiceless)
3. Amplitude

6.4 Describe the principle of the synthesis part of a vocoder?

6.4 How is vocal tract information represented in a channel vocoder?

Bandpass filters

6.4 Describe the principle of the sending part of a channel vocoder?

The sending part tries to build the vocal tract in terms of several parallel bandpass filters.
1. Calculates energy in different bands
2. Describes information of vocal tract in series of bandpass
3. Adds information on the excitation signal, i.e....

The sending part tries to build the vocal tract in terms of several parallel bandpass filters.
1. Calculates energy in different bands
2. Describes information of vocal tract in series of bandpass
3. Adds information on the excitation signal, i.e. a decision between voiced-unvoiced plus in voiced sections the fundamental frequency

6.4 Describe the principle of the receiving part of a channel vocoder?

6.4 What are features of the Formant vocoder?

• Similar to the channel vocoder, but representation of the vocal tract via
- Formant center frequencies
- Formant bandwidths
• Realization of the vocal tract filter as a
- parallel arrangement
- sequential arrangement
of a number of formant filters

6.4 What are disadvantages of channel and formant vocoder?

Low Necessary bit-rate: 0,5...1,2 kbit/s
-> Model used to represent the speech production process quite simplified
- less natural sound
- few information about the speaker

6.4 What are features of the Prediction vocoder?

- Description of the vocal tract via linear prediction
(mostly simple all-pole model)
- In contrast to ADPCM no residual signal transmitted, but artificial generation of an excitation signal

6.4 Describe the principle of Prediction vocoder.

For a parametric description also can be used the principle of linear prediction. Instead of a difference signal, which does recreate the excitation signal of the vocal tract filter in an ideal prediction, this signal is now generated artificially, namely by
• its amplitude A
• its excitation type voiced or unvoiced
• in voiced excitation additionally by the fundamental frequency of the excitation
In addition to the coefficients of the prediction filters H (e jΩ ) and T(e jΩ ) , respectively, which entail the information on the vocal tract, only the excitation parameters still need to be transmitted to the receiver.

6.4 Name characteristics of the prediction vocoder

6.5 What is a coding gap? What are ideas for filling it?

Difference between bit-rate in waveform and parametric coding.
- Transmission of vocal tract information (as side information)
- Excitation with a (simplified) natural residual signal, which is quantified in an intelligent way
- Usage of short-ter...

Difference between bit-rate in waveform and parametric coding.
- Transmission of vocal tract information (as side information)
- Excitation with a (simplified) natural residual signal, which is quantified in an intelligent way
- Usage of short-term and long-term prediction

6.5 Why do we use short-term and long-term prediction?

6.5 Why do we use short-term and long-term prediction?

On the most hybrid coders use in addition to the short-time LPC analysis a long-time predictor, which models the periodicity of the excitation signal in voiced segments.
a) amplitude spectrum of a voiced section; b) long-time predictor; c) short-t...

On the most hybrid coders use in addition to the short-time LPC analysis a long-time predictor, which models the periodicity of the excitation signal in voiced segments.
a) amplitude spectrum of a voiced section; b) long-time predictor; c) short-time predictor; d) cascade of b) und c)

6.5 Which kinds of quantization are used in Hybrid coding?

• Scalar: forward and backward prediction, potentially with noise shaping
• Vector: Quantization of the residual signal with 0,5...1,5 bit/sample
-> Codebook nearly independent of the speaker
-> Gain-shape vector quantization

6.5 What is idea of scalar quantization?

• In LPC-based coding, the excitation signal should show
- a correct temporal energy sequence
- the correct temporal periodicity in voiced segments
- a noisy character in voiceless segments
• This can be reached by a sub-sampled simplified version of the residual signal -> Baseband RELP (Residual Excited Linear Prediction)

6.5 Which coding principle is used in the GSM fullrate codec?

Baseband-RELP

6.5 Describe the principle of baseband RELP (residual excitated linear prediction)
6.5 Describe the principle of baseband RELP (residual excitated linear prediction)
1. Windowing
2. Uses predicitor to calculate the difference
3. Transmit the simplify version of signal - lower band in frequency domain
4. Subsampling with reduction by the factor r
5. Quantization
1. Windowing
2. Uses predicitor to calculate the difference
3. Transmit the simplify version of signal - lower band in frequency domain
4. Subsampling with reduction by the factor r
5. Quantization

6.5 What is the basic principle of CELP (Code-Excited Linear Prediction) coding?

Vector quantization
Testing all codebook vectors results in a (potentially perceptively-weighted) minimum distance, corresponding to an "optimum“ codebook vector, the address of which is ransmitted (computationally demanding!)

Vector quantization
Testing all codebook vectors results in a (potentially perceptively-weighted) minimum distance, corresponding to an "optimum“ codebook vector, the address of which is ransmitted (computationally demanding!)

6.5 Explain the flow of CELP coding

6.6 What are the steps to do coding in frequency domain?

1. Spectral analysis
2. Transmission of individual frequency bands
3. Spectral synthesis

Saving bit-rate by making use of a non-constant power spectral density

6.6 How can we do Spectral analysis for coding in frequency domain?

- Via filterbank
- Sub-sampling of the band signals
- Quantization of the band signals according to their degree of saturation

6.6 Explain the flow of Transform coding

6.6 Explain the flow of Sub-Band coding

6.7 Name criterias for codec selection

1. Speech quality
- Contains several perceptual dimensions
- Can only be measured auditorily
- Can now be estimated via algorithms
2. Complexity of the algorithms
3. Signal delay
- Threshold ITU-T: 150 ms (400 ms)
4. Bit-rate