Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
125 Cards in this Set
- Front
- Back
how humans describe sound: loudness
|
related to physical concept of amplitude
|
|
how humans describe sound: pitch
|
Related to Fundamental Frequency for periodic sounds. Some noise-like sounds have different pitch heights depending on where in the spectrum most of the energy is concentrated
|
|
how humans describe sound: chroma
|
musical notes---cultural
|
|
how humans describe sound: timbre
|
Quality or color of a sound; related to physical concept of Spectrum (amplitude of harmonics for periodic waves.though noise has energy at all frequencies, some noise will have more energy at certain parts of the spectrum
|
|
physics is related to
|
acoustics
|
|
psychoacoustics is related to
|
hearing
|
|
to understand sound in multimedia you need to know
|
how the physics and psychacoustics of sound work
|
|
sound
|
longitudinal waves travelling through a mediuam (typically air)
|
|
sound waves
|
pressure variations in back and forth motion, in direction of sound propagation
|
|
sinusoid waves
|
represent circular motion--considered pure waves
|
|
a sound wave's amplitude is related to
|
loudness
|
|
amplitude
|
peak value that a periodic wave achieves---shows how much pressure varies
|
|
period (t)
|
amount of time it takes to complete one cycle
|
|
a soundwave's frequency is related to...
|
pitch
|
|
frequency's formula is
|
1/T (measures how many cycles are completed in a second---Hertz)
|
|
real world sounds are
|
composite sounds
|
|
Fourier analysis
|
can create any periodic waveform (ie musical sound) by adding harmonically related sinusoids together
|
|
fundamental frequency
|
lowest mode of vibration
|
|
overtones
|
additional modes above fundamental frequency
|
|
harmonics
|
overtones that obey a harmonic relationship to the fundamental (at integer multiples to the fundamental frequency)
e.g. 220hz--its harmonics are 220, 440, 660, 880 hz |
|
waveform shows
|
how amplitude of a wave behaves over time.
|
|
periodic waves(repeating waves) have a pitch because
|
they have fundamental frequency
|
|
when you represent noise as a sound wave, it
|
is aperiodic, random
|
|
why do instruments sound different if a note has the same harmonics?
|
stregnth of harmonic amplitudes is different relative to the fundamental of each type of sound (timbre). A tuba could have a really strong 440hz harmonic, while a flute could have a really weak 440hz harmonic. The waveforms will look different.
|
|
spectrum shows
|
the amplitude of different frequencies. (useful to show how different instruments sound different even when playing same note).
|
|
wave forms are in what domain
|
time domain
|
|
spectrum are in what domain
|
frequency domain
|
|
when you plot a spectrum
|
you can look at frequency and amplitude at a particular moment in time.
|
|
contain all frequencies
|
nonperiodic sounds, aka NOISE
|
|
time-varying spectrums are a property of
|
real life sounds
|
|
loudness increases
|
logarithmically but NOT linearly
|
|
loudness depends on three things
|
mainly amplitude and also frequency and timbre
|
|
our ears are more sensitive to
|
certain frequencies
|
|
loudness is measured in
|
decibels
|
|
threshhold of hearing is how many DB
|
0
|
|
to double loudness...
|
you need to make the power 10x stronger. (more complicated explanation: you will need to exponentially increase the power by 10ⁿ. e.g. doubling loudness from 1 watt to 10 watts is (10 to the 0 watts to 10 to the 1 watts). tripling would be 10 to the 2 watts=100 watts. )
|
|
Our ears are most sensitive in the
|
2000-4000 Hz range. means we hear these sounds as louder than ones outside the range.
|
|
lowest/highest frequencies we can hear
|
20 hz (lowest)- 20,000 hz (highest)
|
|
timbre affects loudness how?
|
Sounds that have frequency content spread over a wider area are perceived as louder (spread across multiple critical bands)
|
|
Loudness affects pitch how?
|
•High pitches get higher the louder they are
•Low pitches get lower the louder they are |
|
The Spectrogram shows
|
the spectrum over time. so we can see the spectrum of real sounds, which happen in time. Time is on one axis (usually horizontal) ,Frequency is on the other axis (usually vertical), and The color measures amplitude/energy
|
|
to convert analog audio to digital audio, we need to
|
sample the amplitude at a set time jump. then convert those amplitude values that into a form a computer can understand (quantize)
|
|
aliasing occurs
|
when you sample too slowly
|
|
aliasing is
|
when you've sampled too slowly and created a waveform at the wrong frequency. it sounds awful.
|
|
Nyquist theorem
|
must sample at 2 times the highest frequency we want to hear so we can hear it
|
|
To be able to store audio in the whole range that humans can perceive, we need to sample
|
at around >40,000 times per second. (highest frequency humans can hear is ~20,000 Hz)
|
|
cd audio's sample rate
|
44,100 Hz
|
|
quantize
|
converts amplitude values that into a form a computer can understand w/a certain number of bits/bytes. a bit- 2 values, a byte-8 bits. Can represent 28 = 256 unique values. 2 bytes: 16 bits. Can represent 216 = 65,536 unique values
|
|
quantization noise
|
We can’t perfectly represent any amplitude, so we “round” the amplitude to the closest quantization level--creating an error noise in our sound atop the perfect sound.
|
|
to decrease quantization noise
|
add more bits. We gain 6 dB of signal to noise ratio per bit added.
|
|
CD Audio uses how many bytes per sample
|
CD Audio – 16 bits (2 bytes) per sample – 96 dB of dynamic range
|
|
Sample rate controls
|
the highest frequency that can be stored. Sample rate must be twice the highest frequency we wish to store per the Nyquist Theorem
|
|
Quantization bit-depth
|
controls the dynamic range, i.e. How much the signal is above the noise level introduced by the quantizer
|
|
channels
|
1 or 2, mono or stereo
|
|
to calculate the size of a digital audio file
|
multiply sample rate by bit depth (#of bytes) by # of channels by duration (# of seconds).
samples/sec*bytes/sample*sec*# of channels |
|
How many bytes is 5 minute of CD audio?
|
44100 SR x 2 bytes x 2 channels x 300 seconds=52920000 bytes
|
|
perceptual coding of audio
|
exploits psychoacoustic masking
|
|
masking is
|
phenomenon where 1 sound renders another sound inaudible
|
|
masking threshhold is
|
the cone of deaf
|
|
if we reduce sample rate and bit depth to make an audio file smaller
|
We will lose high frequency information
and have a noisier sound |
|
types of masking and their definitions
|
Simultaneous masking: Two sounds simultaneously occurring where one sound makes another inaudible
–Forward masking: A sound makes another sound immediately following it inaudible –Backward masking: A sound makes another sound immediately preceding it inaudible (?!?!) |
|
to save space, digital audio drops
|
sounds outside the threshhold of hearing and sounds in the cone of deaf
|
|
compression ratio
|
raw size/compressed size
cd audio / mp3 @ 128kbps= 1411/128= ~11 |
|
name three lossless encoding schemes
|
run length, dictionary, and entropy encoders
|
|
dictionary encoding
|
Symbols and sequences of symbols are then simply referenced as an index into the dictionary. only works well when symbols and sequences are sufficiently repetitive
|
|
run length encoding
|
Given a sequence of symbols, encode the symbol and how many times it repeats
–AAAABBBCCCCCAABBB = 4A 3B 5C 2A 3B only works well w/stuff that has a lot of repetition. |
|
entropy encoding
|
entropy encoding is to exploit that some symbols occur more frequently than others (like letter e vs letter z)
does not work well when symbols are generally equally likely |
|
lossy compression pros and cons
|
pro: substantial size shrinkage (high compression rate)
cons: permanently lose data |
|
What is light?
|
a wave phenomenon w/spectra that usually comes from two sources
–Thermal/black-body radiation –Emission (electron energy state changes |
|
what is the visible spectrum
|
400-790 terahertz
|
|
The actual spectrum of a source is
|
its physical color:
|
|
regulates color perception in human eye
|
cones
|
|
short, medium, and long cones are most sensitive to
|
Red, Green, and Blue wavelengths, respectively
|
|
a color like orange
|
may excite the red cone most strongly, but also excite the green and blue cones some as well
|
|
we create perceived colors by
|
mixing together the correct amounts of red, green, and blue light--moving from infinte dimensional representation of color to three dimensional
|
|
Additive Color
|
Light--mixing all 3 (rgb) creates white
|
|
Subtractive Color
|
Most objects reflect light, and do not generate it. a green notebook is green because it absorbs the other light wavlegnths but reflects green.
art: ryb printing: cmyk |
|
how printing with cmyk colors works
|
Cyan Ink: Absorbs red, reflects blue and green
•Magenta Ink: Absorbs green, reflects red and blue •Yellow Ink: Absorbs blue, reflects red and green •To make blue on paper: Apply cyan and magenta so that only blue is reflected |
|
why don't people use rgb values much when making stuff
|
hard to remember them!
|
|
alternative to rgb values that is easier
|
hue, saturation, value (HSV). maps 1:1 to RGB.
|
|
hue is
|
“Color”
|
|
saturation is
|
"Colorfulness"
|
|
value is
|
“Brightness”
|
|
Color Vision
|
: A spectrum analyzer with receptors that analyze how much red, green, and blue light is present
|
|
Color Theory
|
Add together R, G, B light to create colors. Map RGB values to different representations like HSV to be more intuitive
|
|
digital images use which color model
|
additive (rgb)
|
|
what is a raster image?
|
a representation of an image using pixels, where each pixel takes on an RGB value
|
|
how do you calculate a raster image's size?
|
resolution (pixels wide x pixels high) x color depth (bits or bytes per pixel) = raw file size
|
|
1 bit color is
|
black and white (1 and 0) --(2 to the 1 power)
|
|
1 byte (8 bit) color is
|
256 colors (2 to the 8th power)
|
|
3 byte (24 bit) true color is
|
16,777,216 (2 to the 24th power)
|
|
an 800 x 600 true color images file size would be
|
800 pixels x 600 pixels x 3 bytes=1, 440,000 bytes
|
|
the # of color possibilities available per # of bits available per pixel can be caluclated like...
|
1 bit = 2 possibilities, 2 bits = 4, 8 bits = 256
(2 to the 1, 2 to the 2....2 to the 8---see the pattern here:) |
|
GIF--color depth, compression type, good for
|
•One of earliest examples
•Supports only 8-bit color (though each image can have its own 256-color palette) •Lossless compression using the once patented LZW algorithm •Good for logos, etc. •Supports animation |
|
PNG--color depth, compression type, good for, other features
|
Supports 24-bit color
•Uses patent-free DEFLATE lossless compression good for logos and text supports alpha channel |
|
JPG-- compression type, good for, other features
|
lossy, photos,
|
|
JPEG's compression exploits
|
that human eye is good at noticing slight changes in brightness over large areas (low frequency info) but far less sensitive to sharp transitions, e.g., edges (high frequency info)
|
|
jpeg compression's steps
|
Break up image into 8x8 pixel blocks
•Transform each block's data from spacial domain to frequency domain •Quantize the frequency domain coefficients, and possibly remove high frequency content (edges) •Colors are averaged in the blocks, and each block is clearly visible |
|
a sequence of pixels has a measurable
|
spectrum
|
|
•Removing too much high frequency information in a jpeg (extreme compression) leads to
|
blocking artifacts
|
|
SVG (vector graphics) is not
|
a raster image
|
|
SVG uses
|
geometric formulas to draw an image
great for items you need to scale bad for pictures, good for fonts and some drawn images |
|
video is
|
a sequence of still images
|
|
frame rate is
|
number of images per second, unit is FPS (frames per second)
|
|
how do you calculate the size of video
|
time (in seconds) x (resolutionxbitdepth of still image) x frame rate
|
|
How large would one minute (60 seconds) of a 1080p (1920 x 1080) true-color (24 bits, or 3 bytes per pixel) video at 30 FPS be?
|
BIG
11,197,440,000 bits (1,399,680,000 bytes) |
|
frame rate film and video standards
|
Film standard is 24 FPS
•Video standard is 30 FPS movie theaters project at project at 72 FPS, displaying each image 3 times (for flicker and motion reasons) |
|
“Trumotion” technologies do what
|
attempt to “create” frames between existing ones for more realistic motion
|
|
How does intraframe video compression work?
|
works a lot like jpeg. Take 8 x 8 pixel blocks
–Transform each block from spatial domain to frequency domain –Quantize frequency domain coefficients and possibly remove high frequency content |
|
How does interframe video compression work?
|
Exploit similarity among adjacent frames to achieve compression (e.g., if the background is the same, don't recode it over and over)
|
|
when frames are the same in interframe compression
|
code with a short command to copy
|
|
when frames are not the same in interframe compression
|
use motion compensation. (Uses previous frames to predict the current one, and notice/store the difference)
|
|
Motion compensation uses
|
previous frames to predict the current one, and notice/store the difference
|
|
frame types
|
i (intraframe compressed image),p (predictive frame) and b (bi-predictive frame)
|
|
I frame
|
original intraframe compressed image--no motion compensation or prediction
|
|
p frame
|
predictive. A Delta frame that depends on previous I and P frames
|
|
b frame
|
bi-predictive. Uses both previous (past) and subsequent (future) frames
|
|
we can compress motion further by
|
using prediction--If the prediction is decent, all that needs to be stored is the parameters of the prediction and the ERROR
–With good prediction, the error is small –If the error is small, it can be stored very compactly |
|
Advances in Video Coding
|
Variable pixel block sizes, using more reference frames (frames from way back or forward),Extremely complicated motion compensation systems.
drawback: usually requires high computational cost, and thus better and faster computers |
|
compression scheme is referred to as
|
a codec
|
|
codec stands for
|
coder/decorder
|
|
common codecs
|
MPEG-2/H.262: DVD standard
MPEG-4 AVC/H.264: Blue-ray. Widely gaining ground as the dominant standard. Very CPU intensive •VP6, VP7, VP8: Proprietary format. Was used extensively in Flash. •WMV: Proprietary Microsoft format. |
|
a video file type is NOT
|
a codec, though some share the same file names (ick)
|
|
video containers
|
•AVI: Microsoft container format. Linked to no specific CODEC
•MOV: Apple Quicktime Format - Supports all MPEG formats, amongst others. Became the basis of the MPEG-4 container format •MPEG: Container for MPEG-1 and MPEG-2 videos •MPEG-4/MP4: Based off of newest Quicktime MOV. Main container for H.264 encoded videos •FLASH: Common web streaming format. Was dominated with VP series codecs, now H.264 •REALMEDIA: Real's container format. Uses p |
|
a video container contains
|
Video compressed with some CODEC (e.g. H.264)
–Audio compressed with some audio codec (e.g. mp3) –Perhaps text, menus, etc. |