bimodal distribution

a distribution of data points in which two values occur more frequently than the rest of the values in the data set


boxplot

a graphical EDA technique used to highlight the center and extremes of a data set


Chebyshev's Theorem

no matter what the shape of a distribution, at least 75 percent of the values in the population will fall within 2 standard deviations of the mean and at least 80 percent will fall within 3 standard deviations


coding

a method of calculating the mean for grouped data by recording values of class midpoints to more simple values


coefficient of variation

a relative measure of dispersion, comparable across distributions, that expresses the standard deviation as a percentage of the mean


deciles

fractiles that divide the data into 10 equal parts


dispersion

the spread of variability in a set of data


distance measure

a measure of dispersion in terms of the difference between two values in the data set


exploratory data analysis (EDA)

methods for analyzing data that require very few prior assumptions


fractile

in a frequency distribution, the location of a value at or above a given fraction of the data


geometric mean

a measure of central tendency used to measure the average rate of change or growth for some quantity, computed by taking the Nth root of the product of N values representing change


interfractile range

a measure of the spread between two fractiles in a distribution, that is, the difference between the values of two fractiles


interquartile range

the difference between the values of the first and the third quartiles; this difference indicates the range of the middle half of the data set


kurtosis

the degree of peakedness of a distribution of points


mean

a central tendency measure representing the arithmetic average of a set of observations


measure of central tendency

a measure indicating the value to be expected of a typical or middle data point


measure of dispersion

a measure describing how the observations in a data set are scattered or spread out


median

the middle point of a data set, a measure of location that divides the data set into two halves


median class

the class in a frequency distribution that contains the median value for a data set


mode

the value most often repeated in the data set, represented by the highest point in the distribution curve of a data set


parameters

numerical values that describe the characteristics of a whole population, commonly represented by Greek letters


percentiles

fractiles that divide the data into 100 equal parts


quartiles

fractiles that divide the data into four equal parts


range

the distance between the highest and the lowest values in a data set


skewness

the extent to which a distribution of data points is concentrated at one end or the other; the lack of symmetry


standard deviation

the positive square root of the variance; a measure of dispersion in the same units as the original data, rather than in the squared units of the variance


standard score

expressing an observation in terms of standard deviation units above or below the mean; that is, the transformation of an observation by subtracting the mean and dividing by the standard deviation


statistics

numerical measures describing the characteristics of a samle, represented by Roman letters


Stem and Leaf Display

a histogramlike display used in EDA to group data, while still displaying all the original values


Summary Statistics

single numbers that describe certain characteristics of a data set


symmetrical

a characteristic of a distribution in which each half is the mirror image of the other half


variance

a measure of the average squared distance between the mean and each item in the population


weighted mean

an average calculated to take into account the importance of each value to the overall total, that is, an average in which each observation is weighted by some index of its importance
