Study your flashcards anywhere!

Download the official Cram app for free >

  • Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off

How to study your flashcards.

Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key

Up/Down arrow keys: Flip the card between the front and back.down keyup key

H key: Show hint (3rd side).h key

A key: Read text to speech.a key


Play button


Play button




Click to flip

67 Cards in this Set

  • Front
  • Back
is a method of summarizing data. Frequency is how often
something occurs. For a distribution, you define two or more equivalent classes and count the number of observations in each class. A table showing the equivalence classes and the frequency with which their score values occur is called a frequency distribution.
Frequency distribution
the frequency of an individual score value
ungrouped frequency distribution
each class interval spans 2 or more score values. It has a nominal upper and lower limit
grouped frequency distribution
extend .5 below the lower and upper nominal limits and show that there are no gaps in a distribution
real limits
real upper limit minus the real lower limit for grouped and ungrouped frequency
class interval size.
- are distributions that show the proportion or
percentage of the total number of scores.
Proportionate frequency (Prop f)
Prop f = f/n
Percentage frequency (%f)
% f = f/n x 100
Relative frequency distributions
show the number, proportion or percentage
of scores that occur below the upper limit of each class interval.
Cumulative frequency distributions
is similar to a bar graph but is used for quantitative variables. It is constructed by erecting vertical bars over the real limits of each class interval with the height of each bar corresponding to the number of scores in the interval. The bars of adjacent class intervals should touch to emphasize the continuous quantitative character of the class intervals.
a way to show the information in a frequency table. It looks a little bit like a line graph. You just need to plot a few points and then join the points by straight lines. So what points do you need to plot? Well, first you have to find the midpoints of each class. The midpoint of a class is the point in the middle of the class.
Frequency polygon
a convenient way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (sample maximum). May also indicate which observations, if any, might be considered outliers.
Box-and-whisker plot
another way to analyze the frequency distribution table. Unlike a frequency distribution which tells you how many data points are within each class, a cumulative frequency tells you how many are less than or within each of the class limits.
Cumulative frequency distribution (ogive)-
resembles a histogram that has been turned on its side. It is a way of summarizing a set of data measured on an interval scale. It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient and easily drawn form. It is similar to a histogram but is usually a more informative display for relatively small data sets (<100 data points). It provides a table as well as a picture of the data and from it we can readily write down the data in order of magnitude, which is useful for many statistical procedures.
Stem-and-leaf plot
is the value of a variable below which a certain percent of observations fall.
part of statistics that organizes and summarizes data so
that it can be more readily comprehended.
Descriptive statistics
where the concept of statistical significance is based. You are trying to reach conclusions that extend beyond the immediate data alone. Try to infer from the sample data what the population might think.
Inferential statistics
is some specific characteristic of a subject that can assume one or more different values.
a particular subject’s relative standing on a quantitative variable, or a subject’s
classification within a classification variable. In some cases this is a score. Different states in which a variable occurs.
the individual subjects/objects that serve as the source of the data.
Usually this can be a person.
Observational unit
is the process of assigning numbers or labels to characteristics of people,
objects, or events according to a set of rules.
- is a classification system that places people, objects or other entities
into mutually exclusive categories. Not meaningfully ordered.
represent the rank order of the subjects with respect to the variable that
is being assessed. Characteristic is used to order individuals.
equal distances between scale values have equal quantitative meaning.
This scale does NOT have a true zero point. Characteristic that is used to order individuals, and the distance between numbers are equal.
equal distances between scale values have equal quantitative meaning and
have a true zero point. Characteristic is used to order individuals, the distance between numbers are equal, and the 0 is meaningful.
Range consists of an uncountably infinite number of values.
Characteristics on which individuals can theoretically take on any value between the lowest and highest points on a scale.
Range consist of only a finite number of values or an infinite number of
values that can be counted. Characteristics on which individuals can take ona limited number of values.
Information based on characteristics of the entities studied.
a characteristic which is the same for all members studied.
- a characteristic which takes on a different value fot different individuals studied.
the score value on which a distribution centers, often called the average. Mode, Mean and Median are all measures of central tendency.
Central tendency
the sum of the scores divided by the number of scores, commonly known as the average
Mean (μ, x̄)
the middle score when scores have been arranged in order of size. If the population is even, the median is the midway point between the two sencter scores.
Median (Md)
the score or qualitative category that occurs with the greatest frequency
Mode (Mo)
the spread or scatter of scores around the central point and are expressed in terms of distance along a distributions X axis.
the distance between the largest and smallest number.
one half of the distance between the first quartile point and the third quartile point. The semi-interquartile range is a measure of spread or dispersion. It is computed as one half the difference between the 75th percentile [often called (Q3)] and the 25th percentile (Q1). The formula for semi-interquartile range is therefore: (Q3-Q1)/2. With a normal distribution, this will contain half the scores
Semi-interquartile range (Q)-
represents the sum of squared differences from the mean and is an extremely important term in statistics.
Sum of squares (SS)
the variance is used as a measure of how far a set of numbers are spread out from each other. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). In particular, the variance is one of the moments of a distribution. In that context, it forms part of a systematic approach to distinguishing between probability distributions.
Variance (σ2 , s2 )
It shows how much variation or "dispersion" there is from the average (mean, or expected value).
Standard Deviation (σ , s)
- a symmetrical curve representing the normal distribution. In statistics, the theoretical curve that shows how often an experiment will produce a particular result. The curve is symmetrical and bell shaped, showing that trials will usually give a result near the average, but will occasionally deviate by large amounts. The width of the “bell” indicates how much confidence one can have in the result of an experiment — the narrower the bell, the higher the confidence.
Normal curve
is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (possibly including the median) lie to the right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution.
is a measure of the "peakedness" of the probability distribution of a real-valued random variable, although some sources are insistent that heavy tails, and not peakedness, is what is really being measured by kurtosis.[1] Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. A high kurtosis distribution has a sharper peak and longer, fatter tails, while a low kurtosis distribution has a more rounded peak and shorter thinner tails.
Distributions with zero excess kurtosis like normal distribution
mesokurtic, or mesokurtotic
A distribution with positive excess kurtosis
leptokurtic, or leptokurtotic. "Lepto-" means "slender".
A distribution with negative excess kurtosis. a lower, wider peak around the mean
platykurtic, or platykurtotic. "Platy-" means "broad"
scores that have been standardized to have a mean of zero and a SD of 1. It indicates how many standard deviations a raw score is above or below the mean.
an observation that is numerically distant from the rest of the data
restricting/ not using outliers
Restriction of range
non linear
is used to describe the degree of agreement between paired data that are in the form of ranks S
Spearman Rank Correlation r sub s
the coefficient is the measure of the linear relationship between two variables, X and Y. denoted by rxy or ρ for population Pearson
Product-Moment Correlation
is a measure of how much two variables change together
Covariance ( , ) xy xy σ s
the product of the two deviations in calculating Pearson’s r—(deviation for each X x deviation of each Y)
the representation of the joint frequency of two variables
Bivariate plot
the number that summarizes the nature of the relationship between two variables
Correlation Coefficient
- the relationship of two variables
smallest possible squared residual
least squares criterion
this line gives us the smallest sum of squared residuals
line of best fit
indicates the change in y for a one unit change in X
the difference between observed and predicted values. the error (e) or unexplainable part of Y.
linear relationship between an independent variable and a dependent variable
Regression equation
a point where the graph of a function (line) intersects with the y-axis
the value predicted based on the regression equation
is regression SS plus residual SS
sum of squares of the differences between the values of y predicted by equation 1 and the actual values of y.
Residual SS
Total SS - Residual SS
Regression SS
the proportion of the variability in the outcomes that is accounted for by the predictor variable
Coefficient of Determination R squared
the proportion of the variability in the outcomes that is not explained by the predictor variable
Coefficient of Alienation (1-r squared)