Study your flashcards anywhere!
Download the official Cram app for free >
 Shuffle Toggle OnToggle Off
 Alphabetize Toggle OnToggle Off
 Front First Toggle OnToggle Off
 Both Sides Toggle OnToggle Off
 Read Toggle OnToggle Off
How to study your flashcards.
Right/Left arrow keys: Navigate between flashcards.right arrow keyleft arrow key
Up/Down arrow keys: Flip the card between the front and back.down keyup key
H key: Show hint (3rd side).h key
A key: Read text to speech.a key
67 Cards in this Set
 Front
 Back
is a method of summarizing data. Frequency is how often
something occurs. For a distribution, you define two or more equivalent classes and count the number of observations in each class. A table showing the equivalence classes and the frequency with which their score values occur is called a frequency distribution. 
Frequency distribution


the frequency of an individual score value

ungrouped frequency distribution


each class interval spans 2 or more score values. It has a nominal upper and lower limit

grouped frequency distribution


extend .5 below the lower and upper nominal limits and show that there are no gaps in a distribution

real limits


real upper limit minus the real lower limit for grouped and ungrouped frequency

class interval size.


 are distributions that show the proportion or
percentage of the total number of scores. Proportionate frequency (Prop f) Prop f = f/n Percentage frequency (%f) % f = f/n x 100 
Relative frequency distributions


show the number, proportion or percentage
of scores that occur below the upper limit of each class interval. 
Cumulative frequency distributions


is similar to a bar graph but is used for quantitative variables. It is constructed by erecting vertical bars over the real limits of each class interval with the height of each bar corresponding to the number of scores in the interval. The bars of adjacent class intervals should touch to emphasize the continuous quantitative character of the class intervals.

Histogram


a way to show the information in a frequency table. It looks a little bit like a line graph. You just need to plot a few points and then join the points by straight lines. So what points do you need to plot? Well, first you have to find the midpoints of each class. The midpoint of a class is the point in the middle of the class.

Frequency polygon


a convenient way of graphically depicting groups of numerical data through their fivenumber summaries: the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (sample maximum). May also indicate which observations, if any, might be considered outliers.

Boxandwhisker plot


another way to analyze the frequency distribution table. Unlike a frequency distribution which tells you how many data points are within each class, a cumulative frequency tells you how many are less than or within each of the class limits.

Cumulative frequency distribution (ogive)


resembles a histogram that has been turned on its side. It is a way of summarizing a set of data measured on an interval scale. It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient and easily drawn form. It is similar to a histogram but is usually a more informative display for relatively small data sets (<100 data points). It provides a table as well as a picture of the data and from it we can readily write down the data in order of magnitude, which is useful for many statistical procedures.

Stemandleaf plot


is the value of a variable below which a certain percent of observations fall.

Percentile


part of statistics that organizes and summarizes data so
that it can be more readily comprehended. 
Descriptive statistics


where the concept of statistical significance is based. You are trying to reach conclusions that extend beyond the immediate data alone. Try to infer from the sample data what the population might think.

Inferential statistics


is some specific characteristic of a subject that can assume one or more different values.

Variable


a particular subject’s relative standing on a quantitative variable, or a subject’s
classification within a classification variable. In some cases this is a score. Different states in which a variable occurs. 
Value


the individual subjects/objects that serve as the source of the data.
Usually this can be a person. 
Observational unit


is the process of assigning numbers or labels to characteristics of people,
objects, or events according to a set of rules. 
Measurement


 is a classification system that places people, objects or other entities
into mutually exclusive categories. Not meaningfully ordered. 
Nominal


represent the rank order of the subjects with respect to the variable that
is being assessed. Characteristic is used to order individuals. 
Ordinal


equal distances between scale values have equal quantitative meaning.
This scale does NOT have a true zero point. Characteristic that is used to order individuals, and the distance between numbers are equal. 
Interval


equal distances between scale values have equal quantitative meaning and
have a true zero point. Characteristic is used to order individuals, the distance between numbers are equal, and the 0 is meaningful. 
Ratio


Range consists of an uncountably infinite number of values.
Characteristics on which individuals can theoretically take on any value between the lowest and highest points on a scale. 
Continuous


Range consist of only a finite number of values or an infinite number of
values that can be counted. Characteristics on which individuals can take ona limited number of values. 
Discrete


Information based on characteristics of the entities studied.

Data


a characteristic which is the same for all members studied.

Constant


 a characteristic which takes on a different value fot different individuals studied.

Variable


the score value on which a distribution centers, often called the average. Mode, Mean and Median are all measures of central tendency.

Central tendency


the sum of the scores divided by the number of scores, commonly known as the average

Mean (μ, x̄)


the middle score when scores have been arranged in order of size. If the population is even, the median is the midway point between the two sencter scores.

Median (Md)


the score or qualitative category that occurs with the greatest frequency

Mode (Mo)


the spread or scatter of scores around the central point and are expressed in terms of distance along a distributions X axis.

Variability/Dispersion


the distance between the largest and smallest number.

Range


one half of the distance between the first quartile point and the third quartile point. The semiinterquartile range is a measure of spread or dispersion. It is computed as one half the difference between the 75th percentile [often called (Q3)] and the 25th percentile (Q1). The formula for semiinterquartile range is therefore: (Q3Q1)/2. With a normal distribution, this will contain half the scores

Semiinterquartile range (Q)


represents the sum of squared differences from the mean and is an extremely important term in statistics.

Sum of squares (SS)


the variance is used as a measure of how far a set of numbers are spread out from each other. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). In particular, the variance is one of the moments of a distribution. In that context, it forms part of a systematic approach to distinguishing between probability distributions.

Variance (σ2 , s2 )


It shows how much variation or "dispersion" there is from the average (mean, or expected value).

Standard Deviation (σ , s)


 a symmetrical curve representing the normal distribution. In statistics, the theoretical curve that shows how often an experiment will produce a particular result. The curve is symmetrical and bell shaped, showing that trials will usually give a result near the average, but will occasionally deviate by large amounts. The width of the “bell” indicates how much confidence one can have in the result of an experiment — the narrower the bell, the higher the confidence.

Normal curve


is a measure of the asymmetry of the probability distribution of a realvalued random variable. The skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (possibly including the median) lie to the right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution.

Skewness


is a measure of the "peakedness" of the probability distribution of a realvalued random variable, although some sources are insistent that heavy tails, and not peakedness, is what is really being measured by kurtosis.[1] Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. A high kurtosis distribution has a sharper peak and longer, fatter tails, while a low kurtosis distribution has a more rounded peak and shorter thinner tails.

Kurtosis


Distributions with zero excess kurtosis like normal distribution

mesokurtic, or mesokurtotic


A distribution with positive excess kurtosis

leptokurtic, or leptokurtotic. "Lepto" means "slender".


A distribution with negative excess kurtosis. a lower, wider peak around the mean

platykurtic, or platykurtotic. "Platy" means "broad"


scores that have been standardized to have a mean of zero and a SD of 1. It indicates how many standard deviations a raw score is above or below the mean.

Zscore


an observation that is numerically distant from the rest of the data

Outlier


restricting/ not using outliers

Restriction of range


non linear

Curvilinearity


is used to describe the degree of agreement between paired data that are in the form of ranks S

Spearman Rank Correlation r sub s


the coefficient is the measure of the linear relationship between two variables, X and Y. denoted by rxy or ρ for population Pearson

ProductMoment Correlation


is a measure of how much two variables change together

Covariance ( , ) xy xy σ s


the product of the two deviations in calculating Pearson’s r—(deviation for each X x deviation of each Y)

Crossproduct


the representation of the joint frequency of two variables

Bivariate plot


the number that summarizes the nature of the relationship between two variables

Correlation Coefficient


 the relationship of two variables

Correlation


smallest possible squared residual

least squares criterion


this line gives us the smallest sum of squared residuals

line of best fit


indicates the change in y for a one unit change in X

slope


the difference between observed and predicted values. the error (e) or unexplainable part of Y.

residual


linear relationship between an independent variable and a dependent variable

Regression equation


a point where the graph of a function (line) intersects with the yaxis

yintercept


the value predicted based on the regression equation

Y'


is regression SS plus residual SS

SStotal


sum of squares of the differences between the values of y predicted by equation 1 and the actual values of y.

Residual SS


Total SS  Residual SS

Regression SS


the proportion of the variability in the outcomes that is accounted for by the predictor variable

Coefficient of Determination R squared


the proportion of the variability in the outcomes that is not explained by the predictor variable

Coefficient of Alienation (1r squared)
