Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
statistics concepts

Statistics Concepts

by avaughan13, Sep. 2011

Subjects: concept statistical statistics

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/67

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

67 Cards in this Set

Front
Back

	is a method of summarizing data. Frequency is how often something occurs. For a distribution, you define two or more equivalent classes and count the number of observations in each class. A table showing the equivalence classes and the frequency with which their score values occur is called a frequency distribution.	Frequency distribution
	the frequency of an individual score value	ungrouped frequency distribution
	each class interval spans 2 or more score values. It has a nominal upper and lower limit	grouped frequency distribution
	extend .5 below the lower and upper nominal limits and show that there are no gaps in a distribution	real limits
	real upper limit minus the real lower limit for grouped and ungrouped frequency	class interval size.
	- are distributions that show the proportion or percentage of the total number of scores. Proportionate frequency (Prop f) Prop f = f/n Percentage frequency (%f) % f = f/n x 100	Relative frequency distributions
	show the number, proportion or percentage of scores that occur below the upper limit of each class interval.	Cumulative frequency distributions
	is similar to a bar graph but is used for quantitative variables. It is constructed by erecting vertical bars over the real limits of each class interval with the height of each bar corresponding to the number of scores in the interval. The bars of adjacent class intervals should touch to emphasize the continuous quantitative character of the class intervals.	Histogram
	a way to show the information in a frequency table. It looks a little bit like a line graph. You just need to plot a few points and then join the points by straight lines. So what points do you need to plot? Well, first you have to find the midpoints of each class. The midpoint of a class is the point in the middle of the class.	Frequency polygon
	a convenient way of graphically depicting groups of numerical data through their five-number summaries: the smallest observation (sample minimum), lower quartile (Q1), median (Q2), upper quartile (Q3), and largest observation (sample maximum). May also indicate which observations, if any, might be considered outliers.	Box-and-whisker plot
	another way to analyze the frequency distribution table. Unlike a frequency distribution which tells you how many data points are within each class, a cumulative frequency tells you how many are less than or within each of the class limits.	Cumulative frequency distribution (ogive)-
	resembles a histogram that has been turned on its side. It is a way of summarizing a set of data measured on an interval scale. It is often used in exploratory data analysis to illustrate the major features of the distribution of the data in a convenient and easily drawn form. It is similar to a histogram but is usually a more informative display for relatively small data sets (<100 data points). It provides a table as well as a picture of the data and from it we can readily write down the data in order of magnitude, which is useful for many statistical procedures.	Stem-and-leaf plot
	is the value of a variable below which a certain percent of observations fall.	Percentile
	part of statistics that organizes and summarizes data so that it can be more readily comprehended.	Descriptive statistics
	where the concept of statistical significance is based. You are trying to reach conclusions that extend beyond the immediate data alone. Try to infer from the sample data what the population might think.	Inferential statistics
	is some specific characteristic of a subject that can assume one or more different values.	Variable
	a particular subject’s relative standing on a quantitative variable, or a subject’s classification within a classification variable. In some cases this is a score. Different states in which a variable occurs.	Value
	the individual subjects/objects that serve as the source of the data. Usually this can be a person.	Observational unit
	is the process of assigning numbers or labels to characteristics of people, objects, or events according to a set of rules.	Measurement
	- is a classification system that places people, objects or other entities into mutually exclusive categories. Not meaningfully ordered.	Nominal
	represent the rank order of the subjects with respect to the variable that is being assessed. Characteristic is used to order individuals.	Ordinal
	equal distances between scale values have equal quantitative meaning. This scale does NOT have a true zero point. Characteristic that is used to order individuals, and the distance between numbers are equal.	Interval
	equal distances between scale values have equal quantitative meaning and have a true zero point. Characteristic is used to order individuals, the distance between numbers are equal, and the 0 is meaningful.	Ratio
	Range consists of an uncountably infinite number of values. Characteristics on which individuals can theoretically take on any value between the lowest and highest points on a scale.	Continuous
	Range consist of only a finite number of values or an infinite number of values that can be counted. Characteristics on which individuals can take ona limited number of values.	Discrete
	Information based on characteristics of the entities studied.	Data
	a characteristic which is the same for all members studied.	Constant
	- a characteristic which takes on a different value fot different individuals studied.	Variable
	the score value on which a distribution centers, often called the average. Mode, Mean and Median are all measures of central tendency.	Central tendency
	the sum of the scores divided by the number of scores, commonly known as the average	Mean (μ, x̄)
	the middle score when scores have been arranged in order of size. If the population is even, the median is the midway point between the two sencter scores.	Median (Md)
	the score or qualitative category that occurs with the greatest frequency	Mode (Mo)
	the spread or scatter of scores around the central point and are expressed in terms of distance along a distributions X axis.	Variability/Dispersion
	the distance between the largest and smallest number.	Range
	one half of the distance between the first quartile point and the third quartile point. The semi-interquartile range is a measure of spread or dispersion. It is computed as one half the difference between the 75th percentile [often called (Q3)] and the 25th percentile (Q1). The formula for semi-interquartile range is therefore: (Q3-Q1)/2. With a normal distribution, this will contain half the scores	Semi-interquartile range (Q)-
	represents the sum of squared differences from the mean and is an extremely important term in statistics.	Sum of squares (SS)
	the variance is used as a measure of how far a set of numbers are spread out from each other. It is one of several descriptors of a probability distribution, describing how far the numbers lie from the mean (expected value). In particular, the variance is one of the moments of a distribution. In that context, it forms part of a systematic approach to distinguishing between probability distributions.	Variance (σ2 , s2 )
	It shows how much variation or "dispersion" there is from the average (mean, or expected value).	Standard Deviation (σ , s)
	- a symmetrical curve representing the normal distribution. In statistics, the theoretical curve that shows how often an experiment will produce a particular result. The curve is symmetrical and bell shaped, showing that trials will usually give a result near the average, but will occasionally deviate by large amounts. The width of the “bell” indicates how much confidence one can have in the result of an experiment — the narrower the bell, the higher the confidence.	Normal curve
	is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined. Qualitatively, a negative skew indicates that the tail on the left side of the probability density function is longer than the right side and the bulk of the values (possibly including the median) lie to the right of the mean. A positive skew indicates that the tail on the right side is longer than the left side and the bulk of the values lie to the left of the mean. A zero value indicates that the values are relatively evenly distributed on both sides of the mean, typically but not necessarily implying a symmetric distribution.	Skewness
	is a measure of the "peakedness" of the probability distribution of a real-valued random variable, although some sources are insistent that heavy tails, and not peakedness, is what is really being measured by kurtosis.[1] Higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. A high kurtosis distribution has a sharper peak and longer, fatter tails, while a low kurtosis distribution has a more rounded peak and shorter thinner tails.	Kurtosis
	Distributions with zero excess kurtosis like normal distribution	mesokurtic, or mesokurtotic
	A distribution with positive excess kurtosis	leptokurtic, or leptokurtotic. "Lepto-" means "slender".
	A distribution with negative excess kurtosis. a lower, wider peak around the mean	platykurtic, or platykurtotic. "Platy-" means "broad"
	scores that have been standardized to have a mean of zero and a SD of 1. It indicates how many standard deviations a raw score is above or below the mean.	Z-score
	an observation that is numerically distant from the rest of the data	Outlier
	restricting/ not using outliers	Restriction of range
	non linear	Curvilinearity
	is used to describe the degree of agreement between paired data that are in the form of ranks S	Spearman Rank Correlation r sub s
	the coefficient is the measure of the linear relationship between two variables, X and Y. denoted by rxy or ρ for population Pearson	Product-Moment Correlation
	is a measure of how much two variables change together	Covariance ( , ) xy xy σ s
	the product of the two deviations in calculating Pearson’s r—(deviation for each X x deviation of each Y)	Cross-product
	the representation of the joint frequency of two variables	Bivariate plot
	the number that summarizes the nature of the relationship between two variables	Correlation Coefficient
	- the relationship of two variables	Correlation
	smallest possible squared residual	least squares criterion
	this line gives us the smallest sum of squared residuals	line of best fit
	indicates the change in y for a one unit change in X	slope
	the difference between observed and predicted values. the error (e) or unexplainable part of Y.	residual
	linear relationship between an independent variable and a dependent variable	Regression equation
	a point where the graph of a function (line) intersects with the y-axis	y-intercept
	the value predicted based on the regression equation	Y'
	is regression SS plus residual SS	SStotal
	sum of squares of the differences between the values of y predicted by equation 1 and the actual values of y.	Residual SS
	Total SS - Residual SS	Regression SS
	the proportion of the variability in the outcomes that is accounted for by the predictor variable	Coefficient of Determination R squared
	the proportion of the variability in the outcomes that is not explained by the predictor variable	Coefficient of Alienation (1-r squared)

Share This Flashcard Set

Set the Language

Statistics Concepts

Add to Folders

Upgrade to Cram Premium

Card Range To Study

67 Cards in this Set