Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Flashcards
»
Fundamentals of Statistics

Fundamentals Of Statistics

by LizNenno65914, Jan. 2012

Subjects: college formulas graphs level statistics stats vocabulary

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/175

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

175 Cards in this Set

Front
Back

	Approach	Suggested way to look at and organize a problem so it can be solved. **There is usually more than one way...
	Bias	Result of the sample is not representative of the population
	3 Sources of Bias	1. Sampling Bias 2. Non-response Bias 3. Response Bias
	Blinding	Refers to non-disclosure of a treatment an experimental unit is receiving.
	2 Types of Blinding	1. Single Blinding 2. Double Blinding
	Single Blind	Experiment in which the experimental unit (or subjects) do not know which treatment (s)he is receiving.
	Double Blind	Study/Experiment in which neither the experimental unit nor the researcher is aware of whether the subject knows which treatment (s)he is receiving.
	Case Control Studies	Retrospective studies that require individuals to look back in time, or researchers to look at existing records.
	Closed Question	Question for which the respondent must choose from a list of predetermined responses - i.e., multiple choice
	Cluster Sample	Sample obtained by selecting all individuals within a randomly selected collection or group of individuals. (i.e., All Online students = population; Each online class = Cluster; obtain simple random selection of cluster for survey; Survey all students with cluster.
	Cohort Studies	Identifies a group of individuals to participate (cohorts), then are observed over a length of time (sometimes very long. Characteristics are recorded and some individuals are exposed to certain factors (not intentionally) and others will not. At end of study, the value of the response variable is recorded for the individual. (i.e. Framingham Heart Study)
	Completely Randomized Design	Simplest type of experiment. Design in which each experimental unit is randomly assigned to a treatment.
	Continuous Variable	A quantitative variable that has an infinite number of possible values that are not countable. Can be Discrete or Continuous.
	Confounding	Occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable(s) not accounted for in the study. ** Major problem with observational studies, often the cause is a "lurking variable".
	Control Group	Serves as a baseline treatment that can be used to compare to other treatments.
	Convenience Sampling	Sample in which the individuals are easily obtained and not based on randomness. * self-selected - individuals decided to participate (voluntary response), i.e., phone-in polling, internet surveys **Unreliable results because sampling is not random.
	Cross-Sectional Studies	Observational studies that collect information about individuals at a specific point in time - or over a very short period of time.
	Data	Fact or Proposition used to draw a conclusion or make a decision. Can be numerical or non-numerical. (List of observed values for a variable) i.e., gender is a variable - the observations of Male/Female are data.
	Designed Experiment	Researcher assigns the individuals in a group (certain group), intentionally changes the value of an explanatory variable and records the value of the response variable for each group.
	Steps in Designing an Experiment	1. Identify the problem to be solved (explicit) 2. Determine factors that affect the response variable (field expert) 3. Determine the number of experimental units. 4. Determine the level for each factor. a) Controls: 1) fix at 1 predetermined factor; or 2) Set them at predetermined levels b) Randomize - randomize exp units to various treatment groups so the effects of factors that can't be controlled are minimized. 5. Conduct the experiment: a) Exp units are randomly assigned to the treatments. Replication occurs when each tx is applied to more than 1 exp unit. b) Collect and process data. Measure value of the response variable for each replication 6. Test the claim - inferential statistics.
	Discrete Variable	A quantitative variable that has either a finite number of possible values, or a countable number of possible values. (If you count to get the value of a quantitative variable, it is discrete)
	Experiment	A controlled study conducted to determine the effect varying a treatment has on a response variable - one or more explanatory variables or factors .
	Experimental Unit	Person, object, or some other well defined item upon which a treatment is applied. (Often referred to as a Subject)
	Explanatory Variable.	Qualitative Variables - Variables that describe, name or label the individuals.
	Functional Status	The ability to conduct day-to-day activities.
	Individual	Person or object that is a member of the population to be studied.
	Interval Level of Measurement (Quantitative Variable)	Has the properties of the ordinal level, and the differences in the values of the variables has meaning. Arithmetic operations can be performed (Addition & Subtraction) (i.e., Temperature - can perform arithmetic operations, but ratio doesn't not represent meaningful results.
	Lurking Variable	an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. * Typically related to explanatory variables considered in a study.
	Matched-Pairs Design	Experimental design in which the experimental units are paired up. the pairs are matched so that they are somehow related, there are only 2 levels of treatment.
	Nominal Level of Measurement (Qualitative Variable)	Values of the variable - name, label or categorized - does not allow for the values to be arranged in a ranked or specific order. (i.e. gender)
	Non-response Bias	Exists when individuals selected to be in sample - who do not respond to survey - have a different opinion than those who do participate.
	Non-Sampling Errors	Non-response bias, response bias, data entry errors, under coverage. Can also be present in Census. ** The errors that result from obtaining and recording the information collected.
	Observational Study	Measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. **Observes the behavior without trying to influence the outcome.
	Types of Observational Studies	1. Cross-Sectional - collect information about individuals - usually short periods of time 2. Case Studies - collect information about individuals - completed retrospectively 3. Cohort - individuals studied for longer periods of time - completed prospectively
	Open Question	A question for which the respondent is free to choose his or her response: Open line answer.
	Ordinal Level of Measurement (Qualitative Variables)	Has the properties of the nominal level and the naming scheme allows for the values to be arranged in a specific order. (i.e., Letter Grades) Can be ranked, but differences have no meaning.
	Parameter	Numerical summary of a population
	Placebo	An innocuous medication (such as sugar tablets) that look, taste and smell like the experimental medication.
	Population	Entire group of individuals to be studied.
	Qualitative Data	Observations corresponding to a qualitative variable
	Qualitative Variables *(Categorical)	Allow for classification of individuals based on some attribute or characteristic.
	Quantitative Data	Observations corresponding to a quantitative variable *(Discrete/Continuous)
	Quantitative Variables	Provide numerical measures of individuals. Arithmetic operations - addition and subtraction - can be performed on the values and will provide meaningful results.
	Random Sampling	The process of using chance to select individuals from a population to be included in a sample.
	Ratio Level of Measurement (Quantitative Variables)	Has the properties of the interval level and the ratios have meaning. Arithmetic operations can be completed - Multiplication and Division.
	Response Bias	Exists when the answers on a survey do not reflect the true feelings of the respondent. (i.e., Interviewer Error, Misrepresented Answers, Wording of Questions)
	Reasons for Response Bias	1. Interviewer Error - untrained interviewers 2. Misrepresented Answers: responses that misrepresent facts, flat-out lies 3. Wording of questions: unbalanced? vague? 4. Ordering of questions or words: questions should be rearranged and asked again. 5. Type of question: open-free to choose response, closed - multiple choice 6. Data Entry Error: Imperative to perform accuracy checks!
	Reliability	Represents the ability of different measurements of the same individual to yeild the same results.
	Response Variable	Quantitative Variables - Variables that are measured such as interval or ratio, the end results.
	Sample	Subset of the population being studied
	Sampling Bias	Technique used to obtain the individuals to be in the sample tends to favor one part of the population over another.
	Sampling Error	The error that results from using a sample to estimate information about a population; occurs because a sample gives incomplete information about population (cannot reveal all) **Error that results from using a subset of a population to describe characteristics of the population
	Sampling With Replacement	Certain number of unique numbers selected from a population. Questions/surveys sent to individuals in sample. Individual's names are left in the population and could possibly be chosen again.
	Sample Without Replacement	Certain number of unique numbers selected from a population. Questions/surveys sent to individuals in sample. Those individual's names are removed and can not be chosen again.
	Seed	In a random number generator, provides an initial point for the generator to start creating random numbers (dictates the random numbers that are generated)
	Simple Random Sampling	A sample "n" from a population of size "N" is obtained, if every possible sample of size "n" has an equally likely chance of occurring.
	Goal of Sampling	Obtain as much information as possible about population at the least amount of cost.
	Statistic	Numerical summary of a sample
	Statistics	The science of collecting, organizing, summarizing and analyzing information to draw conclusions or answer questions. In addition, statistics is about providing a measure of confidence in any conclusions.
	Descriptive Statistics	Consists of organizing and summarizing data. Describe data through numerical summaries, tales and graphs.
	Inferential Statistics	Uses methods that take a result from a sample, extend it to the population and measure the reliability of the result.
	Process of Statistics	1) Identify the research objective (determine the detailed questions) 2) Collect the data needed to answer those queswtions - important to use appropriate data collection processes. 3) Describe the data - descriptive statistics allows the researcher to obtain an overview of the data. 4) Perform Inference: apply the appropriate techniques to extend the results obtained from sample to population and report a level of reliability
	Stratified Sample	Separate population into non-overlapping groups (strata) and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogenous (similar) in some way.
	Systemic Sampling	Obtained by selecting every "K"th individual from the population. The 1st individual selected corresponds to a random number between 1 & K Formula: N/n = K Random # = P Sample consists of: P, P + K, P + 2K, …, P + (n+1)K
	Completely Randomized Design	Experimental Design in which each experimental unit is randomly assigned to a treatment. (i.e., field fertilizer example)
	Treatment	Any combination of the values of factors used in an experiment
	Validity	Represents how close to the true value the measurement is.
	Undercoverage	Occurs when the proportion of one segment of the population is lower in a sample, than it is in the population. (can be caused by incomplete/incorrect frame, or not representative of population)
	Variables	Characteristics of the individuals within the population
	Bar Graph	Constructed by labeling each category of data on either the horizontal or vertic al axis and the frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn for each category. The height of each rectangle represents the category's frequency/relative frequency.
	Bell-Shaped Distribution	(Symmetric Distribution) The highest frequency occurs in the middle and frequencies tail off to the left & right
	Classes	Categories of data; Categories by which data are grouped.
	Class Width	The difference between consecutive lower class limits. i.e., 25-34, 35-44 35 - 25 = 10 10 = Class Width
	Guidelines for Determining the lower Class Limit of the First Class and Class Width	Choose the lower Class Limit of the First Class by choosing the smallest observation in the data set or a convenient number slightly lower than the smallest observation. Determine the class Width: Decide on the number of classes (generally between 5 & 20) Determine the class width by computing: largest data value - smallest data value and divide by number of classes. - Round this value up to a convenient number
	Deceptive Graphs	Purposely intended to create an incorrect impression.
	Dot Plot	Drawn by placing each observation horizontally in increasing order and placing a dot above the observation each time it occurs. Limited in usefulness but can be used to quickly visualize data.
	Frequency Distribution	Lists each category of data and the number of occurrences for each category of data
	Guidelines for Constructing Good Graphs	Title and label the graphic axeds clearly, provide explanations if needed. Include: - Units of measurement - Data source (when appropriate) Avoid distortion. Never lie about the data!
	Histogram	Constructed by drawing rectangles for each class of data. The height is the frequency or relative frequency of the class. The width of each rectangle is the same, and the rectangles touch!
	Lower Class Limit	The Lowest (smallest) value of a class 25-34 35-25 = 25
	Misleading Graphs	Graphs that unintentionally create an incorrect impression. Most common: * Manipulation of scale * Misplaced origin (of scale)
	Pareto Chart	Bar graph whose bars are drawn in decreasing order of frequency/relative frequency *Helps prioritize categories for decision making purposes - QA, HR, Marketing
	Relative Frequency	The proportion (percentage) of observations within a category and is found using the following formula: = Frequency/sum of all frequencies
	Open Ended	The first class has no lower class limit; or the last class has no upper class limit.
	Choosing Bar Graphs, Pie Graphs or Pareto Graphs	Pie Charts - Should be used for showing the divisions of "all" possible values of qualitative variables into it's parts. (Not useful for comparing 2 specific values of the qualitative variable. * Bar Graphs: Useful when comparing the different parts, but not the parts being compared to the whole. * Pareto Graphs: Useful when comparing parts to the whole.
	Pie Charts	Circle divided into sectors, each sector represents a category of data. Area of sector is proportional to frequency of the category. * Typically used to present relative frequency of qualitative data. * Data is usually nominal, but can also be used for ordinal.
	Relative Frequency Distribution	Lists each category of data together with the relative frequency
	Side-by-Side Bar Graphs	Bar graph that compares 2 data sets. Should be completed using relative frequency because different sample size/population size makes comparisons using frequency difficult or misleading.
	Skewed Left Distribution	The tail to the L of Peak is longer than the tail to the right of the Peak.
	Skewed Right Distribution	the tail to the right of the peak is longer than the tail to the left
	Stem and Leaf Plot	another way to represent quantitative data graphically. Use the digits to the left of the right-most digit to form the stem. Each right-most digit forms a leaf. i.e., 147; 14 = Stem, 7 = Leaf
	Construction of a Stem and Leaf Plot	Step 1: The stem of a data value will consist of the digits to the left of the right-most digit. The leaf will be the right-most digit. Step 2: Wrikte the stems in vertical column in increasing order. Drawn vertical line to the right of the stems. Step 3: Write each leaf corresp;onding to the stems to the right of the vertical line. Stem 4: Within each stem, rearrange the leaves in ascending order, title the plot and provide legend. i.e., Legend: 5\|5 represents 5.5%
	Time-Series Data	The value of a variable is measured at different points in time. i.e., the closing price of Cisco Systems each month for the past 12 years.
	Time-Series Plot	Obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis.
	Uniform Distribution	(Symmetric Distribution) Shape of distribution - frequency of each value of the variable is evenly spread out across the values of the variable.
	Upper Class Limit	the largest (highest) value within the class i.e., 25 - 34 Upper class limit = 34
	Bar Graph	Constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn for each category. The height of each rectangle represents the category's frequency/relative frequency.
	Bell Shaped Distribution	Symmetric Distribution The highest frequency occurs in the middle and frequencies tail off to the left and right.
	Classes	Categories of data; categories by which data are grouped.
	Class Width	The difference between consecutive lower class limits. i.e., 25-34, 35-44 35-25 = 10 Class width is 10
	Formula for Class Width	Determine class width by computing: largest data value - smallest data value divided by number of classes. Round results up to convenient number (rounding may result in fewer classes)
	Deceptive Graphs	Purposely intend to create an incorrect impression.
	Dot Plot	Drawn by placing each observation horizontally in increasing order and placing a dot above the observation each time it occurs. Limited in usefulness but can be used to quickly visualize data.
	Frequency Distribution	Lists each category of data and the number of occurrences for each category of data
	Guidelines for Constructing Good Graphs	* Title and label the graphic axes clearly, provide explanations if needed Include: -Units of measurement -Data source (when appropriate) *Avoid distortion. Never lie about the data
	Histogram	Constructed by drawing rectangles for each class of data. The height is the frequency or relative frequency of the class. The width of each rectangle is the same, and the rectangles touch.
	Lower Class Limit	The lowest (smallest) value of a class
	Misleading Graphs	Graphs that unintentionally create an incorrect impression. Most Common: Manipulation of scale Misplaced origin (start at 0)
	Pareto Chart	Bar graph whose bars are drawn in decreasing order of frequency/relative frequency. * Helps prioritize categories for decision making purposes - Q!A, HR, Marketing.
	Relative Frequency	The proportion (%) of observations within a category and is found using the following formula: RF = Frequency/Sum of all Frequencies
	Open Ended	The first class has no lower class limit or the last class has no upper class limit.
	Choosing Bar charts, Pie Charts or Pareto Charts	Pie Charts: Should be used for showing the divisions of all possible values of a qualitative variable into it's parts. (Not useful for comparing two specific values of the qualitative variable) Bar Graphs: Useful when comparing the different parts, but not the parts being compared to the whole. Pareto Charts: Useful when comparing parts to the whole.
	Pie Charts	Circle divided into sectors, each sector represents a category of data. Area of sector is proportional to frequency of the category. * Typically used to present relative frequency of qualitative data * Data is usually nominal but can also be used for ordinal.
	Relative Frequency Distribution	Lists each category of data together with the relative frequency
	Side-By-Side Bar Graphs	Bar graph that cmpares two data sets. Should be completed using relative frequency because different sample sizes and/or population size makes comparisons using frequency difficult or misleading.`
	Skewed Left Distribution	The tail to the left of the peak is longer than the tail to the right.
	Skewed Right	The tail to the right of the peak is longer than the tail to the left.
	Stem and Leaf Plot	Another way to represent quantitative data graphically. Use the digits to the left of the right-most digit to form the stem. Each rightmost digit forms a leaf. i.e., 147; 14 = stem, 7 = leaf
	Time - Series Data	The value of a variable is measured at different pints in time. i.e., the closing price of Cisco Systems stock each month for the past 12 years.
	Time Series Plot	Obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis - join by a line to subsequent plotted values.
	Uniform Distribution	Symmetrical distribution. Shape of distribution: frequency of each value of the variable is evenly spread out across the values of the variable.
	Upper Class Limit	The largest (highest) value within the class.
	Describe the Distribution	Describe it's shape - skewed left, skewed right, symmetric; It's center - mean or median; and its spread - std deviation or IQR
	Relationship Between the Mean, Median and Distribution Shape	Skewed Left - Mean is substantially smaller than the Median. Symmetric - Mean is roughly equal to Median. Skewed Right - Mean is substantially larger than Median.
	Sample Standard Deviation	s = obtained by taking the square root of the sample variance. s = (sq root) of s₂
	Population Standard Deviation	℺ = √℺₂
	Skewed Right	The tail to the right of the peak is longer than the tail to the left.
	Stem and Leaf Plot	Another way to represent quantitative data graphically. Use the digits to the left of the right-most digit to form the stem. Each rightmost digit forms a leaf. i.e., 147; 14 = stem, 7 = leaf
	Time - Series Data	The value of a variable is measured at different pints in time. i.e., the closing price of Cisco Systems stock each month for the past 12 years.
	Time Series Plot	Obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis - join by a line to subsequent plotted values.
	Uniform Distribution	Symmetrical distribution. Shape of distribution: frequency of each value of the variable is evenly spread out across the values of the variable.
	Upper Class Limit	The largest (highest) value within the class.
	Describe the Distribution	Describe it's shape - skewed left, skewed right, symmetric; It's center - mean or median; and its spread - std deviation or IQR
	Relationship Between the Mean, Median and Distribution Shape	Skewed Left - Mean is substantially smaller than the Median. Symmetric - Mean is roughly equal to Median. Skewed Right - Mean is substantially larger than Median.
	Sample Standard Deviation	s = obtained by taking the square root of the sample variance. s = (sq root) of s₂
	Population Standard Deviation	Obtained by taking the square root of the population variance. ℺ = √℺₂
	Arithmetic Mean	*Quantitative Data Only Computed by determining the sum of all the values of the variables in the data set and dividing by the number of observations.
	Sample Arithmetic Mean	Computed by using sample data. ꓃=( ∑Xi) / n
	Median	The value that lies in the middle of the data when arranged in ascending order. M = median Odd # of obs: (n + 1) / 2 Even # of obs: Middle two obs added = 1,2,3,4 (2 + 3)/2 = 2.5
	Resistant	A numerical summary of data where extreme values (relative to data) do not affect its value substantially.
	Mode	The most frequent observation of the variable that occurs in the data set. (Tally the number of observations for each value in the data set. The data value that occurs most often = mode) * The only measure of central tendency that can be used for qualitative data.
	Bimodal	Data set that has two modes
	Multimodal	Data set that has 3 or more values that occur with the highest frequency.
	Circumstances for Measure of Central Tendency	Mean: Population - μ = ( ∑Xi) / N Sample: ꓃= ( ∑Xi) / n Center of gravity Quantitative data and freq distribution is roughly symmetric. Median: Arrange data in ascending order and divide data set in half. Odd # of obs: (n + 1) / 2 Even # of obs: Middle two obs added = 1,2,3,4 (2 + 3)/2 = 2.5 Divides the data 50%=50% Quantitative data and freq distribution is skewed left or right. Mode: Tally data to determine most frequent observations. Most frequent observation When the most freq obs is desired measure of central tendency or data is qualitative.
	Sample Standard Deviation	S = √s₂
	Weighted Mean	Certain data values that have a higher importance of "weight" associated with them. Found by multiplying each value of variable by it's corresponding weight, summing the products and dividing the result by sum of the weights.
	Quartiles	Divides data set into fourths. Step 1: Arrange data in ascending order Step 2: Determine the Median (2nd Quartile or Q2) Step 3: Determine the 1st & 3rd quartiles by dividing the data set into 2 halves. Then divide each half into halves. The 1st quartile will be the median of bottom half, the 3rd quartile is the median of the top half.
	Interquartile Range - IQR	The range of the middle 50% of the observations in a data set. IQR = Q3 - Q1 The more spread a data set has, the higher the IQR will be.
	Outliers	Extreme observations. Origins must be investigated Can distort the Mean and standard deviation
	Checking for Outliers Using Quartiles	1. Determine the 1st and 3rd quartiles of the data. 2. Compute IQR 3. Determine the "fencers" - serve as cut off points for determining outliers. Lower Fence: Q1 - 1.5(IQR) Upper Fence: Q3 + 1.5(IQR) 4. If a data value is less than the lower fence or greater than the upper fence, it's considered an outlier.
	Five Number Summary	Min Q1 M(Q2) Q3 Max resistant to extreme values. Measures the spread of data by determining the difference between the 25% & 75%.
	Boxplot	1. Determine the lower and upper fences. IQR = Q3 - Q1 Lower fence: Q1 - 1.5(IQR) Upper fence: Q3 + 1.5(IQR) 2. Draw vertical lines at Q1, M and Q3. Enclose these lines in a box. 3. Label the upper and lower fences 4. Draw a line from Q1 to the smallest data value that's larger than the lower fence. Draw a line from Q3 to the largest data value that is smaller than the upper fence. (whiskers) 5. Any data values < lower fence or > upper fence are outliers and marked with an asterisk.
	Dispersion	The degree to which the data are spread out (describes a distribution)
	Range	(R) of a variable is the difference between the largest data value and the smallest data value. * Uses only 2 values from data set * Affected by extreme values - not resistant
	Deviation about the Mean	Variance on how far, on average, each observation is from the mean. The further the observation is from the mean, the larger the absolute value of deviation *Sum of all deviations about the mean must = 0
	Population Variance	The sum of the squared deviations about the population mean divided by the number of observations in the population, (N). *Its the Mean of the squared deviations about the population mean. Pop Var: σ₂ Formula: σ₂= ( Σ(Xi - μ )₂/N Note: ΣXi₂ = Square each observation then sum squared values. (ΣXi)₂ = Sum all obs, then square sum.
	Biased	When a statistic consistently overestimates or underestimates the population variance.
	Sample Variance	**Always remember to use n-1 in denominator to keep from underestimating the variance (n - 1) = smaller number = larger variances more equal to actual.
	Degrees of Freedom	n - 1; because the first n - 1 observations have freedom to be whatever value. *We have n - 1 degrees of freedom in the computation of s₂ because an unknown parameter, μ, is estimated with ㄡ. For each parameter estimated, we lose 1 degree of freedom.
	Standard Deviation	Used in conjunction with the mean to numerically describe distributions that are bell-shaped and symmetric. The mean measures the center, the St Dev measures the spread *Loosely described as the typical deviation from the mean. The larger the st dev, the more dispersed the distribution. (Units of measure must be the same!)
	Empirical Rule	If a distribution is roughly bell shaped: * Approximately 68% of the data will lie within 1 standard deviation of the mean. Meaning approx 68% of data lie between μ - 1σ and μ + 1σ. Approx 95% of the data will lie with in 2 st dev of the mean; or, μ - 2σ and μ + 2σ * Approx 99.7% of the data will lie within 3 st dev of the mean; or μ - 3σ and μ + 3σ *Can also use this rule based on sample data with ㄡused in place of μ and s used in place of σ
	Chebyshev's Inequality	Obtain regardless of shape (skew, sym) for any data set, regardless of the shape of distribution at least (1 - (1/K2))100% of the observations will lie within K standard deviations of the mean, where K is any number > 1; Meaning: at least (1 - (1/K2))100% of the data will lie between μ - kσ and μ + kσ for k>1 *Can be used for simple data too.
	Correlation Coefficient	A measure of the strength and direction of the linear relation between 2 quantitative variables ϼ = Population, r = sample
	Linear Correlation Coefficient Properties	1. Always between -1 & 1 inclusive: -1 ≤ r ≤ 1 2. If r = +1, then a perfect positive linear relation exists between the two variables 3. If r = -1, then a perfect negative linear relation exists between the two variables 4. The closer r is to +1, the stronger is the evidence of positive association 5. The closer r is to -14, the stronger is the evidence of negative association. 6. If r is close to zero (0), little to no evidence exists of a "linear" relation - does not imply no relation, just no linear relation. 7. The linear correlation coefficient is a unitless measure of association 8. The correlation is not resistant.
	Confounding	Any relation that may exist between two variables may be due to some other variable not accounted for.
	Good Fit	The line drawn appears to describe the relation between the two variables well. *Use Point/Slope equation for find the equation of the line: M = Y2 - Y1 / X2 - X1 = slope
	Residual	The difference between the observed value of y and the predicted value of y = the "error" or residual. (The difference between data set and line from Point/Slope formula
	Least-Squares Regression Line	The line that minimizes the sum of the squared errors (or residuals). It's the line that minimizes the sum of the squared distance between the observed values of y and those predicted by the line Ῠ (y-hat) Minimize Σ residuals ₂
	Interpretation of Y Intercept	First: 2 Questions: 1. Is 0 a reasonable value for explanatory variable? 2. Do any observations near X = 0 exist? If answer is no to either question, no interpretation of Y intercept is given. Second: Should not use regression model to make predictions "Outside the Scope" of the model - values of the explanatory variable that are much larger or much smaller than those observed. * Cannot be certain of the behavior of data for which we have no observations.
	Predictions - No Linear Relation	If the linear correlation coefficient indicates no linear relation between explanatory and response variables, then use the mean value of the response variable as the predicted value.
	Coefficient of Determination	Measures the proportion of total variances in the response variable that is explained by the least squares regression line. Its a number between 0 & 1 - inclusive. 0 ≤ R₂ ≤ 1 If R₂ = 1, the least squares regression line explains 100% of the variation in the response variable: R₂ = r₂
	Unexplained Deviation	The difference between the observed value of the response variable (y) and the predicted value of the response variable Ῠ (y-hat): y - Ῠ (y-hat)
	Explained Deviation	The deviation between the predicted value of the response variable Ῠ (y-hat) and the mean value of the response variable
	Total Deviation	The deviation between observed value of response variable y and the mean value of the response variable y is called total deviation.
	Measure of Central Tendency	Numerically describes the average or typical data value: Mean, Median or Mode

Share This Flashcard Set

Set the Language

Fundamentals Of Statistics

Add to Folders

Upgrade to Cram Premium

Card Range To Study

175 Cards in this Set