Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
170 Cards in this Set
- Front
- Back
What is statistics?
|
-the practice of analyzing numerical data
-a set of math procedures to organize, summarize, and interpret -telling a story with data |
|
Why is statistics important?
|
-doing and understanding research
-criticize -replicate |
|
What is a population?
|
-the set of all the individuals of interest in a particular study (ie Canadians, women, preteens etc)
|
|
Does a population have to be made up of people?
|
no, can be rats, corporations, parts made in a factory, etc
|
|
What is a sample?
|
a set of individuals selected from a population , usually to represent the population in a research study
|
|
What is the relationship between a sample and a population?
|
-the sample is selected from the population, sample participates in study, results are generalized to the population
|
|
What two general categories can statistical procedures be classified into?
|
-descriptive and inferential
|
|
what are descriptive statistics?
|
-procedures that summarize, organize, and simplify data
-results in tables, graphs, or single numbers that consolidate a large amount of information (ie average) |
|
What is the difference between a parameter and a statistic?
|
-a parameter is a characteristic that describes a POPULATION (ie population average)
-a statistic is a characteristic that describes a SAMPLE |
|
What are inferential statistics?
|
-techniques that use sample data to make generalizations about populations
|
|
we use sample statistics to infer what the ______________________ is likely to be.
|
population parameter
|
|
What is sampling error?
|
-the discrepancy (difference, amount of error) between a sample statistic and its population parameter
-sample statistics vary from one sample to another and typically are different from the corresponding population parameters |
|
What is a common example of sampling error?
|
-sample proportion (ie voting, female/male ratio etc)
|
|
What is sampling error caused by?
|
-chance
-sampling bias |
|
What are the steps to using statistics in research?
|
1: Experiment (ie Group A coffee, Group B water)
2: Descriptive Statistics (results: A=4hrs, B=2hrs) 3: Inferential Statistics (sample data show a 2 hr difference) |
|
we use ___________ statistics to decide whether results are a real difference, or due to sampling error
|
-inferential
|
|
What is a variable?
|
a characteristic or condition that changes or has different values for different individuals
|
|
What are the two main kinds of variables?
|
-characteristics that differ from person to person
-environmental conditions that change |
|
What is a data set?
|
the complete set of data/scores
|
|
What is a datum? More commonly called?
|
-the measurement or observation obtained for each individual
-AKA a score or raw score |
|
Most research is intended to examine the ____________________.
|
relationship between variables
|
|
What is a constant? AKA
|
-a characteristic or condition that does not change across individuals
-AKA controlled variable |
|
What to quantitative and qualitative variables measure differences in?
|
quantitative = amount
qualitative = type |
|
Quantitative variables _______ event, while qualitative variables ______ event.
|
quantitative = quantify
qualitative = categorize |
|
What are constructs? aka?
|
-internal attributes or characteristics (variables) that cannot be directly observed, but are useful for describing and explaining behaviour
-aka “hypothetical constructs” |
|
How can we measure a construct?
|
-by observing and measuring behaviours that are representative of the construct
-the external behaviours can be used to create an operational definition of the construct |
|
What are the two components of an operational definition?
|
-describes a set of operations for measuring a construct
-defines the construct in terms of the resulting measurements |
|
What are “real limits” for continuous variables?
|
-the boundaries of intervals for scores that are represented on a continuous number line
-positioned exactly halfway between adjacent scores |
|
What is one key to determining whether a variable is continuous or discrete?
|
-continuous variables can be divided into any number of fractional parts
-whenever you are free to choose the degree of precision or the number of categories for measuring a variable, it must be continuous |
|
What is an interval width?
|
upper real limit – lower real limit
|
|
What are the four scales of measurement, from simplest to most sophisticated?
|
nominal, ordinal, interval, and ratio
|
|
What is the nominal scale of measurement?
|
-categories with different names (nominal = name)
-qualitative, not quantitative differences between categories -no information on direction or size of difference between categories |
|
What is the ordinal scale of measurement?
|
-named categories in an ordered sequence, ranked by size or magnitude
-know order (rank) of objects, but no info about size of intervals between ranks -ie gold vs silver, we know one is better than the other, but not by how much |
|
What is the interval scale of measurement?
|
-ordered categories with intervals of equal size
-no rational/absolute zero point (if you can have negative numbers, it’s an interval scale ie celcius) -any time you are measuring a construct |
|
Can a construct be measured using a ratio scale?
|
NO!
|
|
What is the ratio scale of measurement?
|
-ordered categories with intervals of equal size and rational (absolute) zero point
-basically interval scale + zero point (and not a construct) -can form meaningful ratios |
|
What are the two distinct data structures used to classify different research methods and statistical techniques?
|
-two different variables in same individual: correlational method
-two (or more) groups of scores: experimental and non-experimental methods |
|
Can we use the correlational method to produce non-numerical scores? How are they evaluated?
|
yes, chi-square test
|
|
What is the correlational method of observing variables?
|
-observe two variables to see whether there is a relationship between them
-looking for things that change together |
|
What is the correlational method incapable of demonstrating?
|
-an explanation of the relationship
-a cause-and-effect relationship |
|
What do the experimental and non-experimental methods compare?
|
-two (or more) groups of scores to find differences between the groups
-one variable defines the groups, the other variable is measured to create scores for each group |
|
What is the goal of the experimental method?
|
-to determine whether there is a cause and effect relationship between variables
-to show that changing the value of one variable will cause changes to occur in the second variable -ie whether manipulating one variable causes another variable to change |
|
What is the difference between the experimental and non-experimental methods?
|
-similar: both study two or more groups of scores to find differences between the scores BUT:
-experimental method includes manipulation and control -non-experimental methods consist of nonequivalent groups and pre-post studies |
|
What two characteristics differentiate experiments from other types of research studies?
|
-manipulation: of independent variable (dependent variable is measured to determine whether manipulation causes changes to occur)
-control: over research situation (to avoid confounding variables) |
|
What is an independent variable?
|
-the variable being manipulated by the experimenter to create conditions or groups
-the treatment conditions to which subjects are exposed -consists of antecedent conditions that were manipulated PRIOR to observing the dependent variable |
|
What is a dependent variable?
|
-the variable being measured or observed (across all conditions or groups) to assess effect of treatment
|
|
Lack of control leads to ____________.
|
confounding variable, confound
|
|
What is a confounding variable?
|
-an uncontrolled variable that is unintentionally allowed to vary systematically with the independent variable
|
|
What is confound?
|
-being unable to tell whether changes in DV are due to IV or to confounding variable
|
|
What are the two types of variables we need to consider for possible confounds?
|
-participant variables: characteristics that are specific to an individual (ie age, intelligence, gender)
-environmental variables: characteristics of the environment (ie temperature, lighting, etc) |
|
What are the three basic techniques used to control other variables (possible confounding)?
|
-random assignment: each participant has equal chance of being assigned to each group
-matching: to ensure equivalent groups or environments (ie equal IQ distribution) -holding the variables constant (ie equal age for everyone) |
|
What is the control condition?
|
-participants in the control condition do not receive the experimental treatment
-may receive no treatment, placebo, or be placed on a waiting list for treatment -provides a baseline for comparison with the experimental condition |
|
When do we use the non-experimental method?
|
-when we want to study the relationship between variables by comparing groups of scores BUT we cannot randomly assign participants to groups (lack of control) or manipulate the independent variable
|
|
What are two common examples of non-experimental methods?
|
non-equivalent groups and pre-post studies
|
|
What are non-equivalent group studies?
|
-non-experimental method comparing preexisting groups
-participants cannot be randomly assigned to groups because the grouping variable is a participant variable (ie gender, age, intelligence) |
|
What are pre-post studies?
|
-non-experimental method comparing before and after scores
-in this case, time is the independent variable: experimenter has no control over the passage of time (or any variable that changes with time) |
|
What is the difference between a pre-post study and a correlational study?
|
-they are similar in that both designs measure two scores for the SAME individual, however:
-correlational: the two scores correspond to two DIFFERENT variables (ie sleep time VS GPA) -pre-post design: two scores correspond to SAME variable twice under two different conditions at different times |
|
What is a quasi-independent variable? Why is it not a TRUE independent variable?
|
-the “independent variable” that is used to create the different groups of scores in a nonexperimental study
-not a true independent variable because it is not manipulated |
|
What do Σ and ΣX mean?
|
Σ = summation, “the sum of” (sigma)
ΣX = “sum of the scores” (add all the scores for variable X) |
|
What are the rules of operation for the math in statistical notation?
|
-brackets
-exponents -division/multiplication -***Σ (summation) using Σ notation*** -any other addition/subtraction |
|
What is the purpose of frequency distributions in organizing raw data?
|
-provides an overview of the entire group of scores, making it easy to present an entire set of scores
|
|
Descriptive statistics are a way of __________ and ___________ data.
|
summarizing and organizing
|
|
What are the three ways we can summarize/organize data?
|
- tabular representation of data (frequency distributions)
- graphical representation of data (histograms, bar graphs, polygons) -numerical representations of data (central tendency, ) |
|
What is relative frequency? How do we calculate it? aka
|
-aka proportion (p)
-what proportion (fraction) of the overall group is associated with each score -proportion = p = f/n |
|
What are the groups in a grouped frequency distribution called?
|
-class intervals
|
|
What does a relative frequency distribution table look like?
|
X f p=f/n %=p(100)
|
|
What are apparent limits?
|
-the limits that seem to denote the intervals in class intervals (in a group frequency distribution)
-they are called “apparent” because a interval of 60-64 would actually have real limits of 59.5-64.5 etc |
|
What are the steps to creating a group frequency distribution?
|
1. Find the range of scores (largest - smallest + 1)
2. Determine the class intervals --use about 10 groups or class intervals --the width of a group or class interval should be a fairly simple number --the bottom score of each class interval should be a multiple of the width --all intervals should be the same width and cover the range of scores with no gaps 3. Create the grouped frequency table |
|
What is the difference between relative and cumulative frequency?
|
-relative is proportion of each individual score, cumulative is added up (percentile ranks)
|
|
Relative frequency indicates the ______________associated with each X value.
|
proportion or percentage
|
|
Percentile rank refers to a ___________ and percentile refers to a _______.
|
-percentage (ie percentile rank of 60%)
-score (ie 60th percentile) |
|
What assumption is interpolation based on? Significance?
|
-there is a constant rate of change from one end of the interval to the other
-values calculated are only estimates |
|
When constructing a frequency distribution table, how should the height of the Y axis compare to the width of the X axis?
|
-the height should be approximately 2/3 to 3/4 of the length
|
|
What graphs would be used for interval or ratio data?
|
-histograms or polygons
|
|
What are histograms?
|
-use the frequency distribution in a simple or grouped frequency distribution table to form a graph
x-axis: extends to the real limits of the category (bars touch) y-axis: frequency for each value |
|
How do we draw a histogram for data that has been grouped into class intervals?
|
-width of the bar extends to the real limits of the interval
|
|
What is a modified histogram?
|
-drawing a stack of blocks above each score, each block representing one individual (blocks correspond to frequency)
-no need for a vertical line (y axis) showing frequencies -provides a sketch, is not a substitute for an accurately drawn histogram |
|
What is the difference between a histogram and a bar graph?
|
-histogram = interval or ratio data
-bar = nominal and ordinal data -bar graphs are essentially the same as a histogram, except that spaces are left between bars |
|
Why do we use a bar graph instead of a histogram for nominal and ordinal data?
|
nominal scale: the space emphasises that the categories are separate and distinct
ordinal: separate bars are needed because you cannot assume that the categories are all the same size |
|
How do you make a distribution polygon?
|
-dots placed at
--exact value for single digit class intervals --midpoint for grouped class intervals (but don’t show grouped intervals on the axis!) -connect the dots -connect the endpoints of the graph to the x axis at the next interval |
|
What are possible graphs to use for quantitative data? Qualitative?
|
-quantitative: histogram, frequency polygon
- qualitative: bar chart, pie chart |
|
What three characteristics describe any frequency distribution?
|
-shape, central tendency, variability
|
|
Describe the three main classifications of distribution shape. (image)
|
|
|
What are the sample and population symbols for Mean?
|
|
|
What are the sample and population symbols for Variance?
|
|
|
What are the sample and population symbols for Standard Deviation?
|
|
|
When do skewed distributions often occur?
|
-negatively skewed: may reflect a ceiling effect (can’t score any higher ie quiz out of 10)
-positively skewed: floor effect (can’t score any lower) |
|
What is the purpose of measuring central tendency?
|
- to identify the “average” or “typical” individual by finding the single score that is most typical or most representative of the entire group
-we can use averages from two different groups to describe them and to measure the differences between them |
|
What are the 3 methods for determining central tendency?
|
-mean, median, mode
|
|
What is the mean?
|
-the sum of all scores divided by the number of scores
-known as the “arithmetic average” -population: represented by greek letter “mu” u -sample: “X bar” (old way) or M |
|
Why is the mean known as the “balance point” of a distribution?
|
-the sum of the negative deviations from the mean EXACTLY equals the sum of positive deviations from the mean
|
|
What are the formulas for the mean?
|
|
|
What is the formula for weighted mean?
|
|
|
What is a weighted mean?
|
- overall group mean (for more than one group)
-the combined sum divided by the combined n --ie if 5 people average 50%, 10 people average 60%.. weighted mean is (250+600)/(5+10) |
|
How would adding, subtracting, multiplying, or dividing each score by a constant change the mean?
|
-adding (or subtracting) a constant from each score changes the mean by THAT constant
-multiplying or dividing by a constant changes the mean in the SAME way |
|
What is the mode?
|
-the category or score that has the greatest frequency
-the mode is the NAME of the category, not the frequency (even if the name is a number) -can have more than one mode: 2 is bimodal, 3+ is multimodal |
|
Can a distribution have no mode?
|
-yes, if either all the scores are the same or sometimes if there are several scores with equally high frequencies
|
|
Can a distribution have two modes that don’t have identical frequencies?
|
-occasionally, if there are two separate and distinct groups (ie up down up)
-taller peak = major mode, shorter peak = minor mode |
|
What is the median?
|
-the score that divides the distribution in half so that 50% of the individuals in a distribution have scores at or below the median
-equivalent to the 50th percentile |
|
What does the median tell us about the shape of the distribution?
|
-NOTHING
|
|
How can mean and median both measure the “middle” of the distribution?
|
-median defines middle in terms of scores (same number of scores below as above)
-mean defines middle in terms of distance (same total distance below as above) |
|
What measures of central tendency can you find for nominal scales?
|
mode
|
|
What measures of central tendency can you find for ordinal scales?
|
mode and median (NOT mean)
|
|
What measures of central tendency can you find for interval scales?
|
all three
|
|
What measures of central tendency can you find for ratio scales?
|
all three
|
|
In a normally shaped distribution (perfectly symmetrical) what measures of central tendency will always be the same?
|
-all three: mean, median and mode
|
|
In a symmetrical distribution, what measures of central tendency will always be the same?
|
-mean and median
|
|
What does a normally shaped distribution look like?
|
|
|
Where would the mean, median, and mode fall (relative to each other) in a skewed distribution?
|
|
|
What measure of central tendency is most affected by skew?
|
-mean, because it accounts for ALL scores
|
|
in a skewed distribution, the mean is ______________.
|
-pulled toward the tail
|
|
What is an outlier?
|
-an extreme score
-lies apart from most of the other distribution -several outliers in a distribution gives a skewed shape -outliers pull some central tendency measures with them |
|
What is usually the preferred measure of central tendency (according to text)? Why?
|
-mean
-uses every score in the distribution (typically gives good representative value) -also closely related to variance and standard deviation |
|
When do we use a mean? (4 points)
|
-you have interval or ratio data
-the distribution is normally shaped -there is no missing data -the distribution is closed-ended (ie no “3 or more” values) |
|
In what situations does the median serve as a valuable alternative to the mean?
|
-extreme scores or skewed distributions (mean can be distorted/displaced)
-undetermined values, ie missing data (mean impossible to determine) -open ended distributions (mean impossible to determine) -ordinal scale (mean not accurate/appropriate because ordinal scales don’t measure distance) |
|
In what situations does the mode serve as a valuable alternative to the mean?
|
-nominal scale (mean and median impossible to determine)
-discrete variables (mean is possible, but seems unrealistic if fractional) -describing shape (easy way to help visualize when given along with median or mode) |
|
What is variability?
|
-measure of the degree to which scores are spread out or clustered together
|
|
What two purposes does a good measure of variability serve?
|
1. Describes the distribution: clustered vs spread out
2. Measures how well an individual score (or group of scores) represents the entire distribution (read: how much error to expect if you are using a sample to represent a population) |
|
What are the three measures of variability (that we talk about)?
|
-range, variance, and standard deviation
|
|
What is range (in words)?
|
-measure of variability
-the distance between the smallest and largest observations (using RLs if continuous) |
|
What kinds of variables can range be calculated for?
|
-interval and ratio (because the concept of distance applies to them)
|
|
What is variance (in words)?
|
-a measure of variability
-the average of all squared deviations (distances) from the mean |
|
what is standard deviation (in words)?
|
-measure of variability
-rough measure of the average amount by which observations deviate from the mean -square root of the average squared deviation (undoes the squaring) |
|
What is the goal of variance and standard deviation?
|
-to measure the typical distance that scores are from the mean
|
|
hat do we calculate the range value of variability?
|
continuous: range = URL Xmax – LRLXmin
discrete: count the categories (or largest – smallest + 1) |
|
The deviation is the ________ and _________ from the mean.
|
distance (#), direction ( + or -)
|
|
What are the steps to finding the standard deviation, in words? (by definition, NOT actual calculation)
|
1. Find the deviation for each individual score
2. Square each deviation score 3. Add the squared deviation scores (SS = sum of squares) 4. Divide sum of squares by number of scores (variance) 5. Find the square root of the variance |
|
Why do we need to square each deviation score to find standard deviation?
|
-we want a single number that represents the variability in the distribution, but if we add up all the deviations from the mean, the sum (and thus the mean deviation) would always equal 0
-squaring each deviation score gets rid of the + and – signs, which cancel each other out -result is the mean SQUARED deviation (aka variance) |
|
if we sum the deviations from the mean for a given distribution, the total will always ______.
|
equal zero
|
|
What is Sum of Squares?
|
-SS
-the sum of the squared deviation scores |
|
What is the definitional formula of the sum of squares?
|
-direct, but can become complicated if scores aren’t whole numbers
-replace u with M for sample |
|
What is the computational formula for the sum of squares?
|
-works with the scores (and not deviations from the mean), so it reduces complications of decimals
-replace N with n for sample |
|
What is the variance formula for a population?
|
|
|
What is the variance formula for a sample?
|
|
|
Why is a sample a biased estimate of a population’s variability? How do we adjust for this bias?
|
-because any sample will be less variable than its parent population
- adjust for this by using n-1 instead of n when calculating sample variance and standard deviation -the effect of this adjustment is to increase the value obtained |
|
Why do we use 1 (and not another number) when adjusting for the bias of the sample’s variability compared to the population?
|
-because a sample of n scores has a “degree of freedom” of 1 for the sample variance, which means that while the rest of the scores are independent and free to vary, ONE score is restricted
-for example, if a sample has 3 scores and we know the mean, two of the scores can be ANY number, but the last number must be the exact number that will result in the correct mean |
|
How does standard deviation describe variability?
|
-measuring the distance from the mean, accounting for the ends of the distribution and clustering of scores
|
|
What is the standard deviation formula for a population?
|
|
|
What is the standard deviation formula for a sample?
|
|
|
What is the standard deviation formula for a sample?
|
|
|
In a graphic representation of the mean and standard deviation, where should the standard deviation line be?
|
-extend approximately halfway from the mean to the most extreme score
|
|
What makes a sample statistic biased or unbiased?
|
-biased: if it tends to under or overestimate the corresponding population parameter
-unbiased: if the average value of the statistic is equal to the population parameter |
|
Is sample mean a biased or unbiased statistic?
|
-unbiased (best estimate of the population mean)
|
|
Is sample variance a biased or unbiased statistic?
|
-unbiased (best estimate of the population variance) ***IF n-1 IS USED***
|
|
Is sample standard deviation a biased or unbiased statistic?
|
-biased (tends to UNDERESTIMATE the population standard deviation)
|
|
What are the most common values used to describe a set of data?
|
-mean and standard deviation
|
|
As a rule of thumb, what percentage of scores will be within 1 standard deviation of the mean? 2 standard deviations?
|
-1: roughly 70%
-2: roughly 95% |
|
What happens to standard deviation when you add a constant to each score? Explain.
|
-does not change, there is still the same distance between scores
|
|
What happens to standard deviation when you multiply or divide each score by a constant?
|
-changes SD in the same way
|
|
Can a sum of squares be negative?
|
No!
|
|
What two considerations determine the value of any statistical measurement?
|
1. The measure should provide a stable and reliable description of the scores (not be greatly affected by minor details in the set of data)
2. The measure should have a consistent and predictable relationship with other statistical measurements |
|
What factors affect variability?
|
1. Extreme scores
-all measures are affected, but particularly range 2. Sample size -range is directly affected, SD and variance are relatively unaffected 3. Stability under sampling -SD and variance are said to be stable under sampling, range is unstable 4.Open-ended distributions -cannot compute range, SD or variance.. can only use semi-interquartile range (which we don’t discuss!) |
|
What is the process of standardizing distributions, in simple terms?
|
-takes different distributions and makes them equivalent (comparable)
|
|
What are the benefits of using the mean and standard deviation together?
|
-efficient way to describe a distribution with just two numbers
-allows a direct comparison between distributions that are on the same scales |
|
The relative position of an X value within the distribution of scores depends on the distribution’s ____ and ____________.
|
mean, standard deviation
|
|
What are raw scores?
|
-the scores that are a direct result of measurement
-they are in the original units of measurement |
|
What is the purpose of a z-score?
|
-to identify and describe the exact location of every score in a distribution
-to standardize a distribution (ie lots of tests for IQ, but mean IQ is always 100) --allows direct comparison with other distributions with z-transformed scores |
|
What is a Z score?
|
-combines a score, the mean, and the SD into a single number that precisely describes its location relative to the other scores in the distribution
-tells us whether a score is above or below the mean, and by how many standard deviations |
|
z-scores are measured in units of ___________.
|
standard deviation
|
|
What is the z score formula?
|
|
|
What is the formula for finding X from a z score? (ie when you know mean, z-score, and SD)
|
|
|
Why does distribution shape stay the same after we standardize (transform to z-scores)?
|
-to find z-scores, we subtract by a constant and divide by a constant
-scores don’t change position, just change units (original become SD) |
|
If we transform every X value in a distribution (raw scores) into a z-score, what happens to the shape, mean, and standard deviation?
|
-shape: no change
-mean: becomes 0 -standard deviation and variance: become 1 |
|
Why is the mean of any distribution of Z scores zero?
|
-any score equal to the mean is 0 standard deviations away from the mean
-the sum of all positive Zs must equal the sum of all negative Zs |
|
The standard deviation of any distribution expressed in z-scores is always ____, which means that the variance is always ____ because...
|
both 1, because the square root of 1 is 1, 1 squared is 1
|
|
What is a “standardized distribution”? Used for?
|
- distribution composed of raw scores that have been transformed to create predetermined (known) values for mean and SD
-used to compare distributions across different measures |
|
-A z-score distribution is an example of a ____________ distribution.
|
-standardized distribution (mean = 0, SD = 1)
|
|
What is the z-score distribution also known as? Why?
|
- the standard normal distribution
-because it is a normal distribution expressed in standard (Z) scores (mean = 0, SD = 1) |
|
How do we transform z-scores to a distribution of our choice?
|
1. transform your raw scores into z scores
2. Transform z scores into the new distribution using the formula Xnew = μ(new) + zσ(new) |
|
What are representative and extreme z scores, according to the textbook?
|
-representative/central = close to the mean (within one SD?)
-extreme = 2 or more SD’s away from the mean |