Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
351 Cards in this Set
- Front
- Back
research steps |
1.) Identify study question 2.) Select study approach 3.) Design study & collect data 4.) Analyze data 5.) Report findings |
|
Most research projects require only the use of |
descriptive and perhaps some comparative statistics |
|
4 Types of statistics |
1.) Data management 2.) Descriptive statistics 3.) Comparative statistics 4.) Advanced health statistics |
|
Data Management |
Refers to the entire process of record keeping, whether tracking articles considered for eligibility in a systematic review, extracting data from patient charts for a case series, entering the responses to a cross-sectional or case-control survey, or recording all the results of clinical assessment conducted during a longitudinal cohort or experimental study. After data are entered, the files need to be cleaned and perhaps recoded before beginning statistical analysis. |
|
Codebook |
describes each variable and specifies how the collected info will be entered into a computer database, this should be done prior to beginning data entry |
|
For quantitative surveys numeric or alphabetical codes can be assigned to the options for |
close-ended answers provided on the questionnaire because it provides clear instructions for how to code and enter free-response comments |
|
For open-ended questions and qualitative surveys |
a code book is even more essential because it provides clear instructions for how to code and enter free-response commentsl |
|
In addition to providing specific info about how each piece of info should be entered into the computer file, the code book should specify: |
1.) The name of each variable (which usually employs only capital letters or a combination of capital letters and numbers, and avoids starting with a symbol, such as an underscore) 2.) The wording of the question that was asked 3.) The variable type 4.) The options listed on the survey as possible answers to the question 5.) The way answers should be entered into the computer database 6.) What to do with missing numbers |
|
The code book is also the place to describe how |
anticipated data problems will be handled |
|
The code book will also specify for each variable whether |
missing answers should be left blank in the database, indicated with a numeric code (such as entering a 9 if the expected entry code is 0 or 1 for a dichotomous variable), or marked with the word "MISSING". |
|
Data are usually entered into a |
database program (like Microsoft Access), one of the benefits of these programs is that they can be designed to be visually appealing and to include pre-approved responses to questions and automatic skips between questions, this ensures the consistency of entries and the completeness of the file |
|
Spreadsheet program |
Is an alternative option to enter data directly into (like Microsoft Excel), variable names should be entered in the first row, with one variable per column, each individual's data should be in a new row. The advantage of this entry approach is that it does not require creating a data entry form, defining fields and variable names, and doing other coding and testing of the data entry system. The disadvantage is that it is easy to input inconsistent codes, which makes cleaning the data much more difficult, or to accidentally enter new data over an existing row of data. |
|
Double-entry |
Consists of two individuals entering the same data (or the same person entering the data twice) into two different computer files, then comparing the records in the two files for agreement. |
|
A special software program for checking double-entry of data is called |
Data Compare utility that is part of the U.S. CDC's free Epi Info program allow the individual in two files to be linked by a unique ID number or other variable and compared. These programs usually provide statistics about the agreement level. If the agreement is not extremely high, then the means that the double-entry and comparison of all records is probably required to ensure the accuracy of the final data file. |
|
File comparison programs usually facilitate the creation of a |
clean final data file, they identify disputed entries and allow the researcher to select the best response for the final clean data file after consulting the original survey forms |
|
Data cleaning |
is the process of correcting any typographical or other errors in data files and should also require that duplicate entries are removed from the data base and that the records are complete, with all data from all participants entered into the database |
|
Recoding |
of variables into new categories can be done either prior to or during data analysis, recoding prior to intense analysis is often the easiest approach when the intended new categories are known |
|
Never do any recoding until |
an original version of the cleaned data file is safely backed up elsewhere |
|
Never recode into the same |
variable (that is do not replace the original values with the new recoded values) |
|
Always recode into a |
different (new) variable |
|
One way to maintain confidentiality is to |
safely store paper records, including signed informed consent statements, in a locked and secure room |
|
Another way to maintain confidentiality is to |
destroy individually identifying info once the records are no longer needed (such as after the data have been entered into a computer file and the files have been thoroughly cleaned) and a research ethics committee has approved the secure disposal of consent statements and other documents. |
|
Another way to protect confidentiality is to |
Create secure computerized data files. In general, no individually identifying info (such as name or national identity card number) should be included in an electronic file |
|
If there is a need to link records to individuals |
and there is often no need to do this - then the records should be linked to identifying info by a unique study identification number, the file should be stored in a separate and secure place, not on the same computer as the other participant data |
|
Descriptive statistics |
are used to describe the basic characteristics of study populations and other data sources |
|
Statistics |
when employed properly and accurately, statistics provide essential and useful info for making sense of health research data |
|
For most papers, and especially those written by researchers with limited experience in advanced statistics, the goal of statistics |
should be to use the simplest statistics possible to make the results of the study clear |
|
Studies with no comparison groups like |
case series and cross-sectional surveys may need only univariate analysis; simple statistics like counts (frequencies), proportions, and averages, are likely to provide an adequate description of the study population |
|
For studies that compare two or more populations |
including case-control, cohort, and experimental studies - the description of the study population must be completed before moving on to bivariate analysis, such as the calculation of ratios, odds ratios, and other comparative statistical tests |
|
advanced statistical analysis that examines three or more variables at one time is |
rarely required |
|
variable |
is a characteristic that can be assigned more than one value, examples of variables that could be examined during a population health study are age, sex, annual income, languages spoken at home, frequency of alcohol ingestion, history of chicken pox, and use of contact lenses |
|
The value of a variable does not have to |
vary (change) over time, but the response among individuals within a population should be something that might differ |
|
In many statistical and database programs, responses from individual participants are displayed in |
rows with each column representing one variable |
|
ratio variables |
have a numeric response plotted on a scale on which a value of zero stands for nothing; for example, if height is measured in feet, a measurement of 0 feet tall means that there was no height; as a result, the ratio of heights is meaningful; a person who is 6 feet tall is twice as tall as a person who is 3 feet tall, yielding a ratio of 2 to 1 |
|
interval variables |
are also numeric, but they are plotted on a scale on which zero does not stand for "nothing"; an outside temp of 0 degrees C does not mean that there is no heat; if the weather turns colder, the temp may fall to -10 degrees C or lower; a day with a high temp of 40 degrees C is not twice as hot as a day with a maximum temp of 20 degrees C |
|
ordinal variables |
or ranked variables, order responses from first to last or from best to worst or from most favorable to least favorable; the rank order can be assigned a number; for example, the responses to a survey that asks participants to indicate their level of agreement with a statement can be coded with agree as "3", neutral as "2", and disagree as "1"; no matter what the scale is, the order of the responses is indicated by their numeric value |
|
nominal variables |
or categorical variables, have categorical responses with no inherent rank or order; for example, there is no obvious way to numerically rank participants' favorite recreational sports activities or blood types; binomial variables are a subtype of categorical variables with only two possible answers, usually yes or no |
|
continuous variables |
can take on any value within a range; for example, although height is often rounded to the nearest inch when it is measured, a person's height could actually be 64 1/2 inches or 73 3/4 inches or 58.1528 inches; ratio and interval variables can be further classified as either continuous variables or discrete variables |
|
discrete variables |
typically result from counting something, so there are gaps between acceptable values; for example, a family can own 2 egg-laying chickens or 17 chickens, but cannot own 2 1/2 chickens or 5 1/4 chickens; ratio and interval variables can be further classified as either continuous variables or discrete variables |
|
5 Types of variables |
1.) ratio variables 2.) interval variables 3.) ordinal variables or ranked variables 4.) nominal variables or categorical variables 5.) binomial variables |
|
Binomial variables are a subtype of |
categorical variables |
|
Ratio and interval variables can be further classified as either |
continuous variables or discrete variables |
|
variable type: ratio |
definition: numbers on a scale that has a meaningful zero
examples: blood pressure, height, weight (if the weight increases from 10 kg to 20 kg, the weight has doubled; so the ratio of 20 kg to 10 kg is meaningful) |
|
variable type: interval |
definition: numbers on a scale that does not have a meaningful zero
examples: temp (degree F or degree C) (The temp does not double if it increases from 20 degrees to 40 degrees because 0 degrees does not represent the absence of all heat) |
|
variable type: ordinal/ranked |
definition: an ordered series that assigns a rank to responses (from first to last in the series), but for which the numbers assigned to the values are not meaningful
examples: highest educational degree earned, scales for never (1) to always (5), scales for strongly disagree (1) to strongly agree (5) |
|
variable type: nominal/categorical |
definition: categories with no inherent rank or order
examples: employment category, blood type |
|
variable type: binominal |
definition: nominal variables for which only two responses are possible
examples: yes/no, male/female, case/control |
|
case series |
describe the study population (univariate analysis) |
|
cross-sectional survey |
describe the study population (univariate analysis) and sometimes compare groups (bivariate analysis) |
|
case-control study |
describe the study population (univariate analysis) and compare groups (bivariate analysis) and sometimes regression and other advanced analysis (multivariate analysis) |
|
cohort study |
describe the study population (univariate analysis) and compare groups (bivariate analysis) and sometimes regression and other advanced analysis (multivariate analysis) |
|
experimental study |
describe the study population (univariate analysis) and compare groups (bivariate analysis) and sometimes regression and other advanced analysis (multivariate analysis) |
|
descriptive statistics |
are often used to describe the average response to a variable in a population (for numerical variables, the average is often referred to as the central tendency |
|
for numerical variables, the average is often referred to as this |
central tendency |
|
3 ways to report the average |
1.) mean 2.) median 3.) mode |
|
mean |
is calculated by adding up the values of all responses provided to a question and dividing that sum by the total number of individuals who answered the question |
|
median |
is the middle number when all responses are put in order from least to greatest, half of the responses in a data set will be greater than the median, and half will be less |
|
mode |
is the most common answer given by respondents |
|
for ratio and interval variables, the central tendency can be described using |
means, medians, and mode |
|
for ordinal variables |
a median or mode can be reported |
|
a mode can be reported for |
categorical variables |
|
means and medians provide info about the |
center of a data set, but they do not provide info about how much variability exists in the data set; for example, the participants in a study of adults with a mean age of 50 years may all be 50 years old, or they could range from 18 to 104 years old; this info is very important to have when interpreting the meaning of results |
|
measures of spread are also called |
dispersion and are used to describe the variability and range of responses |
|
range |
for a variable is the difference between the responses with the greatest and least numeric values; for example, if the youngest participant in a study is 18 years old and the oldest is 104 years old, the range is 104-18=86 years |
|
median |
marks the value that divides the responses into two halves with equal numbers of observations |
|
quartiles |
mark the three values that divide a data set into four equal parts |
|
tertiles |
divide a data set into three equal parts |
|
quintiles |
divide a data set into five equal parts |
|
deciles |
divide a data set into 10 equal parts |
|
interquartile range (IQR) |
is the range for the 25th to 75th percentiles, which captures the middle 50% of responses |
|
boxplot |
also called a box-and-whisker plot can be used to display this info, such as IQR, median, quartiles, etc.; they can be especially helpful for displaying the distribution of responses when the responses are skewed |
|
the whiskers (inner fences) in a boxplot show the |
highest and lowest values |
|
outliers |
responses more or less than 1.5 IQRs from the median |
|
skewing |
occurs when the whiskers on the boxplot extend much farther on one side of the median than on the other side |
|
histogram |
is an alternative way to display the responses to a numeric variable like a ratio variable or an interval variable |
|
on a histogram, the x-axis shows |
the value of responses |
|
on a histogram, the y-axis shows |
the count of the number of times each response was given |
|
for a graph to be considered a histogram, each bar must be the |
same width |
|
there should be no gaps between the bars in the |
middle of the distribution, where responses are clumped together (there can be gaps to indicate values of the variable with a count of 0 responses) for a histogram |
|
a histogram showing a normal distribution (Gaussian distribution) or approximately normal distribution of responses will have a |
bell-shaped curve with one peak in the middle; however, not all numeric variables have a normal distribution. The distribution may be skewed. |
|
the distribution may be skewed with responses that extend farther from the peak on either the
|
left (left-skewed) or the right (right-skewed) side of the histogram |
|
the distribution may have a |
bimodal (two-peaked) distribution instead of being unimodal (one peak), or it may be uniform, with equal numbers of people providing each response |
|
for variables with a relatively normal distribution - a reasonably bell-shaped curve - the standard deviation describes |
the narrowness or wideness of the range of responses |
|
When the responses are normal: |
* 68% of responses fall within one standard deviation above or below the mean * 95% of responses are within two standard deviations above or below the mean * More than 99% of responses are within three standard deviations above or below the mean |
|
a small standard deviation indicates that |
most responses were fairly close to the mean |
|
a large standard deviation indicates that |
the range of responses was wide |
|
a z-score indicates |
how many standard deviations away from the sample mean an individual's response is; for example, an individual whose age is exactly the mean age in the population will have a z-score of 0, a person whose age is one standard deviation above the mean in the population will have a z-score of 1, a person whose age is two standard deviations below the population mean will have a z-score of -2 |
|
a histogram or boxplot cannot be used to display the responses to |
categorical variables, the distribution of responses must instead be displayed in a bar chart (or, less often, a pie chart) |
|
like a histogram, the x-axis of a bar chart show the |
values of responses |
|
like a histogram, the y-axis of a bar chart show the |
count of the times each response was given; however, for a bar chart the x-axis can display either a number or a word |
|
a histogram requires |
numbered bars to be evenly spaced along a number line |
|
responses on bar charts |
may appear in any order |
|
the bars in bar charts can be displayed |
vertically or horizontally, and there are usually spaces between the bars |
|
the goal of descriptive statistics is to |
describe accurately all the responses to a variable |
|
for ratio and interval variables descriptive statistics |
the mean and standard deviation are typically reported |
|
for ordinal variables (and for ratio and continuous variables without a normal distribution like the bell-shaped curve) descriptive statistics |
the median and interquartile range are often reported |
|
for categorical variables descriptive statistics |
the proportions of participants who provided a particular response is usually used to describe the population |
|
variable type: ratio descriptive statistics |
common measure of central tendency: mean
common measure of spread: standard deviation
typical means of display: histogram |
|
variable type: interval descriptive statistics |
common measure of central tendency: mean
common measure of spread: standard deviation
typical means of display: histogram
|
|
variable type: ordinal/ranked descriptive statistics |
common measure of central tendency: median
common measure of spread: interquartile range
typical means of display: boxplot
|
|
variable type: nominal/categorical descriptive statistics |
common measure of central tendency: mode
common measure of spread: none
typical means of display: bar chart, pie chart |
|
variable type: binomial descriptive statistics |
common measure of central tendency: mode
common measure of spread: none
typical means of display: bar chart |
|
three of the most serious forms of research misconduct are |
1.) falsification 2.) fabrication 3.) plagiarism |
|
falsification |
the misrepresentation of results |
|
fabrication |
the creation of fake data |
|
plagiarism |
the use of other people's ideas or words without proper attribution |
|
statistical honesty requires more than merely avoiding outright falsification, fabrication, and plagiarism, it also requires |
adherence to accepted statistical practices |
|
scientific integrity requires researchers to |
follow established statistical practices |
|
ideally the researcher should consult with a statistician during the study design process to ensure that: (3 items) |
1.) The sampling methods and sample size are appropriate 2.) The questionnaire will yield usable data 3.) The analytic strategy is a reasonable one |
|
comparative statistics |
compare groups of participants by sex or age, by exposure or disease status, or by other characteristics. Examples of comparative statistical tests include rate ratios, odds ratios, t-tests, and Chi-square tests. |
|
study approach: case-control study |
first step: show that cases and controls are similar except for disease
key analysis: use odds ratios (ORs) to see whether cases and controls have different exposure histories |
|
study approach: cohort study |
first step: show that the exposed and unexposed are similar except for exposure status
key analysis: use rate ratios (RRs) to see whether the exposed and unexposed have different rates of incident disease |
|
study approach: experimental study |
first step: show that the individuals assigned to the intervention and control groups are similar except for exposure status
key analysis: use rate ratios (RRs) and other measures to see if the intervention and control groups have different outcomes |
|
comparative statistical tests |
categorize study participants into two or more groups and compare the characteristics of the groups; for example, the analysis of a case-control study requires using comparative tests to show that the cases (people with the disease) and controls (people without the disease) in the study were similar in terms of age distribution and other demographic characteristics; then additional comparative tests are applied to determine whether the exposure histories of cases and controls were different |
|
comparative tests can also be used to |
compare before and after characteristics of participants in longitudinal and experimental studies |
|
comparative statistical tests usually are designed to test for |
difference rather than for sameness |
|
comparative statistical test questions are usually phrased in terms of |
differences: Are the means different? Are the proportions different? Are the distributions different |
|
each comparative statistical question about statistical difference has two possible answers: |
the values are either different or not different |
|
example statistical question: Are the means different? |
Null hypothesis (H sub 0): The means are not different.
Alternative hypothesis (H sub a): The means are different. |
|
Null hypothesis (H sub 0) |
describes the expected result of a statistical test if there is no difference between the two values being compared. (Null means nothing or zero) |
|
Null result |
means that there was no statistically significant difference |
|
Alternative hypotheses (H sub a) |
describes the expected result if there is a difference |
|
Because statistical tests do not ask questions about sameness, the answers provided by statistical tests do not allow the researcher to say conclusively |
whether two values are the same; instead a researcher must make a conclusion about whether the results of a statistical test indicate that values are different or not different |
|
the language used to describe a decision is that the researcher will either |
reject the null hypothesis or fail to reject the null hypothesis |
|
rejecting the null hypothesis |
mean concluding that the values are different by rejecting the claim that the values are not different |
|
failing to reject the null hypothesis |
means concluding that there is no evidence that the values are different, functionally, this is like saying that the values are close enough to be considered similar, but failing to reject the null hypothesis should never be taken as evidence that the values are the same |
|
the decision to reject or fail to reject the null hypothesis is based on the likelihood that the result of a test was due to |
chance |
|
when a sample population is drawn from a source population, the mean age in the sample population is usually |
not exactly the mean age of the source population |
|
the range of expected values for the mean age of sample populations drawn from a source population can be estimated using |
statistics |
|
some sample populations will have mean ages that are very close to the mean in the |
source populations, other sample populations will have mean ages that are quite far from the mean in the source population |
|
no set cutoff defines what will be considered extremely far from the mean age in the source population, but the standard is to say that |
the 5% of sample means farthest from the true mean are extreme, thus, by chance 5% of the samples drawn from a source population will be expected to have an extreme mean |
|
if two sample populations are drawn from the same source population, their mean ages will |
not be identical even though they are drawn from the same pool of individuals, comparative statistical tests accommodate this expected difference when testing whether two groups in a study population are different; the test that compares the mean ages of cases and controls in a case-control study adjusts for the fact that there will be some difference between the mean ages of case and controls, even if the cases and controls are sampled from source populations with indentical means ages; the test will also examine whether the mean ages are so far apart that, if the cases and controls were drawn from source populations with the same mean age, the difference between the mean ages of the cases and the controls would fall in the 5% of most extreme differences expected by chance |
|
when the difference in mean ages is great, the statistical test will show that it is |
highly unlikely that the group means are not significantly different, the researcher will therefore reject the null hypothesis and conclude that the mean ages of the cases and the controls are different |
|
the 5 of most extreme sample means = |
2.5% of the means have the lowest values and 2.5% of the means have the highest values |
|
count |
number of sample populations drawn from the source population that have a particular mean |
|
if the statistical test shows that the mean ages of cases and controls are fairly close, the researcher will |
fail to reject the null hypothesis and will conclude that the means are not different |
|
p-value or probability value |
for a statistical test is used to decide whether the results observed are likely to reflect real differences between groups; the interpretation is similar for all statistical tests: the p-value for the study determines whether the null hypothesis (H sub 0) will be rejected |
|
the standard is to use a significance level of |
a = 0.05 or 5 %
any statistical test with a result that is in the 5% of most extreme responses expected by chance will result in the rejection of the null hypothesis |
|
some studies use a = 0.01, which makes it |
harder for a test to find a statistically significant result that would cause the rejection of the null hypothesis |
|
others use a = 0.10, which makes it more |
likely that a test will yield a statistically significant result |
|
some p-values are reported as being |
one-sided or two-sided, based on the alternative hypothesis for the statistical test |
|
when direction is specified, a |
one-sided p-value can be used |
|
when a direction is not specified, a |
two-sided p-value should be used to make the decision about rejecting or failing to reject the null hypthesis |
|
Example:
H sub 0: The means are not different. |
Conclusion When:
p < 0.05 = reject H sub 0: The means are different.
p >= 0.05 = fail to reject H sub 0: The means are not different. |
|
Examples of one-sided and two-sided alternative hypothesis
Null hypothesis: The means are different. |
Two-Sided alternative hypothesis (H sub a): The means are different.
Example of a one-sided alternative hypothesis (H sub a): The mean of cases is higher than the mean of controls.
|
|
confidence intervals (CIs) |
provide info about the expected value of a measure in a source population based on the value of that measure in a study population |
|
the width of the interval is related to the |
sample size of the study |
|
a larger sample size will yield a |
narrower confidence interval |
|
a 95% confidence interval is usually reported, and that corresponds to a significance level of |
a = 0.05 for a statistical test; that means that 5% of the time a 95% confidence interval is expected to miss capturing the true value of a measure in the source population |
|
using a 99% confidence interval (a = 0.01) would make the confidence interval |
wider and make it more likely that the value in the source population would be captured within the confidence interval, but it would also make it more difficult to classify a result as statistically significant because fewer results would be classified as extreme. |
|
a 90% confidence interval (a = 0.10) would make the confidence interval |
narrower and make it easier for a result to be deemed statistically significant because more results would be classified as extreme, however, a 90% confidence interval would be less likely than a 95% confidence interval to capture the true value in the source population |
|
a 90% confidence interval for an odds ratio (OR) is |
less likely to overlap with OR = 1 than a 99% confidence interval, so although the 90% confidence interval is less likely to capture the true odds ratio, it is also more likely that the OR will be deemed to show a statistically significant association between the exposure and the outcome |
|
some of the most common types of comparative analysis are the measures of association, such as |
the correlation used for ecological studies, the odds ratio (OR) used for case-control studies, and the rate ratio (RR) used for cohort studies. |
|
the OR and RR compare responses to two variables that have each been divided into two levels using what is sometimes called a |
2 X 2 analysis |
|
prior to using a computer to calculate an OR or RR, variables that are not already divided into two categories must be |
recoded into binomial variables (often coded numerically as yes = 1 and no = 0 |
|
The reference group for an odds ratio or rate ratio should be |
well-defined |
|
the 95% confidence interval provides info about the |
statistical significance of the tests |
|
for statistical comparisons more complex than 2 X 2 analysis, analysts must select a |
test that is appropriate to the goal of the analysis and the types of variables being analyzed |
|
first, the variables to be compared should be |
selected and the goal of the test clearly stated |
|
then select a |
test that is appropriate for the types of variables being examined, some test require the variables being examined to have particular distributions or other characteristics, the researcher must confirm that the variables meet these assumptions of the test prior to running it and interpreting the output |
|
plan for hypothesis testing: 6 steps |
1.) select variables to compare 2.) specify the goal of the test 3.) check variable types 4.) choose appropriate test for the variables 5.) confirm that the assumptions of the test are met 6.) run test and interpret results |
|
statistical tests are often classified as being either |
parametric or nonparametric |
|
The basic difference between these two types of tests is that |
parametric tests make more assumptions about the variables being examined than nonparametric tests |
|
parametric test |
assume that the variables being examined have particular distributions, often requiring the variables to have normal or approximately normal distributions. These tests may also require that the variance for the variable of interest - the spread of observation around the mean - be equal or at least similar in the population groups being compared |
|
nonparametric tests |
do not make assumptions about the distributions of responses |
|
parametric tests are typically |
used for ratio and interval variables with relatively normal (bell-shaped) distributions of responses. Most parametric tests are more statistically powerful than nonparametric tests, so the preference is to use a parametric test whenever the variable being examined fits reasonably well with the assumptions the test makes about sample size, distribution, and the equality of variances |
|
nonparametric tests |
are often used for ranked variables, such as responses to surveys that ask participants to indicate preferences using scales from 1 (strongly disagree) to 5 (strongly agree). They are also used when the distribution of a ratio or interval variable is non-normal. Additionally, nonparametric test are used for categorical variables, including variables with just two groups (such as cases and controls, males and females, children and adults) |
|
the goal of some statistical tests is to |
compare the value of a statistic in a study population to some set value |
|
independent populations |
are populations in which each individual can be a member of only one of the population groups being compared |
|
statistic being evaluated: mean for ratio/interval variable (parametric tests) |
test for whether the statistic in one population is different from a hypothetical value: one sample t-test
test for whether the statistic differs in two populations: independent samples (two-sample) t-test
test for whether the statistic differs in two or more populations: one-way ANOVA (F-test)
|
|
statistic being evaluated: median for ordinal/rank variable (nonparametric tests) |
test for whether the statistic in one population is different from a hypothetical value: one-sample median test
test for whether the statistic differs in two populations: Mann-Whitney U test (Wilcoxon rank sum test, Wilcoxon-Mann-Whitney test)
test for whether the statistic differs in two or more populations: Kruskal-Wallis test
|
|
statistic being evaluated: proportion for binomial variable |
test for whether the statistic in one population is different from a hypothetical value: binomial test
test for whether the statistic differs in two populations: Fisher's exact test
test for whether the statistic differs in two or more populations: Chi-square (x to the power of 2) test
|
|
statistic being evaluated: proportions for nominal categories (variables) |
test for whether the statistic in one population is different from a hypothetical value: Chi-square (x to the power of 2) goodness-of-fit test
test for whether the statistic differs in two populations: Chi-square (x to the power of 2) test
test for whether the statistic differs in two or more populations: Chi-square (x to the power of 2) test |
|
The appropriate test to use depends on the type of |
variable being examined |
|
a two-sample (independent samples) t-test could be used to compare the |
mean ages of cases and controls participating in a case-control study |
|
a Fisher's exact test could be used to examine whether the |
proportions of males in the exposed and unexposed groups of a cohort study are similar |
|
A Chi-square test could be used to determine whether |
the distributions of participants by race or ethnicity are similar for the intervention and control groups of an experimental study |
|
when running statistical tests, it is often beneficial to create a |
table of basic info about the variables of interest for each of the comparison groups as well as the result of the statistical tests used to compare those populations |
|
a different set of tests is used when the goal is to |
compare before and after results in the same individuals |
|
if the goal is to see whether on average a participant in a cohort study gained weight between the baseline exam and the 1 year follow up exam a |
matched pairs t-test can be used |
|
if the goal is to see whether a safe driving course improves the pass rates for a driving licensure exam a |
McNemar's test can be used to examine how many participants switched from failing a pretest to passing a post test, how many switched from passing a pretest to failing a post test, and how many had no change in status, McNemar's test can also determine whether the differences indicate that the course had a significant impact on exam pass rates |
|
test for whether the value of the variable is different in one population measured twice (such as before and after in the same population) or in two paired groups: matched pairs (paired) t-test for the ratio/interval variable (parametric tests) |
Test for whether the value of the variable is different in two or more matched groups: one-way repeated measures ANOVA for ratio/interval variable (parametric tests) |
|
test for whether the value of the variable is different in one population measured twice (such as before and after in the same population) or in two paired groups: Wilcoxon (matched pairs) signed rank test or sign test for matched pairs for ordinal/rank variables (nonparametric tests) |
Test for whether the value of the variable is different in two or more matched groups: Friedman test for ordinal/rank variables (nonparametric tests) |
|
test for whether the value of the variable is different in one population measured twice (such as before and after in the same population) or in two paired groups: McNemar's test for binomial variables |
Test for whether the value of the variable is different in two or more matched groups: Cochran's Q test for binomial variables |
|
test for whether the value of the variable is different in one population measured twice (such as before and after in the same population) or in two paired groups: McNemar's test for nominal categories (variables) |
Test for whether the value of the variable is different in two or more matched groups: Cochran's Q test for nominal categories (variables) |
|
only a very limited number of studies require |
regression analysis or any of the other advanced statistics |
|
researchers should not use these advanced statistical tests without first knowing |
when to use them, what conditions have to be met to make their use appropriate, how to run them, and how to interpret them |
|
one of the main reasons researchers use multivariate statistical modes, that is analysis of three of more variables at one time is to |
examine the interactions that may occur among variables, this may be especially helpful when a third variable (also called an extraneous variable or lurking variable) may be concealing or distorting the true relationship between the two other variables |
|
several different types of third variable effects might occur, including |
confounding and and effect modification |
|
confounder |
may make the association between an exposure variable and an outcome variable appear more or less significant than it truly is |
|
when a third variable is shown to be a confounder, an |
adjusted measure of association, such as an age adjusted odds ratios, should be reported for the association between the exposure and outcome, for example age might be a confounder |
|
effect modifier sometimes called an interaction term |
is a third variable that often represents biologically distinct groups of individuals who might experience different biological responses to various exposures, for example, menopausal status might be an effect modifier |
|
if a third variable is shown to be an effect modifier, it is usually best to report |
separate stratum specific measures of association for each level of the effect modifier (such as separate results for premenopausal and postmenopausal women), pooling the results for the biologically different groups may hide meaningful differences, so an adjusted or crude measure of association should not be reported when effect modification is occurring |
|
to be a confounder or effect modifier, the |
third variable must be independently associated with both an exposure (or predictor) variable and an outcome variable |
|
confounding example |
OR for female = OR for male but not = OR for crude |
|
effect modification example |
OR for female is not = OR for male is not = OR for crude |
|
neither confounding or effect modification example |
OR for female = OR for male = OR for crude |
|
when a third variable is associated with both |
the exposure and the outcome of interest, unadjusted analysis may hide the true association between the exposure and the outcome; these two relationships should be confirmed. Then a crude odds ratio or other measure of association for the relationship between the exposure and the outcome should be calculated, along with a separate measure of association for each level of the third variable, such as separate odds ratios for males and females. |
|
how to identify confounding |
1.) confirm that exposure is statistically significant 2.) confirm that outcome is statistically significant 3.) calculate three measures of association (OR or RRs) for the third variable 1.) crude OR between the exposure and outcome 2.) OR for stratum 1 3.) OR for stratum 2 4.) interpret results
|
|
crude and stratum specific measures are compared using a |
Breslow-Day test for homogeneity or interaction, a - 2 log likelihood test, or another appropriate statistical test |
|
after running a suitable test, the interpretations is as follows |
1.) If the crude and stratum specific odds ratios are all similar, then neither confounding nor effect modification is occurring. Report a crude measure. 2.) If the stratum specific measures of association are equivalent to one another, but different from the crude measure of association, the third variable is a confounder. Report an adjusted measure. 3.) If the stratum specific measures of association are different from one another and different from the crude measure of association, the third variable is an effect modifier. Report stratum specific measures. |
|
regression is often the easiest way to |
adjust for one or more confounding variables or interaction terms during analysis |
|
regression models |
seek to understand the relationship between one or more predictor (independent) variables and one outcome (dependent) variable |
|
predictor |
independent variable |
|
outcome |
dependent variable |
|
The models allow the effect of one predictor variable on the outcome to be examined while |
controlling for other predictor variables (keeping them constant) |
|
two most common types of regression |
linear regression logistic regression |
|
some statistical software programs require the analyst to |
1.) select a variety of specifications for the model, such as the particular estimation technique (often an ordinary least squares, generalized least squares, or maximum likelihood estimation model) 2.) choose the method the computer will use to select variables for inclusion in the model, for example, an enter method will include all predictor variables in the model; a forward step wise method adds the best predictor variables to the model one at a time until adding an additional variable does not significantly improve the fit of the model; a backward step wise method deletes variables from the model until deleting a variable significantly reduces the fit of the model 3.) check the fit of the model by examining its residual terms, which measure how well real data match the values predicted by the model, and the results of statistical tests of the goodness-of-fit for the model |
|
steps in fitting a regression model |
1.) select one outcome (dependent) variable 2.) identify the appropriate type of regression (such as a linear or logistic model) for the outcome variable 3.) select one or more predictor (independent) variables 4.) check to make sure that any assumptions required for the model (such as the variable types or distributions of outcome and predictor variables) are met 5.) choose a selection method for helping the computer decide which set of predictor variables will produce the best fit model (the model that the computer determines is the best at explaining the relationship between the predictor variables and the outcome variable) 6.) examine the model for potential problems. For example, examine residuals for possible autocorrelation, check for possible interaction between predictor variables (such as the multi-collinearity that might occur when two predictor variables are highly correlated), and look for other potential problems that might need to be addressed. 7.) interpret the results of the regression model, and consider whether they are logical (for example, that all necessary covariates are included and all illogical ones are excluded) |
|
a linear regression model is used when the |
outcome variable is a ratio or interval variable |
|
simple linear regression models |
examine whether there is a linear relationship between one predictor variable and the outcome variable |
|
the relationship between the predictor and outcome variables can be visually displayed using a |
scatterplot, and the regression model finds the best fit line for those points |
|
the slope of the line is the |
coefficient for the predictor variable (often designated B in the output of statistical software programs) |
|
the y intercept for the line is the |
coefficient for the constant in the regression model |
|
these values can be used to write an equation for the best fit line, and that equation can be used to predict |
the expected value of the outcome variable for various values of the predictor variable |
|
r^2 |
square of the correlation coefficient, provides the info about how well the regression model predicts the variation in the values of the outcome variable, the value of r^2 ranges from 0 to 1, with the larger values indicating a better model fit |
|
if r^2 = 0.79 |
it means that the predictor variable explains 79% of the variation in the values of the outcome variable |
|
if predictor 1 is held constant, a |
1 unit increase in predictor 2 is associated with a 0.6 unit increase in the expected value of the outcome variable |
|
equation for simple linear regression model |
outcome = 3.1 * predictor 1 = 0.9 |
|
equation for a multiple linear regression model with two continuous variables |
outcome = 0.5 * predictor 1 + 0.6 * predictor 2 - 6.2 |
|
multiple linear regression models |
examine the effects of several predictor variables on the value of the outcome variable |
|
the coefficients (B) for the predictor variables and the constant can be used to write an |
equation for a best fit line; that equation can be used to examine the effect of each predictor variable on the outcome variable while controlling the other predictors by holding their values constant |
|
multiple linear regression models |
can have both continuous and categorical predictor variables, as long as the responses to categorical variables are expressed by numbers |
|
the predictor variables in multiple linear regression models may |
interact; for example, interaction may be occurring when the best fit regression lines for males and females have considerably different slopes |
|
logistic regression models (sometimes called logit regression models) |
are used when the outcome variables is a dichotomous variable; logistic regression is commonly used in case-control studies, for which the outcome variable is usually case status, with case = 1 and control = 0; there are other types of outcome variables, such as yes/no variables, typically yes = 1 and no = 0 |
|
predictor variables for a logistic regression can be |
categorical or continuous |
|
the coefficient for a predictor variable in a logistic regression model is the |
natural log of the odds ratio, ln (OR), so the odds ratio for the association between that predictor variable and the outcome variable can be found by taking the exponential of the coefficient, exp (B) |
|
the odds ratio for each predictor variable represents the |
change in the odds of the outcome, typically the odds of being a case or being classified as a yes, for a 1 unit change in the predictor variable |
|
the confidence interval for the odds ratio can be calculated using the |
value of the coefficient and its standard error |
|
the predictor variables in regression models can take a variety of forms but must have |
numeric responses |
|
nominal categorical variables have responses that cannot be |
ordered and assigned a rank, but a series of dummy variables that convert categorical responses to a series of dichotomous (0/1) variables can be created, additionally when fitting a logistic regression model, it might be helpful to convert ratio ratio and interval variables to dummy variables so that a series of odds ratios for the levels of the variable can be estimated |
|
if the original categorical variable has n possible responses, then |
n -1 dummy variable are required to capture all the responses to the original question, all n - 1 variables should be included in a regression model (even if some may be eliminated during a step wise selection process) |
|
survival analysis |
examines the distribution of the durations of time that individuals in a study population experience from an initial time point (such as the time of enrollment in a study or the time of diagnosis of a particular condition) until some well-defined event, which can be death or some other outcome |
|
measures of survival include |
1.) median survival time 2.) cumulative survival at set times after diagnosis 3.) life tables that record conditional and cumulative probabilities of survival 4.) Kaplan-Meier plots that display cumulative survival rates |
|
log rank test can be used to determine |
whether survival is shorter in one population than in another |
|
Cox proportional hazards regression |
which estimates a hazard ratio that compares durations to an event (such as death) in two populations, can also be used for survival |
|
If GPS (global positioning system) coordinates or other geographic data have been collected, then |
spatial software programs may be useful for conducting the geographic portion of the analysis |
|
the geographic data should be incorporated into a |
GIS (geographical information system) |
|
The GIS allows for spatial analysis such as |
1.) the identification of spatial disease clusters (using a statistic like Moran's coefficient or Geary's coefficient) 2.) the determination of associations, if any, between the social or physical environment and disease 3.) the estimation of distances between locations 4.) the ascertainment of the geographic factors that are related to access to health services |
|
research articles almost always have the same structure |
1.) abstract 2.) introduction 3.) methods 4.) results 5.) discussion |
|
abstract |
summary of the article, the most important function is to serve as an advertisement for the manuscript, key words |
|
abstract |
should be accurate, reasonably complete, compelling, most are limited to 150 -250 words, its usually easier to write the abstract after the rest of the paper has already been written and the focus, key results, and conclusions are already clear |
|
two types of abstracts |
1.) structured abstract uses subheadings, like objectives, methods, results, and conclusions to highlight content 2.) unstructured abstract usually follows the same outline but does not list the section titles |
|
introduction section |
provides the essential background info that a reader must know to understand the methods and results of the article, this section often includes info about the study population, the study site, and the study years; it might include a comparison to previous studies and a discussion of what is novel about the new study, but that content might appear in the discussion section instead; most intros conclude with a statement about the importance or significance of the study and the specific aims, objectives, or hypothesis that the paper will address |
|
methods section |
should begin by clearly identifying the study design used; if person, place, and time characteristics were not provided in the intro, they should be listed in this section; definitions should be provided for the key exposures, outcomes, and other variables; the methods section should provide info about ethical considerations, ethical issues can be included in the endmatter, depending on preference of journal; this section should end with a description of the statistical methods used; it can be written before data collection begins because most of the methods are finalized before data collection starts
for a case-control study, the case definition should be spelled out; for an experimental study, the intervention and control should both be described in detail; for some studies, supplying the exact phrasing and order of questionnaire items, along with the steps taken to validate the survey instrument, might be important |
|
for primary studies |
the methods used to identify, sample, and recruit participants should be described and the inclusion and exclusion criteria listed; the methods for collecting data should also be described, including interview techniques, laboratory methods, physical examinations checklists, and measurement methods, study design; key exposures, outcomes, other variables, setting and dates of study |
|
for secondary analysis |
the report should specify who collected the data originally, how they were collected, how they were collected, how they were acquired for secondary analysis, and the role, if any, that the authors of the new paper had in data collection |
|
results section |
should start with a description of the study population that clearly identifies the sample size and the demographics of the participants; additional results of statistical analysis should then be provided, using tables and figures; the results of a statistical test should not be reported unless the authors fully understand when that test can be used and how it should be interpreted; number of participants at each stage of study |
|
discussion section |
usually begins with a summary of the key findings of the new study; ideally, the key findings should match the aims, objectives, or hypothesis spelled out in the last paragraph in the intro; the ensuing paragraphs should compare the new study to previous studies and include a thorough discussion on the relevant existing lit and an adequate number of citations; every paper needs to include at least one paragraph on the limitations of the study and it should identify potential problems such as types of bias; the final paragraph of the discussion should state the conclusions of the study and might include new theories that emerge from analysis; generalizability of study |
|
endmatter |
1,) the affiliations of the authors and their contact info (if not listed on title page) 2.) contributions of each author to paper 3.) acknowledgments of people who assisted with the study but did not meet authorship criteria 4.) info about some ethical aspects of research (informed consent) 5.) a list of all funding sources 6.) disclosures of the presence or absence of possible conflicts of interest |
|
tables and figures |
many health journals limit the number of tables and figures allowed for each article, often to a max of 4 (tables and figures combined) |
|
tables |
should be used to organize and present statistical results that cannot be easily listed in the text in a sentence or two |
|
graphs and other figures |
should be used when a visual presentation of the material is more effective than words at conveying a result; there is no need to repeat info in the table and figure that is provided in the table or figure, but be sure to have a callout for each table and figure that indicates when the reader should refer to the table or figure |
|
a table should provide enough info so that it can be independently interpreted and understood even in the absence of the text: |
1.) the title of the table should provide a brief but clear description of the content 2.) the rows and columns should each have a descriptive label, i.e. units, sample sizes (n) 3.) for each statistic provide a confidence interval, p-value, and measure of uncertainty such as standard deviation or standard error for a mean or interquartile range for a median 4.) a note just below the table or in the title bar should explain the meaning of asterisks and other symbols commonly used to denote statistical significance and other items of interest 5.) consistent fonts, spacing, and number of decimal points should be used for all tables in the manuscript |
|
graph |
should provide info in the title, figure, and/or legend or key for a reader to be able to interpret the graph even without reading the related portion of the text; high-resolution photographs, maps, flowcharts, and other images provided by the authors can also be used as figures |
|
bar graphs |
used to display categorical data |
|
systematic review |
checklist: PRISMA (preferred reporting items for systematic reviews and meta-analysis |
|
meta-analysis |
checklist; PRISMA, MOOSE (meta-analysis of observational studies in epidemiology) |
|
Cross-sectional survey |
checklist: STROBE - cross-sectional, case-control, cohort (strengthening the reporting of observational studies in epidemiology) |
|
experimental study |
checklist: CONSORT (consolidated standards of reporting trials for randomized controlled trials), TREND (transparent reporting of evaluations with nonrandomized designs) |
|
qualitative studies |
checklist: COREQ (consolidated criteria for reporting qualitative research) |
|
introduction section |
of a manuscript usually provides the background necessary to understand the importance of the new work |
|
discussion section |
typically provides an extensive comparison of the results of the new study to the results of previously published works |
|
a typical article in the health sciences refers to about |
20 or 30 other articles published in peer-reviewed journals |
|
references |
should be carefully selected to support the importance, validity, and conclusions of the study; can also be used to acknowledge the alternative methodological approaches that could have been used, to identify both areas in which the new findings agree with the existing lit and areas where findings contradict previous studies and to provide varying perspectives on the policy and practice implications of the study |
|
citing an article |
is a way of endorsing the work of its author or in a rare instance when specific flaws need to be pointed out |
|
do not trust this to be reliable |
abstracts |
|
journal articles are the preferred source of |
evidentiary support for scientific articles, although books, book chapters, and formal reports (such as those published by governmental agencies and international organizations) are also acceptable |
|
formal scientific reports |
1.) are published in a peer-reviewed journal (sometimes in a report or book) not on a website, in a newspaper, or in a popular magazine) 2.) describe the study design and explain why it was appropriate for the objectives of the study 3.) explain how exposures and outcomes were defined and assessed 4.) describe the analytic approaches used and present results using easily interpreted tables and graphs 5.) draw conclusions that are reasonable and based on the study's data 6.) discuss the limitations of the study 7.) compare the new study to previous studies 8.) follow a standard outline and other conventions for scientific writing |
|
informal sources |
1.) website or fact sheet 2.) newspaper or popular magazine
citable? rarely
|
|
formal sources |
1.) statistical database - citable 2.) official report - citable 3.) book or book chapter - citable 4.) abstract - not citable 5.) article - citable |
|
few scientific articles |
quote directly from another source word for word |
|
one of the most important is that borrowing phrases and sentences from other sources can make writing in a document |
choppy |
|
paraphrasing |
saying the same thing in one's own words, does not remove the requirement to cite an original source, just means quotations don't have to be used; use an in-text citation |
|
direct quote |
entire quote must be in quotations or if long indent from the left margin |
|
specific knowledge |
such as a statistic or the results of a particular field or lab study must be cited; however, some areas of general knowledge or common knowledge do not require citation |
|
common knowledge |
refers to what a typical person in the discipline would know, it does not refer to what a randomly selected person at the grocery store would know; when in doubt cite |
|
plagiarism |
occurs when someone's wording, thinking, or creative output is repeated in a new document without attribution
coping the exact words, paraphrasing a unique theory or observation, using an image without permission and acknowledgment are all forms of plagiarism; redundant publication or falsification of data or fabrication are all plagiarism and is a major violation of scholarly integrity and the article must be retracted with public acknowledgment |
|
helpful habits when using sources |
1.) never cut and paste info into your document, paraphrase and cite it 2.) always include reference in research notes for later citation |
|
citations typically appear in two forms |
1.) as in-text citations where the sources of info are briefly identified in the text 2.) in a reference list at the end of the document that provides a full bib info for each source |
|
most medical and public health journals use some version of a citation style |
ICMJE (International Committee of Medical Journal Editors) style or Vancouver style or NLM (National Library of Medicine) or AMA (American Medical Association) or APA (American Psychological Association) - which is commonly used for social sciences as well for nursing journals or a journal will use their preferred style |
|
some journals will convert bracketed citations to |
supercript numbers during the editing and layout process |
|
reference list |
can be either alphabetical or in order of appearance in article |
|
journals that use ICMJE style or a variant typically |
list authors by last name and first initials, then title with capitals only for proper nouns, an abbreviated journal name, pub year, volume, and page numbers, separated by periods or semicolons or commas; some journals expect all authors to be listed no matter how many; some use an abbreviated version for 6 or more; some use full journal name; some list issue numbers, but most do not; key is to be consistent |
|
vast majority of work has been completed when |
1.) study questions has been identified and refined 2.) study approach has been selected and a protocol developed 3.) data have been collected and analyzed |
|
three key times to address writer's motivation |
1.) first, writers must overcome barriers to getting started 2.) second, writers must find ways to prolong the period of high productivity that often occurs at the start of the writing project 3.) finally, most writers become fatigued during the writing process and at some point lose all desire to even think about their projects; they must find motivation |
|
if researcher does not know how to begin writing, an easy way to start is |
1.) put working title for paper along with names of authors 2.) add in headers for abstract intro, methods, results, discussion, acknowledgments, and references 3.) fill in names of people to thank in acknowledgments section 4.) paste in a table or figure that was created during the analysis 5.) paste in some relevant lines about methods from the protocol 6.) then start filling in gaps 7.) write a sentence or two for each key points
content does not need to be added in any particular order |
|
staying motivated |
change habits or scenery, make a timeline, set weekly meeting with advisor, speak content out loud or write informally |
|
many manuscripts for health science journals are limited to a max of about |
3000 words |
|
writer's block |
can last for weeks or months |
|
what authors can do |
1.) fully explain the actual methods used 2.) run all the appropriate analyses 3.) include a helpful set of references that support the results 4.) polish the prose 5.) honestly identify the limitations of the study and explain what was done to address them
no paper is perfect, but the above can be done to help and few are fatally flawed |
|
lead author is responsible for checking |
the manuscript very carefully |
|
every paper should tell a story that has |
1.) a beginning - the intro sets the stage 2.) a middle - the methods and results say what happened 3.) an end - the discussion provides a conclusion that ties all the parts of the story together |
|
the story line should be able to be summarized into a |
sentence or two |
|
some journals require a precis that is |
35 words or less |
|
the abstract of a report should tell the whole story in |
one compelling paragraph |
|
the first step in editing is to make sure that the big pictures is |
being clearly communicated |
|
does the paper tell a compelling story? |
1.) does the paper have a clear story line, can plot be summarized in one sentence 2.) does the title of the paper reflect the key aspects of the study 3.) does the abstract tell the key parts of the story 4.) do the opening paragraphs draw the reader into the story 5.) is the goal of the study clearly stated in the intro section 6.) does the methods section make it clear how the methods were helpful in answering the study question 7.) do the results and discussion sections provide the answer to the study questions 8.) is the story missing any parts that need to be added, are there any gaps in logic that need to be addressed 9.) are any parts of the manuscript redundant, any parts peripheral to the main story 10.) are the conclusions fully supported by data 11.) does each paragraph have a theme |
|
once the pieces of the paper's story are clear, the next step is to |
check the structure and content of the manuscript; the paper should be well organized, complete, but concise, and accurate about what was done and what was found; the text, the tables and figures, and the reference list must all meet these same requirements |
|
checklist for structure and content of paper |
1.) paper well organized, content focused 2.) intro provide all essential background info (person, place, and time listed in both abstract and the text) 3.) does the intro make research appear important and necessary 4.) does the intro say why study is novel 5.) are methods described in adequate detail 6.) is enough stat analysis presented, is each stat included necessary 7.) are tables and figures well designed 8.) are all stats presented in figures or tables or in text, but not in both 9.) does discussion provide concise summary of key findings and place new findings in context of previous research, does discussion avoid reiterating results section 10.) does discussion adequately address potential limitations of study 11.) is every claim in discussion supported by citations, should additional references be added 12.) is every reference listed important and necessary 13.) has paper been double checked for plagiarism 14.) is every part of paper truthful |
|
in a final check, look at |
each word, sentence, paragraph, and section and examine for clarity
1.) words must be used carefully 2.) sentences must be concise and clear 3.) voice must be consistent 4.) grammar and spelling must be proper |
|
checklist for style and clarity |
1.) are words used precisely, incidence and prevalence used correctly 2.) unnecessary jargon avoided, are definitions for all key terms provided 3.) are all abbreviations introduced at first use 4.) is the tone of writing appropriate, fact based and not emotion based 5.) does the article consistently use a third person voice or in rare cases first person voice 6.) do all subjects agree with verbs, active verbs rather than passive 7.) verb tense consistent 8.) each sentence clear, words spelled correctly 9.) all punctuation correct 10.) paper adhere to all guidelines for target journal |
|
research results are often publicly shared for the |
first time during an oral presentation or poster session at a academic or professional conference |
|
primary outcome of most professional and academic conferences is |
networking, conferences are a place to exchange ideas and a way to find out what others find interesting about project and to identify weak aspects of study |
|
some conferences are |
annual large events and others are small gatherings |
|
most conferences include a mix of |
1.) plenary sessions where keynote addresses are given 2.) business meetings run by officers of sponsoring organization 3.) concurrent sessions in which multiple panels of oral presentations are held at the same time in different rooms 4.) poster sessions in which attendees can mingle while reviewing research posters |
|
presenters are usually assigned to give either an |
oral presentation or a poster presentation |
|
oral presentation |
require speaking in front of a potentially large audience and may involve facing an open question and answer period, they are usually considered more prestigious than posters because there are more slots for posters |
|
poster sessions |
usually held in a less formal venue, require more preparation time than oral presentations usually |
|
researchers are required to submit |
an abstract for consideration by the organizing committee
the committee rates the abstracts, decides who will be invited to present, and select who will give presentation or poster
a good abstract has key words and conveys one clear health message that is appropriate to the audience |
|
most conferences require presenters to |
pay a registration fee often several hundred dollars |
|
when preparing a poster give equal attention to |
content and its design |
|
sample layout for poster |
title, author info, intro, methods, results, conclusions, references, acknowledgments
come prepared with necessary items |
|
a typical oral presentation time slot is about |
15 minutes long; bc of set up and questions, actual presentation time is 10 - 12 minutes
most presenters can cover 1 - 2 slides per minute, that is about 12 -20 slides for a 10 -12 minute talk
highlight key message with images in place of words as often as appropriate
|
|
sample slide distribution for a 10 - 12 minute talk |
1.) title slide = 1 slide 2.) research goal/importance = 1 slide (start with key message) 3.) outline or summary = 1 slide 4.) background/specific aims = 1 - 2 slides 5.) methods = 2 - 4 slides 6.) results = 4 - 8 slides 7.) strengths/limitations = 1 slide 8.) conclusions = 1 slide (end with key message) 9.) acknowledgments and/or invitation for questions = 0 -1 slide 10.) total slides = 12 - 20 slides
develop a checklist for presentation slide show for content and layout and formatting
preparing a slide show is only the first step in preparing to make an oral presentation, practice is key
|
|
a few weeks before conference |
confirm what equipment will be provided in presentation room, some expect presenters to bring their own computer, some require uploading presentation file to a website in advance, some ask for files to be e-mailed to moderators, some expect the file to be on CD or flash drive, make sure you have a back-up |
|
presenters should |
adhere to time limits, arrive 15 minutes early, check in with moderator, set everything up so ready, keep responses to questions in presentation short, acknowledge limitations and highlight strengths, thank everyone, have business cards |
|
the culmination of a well designed and carefully conducted health research project is often the |
dissemination of results through an appropriate publication |
|
target journal |
should be identified early |
|
examination of recent articles published in target journal provides guidance for |
1.) best outline to follow 2.) how to divide commentary between intro and discussion 3.) appropriate voice and writing style 4.) amount of technical detail to include 5.) reference and citation style |
|
choosing a target journal entails many considerations including |
1.) aim and scope of journal 2.) its audience 3.) its impact factor and other characteristics 4.) the possible cost of publication 5.) outline access options
# 1 and 2 are most important |
|
determining whether an article is a fit with a specialty or regional is often |
straightforward
knowing what topics fall withing the scope of a general journal is a little harder |
|
one way to identify journals likely to consider a paper for publications is to |
examine the manuscript's reference list |
|
the target journal should not be selected primarily bc of its |
impact factor, ranking, or reputation, even though these are important factors to consider
impact factor is based on number of times a typical article is cited in its first year or two after publication |
|
journals with an impact factor of 10 or greater |
1.) Science 2, ) Nature 3.) JAMA 4.) The Lancet 5.) The New England Journal of Medicine
most journals in health sciences have an impact factor closer to 1 or 2, specialty journals may have one less than 1
impact factors are often listed on journal websites, Web of Knowledge |
|
some journals require short reports of |
1000 - 1500 words, one table or figure, and a limited number of references; this is an appealing option for a case report, small case series, or an update to a previously submitted article |
|
a comprehensive report of a large study will |
exceed the usual 3000 -3500 word limit or the standard 4 tables and/or figures will require a journal that have more flexible word limits |
|
many big name journals with low acceptance rates have a turnaround time of |
only a few days or weeks bc send few manuscripts out for external review
specialty journals with higher acceptance rates may have a turnaround time of several months bc three or more external referees will review manuscripts |
|
most journals have moved to |
online submission systems, some require mail or e-mail submission |
|
an increasing number of journals require |
authors to pay for some publishing costs or submission fee, processing fee or processing charge, or page fee or page charge, some require authors to pay to become a member of the journal, open access fee - allows journal to make article available online immediately and some are given the choice, sometimes the fees are waived for low-income |
|
some journals only publish only |
in print, the vast majority allow online access to subscribers |
|
being indexed in a competitive database like |
MEDLINE, which examines the quality and editorial rigor of all journals |
|
submission to a journal is not the |
end of the writing process, additional revisions will be likely |
|
one journal must |
be selected, submitting to two or more journals is not permitted in health sciences and require a statement reaffirming this |
|
author guidelines |
state how manuscripts should be formatted
special attention should be paid to tables and figures and other images when formatting, they do not need the same typographic style of the journal
|
|
graphs and maps and other illustrations |
are rarely reworked by a journal's graphic designer
most journals charge fee for printing in color |
|
most submissions are made via |
computer and a cover letter is still expected
cover letter should summarize manuscript and seek to convince the editor that the work is important, valid, original, and a good fit with the aims of the journal, decision could be based on abstract and/or cover letter solely |
|
sample cover letter content |
salutation, basic info, summary, importance, fit, required declarations, thanks, names/signatures |
|
corresponding author |
the coauthor who will communicate with the journal and answer questions from readers after the paper is published needs to register
the corresponding author may be the first author, the senior coauthor, or the coauthor with the most stable e-mail address and affiliation |
|
some journals will ask for |
1.) the type of article 2.) word count 3.) number of tables 4.) number of figures 5.) statements about ethics, funding, conflicts of interest 6.) confirmation that the article is being submitted only to one journal |
|
ad hoc reviewer |
reviewers who are not on the journal's editorial board who are asked to serve as peer reviewers bc of their expertise on the paper's topic or methods |
|
desk rejection |
rejection without review |
|
reviewers provide two sets of comments |
one on quality of manuscript for author one on comments for editor |
|
external review can lead to three possible results |
1.) rejection 2.) opportunity to revise and resubmit 3.) acceptance |
|
minor revision |
may be reviewed by the assistant editor after resubmission
ex: typos, formatting of tables, not enough citations |
|
major revision |
may be sent back to the original reviewers
|
|
provisional acceptance |
final acceptance pending on a few minor adjustments |
|
responding to suggestions from editors and reviewers requires an author |
1.) understand and appreciate different perspectives 2.) balance conflicting sets of advice about what would strengthen a paper 3.) deal with frustration of needing to rethink and rewrite whole portions to make it clearer 4.) recover from harsh criticism |
|
if a project finds no association between an exposure and outcome the results may be |
even more important to publish so that other scientists do not waste their time and resources
publishing a study with null results is often more challenging than publishing a study that finds an unexpected or strong association |
|
the research process does not necessarily end with a |
report
the research process is a cycle in which data analysis and reporting feed back into the formation of new study questions and establishment of a personal research trajectory |
|
publishing enhances the authors |
CVs and resumes |