• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/351

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

351 Cards in this Set

  • Front
  • Back

research steps

1.) Identify study question


2.) Select study approach


3.) Design study & collect data


4.) Analyze data


5.) Report findings

Most research projects require only the use of

descriptive and perhaps some comparative statistics

4 Types of statistics

1.) Data management


2.) Descriptive statistics


3.) Comparative statistics


4.) Advanced health statistics

Data Management

Refers to the entire process of record keeping, whether tracking articles considered for eligibility in a systematic review, extracting data from patient charts for a case series, entering the responses to a cross-sectional or case-control survey, or recording all the results of clinical assessment conducted during a longitudinal cohort or experimental study. After data are entered, the files need to be cleaned and perhaps recoded before beginning statistical analysis.

Codebook

describes each variable and specifies how the collected info will be entered into a computer database, this should be done prior to beginning data entry

For quantitative surveys numeric or alphabetical codes can be assigned to the options for

close-ended answers provided on the questionnaire because it provides clear instructions for how to code and enter free-response comments

For open-ended questions and qualitative surveys

a code book is even more essential because it provides clear instructions for how to code and enter free-response commentsl

In addition to providing specific info about how each piece of info should be entered into the computer file, the code book should specify:

1.) The name of each variable (which usually employs only capital letters or a combination of capital letters and numbers, and avoids starting with a symbol, such as an underscore)


2.) The wording of the question that was asked


3.) The variable type


4.) The options listed on the survey as possible answers to the question


5.) The way answers should be entered into the computer database


6.) What to do with missing numbers

The code book is also the place to describe how

anticipated data problems will be handled

The code book will also specify for each variable whether

missing answers should be left blank in the database, indicated with a numeric code (such as entering a 9 if the expected entry code is 0 or 1 for a dichotomous variable), or marked with the word "MISSING".

Data are usually entered into a

database program (like Microsoft Access), one of the benefits of these programs is that they can be designed to be visually appealing and to include pre-approved responses to questions and automatic skips between questions, this ensures the consistency of entries and the completeness of the file

Spreadsheet program

Is an alternative option to enter data directly into (like Microsoft Excel), variable names should be entered in the first row, with one variable per column, each individual's data should be in a new row. The advantage of this entry approach is that it does not require creating a data entry form, defining fields and variable names, and doing other coding and testing of the data entry system. The disadvantage is that it is easy to input inconsistent codes, which makes cleaning the data much more difficult, or to accidentally enter new data over an existing row of data.

Double-entry

Consists of two individuals entering the same data (or the same person entering the data twice) into two different computer files, then comparing the records in the two files for agreement.

A special software program for checking double-entry of data is called

Data Compare utility that is part of the U.S. CDC's free Epi Info program allow the individual in two files to be linked by a unique ID number or other variable and compared. These programs usually provide statistics about the agreement level. If the agreement is not extremely high, then the means that the double-entry and comparison of all records is probably required to ensure the accuracy of the final data file.

File comparison programs usually facilitate the creation of a

clean final data file, they identify disputed entries and allow the researcher to select the best response for the final clean data file after consulting the original survey forms

Data cleaning

is the process of correcting any typographical or other errors in data files and should also require that duplicate entries are removed from the data base and that the records are complete, with all data from all participants entered into the database

Recoding

of variables into new categories can be done either prior to or during data analysis, recoding prior to intense analysis is often the easiest approach when the intended new categories are known

Never do any recoding until

an original version of the cleaned data file is safely backed up elsewhere

Never recode into the same

variable (that is do not replace the original values with the new recoded values)

Always recode into a

different (new) variable

One way to maintain confidentiality is to

safely store paper records, including signed informed consent statements, in a locked and secure room

Another way to maintain confidentiality is to

destroy individually identifying info once the records are no longer needed (such as after the data have been entered into a computer file and the files have been thoroughly cleaned) and a research ethics committee has approved the secure disposal of consent statements and other documents.

Another way to protect confidentiality is to

Create secure computerized data files. In general, no individually identifying info (such as name or national identity card number) should be included in an electronic file

If there is a need to link records to individuals

and there is often no need to do this - then the records should be linked to identifying info by a unique study identification number, the file should be stored in a separate and secure place, not on the same computer as the other participant data

Descriptive statistics

are used to describe the basic characteristics of study populations and other data sources

Statistics

when employed properly and accurately, statistics provide essential and useful info for making sense of health research data

For most papers, and especially those written by researchers with limited experience in advanced statistics, the goal of statistics

should be to use the simplest statistics possible to make the results of the study clear

Studies with no comparison groups like

case series and cross-sectional surveys may need only univariate analysis; simple statistics like counts (frequencies), proportions, and averages, are likely to provide an adequate description of the study population

For studies that compare two or more populations

including case-control, cohort, and experimental studies - the description of the study population must be completed before moving on to bivariate analysis, such as the calculation of ratios, odds ratios, and other comparative statistical tests

advanced statistical analysis that examines three or more variables at one time is

rarely required

variable

is a characteristic that can be assigned more than one value, examples of variables that could be examined during a population health study are age, sex, annual income, languages spoken at home, frequency of alcohol ingestion, history of chicken pox, and use of contact lenses

The value of a variable does not have to

vary (change) over time, but the response among individuals within a population should be something that might differ

In many statistical and database programs, responses from individual participants are displayed in

rows with each column representing one variable

ratio variables

have a numeric response plotted on a scale on which a value of zero stands for nothing; for example, if height is measured in feet, a measurement of 0 feet tall means that there was no height; as a result, the ratio of heights is meaningful; a person who is 6 feet tall is twice as tall as a person who is 3 feet tall, yielding a ratio of 2 to 1

interval variables

are also numeric, but they are plotted on a scale on which zero does not stand for "nothing"; an outside temp of 0 degrees C does not mean that there is no heat; if the weather turns colder, the temp may fall to -10 degrees C or lower; a day with a high temp of 40 degrees C is not twice as hot as a day with a maximum temp of 20 degrees C

ordinal variables

or ranked variables, order responses from first to last or from best to worst or from most favorable to least favorable; the rank order can be assigned a number; for example, the responses to a survey that asks participants to indicate their level of agreement with a statement can be coded with agree as "3", neutral as "2", and disagree as "1"; no matter what the scale is, the order of the responses is indicated by their numeric value

nominal variables

or categorical variables, have categorical responses with no inherent rank or order; for example, there is no obvious way to numerically rank participants' favorite recreational sports activities or blood types; binomial variables are a subtype of categorical variables with only two possible answers, usually yes or no

continuous variables

can take on any value within a range; for example, although height is often rounded to the nearest inch when it is measured, a person's height could actually be 64 1/2 inches or 73 3/4 inches or 58.1528 inches; ratio and interval variables can be further classified as either continuous variables or discrete variables

discrete variables

typically result from counting something, so there are gaps between acceptable values; for example, a family can own 2 egg-laying chickens or 17 chickens, but cannot own 2 1/2 chickens or 5 1/4 chickens; ratio and interval variables can be further classified as either continuous variables or discrete variables

5 Types of variables

1.) ratio variables


2.) interval variables


3.) ordinal variables or ranked variables


4.) nominal variables or categorical variables


5.) binomial variables

Binomial variables are a subtype of

categorical variables

Ratio and interval variables can be further classified as either

continuous variables or discrete variables

variable type: ratio

definition: numbers on a scale that has a meaningful zero



examples: blood pressure, height, weight (if the weight increases from 10 kg to 20 kg, the weight has doubled; so the ratio of 20 kg to 10 kg is meaningful)

variable type: interval

definition: numbers on a scale that does not have a meaningful zero



examples: temp (degree F or degree C) (The temp does not double if it increases from 20 degrees to 40 degrees because 0 degrees does not represent the absence of all heat)

variable type: ordinal/ranked

definition: an ordered series that assigns a rank to responses (from first to last in the series), but for which the numbers assigned to the values are not meaningful



examples: highest educational degree earned, scales for never (1) to always (5), scales for strongly disagree (1) to strongly agree (5)

variable type: nominal/categorical

definition: categories with no inherent rank or order



examples: employment category, blood type

variable type: binominal

definition: nominal variables for which only two responses are possible



examples: yes/no, male/female, case/control

case series

describe the study population (univariate analysis)

cross-sectional survey

describe the study population (univariate analysis) and sometimes compare groups (bivariate analysis)

case-control study

describe the study population (univariate analysis) and compare groups (bivariate analysis) and sometimes regression and other advanced analysis (multivariate analysis)

cohort study

describe the study population (univariate analysis) and compare groups (bivariate analysis) and sometimes regression and other advanced analysis (multivariate analysis)

experimental study

describe the study population (univariate analysis) and compare groups (bivariate analysis) and sometimes regression and other advanced analysis (multivariate analysis)

descriptive statistics

are often used to describe the average response to a variable in a population (for numerical variables, the average is often referred to as the central tendency

for numerical variables, the average is often referred to as this

central tendency

3 ways to report the average

1.) mean


2.) median


3.) mode

mean

is calculated by adding up the values of all responses provided to a question and dividing that sum by the total number of individuals who answered the question

median

is the middle number when all responses are put in order from least to greatest, half of the responses in a data set will be greater than the median, and half will be less

mode

is the most common answer given by respondents

for ratio and interval variables, the central tendency can be described using

means, medians, and mode

for ordinal variables

a median or mode can be reported

a mode can be reported for

categorical variables

means and medians provide info about the

center of a data set, but they do not provide info about how much variability exists in the data set; for example, the participants in a study of adults with a mean age of 50 years may all be 50 years old, or they could range from 18 to 104 years old; this info is very important to have when interpreting the meaning of results

measures of spread are also called

dispersion and are used to describe the variability and range of responses

range

for a variable is the difference between the responses with the greatest and least numeric values; for example, if the youngest participant in a study is 18 years old and the oldest is 104 years old, the range is 104-18=86 years

median

marks the value that divides the responses into two halves with equal numbers of observations

quartiles

mark the three values that divide a data set into four equal parts

tertiles

divide a data set into three equal parts

quintiles

divide a data set into five equal parts

deciles

divide a data set into 10 equal parts

interquartile range (IQR)

is the range for the 25th to 75th percentiles, which captures the middle 50% of responses

boxplot

also called a box-and-whisker plot can be used to display this info, such as IQR, median, quartiles, etc.; they can be especially helpful for displaying the distribution of responses when the responses are skewed

the whiskers (inner fences) in a boxplot show the

highest and lowest values

outliers

responses more or less than 1.5 IQRs from the median

skewing

occurs when the whiskers on the boxplot extend much farther on one side of the median than on the other side

histogram

is an alternative way to display the responses to a numeric variable like a ratio variable or an interval variable

on a histogram, the x-axis shows

the value of responses

on a histogram, the y-axis shows

the count of the number of times each response was given

for a graph to be considered a histogram, each bar must be the

same width

there should be no gaps between the bars in the

middle of the distribution, where responses are clumped together (there can be gaps to indicate values of the variable with a count of 0 responses) for a histogram

a histogram showing a normal distribution (Gaussian distribution) or approximately normal distribution of responses will have a

bell-shaped curve with one peak in the middle; however, not all numeric variables have a normal distribution. The distribution may be skewed.

the distribution may be skewed with responses that extend farther from the peak on either the


left (left-skewed) or the right (right-skewed) side of the histogram

the distribution may have a

bimodal (two-peaked) distribution instead of being unimodal (one peak), or it may be uniform, with equal numbers of people providing each response

for variables with a relatively normal distribution - a reasonably bell-shaped curve - the standard deviation describes

the narrowness or wideness of the range of responses

When the responses are normal:

* 68% of responses fall within one standard deviation above or below the mean


* 95% of responses are within two standard deviations above or below the mean


* More than 99% of responses are within three standard deviations above or below the mean

a small standard deviation indicates that

most responses were fairly close to the mean

a large standard deviation indicates that

the range of responses was wide

a z-score indicates

how many standard deviations away from the sample mean an individual's response is; for example, an individual whose age is exactly the mean age in the population will have a z-score of 0, a person whose age is one standard deviation above the mean in the population will have a z-score of 1, a person whose age is two standard deviations below the population mean will have a z-score of -2

a histogram or boxplot cannot be used to display the responses to

categorical variables, the distribution of responses must instead be displayed in a bar chart (or, less often, a pie chart)

like a histogram, the x-axis of a bar chart show the

values of responses

like a histogram, the y-axis of a bar chart show the

count of the times each response was given; however, for a bar chart the x-axis can display either a number or a word

a histogram requires

numbered bars to be evenly spaced along a number line

responses on bar charts

may appear in any order

the bars in bar charts can be displayed

vertically or horizontally, and there are usually spaces between the bars

the goal of descriptive statistics is to

describe accurately all the responses to a variable

for ratio and interval variables


descriptive statistics

the mean and standard deviation are typically reported

for ordinal variables (and for ratio and continuous variables without a normal distribution like the bell-shaped curve)


descriptive statistics

the median and interquartile range are often reported

for categorical variables


descriptive statistics

the proportions of participants who provided a particular response is usually used to describe the population

variable type: ratio


descriptive statistics

common measure of central tendency: mean



common measure of spread: standard deviation



typical means of display: histogram

variable type: interval


descriptive statistics

common measure of central tendency: mean



common measure of spread: standard deviation



typical means of display: histogram



variable type: ordinal/ranked


descriptive statistics

common measure of central tendency: median



common measure of spread: interquartile range



typical means of display: boxplot


variable type: nominal/categorical


descriptive statistics

common measure of central tendency: mode



common measure of spread: none



typical means of display: bar chart, pie chart

variable type: binomial


descriptive statistics

common measure of central tendency: mode



common measure of spread: none



typical means of display: bar chart

three of the most serious forms of research misconduct are

1.) falsification


2.) fabrication


3.) plagiarism

falsification

the misrepresentation of results

fabrication

the creation of fake data

plagiarism

the use of other people's ideas or words without proper attribution

statistical honesty requires more than merely avoiding outright falsification, fabrication, and plagiarism, it also requires

adherence to accepted statistical practices

scientific integrity requires researchers to

follow established statistical practices

ideally the researcher should consult with a statistician during the study design process to ensure that: (3 items)

1.) The sampling methods and sample size are appropriate


2.) The questionnaire will yield usable data


3.) The analytic strategy is a reasonable one

comparative statistics

compare groups of participants by sex or age, by exposure or disease status, or by other characteristics. Examples of comparative statistical tests include rate ratios, odds ratios, t-tests, and Chi-square tests.

study approach: case-control study

first step: show that cases and controls are similar except for disease



key analysis: use odds ratios (ORs) to see whether cases and controls have different exposure histories

study approach: cohort study

first step: show that the exposed and unexposed are similar except for exposure status



key analysis: use rate ratios (RRs) to see whether the exposed and unexposed have different rates of incident disease

study approach: experimental study

first step: show that the individuals assigned to the intervention and control groups are similar except for exposure status



key analysis: use rate ratios (RRs) and other measures to see if the intervention and control groups have different outcomes

comparative statistical tests

categorize study participants into two or more groups and compare the characteristics of the groups; for example, the analysis of a case-control study requires using comparative tests to show that the cases (people with the disease) and controls (people without the disease) in the study were similar in terms of age distribution and other demographic characteristics; then additional comparative tests are applied to determine whether the exposure histories of cases and controls were different

comparative tests can also be used to

compare before and after characteristics of participants in longitudinal and experimental studies

comparative statistical tests usually are designed to test for

difference rather than for sameness

comparative statistical test questions are usually phrased in terms of

differences: Are the means different?


Are the proportions different?


Are the distributions different

each comparative statistical question about statistical difference has two possible answers:

the values are either different or not different

example statistical question: Are the means different?

Null hypothesis (H sub 0): The means are not different.



Alternative hypothesis (H sub a): The means are different.

Null hypothesis (H sub 0)

describes the expected result of a statistical test if there is no difference between the two values being compared. (Null means nothing or zero)

Null result

means that there was no statistically significant difference

Alternative hypotheses (H sub a)

describes the expected result if there is a difference

Because statistical tests do not ask questions about sameness, the answers provided by statistical tests do not allow the researcher to say conclusively

whether two values are the same; instead a researcher must make a conclusion about whether the results of a statistical test indicate that values are different or not different

the language used to describe a decision is that the researcher will either

reject the null hypothesis or


fail to reject the null hypothesis

rejecting the null hypothesis

mean concluding that the values are different by rejecting the claim that the values are not different

failing to reject the null hypothesis

means concluding that there is no evidence that the values are different, functionally, this is like saying that the values are close enough to be considered similar, but failing to reject the null hypothesis should never be taken as evidence that the values are the same

the decision to reject or fail to reject the null hypothesis is based on the likelihood that the result of a test was due to

chance

when a sample population is drawn from a source population, the mean age in the sample population is usually

not exactly the mean age of the source population

the range of expected values for the mean age of sample populations drawn from a source population can be estimated using

statistics

some sample populations will have mean ages that are very close to the mean in the

source populations, other sample populations will have mean ages that are quite far from the mean in the source population

no set cutoff defines what will be considered extremely far from the mean age in the source population, but the standard is to say that

the 5% of sample means farthest from the true mean are extreme, thus, by chance 5% of the samples drawn from a source population will be expected to have an extreme mean

if two sample populations are drawn from the same source population, their mean ages will

not be identical even though they are drawn from the same pool of individuals, comparative statistical tests accommodate this expected difference when testing whether two groups in a study population are different; the test that compares the mean ages of cases and controls in a case-control study adjusts for the fact that there will be some difference between the mean ages of case and controls, even if the cases and controls are sampled from source populations with indentical means ages; the test will also examine whether the mean ages are so far apart that, if the cases and controls were drawn from source populations with the same mean age, the difference between the mean ages of the cases and the controls would fall in the 5% of most extreme differences expected by chance

when the difference in mean ages is great, the statistical test will show that it is

highly unlikely that the group means are not significantly different, the researcher will therefore reject the null hypothesis and conclude that the mean ages of the cases and the controls are different

the 5 of most extreme sample means =

2.5% of the means have the lowest values and


2.5% of the means have the highest values

count

number of sample populations drawn from the source population that have a particular mean

if the statistical test shows that the mean ages of cases and controls are fairly close, the researcher will

fail to reject the null hypothesis and will conclude that the means are not different

p-value or probability value

for a statistical test is used to decide whether the results observed are likely to reflect real differences between groups; the interpretation is similar for all statistical tests: the p-value for the study determines whether the null hypothesis (H sub 0) will be rejected

the standard is to use a significance level of

a = 0.05 or 5 %



any statistical test with a result that is in the 5% of most extreme responses expected by chance will result in the rejection of the null hypothesis

some studies use a = 0.01, which makes it

harder for a test to find a statistically significant result that would cause the rejection of the null hypothesis

others use a = 0.10, which makes it more

likely that a test will yield a statistically significant result

some p-values are reported as being

one-sided or two-sided, based on the alternative hypothesis for the statistical test

when direction is specified, a

one-sided p-value can be used

when a direction is not specified, a

two-sided p-value should be used to make the decision about rejecting or failing to reject the null hypthesis

Example:



H sub 0: The means are not different.

Conclusion When:



p < 0.05 = reject H sub 0: The means are different.



p >= 0.05 = fail to reject H sub 0: The means are not different.

Examples of one-sided and two-sided alternative hypothesis



Null hypothesis: The means are different.

Two-Sided alternative hypothesis (H sub a):


The means are different.



Example of a one-sided alternative hypothesis (H sub a): The mean of cases is higher than the mean of controls.


confidence intervals (CIs)

provide info about the expected value of a measure in a source population based on the value of that measure in a study population

the width of the interval is related to the

sample size of the study

a larger sample size will yield a

narrower confidence interval

a 95% confidence interval is usually reported, and that corresponds to a significance level of

a = 0.05 for a statistical test; that means that 5% of the time a 95% confidence interval is expected to miss capturing the true value of a measure in the source population

using a 99% confidence interval (a = 0.01) would make the confidence interval

wider and make it more likely that the value in the source population would be captured within the confidence interval, but it would also make it more difficult to classify a result as statistically significant because fewer results would be classified as extreme.

a 90% confidence interval (a = 0.10) would make the confidence interval

narrower and make it easier for a result to be deemed statistically significant because more results would be classified as extreme, however, a 90% confidence interval would be less likely than a 95% confidence interval to capture the true value in the source population

a 90% confidence interval for an odds ratio (OR) is

less likely to overlap with OR = 1 than a 99% confidence interval, so although the 90% confidence interval is less likely to capture the true odds ratio, it is also more likely that the OR will be deemed to show a statistically significant association between the exposure and the outcome

some of the most common types of comparative analysis are the measures of association, such as

the correlation used for ecological studies, the odds ratio (OR) used for case-control studies, and the rate ratio (RR) used for cohort studies.

the OR and RR compare responses to two variables that have each been divided into two levels using what is sometimes called a

2 X 2 analysis

prior to using a computer to calculate an OR or RR, variables that are not already divided into two categories must be

recoded into binomial variables (often coded numerically as yes = 1 and no = 0

The reference group for an odds ratio or rate ratio should be

well-defined

the 95% confidence interval provides info about the

statistical significance of the tests

for statistical comparisons more complex than 2 X 2 analysis, analysts must select a

test that is appropriate to the goal of the analysis and the types of variables being analyzed

first, the variables to be compared should be

selected and the goal of the test clearly stated

then select a

test that is appropriate for the types of variables being examined, some test require the variables being examined to have particular distributions or other characteristics, the researcher must confirm that the variables meet these assumptions of the test prior to running it and interpreting the output

plan for hypothesis testing: 6 steps

1.) select variables to compare


2.) specify the goal of the test


3.) check variable types


4.) choose appropriate test for the variables


5.) confirm that the assumptions of the test are met


6.) run test and interpret results

statistical tests are often classified as being either

parametric or nonparametric

The basic difference between these two types of tests is that

parametric tests make more assumptions about the variables being examined than nonparametric tests

parametric test

assume that the variables being examined have particular distributions, often requiring the variables to have normal or approximately normal distributions. These tests may also require that the variance for the variable of interest - the spread of observation around the mean - be equal or at least similar in the population groups being compared

nonparametric tests

do not make assumptions about the distributions of responses

parametric tests are typically

used for ratio and interval variables with relatively normal (bell-shaped) distributions of responses. Most parametric tests are more statistically powerful than nonparametric tests, so the preference is to use a parametric test whenever the variable being examined fits reasonably well with the assumptions the test makes about sample size, distribution, and the equality of variances

nonparametric tests

are often used for ranked variables, such as responses to surveys that ask participants to indicate preferences using scales from 1 (strongly disagree) to 5 (strongly agree). They are also used when the distribution of a ratio or interval variable is non-normal. Additionally, nonparametric test are used for categorical variables, including variables with just two groups (such as cases and controls, males and females, children and adults)

the goal of some statistical tests is to

compare the value of a statistic in a study population to some set value

independent populations

are populations in which each individual can be a member of only one of the population groups being compared

statistic being evaluated: mean for ratio/interval variable (parametric tests)

test for whether the statistic in one population is different from a hypothetical value: one sample t-test



test for whether the statistic differs in two populations: independent samples (two-sample) t-test



test for whether the statistic differs in two or more populations: one-way ANOVA (F-test)


statistic being evaluated: median for ordinal/rank variable (nonparametric tests)

test for whether the statistic in one population is different from a hypothetical value: one-sample median test



test for whether the statistic differs in two populations: Mann-Whitney U test (Wilcoxon rank sum test, Wilcoxon-Mann-Whitney test)



test for whether the statistic differs in two or more populations: Kruskal-Wallis test


statistic being evaluated: proportion for binomial variable

test for whether the statistic in one population is different from a hypothetical value: binomial test



test for whether the statistic differs in two populations: Fisher's exact test



test for whether the statistic differs in two or more populations: Chi-square (x to the power of 2) test


statistic being evaluated: proportions for nominal categories (variables)

test for whether the statistic in one population is different from a hypothetical value: Chi-square (x to the power of 2) goodness-of-fit test



test for whether the statistic differs in two populations: Chi-square (x to the power of 2) test



test for whether the statistic differs in two or more populations: Chi-square (x to the power of 2) test

The appropriate test to use depends on the type of

variable being examined

a two-sample (independent samples) t-test could be used to compare the

mean ages of cases and controls participating in a case-control study

a Fisher's exact test could be used to examine whether the

proportions of males in the exposed and unexposed groups of a cohort study are similar

A Chi-square test could be used to determine whether

the distributions of participants by race or ethnicity are similar for the intervention and control groups of an experimental study

when running statistical tests, it is often beneficial to create a

table of basic info about the variables of interest for each of the comparison groups as well as the result of the statistical tests used to compare those populations

a different set of tests is used when the goal is to

compare before and after results in the same individuals

if the goal is to see whether on average a participant in a cohort study gained weight between the baseline exam and the 1 year follow up exam a

matched pairs t-test can be used

if the goal is to see whether a safe driving course improves the pass rates for a driving licensure exam a

McNemar's test can be used to examine how many participants switched from failing a pretest to passing a post test, how many switched from passing a pretest to failing a post test, and how many had no change in status, McNemar's test can also determine whether the differences indicate that the course had a significant impact on exam pass rates

test for whether the value of the variable is different in one population measured twice (such as before and after in the same population) or in two paired groups: matched pairs (paired) t-test for the ratio/interval variable (parametric tests)

Test for whether the value of the variable is different in two or more matched groups: one-way repeated measures ANOVA for ratio/interval variable (parametric tests)

test for whether the value of the variable is different in one population measured twice (such as before and after in the same population) or in two paired groups: Wilcoxon (matched pairs) signed rank test or sign test for matched pairs for ordinal/rank variables (nonparametric tests)

Test for whether the value of the variable is different in two or more matched groups: Friedman test for ordinal/rank variables (nonparametric tests)

test for whether the value of the variable is different in one population measured twice (such as before and after in the same population) or in two paired groups: McNemar's test for binomial variables

Test for whether the value of the variable is different in two or more matched groups: Cochran's Q test for binomial variables

test for whether the value of the variable is different in one population measured twice (such as before and after in the same population) or in two paired groups: McNemar's test for nominal categories (variables)

Test for whether the value of the variable is different in two or more matched groups: Cochran's Q test for nominal categories (variables)

only a very limited number of studies require

regression analysis or any of the other advanced statistics

researchers should not use these advanced statistical tests without first knowing

when to use them, what conditions have to be met to make their use appropriate, how to run them, and how to interpret them

one of the main reasons researchers use multivariate statistical modes, that is analysis of three of more variables at one time is to

examine the interactions that may occur among variables, this may be especially helpful when a third variable (also called an extraneous variable or lurking variable) may be concealing or distorting the true relationship between the two other variables

several different types of third variable effects might occur, including

confounding and and effect modification

confounder

may make the association between an exposure variable and an outcome variable appear more or less significant than it truly is

when a third variable is shown to be a confounder, an

adjusted measure of association, such as an age adjusted odds ratios, should be reported for the association between the exposure and outcome, for example age might be a confounder

effect modifier sometimes called an interaction term

is a third variable that often represents biologically distinct groups of individuals who might experience different biological responses to various exposures, for example, menopausal status might be an effect modifier

if a third variable is shown to be an effect modifier, it is usually best to report

separate stratum specific measures of association for each level of the effect modifier (such as separate results for premenopausal and postmenopausal women), pooling the results for the biologically different groups may hide meaningful differences, so an adjusted or crude measure of association should not be reported when effect modification is occurring

to be a confounder or effect modifier, the

third variable must be independently associated with both an exposure (or predictor) variable and an outcome variable

confounding example

OR for female = OR for male but not = OR for crude

effect modification example

OR for female is not = OR for male is not = OR for crude

neither confounding or effect modification example

OR for female = OR for male = OR for crude

when a third variable is associated with both

the exposure and the outcome of interest, unadjusted analysis may hide the true association between the exposure and the outcome; these two relationships should be confirmed. Then a crude odds ratio or other measure of association for the relationship between the exposure and the outcome should be calculated, along with a separate measure of association for each level of the third variable, such as separate odds ratios for males and females.

how to identify confounding

1.) confirm that exposure is statistically significant


2.) confirm that outcome is statistically significant


3.) calculate three measures of association (OR or RRs) for the third variable


1.) crude OR between the exposure and outcome


2.) OR for stratum 1


3.) OR for stratum 2


4.) interpret results


crude and stratum specific measures are compared using a

Breslow-Day test for homogeneity or interaction, a - 2 log likelihood test, or another appropriate statistical test

after running a suitable test, the interpretations is as follows

1.) If the crude and stratum specific odds ratios are all similar, then neither confounding nor effect modification is occurring. Report a crude measure.


2.) If the stratum specific measures of association are equivalent to one another, but different from the crude measure of association, the third variable is a confounder. Report an adjusted measure.


3.) If the stratum specific measures of association are different from one another and different from the crude measure of association, the third variable is an effect modifier. Report stratum specific measures.

regression is often the easiest way to

adjust for one or more confounding variables or interaction terms during analysis

regression models

seek to understand the relationship between one or more predictor (independent) variables and one outcome (dependent) variable

predictor

independent variable

outcome

dependent variable

The models allow the effect of one predictor variable on the outcome to be examined while

controlling for other predictor variables (keeping them constant)

two most common types of regression

linear regression


logistic regression

some statistical software programs require the analyst to

1.) select a variety of specifications for the model, such as the particular estimation technique (often an ordinary least squares, generalized least squares, or maximum likelihood estimation model)


2.) choose the method the computer will use to select variables for inclusion in the model, for example, an enter method will include all predictor variables in the model; a forward step wise method adds the best predictor variables to the model one at a time until adding an additional variable does not significantly improve the fit of the model; a backward step wise method deletes variables from the model until deleting a variable significantly reduces the fit of the model


3.) check the fit of the model by examining its residual terms, which measure how well real data match the values predicted by the model, and the results of statistical tests of the goodness-of-fit for the model

steps in fitting a regression model

1.) select one outcome (dependent) variable


2.) identify the appropriate type of regression (such as a linear or logistic model) for the outcome variable


3.) select one or more predictor (independent) variables


4.) check to make sure that any assumptions required for the model (such as the variable types or distributions of outcome and predictor variables) are met


5.) choose a selection method for helping the computer decide which set of predictor variables will produce the best fit model (the model that the computer determines is the best at explaining the relationship between the predictor variables and the outcome variable)


6.) examine the model for potential problems. For example, examine residuals for possible autocorrelation, check for possible interaction between predictor variables (such as the multi-collinearity that might occur when two predictor variables are highly correlated), and look for other potential problems that might need to be addressed.


7.) interpret the results of the regression model, and consider whether they are logical (for example, that all necessary covariates are included and all illogical ones are excluded)

a linear regression model is used when the

outcome variable is a ratio or interval variable

simple linear regression models

examine whether there is a linear relationship between one predictor variable and the outcome variable

the relationship between the predictor and outcome variables can be visually displayed using a

scatterplot, and the regression model finds the best fit line for those points

the slope of the line is the

coefficient for the predictor variable (often designated B in the output of statistical software programs)

the y intercept for the line is the

coefficient for the constant in the regression model

these values can be used to write an equation for the best fit line, and that equation can be used to predict

the expected value of the outcome variable for various values of the predictor variable

r^2

square of the correlation coefficient, provides the info about how well the regression model predicts the variation in the values of the outcome variable, the value of r^2 ranges from 0 to 1, with the larger values indicating a better model fit

if r^2 = 0.79

it means that the predictor variable explains 79% of the variation in the values of the outcome variable

if predictor 1 is held constant, a

1 unit increase in predictor 2 is associated with a 0.6 unit increase in the expected value of the outcome variable

equation for simple linear regression model

outcome = 3.1 * predictor 1 = 0.9

equation for a multiple linear regression model with two continuous variables

outcome = 0.5 * predictor 1 + 0.6 * predictor 2 - 6.2

multiple linear regression models

examine the effects of several predictor variables on the value of the outcome variable

the coefficients (B) for the predictor variables and the constant can be used to write an

equation for a best fit line; that equation can be used to examine the effect of each predictor variable on the outcome variable while controlling the other predictors by holding their values constant

multiple linear regression models

can have both continuous and categorical predictor variables, as long as the responses to categorical variables are expressed by numbers

the predictor variables in multiple linear regression models may

interact; for example, interaction may be occurring when the best fit regression lines for males and females have considerably different slopes

logistic regression models (sometimes called logit regression models)

are used when the outcome variables is a dichotomous variable; logistic regression is commonly used in case-control studies, for which the outcome variable is usually case status, with case = 1 and control = 0; there are other types of outcome variables, such as yes/no variables, typically yes = 1 and no = 0

predictor variables for a logistic regression can be

categorical or continuous

the coefficient for a predictor variable in a logistic regression model is the

natural log of the odds ratio, ln (OR), so the odds ratio for the association between that predictor variable and the outcome variable can be found by taking the exponential of the coefficient, exp (B)

the odds ratio for each predictor variable represents the

change in the odds of the outcome, typically the odds of being a case or being classified as a yes, for a 1 unit change in the predictor variable

the confidence interval for the odds ratio can be calculated using the

value of the coefficient and its standard error

the predictor variables in regression models can take a variety of forms but must have

numeric responses

nominal categorical variables have responses that cannot be

ordered and assigned a rank, but a series of dummy variables that convert categorical responses to a series of dichotomous (0/1) variables can be created, additionally when fitting a logistic regression model, it might be helpful to convert ratio ratio and interval variables to dummy variables so that a series of odds ratios for the levels of the variable can be estimated

if the original categorical variable has n possible responses, then

n -1 dummy variable are required to capture all the responses to the original question, all n - 1 variables should be included in a regression model (even if some may be eliminated during a step wise selection process)

survival analysis

examines the distribution of the durations of time that individuals in a study population experience from an initial time point (such as the time of enrollment in a study or the time of diagnosis of a particular condition) until some well-defined event, which can be death or some other outcome

measures of survival include

1.) median survival time


2.) cumulative survival at set times after diagnosis


3.) life tables that record conditional and cumulative probabilities of survival


4.) Kaplan-Meier plots that display cumulative survival rates

log rank test can be used to determine

whether survival is shorter in one population than in another

Cox proportional hazards regression

which estimates a hazard ratio that compares durations to an event (such as death) in two populations, can also be used for survival

If GPS (global positioning system) coordinates or other geographic data have been collected, then

spatial software programs may be useful for conducting the geographic portion of the analysis

the geographic data should be incorporated into a

GIS (geographical information system)

The GIS allows for spatial analysis such as

1.) the identification of spatial disease clusters (using a statistic like Moran's coefficient or Geary's coefficient)


2.) the determination of associations, if any, between the social or physical environment and disease


3.) the estimation of distances between locations


4.) the ascertainment of the geographic factors that are related to access to health services

research articles almost always have the same structure

1.) abstract


2.) introduction


3.) methods


4.) results


5.) discussion

abstract

summary of the article, the most important function is to serve as an advertisement for the manuscript, key words

abstract

should be accurate, reasonably complete, compelling, most are limited to 150 -250 words, its usually easier to write the abstract after the rest of the paper has already been written and the focus, key results, and conclusions are already clear

two types of abstracts

1.) structured abstract uses subheadings, like objectives, methods, results, and conclusions to highlight content


2.) unstructured abstract usually follows the same outline but does not list the section titles

introduction section

provides the essential background info that a reader must know to understand the methods and results of the article, this section often includes info about the study population, the study site, and the study years; it might include a comparison to previous studies and a discussion of what is novel about the new study, but that content might appear in the discussion section instead; most intros conclude with a statement about the importance or significance of the study and the specific aims, objectives, or hypothesis that the paper will address

methods section

should begin by clearly identifying the study design used; if person, place, and time characteristics were not provided in the intro, they should be listed in this section; definitions should be provided for the key exposures, outcomes, and other variables; the methods section should provide info about ethical considerations, ethical issues can be included in the endmatter, depending on preference of journal; this section should end with a description of the statistical methods used; it can be written before data collection begins because most of the methods are finalized before data collection starts



for a case-control study, the case definition should be spelled out; for an experimental study, the intervention and control should both be described in detail; for some studies, supplying the exact phrasing and order of questionnaire items, along with the steps taken to validate the survey instrument, might be important

for primary studies

the methods used to identify, sample, and recruit participants should be described and the inclusion and exclusion criteria listed; the methods for collecting data should also be described, including interview techniques, laboratory methods, physical examinations checklists, and measurement methods, study design; key exposures, outcomes, other variables, setting and dates of study

for secondary analysis

the report should specify who collected the data originally, how they were collected, how they were collected, how they were acquired for secondary analysis, and the role, if any, that the authors of the new paper had in data collection

results section

should start with a description of the study population that clearly identifies the sample size and the demographics of the participants; additional results of statistical analysis should then be provided, using tables and figures; the results of a statistical test should not be reported unless the authors fully understand when that test can be used and how it should be interpreted; number of participants at each stage of study

discussion section

usually begins with a summary of the key findings of the new study; ideally, the key findings should match the aims, objectives, or hypothesis spelled out in the last paragraph in the intro; the ensuing paragraphs should compare the new study to previous studies and include a thorough discussion on the relevant existing lit and an adequate number of citations; every paper needs to include at least one paragraph on the limitations of the study and it should identify potential problems such as types of bias; the final paragraph of the discussion should state the conclusions of the study and might include new theories that emerge from analysis; generalizability of study

endmatter

1,) the affiliations of the authors and their contact info (if not listed on title page)


2.) contributions of each author to paper


3.) acknowledgments of people who assisted with the study but did not meet authorship criteria


4.) info about some ethical aspects of research (informed consent)


5.) a list of all funding sources


6.) disclosures of the presence or absence of possible conflicts of interest

tables and figures

many health journals limit the number of tables and figures allowed for each article, often to a max of 4 (tables and figures combined)

tables

should be used to organize and present statistical results that cannot be easily listed in the text in a sentence or two

graphs and other figures

should be used when a visual presentation of the material is more effective than words at conveying a result; there is no need to repeat info in the table and figure that is provided in the table or figure, but be sure to have a callout for each table and figure that indicates when the reader should refer to the table or figure

a table should provide enough info so that it can be independently interpreted and understood even in the absence of the text:

1.) the title of the table should provide a brief but clear description of the content


2.) the rows and columns should each have a descriptive label, i.e. units, sample sizes (n)


3.) for each statistic provide a confidence interval, p-value, and measure of uncertainty such as standard deviation or standard error for a mean or interquartile range for a median


4.) a note just below the table or in the title bar should explain the meaning of asterisks and other symbols commonly used to denote statistical significance and other items of interest


5.) consistent fonts, spacing, and number of decimal points should be used for all tables in the manuscript

graph

should provide info in the title, figure, and/or legend or key for a reader to be able to interpret the graph even without reading the related portion of the text; high-resolution photographs, maps, flowcharts, and other images provided by the authors can also be used as figures

bar graphs

used to display categorical data

systematic review

checklist: PRISMA (preferred reporting items for systematic reviews and meta-analysis

meta-analysis

checklist; PRISMA, MOOSE (meta-analysis of observational studies in epidemiology)

Cross-sectional survey

checklist: STROBE - cross-sectional, case-control, cohort (strengthening the reporting of observational studies in epidemiology)

experimental study

checklist: CONSORT (consolidated standards of reporting trials for randomized controlled trials), TREND (transparent reporting of evaluations with nonrandomized designs)

qualitative studies

checklist: COREQ (consolidated criteria for reporting qualitative research)

introduction section

of a manuscript usually provides the background necessary to understand the importance of the new work

discussion section

typically provides an extensive comparison of the results of the new study to the results of previously published works

a typical article in the health sciences refers to about

20 or 30 other articles published in peer-reviewed journals

references

should be carefully selected to support the importance, validity, and conclusions of the study; can also be used to acknowledge the alternative methodological approaches that could have been used, to identify both areas in which the new findings agree with the existing lit and areas where findings contradict previous studies and to provide varying perspectives on the policy and practice implications of the study

citing an article

is a way of endorsing the work of its author or in a rare instance when specific flaws need to be pointed out

do not trust this to be reliable

abstracts

journal articles are the preferred source of

evidentiary support for scientific articles, although books, book chapters, and formal reports (such as those published by governmental agencies and international organizations) are also acceptable

formal scientific reports

1.) are published in a peer-reviewed journal (sometimes in a report or book) not on a website, in a newspaper, or in a popular magazine)


2.) describe the study design and explain why it was appropriate for the objectives of the study


3.) explain how exposures and outcomes were defined and assessed


4.) describe the analytic approaches used and present results using easily interpreted tables and graphs


5.) draw conclusions that are reasonable and based on the study's data


6.) discuss the limitations of the study


7.) compare the new study to previous studies


8.) follow a standard outline and other conventions for scientific writing

informal sources

1.) website or fact sheet


2.) newspaper or popular magazine



citable? rarely


formal sources

1.) statistical database - citable


2.) official report - citable


3.) book or book chapter - citable


4.) abstract - not citable


5.) article - citable

few scientific articles

quote directly from another source word for word

one of the most important is that borrowing phrases and sentences from other sources can make writing in a document

choppy

paraphrasing

saying the same thing in one's own words, does not remove the requirement to cite an original source, just means quotations don't have to be used; use an in-text citation

direct quote

entire quote must be in quotations or if long indent from the left margin

specific knowledge

such as a statistic or the results of a particular field or lab study must be cited; however, some areas of general knowledge or common knowledge do not require citation

common knowledge

refers to what a typical person in the discipline would know, it does not refer to what a randomly selected person at the grocery store would know; when in doubt cite

plagiarism

occurs when someone's wording, thinking, or creative output is repeated in a new document without attribution



coping the exact words, paraphrasing a unique theory or observation, using an image without permission and acknowledgment are all forms of plagiarism; redundant publication or falsification of data or fabrication are all plagiarism and is a major violation of scholarly integrity and the article must be retracted with public acknowledgment

helpful habits when using sources

1.) never cut and paste info into your document, paraphrase and cite it


2.) always include reference in research notes for later citation

citations typically appear in two forms

1.) as in-text citations where the sources of info are briefly identified in the text


2.) in a reference list at the end of the document that provides a full bib info for each source

most medical and public health journals use some version of a citation style

ICMJE (International Committee of Medical Journal Editors) style or Vancouver style or NLM (National Library of Medicine) or AMA (American Medical Association) or APA (American Psychological Association) - which is commonly used for social sciences as well for nursing journals or a journal will use their preferred style

some journals will convert bracketed citations to

supercript numbers during the editing and layout process

reference list

can be either alphabetical or in order of appearance in article

journals that use ICMJE style or a variant typically

list authors by last name and first initials, then title with capitals only for proper nouns, an abbreviated journal name, pub year, volume, and page numbers, separated by periods or semicolons or commas; some journals expect all authors to be listed no matter how many; some use an abbreviated version for 6 or more; some use full journal name; some list issue numbers, but most do not; key is to be consistent

vast majority of work has been completed when

1.) study questions has been identified and refined


2.) study approach has been selected and a protocol developed


3.) data have been collected and analyzed

three key times to address writer's motivation

1.) first, writers must overcome barriers to getting started


2.) second, writers must find ways to prolong the period of high productivity that often occurs at the start of the writing project


3.) finally, most writers become fatigued during the writing process and at some point lose all desire to even think about their projects; they must find motivation

if researcher does not know how to begin writing, an easy way to start is

1.) put working title for paper along with names of authors


2.) add in headers for abstract intro, methods, results, discussion, acknowledgments, and references


3.) fill in names of people to thank in acknowledgments section


4.) paste in a table or figure that was created during the analysis


5.) paste in some relevant lines about methods from the protocol


6.) then start filling in gaps


7.) write a sentence or two for each key points



content does not need to be added in any particular order

staying motivated

change habits or scenery, make a timeline, set weekly meeting with advisor, speak content out loud or write informally

many manuscripts for health science journals are limited to a max of about

3000 words

writer's block

can last for weeks or months

what authors can do

1.) fully explain the actual methods used


2.) run all the appropriate analyses


3.) include a helpful set of references that support the results


4.) polish the prose


5.) honestly identify the limitations of the study and explain what was done to address them



no paper is perfect, but the above can be done to help and few are fatally flawed

lead author is responsible for checking

the manuscript very carefully

every paper should tell a story that has

1.) a beginning - the intro sets the stage


2.) a middle - the methods and results say what happened


3.) an end - the discussion provides a conclusion that ties all the parts of the story together

the story line should be able to be summarized into a

sentence or two

some journals require a precis that is

35 words or less

the abstract of a report should tell the whole story in

one compelling paragraph

the first step in editing is to make sure that the big pictures is

being clearly communicated

does the paper tell a compelling story?

1.) does the paper have a clear story line, can plot be summarized in one sentence


2.) does the title of the paper reflect the key aspects of the study


3.) does the abstract tell the key parts of the story


4.) do the opening paragraphs draw the reader into the story


5.) is the goal of the study clearly stated in the intro section


6.) does the methods section make it clear how the methods were helpful in answering the study question


7.) do the results and discussion sections provide the answer to the study questions


8.) is the story missing any parts that need to be added, are there any gaps in logic that need to be addressed


9.) are any parts of the manuscript redundant, any parts peripheral to the main story


10.) are the conclusions fully supported by data


11.) does each paragraph have a theme

once the pieces of the paper's story are clear, the next step is to

check the structure and content of the manuscript; the paper should be well organized, complete, but concise, and accurate about what was done and what was found; the text, the tables and figures, and the reference list must all meet these same requirements

checklist for structure and content of paper

1.) paper well organized, content focused


2.) intro provide all essential background info (person, place, and time listed in both abstract and the text)


3.) does the intro make research appear important and necessary


4.) does the intro say why study is novel


5.) are methods described in adequate detail


6.) is enough stat analysis presented, is each stat included necessary


7.) are tables and figures well designed


8.) are all stats presented in figures or tables or in text, but not in both


9.) does discussion provide concise summary of key findings and place new findings in context of previous research, does discussion avoid reiterating results section


10.) does discussion adequately address potential limitations of study


11.) is every claim in discussion supported by citations, should additional references be added


12.) is every reference listed important and necessary


13.) has paper been double checked for plagiarism


14.) is every part of paper truthful

in a final check, look at

each word, sentence, paragraph, and section and examine for clarity



1.) words must be used carefully


2.) sentences must be concise and clear


3.) voice must be consistent


4.) grammar and spelling must be proper

checklist for style and clarity

1.) are words used precisely, incidence and prevalence used correctly


2.) unnecessary jargon avoided, are definitions for all key terms provided


3.) are all abbreviations introduced at first use


4.) is the tone of writing appropriate, fact based and not emotion based


5.) does the article consistently use a third person voice or in rare cases first person voice


6.) do all subjects agree with verbs, active verbs rather than passive


7.) verb tense consistent


8.) each sentence clear, words spelled correctly


9.) all punctuation correct


10.) paper adhere to all guidelines for target journal

research results are often publicly shared for the

first time during an oral presentation or poster session at a academic or professional conference

primary outcome of most professional and academic conferences is

networking, conferences are a place to exchange ideas and a way to find out what others find interesting about project and to identify weak aspects of study

some conferences are

annual large events and others are small gatherings

most conferences include a mix of

1.) plenary sessions where keynote addresses are given


2.) business meetings run by officers of sponsoring organization


3.) concurrent sessions in which multiple panels of oral presentations are held at the same time in different rooms


4.) poster sessions in which attendees can mingle while reviewing research posters

presenters are usually assigned to give either an

oral presentation or a poster presentation

oral presentation

require speaking in front of a potentially large audience and may involve facing an open question and answer period, they are usually considered more prestigious than posters because there are more slots for posters

poster sessions

usually held in a less formal venue, require more preparation time than oral presentations usually

researchers are required to submit

an abstract for consideration by the organizing committee



the committee rates the abstracts, decides who will be invited to present, and select who will give presentation or poster



a good abstract has key words and conveys one clear health message that is appropriate to the audience

most conferences require presenters to

pay a registration fee often several hundred dollars

when preparing a poster give equal attention to

content and its design

sample layout for poster

title, author info, intro, methods, results, conclusions, references, acknowledgments



come prepared with necessary items

a typical oral presentation time slot is about

15 minutes long; bc of set up and questions, actual presentation time is 10 - 12 minutes



most presenters can cover 1 - 2 slides per minute, that is about 12 -20 slides for a 10 -12 minute talk



highlight key message with images in place of words as often as appropriate


sample slide distribution for a 10 - 12 minute talk

1.) title slide = 1 slide


2.) research goal/importance = 1 slide (start with key message)


3.) outline or summary = 1 slide


4.) background/specific aims = 1 - 2 slides


5.) methods = 2 - 4 slides


6.) results = 4 - 8 slides


7.) strengths/limitations = 1 slide


8.) conclusions = 1 slide (end with key message)


9.) acknowledgments and/or invitation for questions = 0 -1 slide


10.) total slides = 12 - 20 slides



develop a checklist for presentation slide show for content and layout and formatting



preparing a slide show is only the first step in preparing to make an oral presentation, practice is key


a few weeks before conference

confirm what equipment will be provided in presentation room, some expect presenters to bring their own computer, some require uploading presentation file to a website in advance, some ask for files to be e-mailed to moderators, some expect the file to be on CD or flash drive, make sure you have a back-up

presenters should

adhere to time limits, arrive 15 minutes early, check in with moderator, set everything up so ready, keep responses to questions in presentation short, acknowledge limitations and highlight strengths, thank everyone, have business cards

the culmination of a well designed and carefully conducted health research project is often the

dissemination of results through an appropriate publication

target journal

should be identified early

examination of recent articles published in target journal provides guidance for

1.) best outline to follow


2.) how to divide commentary between intro and discussion


3.) appropriate voice and writing style


4.) amount of technical detail to include


5.) reference and citation style

choosing a target journal entails many considerations including

1.) aim and scope of journal


2.) its audience


3.) its impact factor and other characteristics


4.) the possible cost of publication


5.) outline access options



# 1 and 2 are most important

determining whether an article is a fit with a specialty or regional is often

straightforward



knowing what topics fall withing the scope of a general journal is a little harder

one way to identify journals likely to consider a paper for publications is to

examine the manuscript's reference list

the target journal should not be selected primarily bc of its

impact factor, ranking, or reputation, even though these are important factors to consider



impact factor is based on number of times a typical article is cited in its first year or two after publication

journals with an impact factor of 10 or greater

1.) Science


2, ) Nature


3.) JAMA


4.) The Lancet


5.) The New England Journal of Medicine



most journals in health sciences have an impact factor closer to 1 or 2, specialty journals may have one less than 1



impact factors are often listed on journal websites, Web of Knowledge

some journals require short reports of

1000 - 1500 words, one table or figure, and a limited number of references; this is an appealing option for a case report, small case series, or an update to a previously submitted article

a comprehensive report of a large study will

exceed the usual 3000 -3500 word limit or the standard 4 tables and/or figures will require a journal that have more flexible word limits

many big name journals with low acceptance rates have a turnaround time of

only a few days or weeks bc send few manuscripts out for external review



specialty journals with higher acceptance rates may have a turnaround time of several months bc three or more external referees will review manuscripts

most journals have moved to

online submission systems, some require mail or e-mail submission

an increasing number of journals require

authors to pay for some publishing costs or submission fee, processing fee or processing charge, or page fee or page charge, some require authors to pay to become a member of the journal, open access fee - allows journal to make article available online immediately and some are given the choice, sometimes the fees are waived for low-income

some journals only publish only

in print, the vast majority allow online access to subscribers

being indexed in a competitive database like

MEDLINE, which examines the quality and editorial rigor of all journals

submission to a journal is not the

end of the writing process, additional revisions will be likely

one journal must

be selected, submitting to two or more journals is not permitted in health sciences and require a statement reaffirming this

author guidelines

state how manuscripts should be formatted



special attention should be paid to tables and figures and other images when formatting, they do not need the same typographic style of the journal



graphs and maps and other illustrations

are rarely reworked by a journal's graphic designer



most journals charge fee for printing in color

most submissions are made via

computer and a cover letter is still expected



cover letter should summarize manuscript and seek to convince the editor that the work is important, valid, original, and a good fit with the aims of the journal, decision could be based on abstract and/or cover letter solely

sample cover letter content

salutation, basic info, summary, importance, fit, required declarations, thanks, names/signatures

corresponding author

the coauthor who will communicate with the journal and answer questions from readers after the paper is published needs to register



the corresponding author may be the first author, the senior coauthor, or the coauthor with the most stable e-mail address and affiliation

some journals will ask for

1.) the type of article


2.) word count


3.) number of tables


4.) number of figures


5.) statements about ethics, funding, conflicts of interest


6.) confirmation that the article is being submitted only to one journal

ad hoc reviewer

reviewers who are not on the journal's editorial board who are asked to serve as peer reviewers bc of their expertise on the paper's topic or methods

desk rejection

rejection without review

reviewers provide two sets of comments

one on quality of manuscript for author


one on comments for editor

external review can lead to three possible results

1.) rejection


2.) opportunity to revise and resubmit


3.) acceptance

minor revision

may be reviewed by the assistant editor after resubmission



ex: typos, formatting of tables, not enough citations

major revision

may be sent back to the original reviewers



provisional acceptance

final acceptance pending on a few minor adjustments

responding to suggestions from editors and reviewers requires an author

1.) understand and appreciate different perspectives


2.) balance conflicting sets of advice about what would strengthen a paper


3.) deal with frustration of needing to rethink and rewrite whole portions to make it clearer


4.) recover from harsh criticism

if a project finds no association between an exposure and outcome the results may be

even more important to publish so that other scientists do not waste their time and resources



publishing a study with null results is often more challenging than publishing a study that finds an unexpected or strong association

the research process does not necessarily end with a

report



the research process is a cycle in which data analysis and reporting feed back into the formation of new study questions and establishment of a personal research trajectory

publishing enhances the authors

CVs and resumes