• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/52

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

52 Cards in this Set

  • Front
  • Back

Artefact

An artificial pattern caused by deficiencies in the data collection process

Association


(alternative name: relationship)

A pattern that connects two (or more) variables. This pattern would be unlikely to be generated by purely chance. Conversely, there is no relationship when learning the value of one variable would tell you nothing new about the likely value of the other

Bar Chart


(alternative names: bar graph, bar plot, column chart)

A graph used for categorical variables to display the percentages or frequencies falling into each category

Bimodal

When two peaks are evident in a graph of the distribution of a numeric variable

Box Plot


(alternative name: box and whisker plot)

A graph for displaying the distribution of a numeric variable. It splits the data into quartiles with the box part going from the lower (1st) quartile to the upper (3rd) quartile. A line is drawn at the median

Categorical Variable


(alternative names: qualitative variable, factor, class variable)

A variable whose values are names or codes for different groups (or categories)

Centre


(alternative name: average, location)

The idea of where the "middle" of the set of observations is

Cluster

A distinct grouping of values that is separated from other groupings of values

Dot Plot

A graph for displaying the distribution of a numeric variable. Each dot represents a single observation from a set of data. The form we use is a special case, a stacked dot plot

Entities


(alternative names: individuals, units, cases)

The individual "things" we are recording data about

Estimate

A number calculated from the data; used to estimate an unknown parameter value

False Negative

The individual has the condition but tests negative for the condition

False Positive

The individual does not have the condition but tests positive for the condition

Frequency


(alternative names: count, tally)

The number of times a value of a variable, or a category, occurs

Histogram

A graph made up of vertical rectangles that displays the distribution of a numeric variable. The range of the data is divided into class intervals which form the bases of each rectangle. The height of each rectangle is set so that the area of the rectangle represents the relative frequency with which values fall into that class interval

Interquartile Range (IQR)

A measure of spread for a distribution of a numeric variable. It gives "the length of the middle half of the data". Calculated by the difference between the upper (3rd) and lower (1st) quartiles

Mean

A measure of the centre for a distribution of a numeric variable. The total of all values divided by the total number of values

Median

A measure of the centre for a distribution of a numeric variable. The "middle value". It splits the data in half with half the observations at or above and half at or below

Missing Value

No information has been recorded for this cell or of this variable for this entity

Modality

Relating to or constituting the most frequent value in a distribution


(unimodal - 1 peak; bimodal - 2 peaks;


multi-modal - many peaks)

Nominal Variable

A categorical variable in which the categories have no natural order

Numeric Variable


(alternative name: quantitative variable)

A variable for which all of the values are numbers (e.g. from counting or measuring)

Oddities

Anything in the data that looks strange or odd. Things that make us wonder, "Is that a mistake?"

Ordinal Variable

A categorical variable in which the categories have a natural order

Outlier(s)

Value(s) that lie so far away from the bulk of the data that they look odd and make us wonder, "Is that a mistake?"

Overlap

A visual notion. The degree to which plots extend over common values

Overprinting

A problem with scatter plots when points sit on top of one another so that we are unable to tell how many points are sitting at a given position. This can lead to very misleading impressions of what the data is saying

Pie Chart

A graph for displaying the relative frequencies of a categorical variable. A circle is divided into sectors according to the relative frequency of each category

Proportion

A proportion refers to the fraction of the total that possesses a certain attribute

Quartiles

Comes from separating a numeric distribution into four groups, each containing equal numbers of values. The lower (1st) quartile is the middle of the lower half of the data and the upper (3rd) quartile is the middle of the upper half of the data

Range

A measure of spread for a distribution of a numeric variable, calculated by:


largest value - smallest value

Rectangular Data

Data organised and stored in such a way that each row corresponds to an individual entity and each column corresponds to a property recorded for these entities

Risk

A way of expressing the chance that something will happen. Risk is the same as probability, but it usually is used to describe the probability of an adverse event

Risk: Absolute Risk

The probability or chance a person in a population will have a specified (medical) event. Usually expressed as a percentage

Risk: Relative Risk

A comparison of the risk of a particular event for two different groups of people

Scatter

In a scatter plot, the extent to which the values of the response variable deviate from the trend

Scatter Plot


(alternative name: scatter graph)

A graph for displaying a pair of numeric variables in which points are plotted on a pair of axes to represent each entity. The coordinates of each point are the values of the two variables for that entity

Shape

Used to talk about the outline (or profile) of a plot of the distribution of a numeric variable

Side-By-Side Bar Chart

A bar chart for investigating the relationship between two categorical variables where, for each response (outcome) category in turn, we put all of the bars for the explanatory (predictor) categories together "side-by-side"

Skewed

The lack of symmetry in a distribution of a numeric variable. Positively skewed is when the data are piled up on the left and the tail extends out to the right. Negatively skewed is when the data piled up on the right and the tail extends out to the left

Spread


(alternative names: variability, variation)

The idea of the degree to which values of a numeric variable differ from one another (vary) or, visually, are spread out along the axis

Stacked Bar Chart


(alternative names: segmented bar chart)

A graph for displaying the relationship between two categorical variables. Constructed by taking a bar graph for one categorical variable and subdividing each bar according to the percentages of the second categorical variable

Standard Deviation

Approximately measures the average of the differences (distances) between the observations and the mean

Steam-and-Left Plot

A graph used to display numeric data. It is similar to a histogram but retains most or all of the numerical information

Subset

Used in this course in its everyday, nontechnical sense - a collection of things that is part of a larger collection

Subsetting

Dividing the entities in our data set into different groups (subsets) on the basis of their values for one or two subsetting variables. This allows us to make separate graphs of the same type for every subset and present them either as a matrix of tiled graphs, or by playing through them like a movie

Systematic Biases


(alternative name: systematic error)

Consistent biases caused by the way a system or process functions

Tile Density Plot

A tile density plot looks like a crude scatter-plot. In a tile density plot, the scatter-plotting region is divided into a set of rectangular tiles. If there is no data in the area covered by a tile it is coloured white. If there is data in the area covered by the tile it is coloured with the depth of colour (darkness) determined by the number of data points in the area covered by the tile

Transparency

A technique used in scatter plots to deal with overprinting in which we make the symbols semi-transparent. Where there is a lot of overprinting, the symbols will be darker and where there are single or few points overprinted, the symbols will be lighter

Trend

The overall pattern between the two numeric variables displayed in a scatter-plot

Variability

The extent to which we get different values for different individuals (or in some contexts, different values at different times)

Variable

A property that we record for each entity, e.g. a measurement, or one of a set of group labels (indicating categories)