• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/209

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

209 Cards in this Set

  • Front
  • Back

L2: Sampling

.............................

What is sampling?

-Collecting infromation from indidivuals (representative) -make inferences about population=larger.

Benefits: Cheaper, easier and quicker than sample entire population-trade off- practicality (sampling) and certainty (census).

What is a sampling frame?
-Sub group within overall population (collection elements-object of interest):ie=Otago Uni students
-Source material sample taken within (most appropriate for info wanted)
What are some problems associated with sampling?
a) Sampling variation: All individuals =different , different people- different results ( successive groups)- variation withinframe.

b) Sampling size: Increase size-better representation. large samples rare-balance : possible/accuracy/ decrease error.

c) Sampls: eed be balanced/fair: >30=representative. Need stats viable # obervations-statistically representative of population.
What are the 4 main types of statistical sampling methods?

a) Stratified sampling
b) Systematic sampling
c) Random sampling
d) Clustered sampling

Define stratified sampling, and describe its benefits and weaknesses?
Define: Prior knowledge of popultion: Otago Uni students (60% females, 40% males=population parameters).
-Stratify sample size fit proportion.
-Each population member must fit 1 strata.

BENEFITS: More precise estimates of population paramters.

neagtives: Over represent=bad.

eg: Class gender close to uni average=extrapolate to overall student population accord to question asked.
Define systematic sampling, and describe its benefits and weaknesses?
Define: # individuals 1 to n- select individuals at regular intervals (every 10th person).

Problems: Dampen /amplify regular pattern-variation in population.
-Render confidence intervals=invalid:
ie: Tides (cyclic pattern-bad result) , instead better o apply to something without pattern.

eg: Question 2 people at end of each row of tables. Compare results to class (error between sample and class data=isnt perfect).
Define random sampling, and describe its benefits and weaknesses?
Define: Select members of population at random. Everyone change in random selection.

Not suitable for:
-Uneven distribution over whole population
-Time and cost (walk whole forest sample trees)
-Prior knowledge of population-used increase efficiency.

Positives:
-No bias (prior knowledge)
-Confidence in results
-Able add random to others to increase accuracy.

eg: Random sampling frame:
a) 1st 20 respondents=wearing blue pieces of clothing
b) COmpare results to class census.
Define clustered sampling, and describe its benefits and weaknesses?
Define: Divides population into even groups -randomly select # groups for sampling (dunedin to suburbs -few sample)

Problems: -Only suitable if groups heterogenous (homogenous=inefficient).

Benefits: Save time / $ obtain data.

ie: 1st 3 rows (cluster accord to rows). Problem: Similar people sit next to eachother-all from Southland etc.
What sampling method would you use for this problem?

- Want average trunk diameter for pocket of pine forest near Lake.
-Pine forest = 10,000m^2

Hypothesis: Sun will create large diamters compared to less sun areas.
-Use random and stratified:

Strat according to areas (randomly select within that area to decrease # sampled).
L3: MINITAB-prelim data anlysis
......................................
When doing prelim data analysi- what need look for?
1) Normal distribution
2) Outliers/ unusual values
3) Understand variation and mean
4) Patterns within and between sets.
How on MINITAB do we get most basic form of stats overview for a dataset (must click options wanted in menu)?
Stat>Basic stats> Display descriptive stats
What do dotplots show?
Compare > 1 dataset
What do boxplots show?
Shows distribution of data (means, quartiles, outliers)
What do histograms show?
Basice, frequency data (GRAPH>HISTOGRAM).
So what is difference between bar charts/histograms and scatter X-Y plots?
-Bar charts/ histograms: Frequency data (counts)
-bars touch (categories continuous with histograms),
-Scatter X-Y Plots: Ratio data (scales): join points : intermediate values likely/estimated.
L4: Tests on 2 samples: T-test and Mann Whitney:
......................................
What is the central tendency?
Mean and median
What are the steps we must go through to check if parametric or non parametric test to undertake?
If any of answers are no: =Non parametric Mann Whitney test undertaken!

a) Are data measured on interval/ratio scales?
b) Are data normally distributed?
c) Are variances equal?
No=Unequal variance t-test
Yes=Pooled variance t-test.
What are the 3 fundamental types of number definitions?
a) Cardinal: How many #'s: 1, 2,3 (7 cars race)
b) Ordinal: # position something (1st place)
c) Nominal: # used as name i.d. subject (#99 -i.d. car)
What are the 4 types of data?
a) Ratio: 2:1 (50 kg 2x 25kg)

b) Interval: Difference between values: difference between 20 and 30 degrees same as 40 and 50 degrees, but 40 not 2 times 20 as no true 0 point.

c) ordinal: Numerical scale -rate (how hard something is compared to lower form-hard tell)

d) Nominal data: Categories/types-soil types.
What data types do parametric and non-parametric have?
Parametric: Pop paramters (mean, std deviations)-sample data.
-Interval /ratio

Non parametric ( not pop paramters, instead= median, range, quartiles, ranks)
-Ordinal data.
What are the two hypotheses?
Ho: No stat diff between samples (=same)
Ha: Stat difference between samples (=not same)
What determines if data normally distributed? b)
Mean=median=mode
-Know prob. get value distnce from mean
c) % overlap of graphs.

USE: descriptive stats and dotplot.
What is variance?
-Mean of squared differences from mean
(mean- 1)^2 + (mean -2)^2 etc.

-Variances same if within same range and norm distributed.

How do we test for variance? F-test?

F calc=S2 ^2 (large variance estimate) / S1^2 (small variance)

F crit: table 215-216

Outcomes:
F calc > Fcrit=estimate variance not equal
F calc < F crit=estimates stat equal
Give the overall characteristics of t test then Mann Whitney U-test?
T-test: (Parametric)
-Interval/ratio data
-Mean/std deviation

Assumptions: _norm dist pop
-Random
-No outliers
-Equal vairances

Minitab: stat>basic stats> 2 sample t



Mann-Whitney: (Non-parametric)

-Ordinal data
-Separates rank different/not
-Rank entire dataset
-Used when t test not met assumptions

Assumptions: -Random samples
-No effect outliers
-Unequal variance

Minitab: Stat> non parametrics> Mann Whitney
How can we interpret the output of the test: 2 things look at?

1) Confidence interval:

-If 0 contained within= not evidence of stat difference b/ samples
-Excludes 0= diff between sample sets

2) p-value: Level of significance (0.05 as 95% C.I.)

Significance level >0.05= null true (no evidence stat difference between samples)
<0.05= p low reject Ho: stat diff between sample sets.

L5: Tests on Multiple samples
.............................................
Define the key terms of median, sample, null hypothesis, and parametric stats test?
median: middle number in order
Sample: Subset sample of population
Nulle: No difference between means/ is relationship between samples
Parametric: Assumes normal distribution
Why do we need to assess > 2 samples all at same time?
-Tedious if compare pairs as numbers increase
-Increase error if do this: Type 1 error: p=0.05 , 5 % chance reject null hypothesis, as increase tests cumulative of this.
What checks should we go through before selecting parametric or not?
a) Nature of data: must be actual not ranked for normal

b) Distribution of data: normal

c) Errors/outliers: too swayed=not normal

d) Variances same across all samples=normal

e) Sample size >or= 30 is best normal.
What is the step by step process to pick different tests ?
Kruskal Wallis/ moods median test (non parametric):
-Not normally distributed
-Variances not equal

ANOVA (parametric)
: Needs both normal distribution and variances equal
What are rules for normal disttribution?
Mean=median=mode
-State probability of a given value certain distance from mean.
-State likelihood accord to overlap of samples same population
Define variance and how to get it, also in MINITAB?
Define: Mean of squared differences from the mean
Data= 1, 2 Mean=3 Variance= ((3-1)^2 +(3-2)^2

To have equal variances look at spread of data : 0-10 is different from 0-100

MINITAB: Stat>ANOVA> Test for equal variances
What does p value from Bartletts test tell us?
p< 0.05: 2 or more variances differ=no ANOVA
What is the one way analysis of variance (ANOVA)?
Null hypothesis: Sample randomy drawn same population: no difference, if so due chance.

Test stat (F): Ratio 2 kinds variability-data: a) Between sample means (differences)
b) within sample means.

If F is small: means null hypothesis is true: differences between samples < differences within.
How do we interpret ANOVA results?
p value: indicates prob Ho true
If reject Ho: systematic relationship between numerical (soil moisture) and category variables (land use type)

Look at p value: low reject Ho, and F value: high= reject Ho s differences mean not sample sample.
What does TUkey test tell us?
Compare difference in mean between each sample.

ie: If means dont share letter -significantly differ.
What does Kruskal Wallis show (non-parametric)?
-Not normally distributed and variances not equal.
- The preferred non parametric test
-Tests observed differences in sample medians

COMPARES MEDIANS , NOT MEANS: RANK VALUES.
How do we interpret Kruskal Wallis output: Z score and H stat?
Z score: Measure relative difference between average rank group and overall average rank- interpret differences between groups.

Z score: >=/- 2 =data different to others, grps within range clustered too.

H Stat: accept/reject Ho depends p threshold.
What is the moods median test?
Non parametric:
-Less preferred
-Calcs overall data set median: counts # values above/below value each sample.
-If similar grps , rough equal number above/below (if > or= 1 markedly more values abovebelow)
Comparison between moods median and Kruskal Wallis?
Moods median more robust for outliers

Kruskal Wallis:
-Less powerful
-less sensitive to subtle patterns -increase type 1 error.
L6: Chi-squared:
........................................
What is Chi squared test used for?
-Analyse frequency data (categorical /ordina;, not %)
-Test assoc b/w 2 variables
What are assumptions for chi squared?
-Random sampling
-Data pts independent measurements
What are 2 main uses of chi squareD?
-Analyse 2 way tables:

a) Test factor relationships (height vs socioeco status : ie- calories vs persons height)

b) Test similar characteristics-separate samples (flower shape/colour): flower colour abundance at 3 different geographic locations. See if difference between abundance -are they separated or breeding/associated at all.
How do you calculate chi squared value (also overall) if looking at a table?
Have observed value....... then to get expected frequency per cell calc: (Row total x column total) / (overall total)= overall total if no pattern.

Then calculate: (Observed-expected)^2 / Expected= Chi square for cell.

Chi-square total: Add up all chi-square cell values.
What does the difference between observed and expected frequencies mean?
Observed-expected:
-Small difference=NO PATTERN
-Large=PATTERN
What is the point in square difference and dividing by expected frequency?
a) Emphasis (square)
b) Divide: Standardise difference.
What is an example of a conclusion to be made between variables (soil pH and abundance)?
Low abundance associated with less acidic, more abundance at moderate to high acidity: diagonal pattern-does better in more acidic sites.
What is the calculated chi-square value compared to?
The critical value according to table: 95% confidence and n degrees freedom.

df: (rows-1) x (columns -1)
0.05.
So what are the main steps in chi-squared?
1) Calcualte each cell chi squared:
(row total x column total/ overall total) , then calc (Observed - expected)^2/ Expected

2) total chi squared for table by adding

3) Get this after calc df: (rows -1) x (columns -1) and work out difference between calc chi squared and critical.

4) If calc> crit: we reject Ho which says no relationship b/w species abundance and soil pH: as inspection of frequnecy tables suggests plant abundance is in greater numbers more acidic sites: accept Ha.
What are some rules about expected value numbers?
-No expected value should be <1 (inflates chi-squared value)
-Prefer < or= 20% expected are < 5.

Best sln: collect increased observations.
L7: Time Series Analysis:
.............................
What is the definition of time series data?
-Observe over time ( even spaced intervals: hrs etc)
-Analyse trends/patterns within data sets
-Forecast future alues
What are some uses of time series data?
a) needed additional info than provided by measures central tendency=mean, median, std deviation

b) Change in variable over time: temp over yr, river flow floolowing rain

c) Comparison between data series: Time series variation: make it difficult determine 2 samples part same population: look at differences central tendency: not fully described by data.

d) Compare between data series: Field camp data: location 6 warmer than location 2.
T-test indicates significant difference at p=0.01.

Distribution may be same but trends differ- time series shows this.
What is smoothing of data used for?
-Discrete data measures at narrow intervals-lead noise (variation) dataset. Use moving averages method: smooth noise- find long term and proper trends.
-Reduces local vairation swaying overall data
How to calculate moving averages?
a) Calc: divide dataset into subsets

b) Average values each subset

c) Move average is series of average values

ie: (14+12+16)/3 =14, then move down one in column.
What do we say with trend analysis?
Trend: General direction- series obseervations increase or decrease: determines if series increases or decreases in time series systematic, or irregular variaiton (random)
-Use bivariate linear regression test trends stat significant.
Give example of a trend analysis:
Temp (y) vs time (x):
Regression eqn:
Regression constant: 19.061
Regression coefficient=-017683 degrees celsius.
L8: Cluster Analysis
..............................
What is cluster analysis?
-Classification of objects /variables using maths. Data matrices used to i.d. groups.
-Assess between individual simialrity (comparisons)
What are 2 main types of cluster analysis?
a) linkage technique:
-Similarity measures-distance (geometric)
-Grp individuals must be similar
-Closest in mathematical space

b) Variance technique:
-Goup individuals based-variances (ie: s^2) -subsets within data
eg: Join individuals-give smallest increase in variance, most similar.
What are two types of data used in cluster analysis?
a) Ratio (continuous): weights
b) ordinal (coded): likert scale : 1 to 5.
What does 3D scatterplot do when have 3 variables?
-Shows the points plotted so we can see if clustered in groups-ie: group of 15 surveyed on preferences of charcteristics of recreation areas in N.Z.: see which group likes:

Activities>Scenery>facilities.
What is the single linkage technique?
-Distance measured -calculate proximity of individuals: co-ordinates of measure similarity.
-Co-ordinates pythagoras theorem calc variable distance.
What is multiple linkage technique in visual form called?
Dendogram:

-Creation of data matrix-all pairs dataset.
-Graphed represenation of groups (clusters) within data set:visual.
How do we interpret the dendogram?
-I.D. groups individuals within dendogram below distance measured -all groups joined.
-2,3,4,5, or more grps: depends on interpretation.
-Individual types defined within groups-understand individual behaviour.

L9: Correlation:

..........................

What is point or reason behind correlation?

Measures strength between 2 CONITNUOUS variables (x and y)= see correlation or strength of relationship.



-Not causal associations

What is the different between concordant and discordant associations and what is the equation that links them together?

Concordant: X increase, Y increases (Plusses)



Discordant: X increases, Y decreases (minuses)



Equation: S=P (plus) - M (minuses) with S showing the monotonic dependence of Y on X.

What is the difference with x and y's if correlation is different?

Positive: Y's increase more often than decrease as X increases.



Negative: Y's decrease more often than increase.

What are the 3 steps in deciding which test for correlation to undertake?

1) Plt scatterplot and see distribution of graphed data.



2) Is the data linear or normally distributed (Usually means will be linear association)?



3)Does data have outliers?



ONCE CHOSEN TEST:



4) See if correlation is +, -, or no correlation=



Calc S: S= P (number plusses-increase X or Y)- M (minuses: X decrease, Y Decrease).....get correlation coefficient and compare to back book?

What are the conditions that must be met for the different tests for correlation=spearmans?

Spearmans (p):



- Data normally distributed, few outliers.



or:



-Data not normally distributed without outliers (n>20)



-Rank not data values (calc this for X and Y values)

What are the conditions that must be met for the different tests for correlation= pearsons (r)?

-Normally distributed, not outliers


-Simple linear association



Assume: Norm distribution around 0, data values independent.

What are the conditions that must be met for the different tests for correlation=Kendalls (T)?

-Data not normally distributed and few outliers



or:


Monotonic association not linear.



-Rank


Kendalls measures lower than other (07 actually =0.9 of others).

What are the null and alternate hypotheses for correlation?

Null: No correlation between variables


Alternate: Correlation exists between X and Y variables.

What are Type 1 and 2 errors:

Type 1: Wrongly reject (reject null when true)



Type 2: Wrongly accept (accept null when false).

What analyses of correlation are we looking for?

p value low and R^2:



-Say if positive or negative association (as x increases, y decreases)



-R^2: is strength of it, use judgement -strong or weak



-p value: see if correlation or not (reject null if p value <0.05).

What is significance of using PASW package for this data?

Correlates all 3 methods, whereas minitab only 1 (Pearsons correlation).

What graphs do you use for correlation?

a) Scatterplot of points (x vs Y)


b) Dotplots= see if normally distributed or outliers (mean +/- 2std deviations=outleirs).

L10: Linear Regression

..............................

What is linear regression and give basic equation?

-Measure relationship between x and y



y= a (y intercept) + b( gradient -rise/run) x



or



y=mx + c



y depends on x (independent)


How do we go about creating a linear regression striaght line?

-Get x and y coordintes and put line of best fit (make R^2 as high as possible)

What does a gradient of 0 mean?

There is no relationship between x and y.

What does R^2 stand for and what makes it higher value-closer to 1?

-Defined as estimate of the fit of the line- expected to observed points (how well line describes data)= % variability described by line.



-If measured y values are close to predicted y values due to line: regression line is good fit=



low residual values (difference between line-predicted and observed points).

What are the null and alternate hypothese for linear regressions?

null: No relationship exists between x and y vairables : Gradient or B=0.



Alternate: Relationship exists between x and y variables: B or graident does not =0.(as x increases, response from y)

What outputs do you produce in Minitab for regression?

-Scatterplot (see if relationship between dots)


-Regression-straight line.



How do we interpret residual plots (regression analysis)?

-Look at versus fits graph output



a) Satisfactory= consistent y vairance scattered either side line



b) Variance of y not consistent (widens at end)



c) Diagonal look= errors in calculation



d) Linear model not appropriate: rainbow look (frown).

What requirements must outputs in Minitab meet to use linear regression?>

a) Normal probability plot: Points lie on straight line and residuals normally distributed



b) Histogram : Bell shaped (show normal distribution)



What values do we look for in the anaylsis?

a) T-value: From regression output- compare to t value in table (n- # variables-2 usually?)



b) p.value: Probability if null true: low reject Ho- 2variable likely related if p<0.05.



c) R^2: (coefficient of determination)= % variance y values accounted for by regression relationship with x variable. Increase: x perfect indicator of y.

What is the regression sum of squares?

Variance accounted for by regression relationship

Error sum of squares:

Indicates variance unaccounted for by regression relationship.

What is the F-stat?

Decide if significant proportion of y variance explained by regression-if x and y relationship.

What can we do with equation we get from regression?

Insert values into x to find y using line of best fit.

Why do we use bands outside of line of best fit in data?

-95% confidence bands =allows particular x value=95% confident mean y value correspond to x falls within line limits.

What are the assumptions for running linear regressaion, therefore makig it different to correlation?

a) Residuals (errors)= mean of 0 and constant variance



b) Residuals= indpendent of eachother (value of 1 unaffected by another)



c) Data values are normally distributed



If the p value was <0.05 and R^2 was 73.3%, what analysis would be had?

Strong, significant liner relaitionship between variables.

What is significance of F-ratio in able?

F ratio >1: Confident p value=low.

What is point of standardising residuals?

To ensure that although they have different measurement scales, they fit under normal distribution curve nonetheless and can be compared.

How do we calculate whether 2 straight lines have the same gradient?

t= b1-b2/ SQRT Sb1^2 + SB2^2


Gives you t-calc, compare it to the t-crit in back of book (DF= n -2 or - # number samples compared-usually 2).



b1 is hypothesised value (not original-one to compare) and b2 is observed value (treatment).



Sb1= SE Coefficicnet (then square it)



-a) If t crit> t calc: null hypothesis true- no relationship between variable in front of x- line gradients the same-slope lines same)



If t calc> t crit- difference in gradients of line- not the same-different rate of change.



L11: Introduction to Excel:

.........................

Why is excel bette than PASW/ good in general for modelling?

-PASW: Better as allows input of formulas to cells


-Simple graphs (simple tool analysis/graphically)=MODELLING.

How do we lock constants/AVERAGE?

Lock constants:$A$2



AVERAGE: AVERAGE (C2;C21).

What is the IF function in Excel?

Creates function conditions=returns value if condition is TRUE, another if FALSE. IF (A1>10, over 10 vs 10 or less conditions)-only returns >10.

What does RAND and RANDBETWEEn functions do, also VLOOKUP function?

a) Random number generator


b) Gets numbers between values specified.


c) VLOOKUP (E11, $H$5:$I$7,2,TRUE): Looks at value E11 and compares to values in cesll $H$5, to $I$7 (locked value0, 2 represents look 2nd column I, if E11> 1.3 stable returned to E12= WAY OF LINKING DIFFERENT CELLS AND ORMULAS TO GET OUTCOME.

What is point of transpose in excel?

-to turned columned data into rows or vice versa.

What does Replace tool allow us to do?

-Find and Select (Click replace):



Find values and replace all with same code or #.

L12: Modelling with excel:

.....................

Define models:

Simplified representation of reality


-Classifies core componenets in real world and natural system.

What are the 2 main model types ?

a) Empirical: Stats/math


b) physical: Based on law of physics

What are models used for mainly?

Predictions (extrapolate values from future)=what if scenarios.

What is the main problem with models?

-If you have bad inputs= bad outputs.

Why is excel useful for modelling?

-Formulas in cells linked: effective simulate system with componenet relationships.



eg: slope stability using Infinite Slope Model and Safety Factor Value= max slope angle and main stability factors intertwined and get outputs show what can do each variable.

What is sensitivity analysis?

Sees what the change in one variable is as change anothee:



eg: SF decrease as slope angle increases.

L13: Geographical Information Systems:

........................

Define GIS:

Computer assisted systems:



-A cquisition


-S torage


-I tegration


-A nalysis


-D isplay



Geographical data-spatial analysis tool: mix differing dates and sources to 1 map.

What are the 5 main componenets of GIS?

H ardware: Computer with GIS



S oftware: With functions/tool- store, analyse, display geographical information-inut and manipulate.



D ata: Geographical and taular: within or elsewhere purchased- mix sources and integrate them to 1.



P eople: Users: technical specialists- design and maintin.



M ethods: Successful- well designed plan/ business rates.

What is the difference between attribute and spatial data in GIS (give example):

Attribute:


-Non spatial, descriptive information on characteristics of spatial features



eg: Mountain height



Spatial:


-Information on location/relative position of geographical features.

What is the difference between vector and raster spatial data representation?

Vector: IDRISI


-Series points joined straight lines=feature.


-Points encoded X and Y coordinates-latitutde and llongitude


-Info stored in database table




3 main types=


a) Points


b) lines


c) polygons



eg: Distinct boundaries/ spatially discontinuous features (roads)






Raster:


-Map in mesh form (grid cells=pixels)-numerically coded


-Each cell -faeature identifier, qualitative attribute code (different land use classes) or quantitative attribute (elevation -cells)



-Different colour=different layer feature



eg: Spatially continuous data without distinction boundaries (NZGrid system)

How does GIS integrate data?

-Brings old/new info/ descriptive: 1 location- integrates in depth analysis.

What is the difference between GIS and computer aided mapping?

CAM: Users generate high quality maps/ spatial analysis


GIS: Analytical tol/integrates disparate data types wide range sources.



GIS used regional councils: i.d. road/ crashes.

What are general uses of GIS in N.Z.?

- Model complex enviro systems



PROFESSIONAL: Cosatal change in dune system (stewart island marram grass control project)


What does the GIS system use as inputs?

Data, people, software, hardware, methods.

L14: GIS-Exploring the Map:

.................................

What is the use of the METADATA MODULE?

-Obtain a list and descriptions of the files within the working directory= get coordinates of location on N.Z. grid map system.



-Also allows us to look at available vector fiels

What does the DISPLAY LAUNCHER MODULE DO?

It allows us to display visually the vector and raster maps within the IDRISI database.

How do we go about changing the visual display of a raster/vector map and how do we add layers to an orgianl map?

USE MAP COMPOSER tool:



a) Map properties: change palette etc=changes display for map



b) Add layer (vector layer usually onto a raster map)=eg: coast fit onto raster or base map =not VICE VERSA!

What is the role of CUROSR INQUIRY TOOL?

We click it so that we can go onto map and see the co-ordinate description of where our mouse or cursor is positioned.

What is the role of the ZOOM IN/OUT MODULE?

In order to zoom in and out of the maps to get closer or further away look at it.

What is the role of the RECLASSIFY MODULE?

Allows us to reclassify areas within the raster map -separate them into 2 categories (1= meets requirements we want, 0= doesn't meet requirements we want):



ie: <500m elevation is 1 (all values of 1 to just less than 501)>500m elevation is 0



When using reclass, must be (1= values between 0 and 2.51, 0 with 2.51 to 999)

What is the use of the AREA MODULE?

It allows us to extract information of the area (m^2) within each category of 1 or 0= output is in tabular form (table form). Gives area within above categories of <500m or >500m elevations.

RANDOM DEFINITIONS

.........................

What is the IMAGE CALCULATOR tool?

Multiply layers together or bring aspects together


What is the BUFFER MODULE used for?

determines the pixels within certain distance of a feature, all else is disregarded.



eg: Create 600m buffer around image/feature (roads vector image):



Assign 0 to target areas (roads), 0 to non buffer zone (outside 600m distance)



Assign 1: To buffer zone (within 600m of roads).




OVERALL: Generally use if description says WITHIN certain distance of vector feature.



Also: If says > or = to 200, away from existing building: want i.d. areas outside buffer zone=


Assign 0: target areas, buffer zones


1: Non-buffer zone.

What is the role of the DISTANCE MODULE?

Attribute data for each pixel to represent distance from a feature: feature or input image is feature want measure distance from (ie: roads) and output is distance.



Must use EXTRACT MODUE however to find distance=find figures on this such as average distance from roads.

What is a palette?

A palette is simply the colour scheme we choose to use that best represents the information we are attempting to illustrate. Different types of info need to be displayed in different colours.


What is the REGRESS MODULE used for?

-See if there is a relationship between 2 vairables= gives output with equation.



eg: Have equation and have values for x so find values of y and apply to a map= eg: extract evaporation using elevation (x) measurements onto a map.

What is the use of the SURFACE MODULE?

Takes a topographical raster map and allows us to cnvert it into elevation, slope etc dynamics/features.



eg: Input is elevation (in degrees).

What is the OVERLAY MODULE used for?

Used to join maps in pairs based on Boolean operators (overlay Boolean images onto eachother)

What is the LINERAS MODULE used for?

Converting raster into vector data.

What is the use of the HISTO tool?

Displays histogram of image pixels in relation to their assigned values and also outputs basic stats. Represents or shows the pixels in boolean image: 0 doesnt meet criteria vs 1's that meet criteria set out.

What is the role of the CROSSTAB TOOL/MODULE?

Puts layers together but keep original values (ie: F in top left corner of one combined with 1 in top left conrer of other=F1 in new map).

What is the role of EXTRACT tool?

-Takes summary statistics

What is the role of the ASSIGN MODULE/tool?

-Assigns value to pixels-similar to reclass but uses data values (1-4)= more than 2 classes.

What is a boolean image?

Image that has had areas within it defined as excluded or included (reclass function etc).

Give exmaple of how to overlay multiple BOOLEAN IMAGES to create a composite suitability (meets requirements outlined) image?

Use IMAGE CALCULATOR MODULE:



Output file (sites filename:-ones can be used once all requirements met)= (slopercl) * (roadbuf)



rcl= reclassed slope map


buf= buffer roads.



Pixels=0: dont meet criteria so sites not used,


1: Does meet criteria (sites used therefore).

What is the role of the GROUP MODULE?

Allows us to group pixels into sites (1's): see what size requirement= combines, not allow diagonal come together.



NB: Pixels must have met requirement (=1).W

What are BOOLEAN OPERATORS?

OR,AND,NOT. 1=matches up with the aims



OR= Meet conditions for 1 or both options


AND= Must contain both options/conditions


NOT= Only one not other (mutually exclusive)

L17: GIS in Agriculture:

........................

What is the difference between site selection and classification?

Site selection: Use boolean operators and raster overlays/ vector overlays to meet the criteria given at start (code with 1 or 0 dependent on whether met criteria or not).



Classification: All pixels (not just few that meet the criteria)= final image: numeric code signifies combo of moisture and temperature zones falls into (63 unique codes: 7 moisture zones, 9 temperature zones)



eg:


-63 combos for classification


-0 or 1 for site selection.


How can we integrate an equation and maps using IMAGE CALCULATOR to create a new map?

EQN: Moisture availability= mean annual rainfall/ potential evaporation



GOT THE EVAPO MAP BY USING REGRESSION EQN



Therefore: If we have evaporation and annual rainfall raster images- we can use IMAGE CALCULATOR to get map with moisture availability:



IMAGE CALC =NRAIN/EVAPO

What must we do prior to deriving a statistical relationship between data?

eg: elevation and mean annual temperature:



Say what general relationship is:


-The higher the station is, the lower the mean annual temperature.

What is the REGRESS MODULE used for?

-describes relationship between two variables statistically= anlyses relationship between either two images or two attribute values files (data in columns).



-Must look at R^2-shows strength relationship and also whether negative or positive.

How do we use the RECLASSIFY MODULE to reclassify more than one zone in a raster map?

Assign 1 to the values you want data to fall between, but 0 to all the rest and do this multiple times.

What is the use of the CROSSTAB MODULE?

Creates final map with unique combinations of moisture and temperature (combined: A1, B4 etc- according to the zones classified earlier).



= get cross-classification and tabulation output types- shows # cells fell into each of 63 possible unique combinations-temp and moisture avilability.

L15: Analysis in GIS

.................................

What is the role of the WINDOW MODULE?

Allows us to trim the initial maps so that only the study area is displayed and analysed. THIS IS DIFFERENT TO EDIT MODULE WHEREBY WE WERE ONLY CREATING ONE MAP AND TURNING IT INTO VECTOR MAP- here we are take >1, selecting same area from all of them.

How do we create a blank raster map?

Need sto be done before we convert vectors to raster.



-Undertaken using the IDRISI FILE EXPLORER-:



eg: create a new file name called roads (VECTOR FILE)

How do we then covert the roads vector file to a raster file?


HINT: Use the LINERAS MODULE

Use the LINERAS MODULE: roads as input vector and roeads as image file updated.

Therefore, what is role of the LINERAS MODULE?

Covert vector files (roads) to raster files

What is the ASSIGN MODULE used for?

Assigns new classification (A1 etc) values to the soil map

What is the purpose of the DISTANCE MODULE?

-Attribute datat to each pixel in form of distance= attribute data for each pixel is distance of pixel from any road.



distance output file.

What is the purpose of the EXTRACT MODULE?

Use feature image (soil reclassified image) to get info about soil orders from other map layers (ie: wdem=image processed-get information from):



-Can get average , min, max of data.


-Can use it to calculate average distance (km) from roads for each of soil orders


-Calculate the number of houses in each of the soil orders.

L16: An Introduction to Cartographic Modelling:.

............................

How would you go about getting the follwoing restrictions for sites onto a on composite suitability imageand finding areas of grouped sites big enough to meet conditions?



a) Slope <2.5 degrees


b) Within 600m of road (access)


c) Within 400m major river


d) > or = 200m away from existing buildings


e) > or = 0.6km sqm in area


a) SURFACE MODULE (use input elevation model and create slope image. Reclass this to assign values 0 to 2.51 as 1's and 2.51 to 999 as 0's.



b) BUFFER MODULE: Around wroads vector image=


0 to target area/ non buffer


1 to buffer zone.



THEN OVERLAY ALL WITH IMAGE CALCULATOR.


-Also, must use GROUP MODULE= input sites and output groups.


-Use AREA MODULE:Inout file= groups, output=sites. Use IMAGE CALC: Multiply areas by sites.


-LAST: RECLASS MODULE: 0 to all values <0.6. See which ones can build on!



Histo allows us to see roadbuf.?




c) Reclassify image so only getting features under 12 showing up: (0-12 and 12-1000 as 0's), 12=1. Create 400m buffer: 0 to target, 0 to non buffer, 1 to buffer.



d) BUFFER MODULE: 200m: 0 to target, 0 to buffer zone , 1 to non buffer HOWEVER.

L18: Using Remote Sensing data and techniques in GIS:

.........................................

What is job of EXTENDED CURSOR INQUIRY TOOL and how to combine layers to allow this to work using COLLECTION EDITOR?

See pixel values from multiple layers of raster maps.



COLLECTION EDITOR: Combines all map layers together to be analysed by Cursor Inquiry.

What is role of the COMPOSITE MODULE?

-Creates composite false colour image from 3 spectral bands.Allows us i.d. land use easier by eye.Different maps= different colours.



j42comp= comp shows has been made to COMPOSITE image.

How does CLUSTER MODULE intertwine with unsupervised classification?

3-d histogram= 3 bands in false colour comosite image: searches distinct peaks and classifies image. Break histogram into small range histogram peak clusters. Fine mode creates more clusters than broad.

What does supervised classification need ?

-needs create vector polygon file of training sites: require user i.d. training sites based on ground truth data.



-Need to overlay the training sites as vector ploygon files over the original raster iamge.W

what is the role of the MAKESIG MODULE?

Needed to take training sites -avaiable as vector polygons and teach IDRISI spectral signature (combined reflectance pattern in all three bands) of each site.



-Create different signatures (and given numeric cosdes that are already assigned to training site polygons or specific types of soil types.

What is role of the SIGCOMP MODULE?

Choose signature files with means.



-Get graph: mean pixel value of each training for each image band (brghtness of each land use category in each colour range)

What is the MAXLIKE MODULE used for?

Classifies the spot imagery.


-Classifies pixels based on ground truth data provided, use classification algorithm (calc)=maximum likelihood classifier.

What is supervised classification?

User develops spectral signatures-for known categories- software assigns pixel in image to category which its signature is most similar. Basically requires ground truthing classification within polygons.Wa

What is ground truthing?

Real world: need study area-land based observations of land cover facilitate /verify classification process.


-Use remote sensingto create prelim classification, then visit number random observation points and colect relevnt information -visit boundaries on prelim map:



SUPERVISED CLASSIFICATIONWha

t is the DIGITIZING TOOL used for?

Process trace vector shapes into digital files -done using digitizer-records coordinates pairs every point draw tablet-precise pointer.

What are training sites for supervised classification?

Go out into field and gather data to make educational assumptions about trends in an area-

Remote sensing-what is it?

Use radar device scan earth nd obtain graphical image- acquire info without touching it.

What is difference between analogue and digital data?

Analogue: Aerial (get analogue in GIS database) scan it.



Digital data: Satellite imagery.



What is a signature file?

Available as vector polygons-teach GIS program whats there pixel represents polygon.

L19: Using GIS in Field Research:

.........................

How would you go about finding spcified site on a topo map when given coordinates of site?

Use CURSOR INQUIRY tool following getting image onto document using DISPLAY LAUNCHER.

How would we go about creating a new map (singular)- ie not WINDOW MODULE for multiple, but create vector points onto new map?

1) Data Entry into EDIT MODULE=includes info of coordinates of site, polygon, NZGRID reference system etc. =save as vector export fie.



2) Convert this file using CONVERT MODULE to binary vector file.



3) INITIAL MODULE is used to create a blank map of study area. INteger data output- have grid system to help i.d. oints-NZ Map Grid system (output reference info).



4) Need to use SAMPLE MODULE in order to generate random set of sample points across study area (represented as dots, and generated via this MODULES stratified sampling ability). =vector map output called samples created.


How would you go about putting this samples map onto the aerial photo?

1) Use MAP COMPOSER and add layer of samples to it.



2) Need attribute data add to these points or are meaningless: use EDIT MODULE and type in Cd numbers corresponding to each numbered surveyed site.



3) Use INTERPOL MODULE : using data values file combined with the input vector file of samples = get output image the interpolates or estimates values of Cd values outside of the pixel info already gien.

What is the role of the EDIT MODULE?

Allows inout dats and create eithe a vector export file or a values file = vector:define numerically boundaries and create vector export file (smaller version of big map, just of study area) and can then convert to binary vector file and input attribute data onto points.

What does the CONVERT MODULE do?

Takes data files created in vector form by EDIT MODULE and converts to binary vector file= one that can be displayed in IDRISI.



What is the role of the INITIAL MODULE?

Create map (blank) from converted module creation. Has grid system-creates it, basically prepares vector map for use elsewhere.

What is role of the SAMPLE MODULE?

Uses image created in INITIAL to generate random samples within the grid using stratified random sampling technique-creates file within.

What is the role of the INTERPOL MODULE?

Use sammples (with values input into them using EDIT MODULE) -interpolate or guess rest of Cd values in soil when only 30 specific sites sampled in area.

EXTRA QUESTIONS

.......................

What are the 2 tyes of data that wil cause there to be a non-parametric test on 2 samples (non parametric because dont assume any particular distribtuion)?

Logarithmic (pH)



-Ordinal: Almost a rank that has no specific quantifiable numerical value (ranks 1-5 on how good house is, very hard to define how much go up between 1-2 compared to 4-5). Ites on scale set into some kind of order by position on scale-superiority. Numbers assigned to show relative position (1st, 3rd, 5th or A,B,C).



Nomial: Items differntiated by simpe naming system (set of countries).

What 2 types of data will allow parametric or data normally distributed in test on 2 samples (parametric because assume distribution exists and are normal)?

-Interval -scale at which each point is equidistant from one another. =eg: Temperature.



-Ratio: Numbers compared as multiples of one another (one person twice tall as other).



How to work out DF for tests on 2 samples?

n-1 for both sides of F-crit table!

What are the assumptions of non parametric side of two sample t test?

-No effect outliers


-Random


-Unequal variances???

How do we work out DF for two sample t-test (pooled variance or parametric test)?

Combine total samples for both sampe lots and minus number sample datasets present: (6+6)=12 -2 =10, and use 0.05 significance to get t-critical value.

What are differences in key word when go to undertake test on 2 samples between mann whitney and two sample t test?

Mann whitney: using medians



Two sample t test: using means


How do we calculate DF for chi square?

df= (rows -1) x (columns -1)


Must use the 0.05 signifiance at top.


SHOULD GIVE US THIS IN OUTPUT.

How do we get DF for linear regression?

df= n - # variables.
Using it for t-value.

What R^2 value must we use when looking at regression table/ also take data from table, what line do we look at?

R^2 (adj)



-Not the constant line but the other predictor variable.H

How do we calculate the DF for when testing if 2 linear slopes have same gradient or not?

n (number samples from origial or observed dataset)-2 (number datasets compared). Needed to find t critical.

What does SB1 in the equation to see if two lines have same slope mean (also write eqn)?

SB1 is SE Coefficient, whilst B1 is the coefficient



EQN:



t= b1-b2/ SQRT ((Sb1^2 + Sb2^2))



b1 is always hypothesised value (NOTE).

What are the null and alternate hypotheses when comparing gradients of two straight lines?

Ho: Original treatment slope is = to hypothesised slope-slope


HA: Observed slope is significantly different from the observed slope (gradients different).

How can ASSIGN MODULE be used to take data from EDIT MODULE and put onto new raster map=reclassify it essentially?

Type in the feature image as being wsoil1 (has all of the soil types laid out) and have output as soilrcl (reclassified).

What is correlation not?

Correlation is not causation.

What are the key differences in terminology of correlation and regression?

Correlation: association



Regression: Relationship between variables instead.

What is a cartographic model defined as?

Flow diagram that plans steps and sequence of operations to get an outcome in GIS analysis.