• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/135

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

135 Cards in this Set

  • Front
  • Back
How can we visualize comparisons between sequences?
dot plots
What are 2 problems with dot plots?
lots of noise if filter not used; time consuming if words not used
How can dot plots be generated quickly?
through the use of identity blocks
T/F. Dot plots provide alignments between sequences.
F
T/F. For alignments, gaps can be introduced to increase base matches.
T
What is the name of the most basic algorithm to align two sequences?
Needleman Wunsch
What is the scoring scheme for NW alignments?
match:1, mismatch:0
Where is the max score of a NW alignment found?
top left
T/F. The NW algorithm creates a local alignment.
F. global
Which algorithm is likely to miss short and highly similar subsequences?
NW
T/F. SW algorithm creates a local alignment.
T
T/F. All sequences are aligned when using a SW algorithm.
F
How can you turn a NW algorithm into a SW algorithm?
give negative score to mismatchs; make 0 the min score recorded; beginning and end of optimal path may be found anywhere in the matrix
T/F. A global alignment should be used for homologous sequences.
T
T/F. Global alignments should be used when homology is distant.
F. local
How can you determine if the alignment of 2 sequences is statistically significant?
permutation test: randomly rearrange one or both seqs, align permuted seqs, record alignment score, repeat many times
T/F. If an alignment score is higher than that for the permuted sequences, they must be homologous to some extent.
T
For protein sequences, greater than __ % identity suggests homology.
25
T/F. A gap and its length are distinct quantities.
T
When should end gaps not be penalized?
if two sequences have no obvious relationship at ends
T/F. The optimal alignment is always statistically significant.
F
T/F. A NW algorithm can only be done with 2 sequences.
F
Name one of the most popular multiple alignment programs.
Clustal
T/F. Indels are more likely to occur at the ends of a sequence.
T
What is the formula for the hamming distance?
D = k/n
If all mutations occur at the same frequency, which value with the rate of substitutions over time approach for the hamming distance?
0.75
T/F. For JC, the measure of distance becomes more accurate as time goes on.
F
T/F. The maximum hamming distance is 0.75
F
When can you not perform a JC?
when the hamming distance is greater than 0.75
Which correction corrects for differences in the rates of transitions and transversions?
kimura 2-parameter
In which distance corrections does divergence follow a logarithmic function?
JC, k2p,
Describe the Tamura-Nei correction.
different rates of transversions and transitions between purines and pyrimidines respectively
How do we deal with sequences with substitutions unequally spread out?
apply a gamma distribution
T/F. The higher alpha is, the higher the extent of variation in a substitution rate in a gamma distribution.
T
T/F. Gamma distribution cannot be applied to the hamming distance.
F
What is a synonymous change?
substitutions that do not cause amino acid replacement
T/F. Synonymous substitutions occur at a much faster rate than nonsynonymous substitutions.
T
What is the most common way to assign weights based on the structural similarities and the ease of genetic interchange?
Dayhoff's PAM250 matrix
What does PAM stand for?
percent accepted mutations
T/F. A PAM250 matrix does not work well for distant relationships.
F
ODDS MATRIX??
pdf8
What does it mean if a residue pair in an odds log matrix have a score greater than 0?
They replace each other more often as alternatives in related sequences than in random sequences (residues may have similar function)
What is a problem with PAM matrices?
assumes all sites are equally mutable
What does BLOSUM stand for?
BLOcks SUbstitution Matrix
MORE ON BLOSUM
pdf8
GONNET?
pdf8
Why does PHYLIP ignore all gaps and missing data?
there is no accurate way to weight changes due to indels relative to substitutions (all empirical)
T/F. There are more possible topologies for a rooted tree than an unrooted tree for n species.
T
Which type of tree should be used when rates of evolution are variable?
unrooted
What does OTU stand for?
operational taxonomic unit
What can OTUs represent?
groups of organisms, populations, species, families, etc
What is the location where interior branches in a tree meet up called?
internal node
What is the purpose of including an outgroup in a tree?
to find the root of the tree
Which characters typically provide the greatest problems for tree reconstruction algorithms?
homoplasies
Which ancestral character is used to build phylogenies?
synapomorphies
There is a genus of plants in which one species develops red petals (with the ancestral form being white petals). Suppose it underwent speciation such that there are now two red-petalled species and that there still exist five white-petalled species. Then white petals is the __ character, red petals is an ____ character, the white petals among the five species is a __ character, the red petals among the two species is a ___ character. If another species arose with purple petals, this would be a ___
character.
plesiomorphic; apomorphic; symplesiomorphic; synapomorphic; autapomorphic
Describe the phenetic approach.
tree constructed by considering phenotypic similarities, does not reflect evolutionary relationships
Describe the cladistic approach.
tree constructed by considering the various possible pathways of evolution and choosing the best possible tree
T/F. Phenetic approach is often used for taxonomic purposes.
F
T/F. The phenetic approach often has faster algorithms than the cladistic approach.
T
What does UPGMA stand for?
unweighted pair group method using arithmetic averages
T/F. A limitation of cluster methods is that they only permit bifurcating trees.
F. this is true but it is not a limitation since branch lengths can be zero
What does UPGMA assume?
molecular clock
Which distance method attempts to correct the UPGMA method for its assumption that all rates of change are equal?
neighbor joining method
T/F. Neighbor joining method yields a rooted tree.
F
What defines the most parsimonous tree?
tree with the minimum number of evolutionary changes
How is a progressive global alignment done?
pairwise alignmets of all seqs, construction of distance matrix, construction of phylogenetic tree using distance matrix, alignment of seqs using tree as a guide, profile-profile alignments
T/F. For a progressive global alignment, the choice of scoring affects the final alignment.
T
T/F. In most cases, a tree does not have a strong influence on the alignment.
T
How is an iterative global alignment done?
repeat alignments of a subgroup of seqs, align in a global alignment
T/F. MUSCLE uses a progressive global alignment.
F. iterative
Which score does Clustal use? How does it treat gaps?
PSP; ignores them
What does MUSCLE stand for?
multiple sequence comparison by log-expectation
What does HMM stand for?
Hidden Markov model alignments
What are microsatellites often used for?
distinguishing between individuals
Which model assumes nucleotide substitution arise at any site with equal frequency.
all
What is the formula for the JC distance?
DJC = -3/4ln(1-4/3D)
Which model takes into account the different rates of transversions and transitions?
k2p
What is the formula for k2p distance?
Dk2p = -1/2ln(1-2P-Q)-1/4ln(1-2Q)
For k2p, what does P represent? Q?
????
Which model takes into account that not all transitions occur at the same rate and not all transversions occur at the same rate?
Tamura and Nei
Why might some sites change more rapidly or more slowly than others?
functional constraints on protein or RNA, different positions within the codon, diff chromosomal location
Which assumption does a gamma distribution correct for?
assumption that rate of substitution is the same for all sites
What is a 4 fold degenerate site?
third codon positions where all changes are synonymous
T/F. PAML makes use of a codon model
T
T/F. If there are more nonsynonmous changes than synonymous changes, positive selection is at play.
T
How does the JC distance formula change when looking at aa seqs?
DJC = -19/20ln(1-(20/19)D)
What did Dayhoff do?
L14 S26
How do you calculate the relative mutability of an aa?
#substitutions/freq
T/F. PAM250 is generally used for distant comparisons.
T
PAM250 corresponds to ~ how many differences per site?
2.5
How are PAM scoring values shown?
as a symmetric log odds ratio matrix
How do you find the log odds value if p=0.08?
log(0.08/0.92)
What does it mean if a log odds is greater than zero?
the amino acids are found across from eachother in an alignment more often than expected by chance
T/F. PAM1 is a scoring matrix.
F. Transition matrix
What are some problems with the PAM matrix?
biased towards globular proteins, assumes all sites equally mutable, only few proteins available at the time
What was an update of the PAM matrix?
JTT matrix
T/F. PAM is more reliable than BLOSUM for distantly related proteins.
F
Which matrix is the default for BLAST?
BLOSUM
T/F. FASTA uses BLOSUM
T
T/F. BLOSUM is more tolerant of hydrophilic aa substitutions than PAM.
F
Which BLOSUM matrix is PAM160 ~equal to?
BLOSUM62
PAM# increases as query length ___. BLOSUM# increases as query length ___.
increases; decreases
Which type of alignments do protein only databases use?
SW
Name 2 algorithms to search sequence databases.
BLAST and FASTA
What is the aim of FASTA?
look for homologs or similar sequences to a query sequence in a database
What is a problem with FASTA?
SW is slow
How can you speed up FASTA?
increase K-tuple/word size
What is a HASH table?
database divided into alphabetized list of words with links to location in db
L16 S1
??
What does So represent?
number of k-tuples on the main diagonal
What does S-n represent?
number of k-tuples below the main diagonal
What are the steps for FASTA?
calculate k-tuples, score them, identify 10 best, rescore, join initial regions with joining penalties, full alignment for seqs with high scores
How do you know that the optimal score for FASTA is unusual?
fit a linear regression line and calculate a z-score
What does UniRef100 do?
non-redundant collection of proteins (like UniProt) but eliminates identical proteins
go over FASTA results L16
????
compare BLAST AND FASTA
sbdfdghdhhdggdhdh
What does it mean if you have a double exponential?
like a distribution for an extreme value
T/F. E is a probability.
F
What is psi blast used for?
families of proteins
What are orthologs?
homologous seqs separated by a speciation event
What are paralogs??
homology by duplication
What are xenologs?
homology through lateral gene transfer
What is homoplasy?
independent presence of similar characters between species
What does convergence mean?
non related orgs evolving sim traits independently due to sim env pressures
What is a reversion?
return of character to on of its ancestral states
For scale trees, what are the branch lengths proportional to?
number of changes on that branch
KNOW NEWICK FORMAT
L19 S29
What are the 3 domains of life?
bacteria, archaea, eucarya
Know how to make a upgma tree
L20 S9
T/F. Neighbour method assumes molecular clock.
F
T/F. UPGMA yields rooted trees whereas neighbour yields unrooted
T
Describe Fitch Margoliash.
pairwise clustering algorithm, does not add taxa one at a time, uses least squares method
Which methods are more accurate than distance methods?
parsimony methods
Know diff between informative and non informative sites
L20 S37