Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
135 Cards in this Set
- Front
- Back
How can we visualize comparisons between sequences?
|
dot plots
|
|
What are 2 problems with dot plots?
|
lots of noise if filter not used; time consuming if words not used
|
|
How can dot plots be generated quickly?
|
through the use of identity blocks
|
|
T/F. Dot plots provide alignments between sequences.
|
F
|
|
T/F. For alignments, gaps can be introduced to increase base matches.
|
T
|
|
What is the name of the most basic algorithm to align two sequences?
|
Needleman Wunsch
|
|
What is the scoring scheme for NW alignments?
|
match:1, mismatch:0
|
|
Where is the max score of a NW alignment found?
|
top left
|
|
T/F. The NW algorithm creates a local alignment.
|
F. global
|
|
Which algorithm is likely to miss short and highly similar subsequences?
|
NW
|
|
T/F. SW algorithm creates a local alignment.
|
T
|
|
T/F. All sequences are aligned when using a SW algorithm.
|
F
|
|
How can you turn a NW algorithm into a SW algorithm?
|
give negative score to mismatchs; make 0 the min score recorded; beginning and end of optimal path may be found anywhere in the matrix
|
|
T/F. A global alignment should be used for homologous sequences.
|
T
|
|
T/F. Global alignments should be used when homology is distant.
|
F. local
|
|
How can you determine if the alignment of 2 sequences is statistically significant?
|
permutation test: randomly rearrange one or both seqs, align permuted seqs, record alignment score, repeat many times
|
|
T/F. If an alignment score is higher than that for the permuted sequences, they must be homologous to some extent.
|
T
|
|
For protein sequences, greater than __ % identity suggests homology.
|
25
|
|
T/F. A gap and its length are distinct quantities.
|
T
|
|
When should end gaps not be penalized?
|
if two sequences have no obvious relationship at ends
|
|
T/F. The optimal alignment is always statistically significant.
|
F
|
|
T/F. A NW algorithm can only be done with 2 sequences.
|
F
|
|
Name one of the most popular multiple alignment programs.
|
Clustal
|
|
T/F. Indels are more likely to occur at the ends of a sequence.
|
T
|
|
What is the formula for the hamming distance?
|
D = k/n
|
|
If all mutations occur at the same frequency, which value with the rate of substitutions over time approach for the hamming distance?
|
0.75
|
|
T/F. For JC, the measure of distance becomes more accurate as time goes on.
|
F
|
|
T/F. The maximum hamming distance is 0.75
|
F
|
|
When can you not perform a JC?
|
when the hamming distance is greater than 0.75
|
|
Which correction corrects for differences in the rates of transitions and transversions?
|
kimura 2-parameter
|
|
In which distance corrections does divergence follow a logarithmic function?
|
JC, k2p,
|
|
Describe the Tamura-Nei correction.
|
different rates of transversions and transitions between purines and pyrimidines respectively
|
|
How do we deal with sequences with substitutions unequally spread out?
|
apply a gamma distribution
|
|
T/F. The higher alpha is, the higher the extent of variation in a substitution rate in a gamma distribution.
|
T
|
|
T/F. Gamma distribution cannot be applied to the hamming distance.
|
F
|
|
What is a synonymous change?
|
substitutions that do not cause amino acid replacement
|
|
T/F. Synonymous substitutions occur at a much faster rate than nonsynonymous substitutions.
|
T
|
|
What is the most common way to assign weights based on the structural similarities and the ease of genetic interchange?
|
Dayhoff's PAM250 matrix
|
|
What does PAM stand for?
|
percent accepted mutations
|
|
T/F. A PAM250 matrix does not work well for distant relationships.
|
F
|
|
ODDS MATRIX??
|
pdf8
|
|
What does it mean if a residue pair in an odds log matrix have a score greater than 0?
|
They replace each other more often as alternatives in related sequences than in random sequences (residues may have similar function)
|
|
What is a problem with PAM matrices?
|
assumes all sites are equally mutable
|
|
What does BLOSUM stand for?
|
BLOcks SUbstitution Matrix
|
|
MORE ON BLOSUM
|
pdf8
|
|
GONNET?
|
pdf8
|
|
Why does PHYLIP ignore all gaps and missing data?
|
there is no accurate way to weight changes due to indels relative to substitutions (all empirical)
|
|
T/F. There are more possible topologies for a rooted tree than an unrooted tree for n species.
|
T
|
|
Which type of tree should be used when rates of evolution are variable?
|
unrooted
|
|
What does OTU stand for?
|
operational taxonomic unit
|
|
What can OTUs represent?
|
groups of organisms, populations, species, families, etc
|
|
What is the location where interior branches in a tree meet up called?
|
internal node
|
|
What is the purpose of including an outgroup in a tree?
|
to find the root of the tree
|
|
Which characters typically provide the greatest problems for tree reconstruction algorithms?
|
homoplasies
|
|
Which ancestral character is used to build phylogenies?
|
synapomorphies
|
|
There is a genus of plants in which one species develops red petals (with the ancestral form being white petals). Suppose it underwent speciation such that there are now two red-petalled species and that there still exist five white-petalled species. Then white petals is the __ character, red petals is an ____ character, the white petals among the five species is a __ character, the red petals among the two species is a ___ character. If another species arose with purple petals, this would be a ___
character. |
plesiomorphic; apomorphic; symplesiomorphic; synapomorphic; autapomorphic
|
|
Describe the phenetic approach.
|
tree constructed by considering phenotypic similarities, does not reflect evolutionary relationships
|
|
Describe the cladistic approach.
|
tree constructed by considering the various possible pathways of evolution and choosing the best possible tree
|
|
T/F. Phenetic approach is often used for taxonomic purposes.
|
F
|
|
T/F. The phenetic approach often has faster algorithms than the cladistic approach.
|
T
|
|
What does UPGMA stand for?
|
unweighted pair group method using arithmetic averages
|
|
T/F. A limitation of cluster methods is that they only permit bifurcating trees.
|
F. this is true but it is not a limitation since branch lengths can be zero
|
|
What does UPGMA assume?
|
molecular clock
|
|
Which distance method attempts to correct the UPGMA method for its assumption that all rates of change are equal?
|
neighbor joining method
|
|
T/F. Neighbor joining method yields a rooted tree.
|
F
|
|
What defines the most parsimonous tree?
|
tree with the minimum number of evolutionary changes
|
|
How is a progressive global alignment done?
|
pairwise alignmets of all seqs, construction of distance matrix, construction of phylogenetic tree using distance matrix, alignment of seqs using tree as a guide, profile-profile alignments
|
|
T/F. For a progressive global alignment, the choice of scoring affects the final alignment.
|
T
|
|
T/F. In most cases, a tree does not have a strong influence on the alignment.
|
T
|
|
How is an iterative global alignment done?
|
repeat alignments of a subgroup of seqs, align in a global alignment
|
|
T/F. MUSCLE uses a progressive global alignment.
|
F. iterative
|
|
Which score does Clustal use? How does it treat gaps?
|
PSP; ignores them
|
|
What does MUSCLE stand for?
|
multiple sequence comparison by log-expectation
|
|
What does HMM stand for?
|
Hidden Markov model alignments
|
|
What are microsatellites often used for?
|
distinguishing between individuals
|
|
Which model assumes nucleotide substitution arise at any site with equal frequency.
|
all
|
|
What is the formula for the JC distance?
|
DJC = -3/4ln(1-4/3D)
|
|
Which model takes into account the different rates of transversions and transitions?
|
k2p
|
|
What is the formula for k2p distance?
|
Dk2p = -1/2ln(1-2P-Q)-1/4ln(1-2Q)
|
|
For k2p, what does P represent? Q?
|
????
|
|
Which model takes into account that not all transitions occur at the same rate and not all transversions occur at the same rate?
|
Tamura and Nei
|
|
Why might some sites change more rapidly or more slowly than others?
|
functional constraints on protein or RNA, different positions within the codon, diff chromosomal location
|
|
Which assumption does a gamma distribution correct for?
|
assumption that rate of substitution is the same for all sites
|
|
What is a 4 fold degenerate site?
|
third codon positions where all changes are synonymous
|
|
T/F. PAML makes use of a codon model
|
T
|
|
T/F. If there are more nonsynonmous changes than synonymous changes, positive selection is at play.
|
T
|
|
How does the JC distance formula change when looking at aa seqs?
|
DJC = -19/20ln(1-(20/19)D)
|
|
What did Dayhoff do?
|
L14 S26
|
|
How do you calculate the relative mutability of an aa?
|
#substitutions/freq
|
|
T/F. PAM250 is generally used for distant comparisons.
|
T
|
|
PAM250 corresponds to ~ how many differences per site?
|
2.5
|
|
How are PAM scoring values shown?
|
as a symmetric log odds ratio matrix
|
|
How do you find the log odds value if p=0.08?
|
log(0.08/0.92)
|
|
What does it mean if a log odds is greater than zero?
|
the amino acids are found across from eachother in an alignment more often than expected by chance
|
|
T/F. PAM1 is a scoring matrix.
|
F. Transition matrix
|
|
What are some problems with the PAM matrix?
|
biased towards globular proteins, assumes all sites equally mutable, only few proteins available at the time
|
|
What was an update of the PAM matrix?
|
JTT matrix
|
|
T/F. PAM is more reliable than BLOSUM for distantly related proteins.
|
F
|
|
Which matrix is the default for BLAST?
|
BLOSUM
|
|
T/F. FASTA uses BLOSUM
|
T
|
|
T/F. BLOSUM is more tolerant of hydrophilic aa substitutions than PAM.
|
F
|
|
Which BLOSUM matrix is PAM160 ~equal to?
|
BLOSUM62
|
|
PAM# increases as query length ___. BLOSUM# increases as query length ___.
|
increases; decreases
|
|
Which type of alignments do protein only databases use?
|
SW
|
|
Name 2 algorithms to search sequence databases.
|
BLAST and FASTA
|
|
What is the aim of FASTA?
|
look for homologs or similar sequences to a query sequence in a database
|
|
What is a problem with FASTA?
|
SW is slow
|
|
How can you speed up FASTA?
|
increase K-tuple/word size
|
|
What is a HASH table?
|
database divided into alphabetized list of words with links to location in db
|
|
L16 S1
|
??
|
|
What does So represent?
|
number of k-tuples on the main diagonal
|
|
What does S-n represent?
|
number of k-tuples below the main diagonal
|
|
What are the steps for FASTA?
|
calculate k-tuples, score them, identify 10 best, rescore, join initial regions with joining penalties, full alignment for seqs with high scores
|
|
How do you know that the optimal score for FASTA is unusual?
|
fit a linear regression line and calculate a z-score
|
|
What does UniRef100 do?
|
non-redundant collection of proteins (like UniProt) but eliminates identical proteins
|
|
go over FASTA results L16
|
????
|
|
compare BLAST AND FASTA
|
sbdfdghdhhdggdhdh
|
|
What does it mean if you have a double exponential?
|
like a distribution for an extreme value
|
|
T/F. E is a probability.
|
F
|
|
What is psi blast used for?
|
families of proteins
|
|
What are orthologs?
|
homologous seqs separated by a speciation event
|
|
What are paralogs??
|
homology by duplication
|
|
What are xenologs?
|
homology through lateral gene transfer
|
|
What is homoplasy?
|
independent presence of similar characters between species
|
|
What does convergence mean?
|
non related orgs evolving sim traits independently due to sim env pressures
|
|
What is a reversion?
|
return of character to on of its ancestral states
|
|
For scale trees, what are the branch lengths proportional to?
|
number of changes on that branch
|
|
KNOW NEWICK FORMAT
|
L19 S29
|
|
What are the 3 domains of life?
|
bacteria, archaea, eucarya
|
|
Know how to make a upgma tree
|
L20 S9
|
|
T/F. Neighbour method assumes molecular clock.
|
F
|
|
T/F. UPGMA yields rooted trees whereas neighbour yields unrooted
|
T
|
|
Describe Fitch Margoliash.
|
pairwise clustering algorithm, does not add taxa one at a time, uses least squares method
|
|
Which methods are more accurate than distance methods?
|
parsimony methods
|
|
Know diff between informative and non informative sites
|
L20 S37
|