Set the Language

We weren't able to detect the audio language on your flashcards. Please select the correct language below.

Front

Back

Related Flashcards

Flashcards
»
Bioinformatics - MT2

Bioinformatics - Mt2

by daleb, Nov. 2012

Favorite

Add to folder

Flag

Shuffle
Toggle On

Toggle Off
Alphabetize
Toggle On

Toggle Off
Front First
Toggle On

Toggle Off
Both Sides
Toggle On

Toggle Off
Read
Toggle On

Toggle Off

Reading...

Front

Card Range To Study

through

Play button

Progress

1/135

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

135 Cards in this Set

Front
Back

	How can we visualize comparisons between sequences?	dot plots
	What are 2 problems with dot plots?	lots of noise if filter not used; time consuming if words not used
	How can dot plots be generated quickly?	through the use of identity blocks
	T/F. Dot plots provide alignments between sequences.	F
	T/F. For alignments, gaps can be introduced to increase base matches.	T
	What is the name of the most basic algorithm to align two sequences?	Needleman Wunsch
	What is the scoring scheme for NW alignments?	match:1, mismatch:0
	Where is the max score of a NW alignment found?	top left
	T/F. The NW algorithm creates a local alignment.	F. global
	Which algorithm is likely to miss short and highly similar subsequences?	NW
	T/F. SW algorithm creates a local alignment.	T
	T/F. All sequences are aligned when using a SW algorithm.	F
	How can you turn a NW algorithm into a SW algorithm?	give negative score to mismatchs; make 0 the min score recorded; beginning and end of optimal path may be found anywhere in the matrix
	T/F. A global alignment should be used for homologous sequences.	T
	T/F. Global alignments should be used when homology is distant.	F. local
	How can you determine if the alignment of 2 sequences is statistically significant?	permutation test: randomly rearrange one or both seqs, align permuted seqs, record alignment score, repeat many times
	T/F. If an alignment score is higher than that for the permuted sequences, they must be homologous to some extent.	T
	For protein sequences, greater than __ % identity suggests homology.	25
	T/F. A gap and its length are distinct quantities.	T
	When should end gaps not be penalized?	if two sequences have no obvious relationship at ends
	T/F. The optimal alignment is always statistically significant.	F
	T/F. A NW algorithm can only be done with 2 sequences.	F
	Name one of the most popular multiple alignment programs.	Clustal
	T/F. Indels are more likely to occur at the ends of a sequence.	T
	What is the formula for the hamming distance?	D = k/n
	If all mutations occur at the same frequency, which value with the rate of substitutions over time approach for the hamming distance?	0.75
	T/F. For JC, the measure of distance becomes more accurate as time goes on.	F
	T/F. The maximum hamming distance is 0.75	F
	When can you not perform a JC?	when the hamming distance is greater than 0.75
	Which correction corrects for differences in the rates of transitions and transversions?	kimura 2-parameter
	In which distance corrections does divergence follow a logarithmic function?	JC, k2p,
	Describe the Tamura-Nei correction.	different rates of transversions and transitions between purines and pyrimidines respectively
	How do we deal with sequences with substitutions unequally spread out?	apply a gamma distribution
	T/F. The higher alpha is, the higher the extent of variation in a substitution rate in a gamma distribution.	T
	T/F. Gamma distribution cannot be applied to the hamming distance.	F
	What is a synonymous change?	substitutions that do not cause amino acid replacement
	T/F. Synonymous substitutions occur at a much faster rate than nonsynonymous substitutions.	T
	What is the most common way to assign weights based on the structural similarities and the ease of genetic interchange?	Dayhoff's PAM250 matrix
	What does PAM stand for?	percent accepted mutations
	T/F. A PAM250 matrix does not work well for distant relationships.	F
	ODDS MATRIX??	pdf8
	What does it mean if a residue pair in an odds log matrix have a score greater than 0?	They replace each other more often as alternatives in related sequences than in random sequences (residues may have similar function)
	What is a problem with PAM matrices?	assumes all sites are equally mutable
	What does BLOSUM stand for?	BLOcks SUbstitution Matrix
	MORE ON BLOSUM	pdf8
	GONNET?	pdf8
	Why does PHYLIP ignore all gaps and missing data?	there is no accurate way to weight changes due to indels relative to substitutions (all empirical)
	T/F. There are more possible topologies for a rooted tree than an unrooted tree for n species.	T
	Which type of tree should be used when rates of evolution are variable?	unrooted
	What does OTU stand for?	operational taxonomic unit
	What can OTUs represent?	groups of organisms, populations, species, families, etc
	What is the location where interior branches in a tree meet up called?	internal node
	What is the purpose of including an outgroup in a tree?	to find the root of the tree
	Which characters typically provide the greatest problems for tree reconstruction algorithms?	homoplasies
	Which ancestral character is used to build phylogenies?	synapomorphies
	There is a genus of plants in which one species develops red petals (with the ancestral form being white petals). Suppose it underwent speciation such that there are now two red-petalled species and that there still exist five white-petalled species. Then white petals is the __ character, red petals is an ____ character, the white petals among the five species is a __ character, the red petals among the two species is a ___ character. If another species arose with purple petals, this would be a ___ character.	plesiomorphic; apomorphic; symplesiomorphic; synapomorphic; autapomorphic
	Describe the phenetic approach.	tree constructed by considering phenotypic similarities, does not reflect evolutionary relationships
	Describe the cladistic approach.	tree constructed by considering the various possible pathways of evolution and choosing the best possible tree
	T/F. Phenetic approach is often used for taxonomic purposes.	F
	T/F. The phenetic approach often has faster algorithms than the cladistic approach.	T
	What does UPGMA stand for?	unweighted pair group method using arithmetic averages
	T/F. A limitation of cluster methods is that they only permit bifurcating trees.	F. this is true but it is not a limitation since branch lengths can be zero
	What does UPGMA assume?	molecular clock
	Which distance method attempts to correct the UPGMA method for its assumption that all rates of change are equal?	neighbor joining method
	T/F. Neighbor joining method yields a rooted tree.	F
	What defines the most parsimonous tree?	tree with the minimum number of evolutionary changes
	How is a progressive global alignment done?	pairwise alignmets of all seqs, construction of distance matrix, construction of phylogenetic tree using distance matrix, alignment of seqs using tree as a guide, profile-profile alignments
	T/F. For a progressive global alignment, the choice of scoring affects the final alignment.	T
	T/F. In most cases, a tree does not have a strong influence on the alignment.	T
	How is an iterative global alignment done?	repeat alignments of a subgroup of seqs, align in a global alignment
	T/F. MUSCLE uses a progressive global alignment.	F. iterative
	Which score does Clustal use? How does it treat gaps?	PSP; ignores them
	What does MUSCLE stand for?	multiple sequence comparison by log-expectation
	What does HMM stand for?	Hidden Markov model alignments
	What are microsatellites often used for?	distinguishing between individuals
	Which model assumes nucleotide substitution arise at any site with equal frequency.	all
	What is the formula for the JC distance?	DJC = -3/4ln(1-4/3D)
	Which model takes into account the different rates of transversions and transitions?	k2p
	What is the formula for k2p distance?	Dk2p = -1/2ln(1-2P-Q)-1/4ln(1-2Q)
	For k2p, what does P represent? Q?	????
	Which model takes into account that not all transitions occur at the same rate and not all transversions occur at the same rate?	Tamura and Nei
	Why might some sites change more rapidly or more slowly than others?	functional constraints on protein or RNA, different positions within the codon, diff chromosomal location
	Which assumption does a gamma distribution correct for?	assumption that rate of substitution is the same for all sites
	What is a 4 fold degenerate site?	third codon positions where all changes are synonymous
	T/F. PAML makes use of a codon model	T
	T/F. If there are more nonsynonmous changes than synonymous changes, positive selection is at play.	T
	How does the JC distance formula change when looking at aa seqs?	DJC = -19/20ln(1-(20/19)D)
	What did Dayhoff do?	L14 S26
	How do you calculate the relative mutability of an aa?	#substitutions/freq
	T/F. PAM250 is generally used for distant comparisons.	T
	PAM250 corresponds to ~ how many differences per site?	2.5
	How are PAM scoring values shown?	as a symmetric log odds ratio matrix
	How do you find the log odds value if p=0.08?	log(0.08/0.92)
	What does it mean if a log odds is greater than zero?	the amino acids are found across from eachother in an alignment more often than expected by chance
	T/F. PAM1 is a scoring matrix.	F. Transition matrix
	What are some problems with the PAM matrix?	biased towards globular proteins, assumes all sites equally mutable, only few proteins available at the time
	What was an update of the PAM matrix?	JTT matrix
	T/F. PAM is more reliable than BLOSUM for distantly related proteins.	F
	Which matrix is the default for BLAST?	BLOSUM
	T/F. FASTA uses BLOSUM	T
	T/F. BLOSUM is more tolerant of hydrophilic aa substitutions than PAM.	F
	Which BLOSUM matrix is PAM160 ~equal to?	BLOSUM62
	PAM# increases as query length ___. BLOSUM# increases as query length ___.	increases; decreases
	Which type of alignments do protein only databases use?	SW
	Name 2 algorithms to search sequence databases.	BLAST and FASTA
	What is the aim of FASTA?	look for homologs or similar sequences to a query sequence in a database
	What is a problem with FASTA?	SW is slow
	How can you speed up FASTA?	increase K-tuple/word size
	What is a HASH table?	database divided into alphabetized list of words with links to location in db
	L16 S1	??
	What does So represent?	number of k-tuples on the main diagonal
	What does S-n represent?	number of k-tuples below the main diagonal
	What are the steps for FASTA?	calculate k-tuples, score them, identify 10 best, rescore, join initial regions with joining penalties, full alignment for seqs with high scores
	How do you know that the optimal score for FASTA is unusual?	fit a linear regression line and calculate a z-score
	What does UniRef100 do?	non-redundant collection of proteins (like UniProt) but eliminates identical proteins
	go over FASTA results L16	????
	compare BLAST AND FASTA	sbdfdghdhhdggdhdh
	What does it mean if you have a double exponential?	like a distribution for an extreme value
	T/F. E is a probability.	F
	What is psi blast used for?	families of proteins
	What are orthologs?	homologous seqs separated by a speciation event
	What are paralogs??	homology by duplication
	What are xenologs?	homology through lateral gene transfer
	What is homoplasy?	independent presence of similar characters between species
	What does convergence mean?	non related orgs evolving sim traits independently due to sim env pressures
	What is a reversion?	return of character to on of its ancestral states
	For scale trees, what are the branch lengths proportional to?	number of changes on that branch
	KNOW NEWICK FORMAT	L19 S29
	What are the 3 domains of life?	bacteria, archaea, eucarya
	Know how to make a upgma tree	L20 S9
	T/F. Neighbour method assumes molecular clock.	F
	T/F. UPGMA yields rooted trees whereas neighbour yields unrooted	T
	Describe Fitch Margoliash.	pairwise clustering algorithm, does not add taxa one at a time, uses least squares method
	Which methods are more accurate than distance methods?	parsimony methods
	Know diff between informative and non informative sites	L20 S37

Share This Flashcard Set

Set the Language

Related Flashcards

Bioinformatics - Mt2

Add to Folders

Upgrade to Cram Premium

Card Range To Study

135 Cards in this Set