Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
47 Cards in this Set
- Front
- Back
- 3rd side (hint)
Basic goals of genome project
|
1. Sequenced DNA that comprises major genome units
2. Annotate genome features 3. Analyze freq and distribution of features, make biological and evolutionary comparisons, and publish 4. Make sequence and annotations publicly available |
|
|
When did the HGP begin and end?
|
1990 to 2003 (2 yrs ahead of schedule)
|
|
|
Who headed the governments HGP?
|
US dept of Energy and the NIH. PI was Francis Collins.
|
|
|
What were the goals of the HGP?
|
>identify all the approximately 20,000-25,000 genes in human DNA,
>determine the sequences of the 3 billion chemical base pairs that make up human DNA, >store this information in databases, >improve tools for data analysis, >transfer related technologies to the private sector, and >address the ethical, legal, and social issues (ELSI) that may arise from the project. |
|
|
What's a genome consortium?
|
Association of scientists and public and private institutes working together to sequence and understand a genome for public release
|
|
|
How do we use bioinformatics?
|
• Store/retrieve biological information (databases)
• Retrieve/compare gene sequences • Predict function of unknown genes/proteins • Search for previously known functions of a gene • Compare data with other researchers • Compile/distribute data for other researchers • Generate hypotheses for testing in the lab |
|
|
What does BLAST stand for?
|
Basic local alignment search tool
|
|
|
What can linear sequence of letters in the genome reveal about vying organisms?
|
1. How info is carried out or expressed over time and space
2. How info is transformed over generations 3. Sequence alone is not the key to life |
|
|
What distinguishes genome level studies from genetic studies?
|
Emergent properties of genome:
• Gene order along chromosome, gene frequency • Genome characteristics: GC content, codon usage, transposons, can test horizontal gene transfer, changes in genome size Scale/ Speed: • GWAS – Genome Wide Association studies to find all genes associated with specific phenotype • Complex genetic disease due to many interacting genes, many unknown genes, can quickly identify candidate genes • Development of many genetic markers at once • Find all differentially expressed genes related to alternative phenotypic state |
|
|
Genomic information is differentially expressed over time and space in an individual. This reflects...
|
Biological processes!:
>development (growth and differentiation) >disease >responses to environmental stimuli (think phenotypic plasticity)09 |
|
|
ENCODE project determined
|
That over 80% of the human genome's components have been assigned at least one biological function
|
|
|
What's the take home message with gene regulation?
|
Quite complex
Involves many interacting proteins |
|
|
Examples of differing expression in response to life stage or environmental stimuli
|
1. Fetal vs adult regulation of gloving gene region
2. Daphnia pulex develops curved helmet and elongated tail spine in presence of predator chemical cues (34/107 genes were novel) 3. Frog genome response to Db infection - immune response not there at early stages. At later stages, differential gene expression more down regulated than up. Fungus winning! Gene suppression :( |
|
|
Epigenome is
|
Modifications to genome fetching gene regulation without changing nucleotides
>involves DNA methylation or histone modification >Methylation silences genes at CpG sites |
|
|
Genomic information is transformed over generations! This reflects processes of...
|
1. Mutation
2. Population genetics >Results in differences among populations and species |
|
|
Sanger sequencing
|
Dideoxy chain termination sequencing
|
|
|
ddNTP stands for ___________, and morphologically a ddNTP is ...
|
Dideoxyribonucleotide
Missing 3' -OH group and terminates the DNA chain |
|
|
How was the HGP conducted?
|
Hierarchical shotgun sequencing. Explain.
|
|
|
Genomic library
|
Contains entire genome in overlapping segments
>vector determines size of DNA insert that can be carried (100-200kb fragments into BAC vector during HGP) |
|
|
HGSP sequencing strategy
|
Don't sequence everything
1. Take overlapping BACs and draw out tiling set (29k clones as opposed to 354k) 2. Shotgun sequence and computer assembly 3. More shotgun, gap closure, finishing |
|
|
Mapping:
Genetic maps vs physical maps |
Genetic:
>relative order of genetic markers >Distance expressed in units of recombination frequency >units = cM Physical: >Distances based on length of DNA sequence between genetic elements >Units = kb, Mb, etc. |
|
|
Why does recombination help make genetic maps?
|
1. Linked genes (alleles) transmitted as a single unit
2. Coinheritance implies two alleles on same chromosome |
|
|
Degree of crossing over (recombination) between two genes on the same chromosome is ...
|
proportional to the distance between them. Map units, aka ____________, are ...
|
Centimorgans, 1 % recombination between 2 genes on a chromosome
|
|
STSs
|
Sequence tagged sites; known fragments of DNA of known chromosomal location, used to map BAC inserts
|
|
|
HGP mapped BAC inserts by
|
1. Using STSs
2. Digesting library clones with HindIIIo get DNA fragment fingerprint |
|
|
What is FISH and what is it used for?
|
Fluorescent in situ hybridization; used to map BAC clones with no STSs
|
|
|
What is a tiling set?
|
Fragments with Minimal amount of overlap to allow seqs to be stitched back together
|
|
|
Library
|
~2kb fragment BAC inserts; ligate into a vector and sequence from either end (phage vector sequence is known)
>get about 800bp from each end, so close to 2000 bp total |
|
|
Contigs
|
Computer program orders seqs based on overlap to form longest contiguous sequence
|
|
|
Scaffold
|
Overlapping clone sequences eliminate gaps, yield complete seq.; scaffold = merged contigs
|
|
|
Define coverage
|
Average number of times a base was sequenced
|
|
|
Name some of the Bermuda Rules and who they applied to
|
1996
-assembled seq greater than 1000bp deposited into public database every 24hrs -no patents may be filed -joint commitment of sequencing centers and funding agencies |
|
|
What did Celera do differently from the HGP?
|
Whole genome shotgun sequencing instead of heierarchical shotgun sequencing; broke whole genome into pieces of different sizes, skipped mapping, Advanced computing assembled pieces into correct order
|
|
|
Differences in next-gen sequencing?
|
-Numerous technologies (454, Illumina, SOLID)
-much shorter fragments and longer runs -thousands to millions of short frags done at once rather than 96 800bp reads -real challenge is assembling these very short reads (75 to 400bp) |
|
|
Human genome is ~___ Gb but only ~__% codes for proteins;
Genome contains about how many genes? |
3.2; 1.5
25,000 |
|
|
How do we find genes in a DNA sequence?
|
Look for the DNA signatures of genes:
– Start codons – Stop codons – “open” sequence in between (ORF) • These regions are potential genes – i.e. they could produce proteins if proper promoters and translation initiation sequences are present. • How many ways can you translate a DNA sequence? |
6
|
|
Ab initio methods of gene detection rely on...
|
Signal and content:
Signal is presence of specific sequence (e.g., Prinbow box or -35 element, polyadenylation signal) Content is statistical properties of protein-coding sequence itself (GC content, ORF, codon frequency, etc.) |
|
|
TATA box
|
a DNA sequence (cis-regulatory element) found in the promoter region of genes in archaea and eukaryotes;[2] approximately 24% of human genes contain a TATA box within the core promoter
|
|
|
Why is an initio gene prediction in eukaryotes more difficult than in prokaryotes?
|
Less gene-rich
Introns Complex promoter sequences |
|
|
What is an EST?
|
Expressed sequence tag; partial sequence of a cDNA
|
|
|
What are the difficulties associated with biological evidence?
|
• Part of different cDNAs show similarity to same region of chromosome (alternative splicing).
• BLAST doesn’t distinguish splice sites, so intron boundaries often incorrect • More often than not, have ESTs first, are often incomplete, 5’ end missing, so gene boundaries may be incorrect • Predictions less accurate as move away from species |
|
|
Transposons
|
Move within the genome and can insert themselves into various positions within and between chromosomes;
Usually exist in many similar copies within a genome ("families"); LINES and SINES are major families in humans |
|
|
Describe class I transposons
|
Retrotransposons – resemble RNA viruses and move via an RNA intermediate (copy and paste)
•Usestheenzymereversetranscriptase(mayormay not encode RT). •Mayormaynothavelongterminalrepeats(LTR’s) |
|
|
Describe class II transposons
|
DNA transposons – move from one part of the genome to another by a cut-and-paste or copy and paste mechanisms
•Use the enzyme transposase thatallowsthemtojump (may or may not encode transposase) • Cause small duplication at target site |
|
|
Retro transposons are transcribed by _____, the RT turns into _____ and gets ....
|
Retrotransposons are transcribed by RNA pol II, RT turns into cDNA, reinserted into genome
|
|
|
DNA Transposons move by two methods:
|
nonreplicative (cut and paste) and replicative (copy and paste) transposition
|
|
|
How are transposons identified in a new genome?
|
Discovery: (e.g. homology based method) BLAST genome using transposon-specific sequences as queries, e.g., sequence for transposase, reverse transcriptase, integrase; scan surrounding regions for flanking repeats to iden3fy boundaries; (e.g. structure based method) look for long terminal repeats
Detection: (e.g., RepeatMasker/CENSOR) Iden3fy interspersed repeat sequences in genome that match consensus of transponsable element consensus sequence in RepBase databases |
|