• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/47

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

47 Cards in this Set

  • Front
  • Back
  • 3rd side (hint)
Basic goals of genome project
1. Sequenced DNA that comprises major genome units
2. Annotate genome features
3. Analyze freq and distribution of features, make biological and evolutionary comparisons, and publish
4. Make sequence and annotations publicly available
When did the HGP begin and end?
1990 to 2003 (2 yrs ahead of schedule)
Who headed the governments HGP?
US dept of Energy and the NIH. PI was Francis Collins.
What were the goals of the HGP?
>identify all the approximately 20,000-25,000 genes in human DNA,
>determine the sequences of the 3 billion chemical base pairs that make up human DNA,
>store this information in databases,
>improve tools for data analysis,
>transfer related technologies to the private sector, and
>address the ethical, legal, and social issues (ELSI) that may arise from the project.
What's a genome consortium?
Association of scientists and public and private institutes working together to sequence and understand a genome for public release
How do we use bioinformatics?
• Store/retrieve biological information (databases)
• Retrieve/compare gene sequences
• Predict function of unknown genes/proteins
• Search for previously known functions of a gene
• Compare data with other researchers
• Compile/distribute data for other researchers
• Generate hypotheses for testing in the lab
What does BLAST stand for?
Basic local alignment search tool
What can linear sequence of letters in the genome reveal about vying organisms?
1. How info is carried out or expressed over time and space
2. How info is transformed over generations
3. Sequence alone is not the key to life
What distinguishes genome level studies from genetic studies?
Emergent properties of genome:
• Gene order along chromosome, gene frequency
• Genome characteristics: GC content, codon usage, transposons, can test horizontal gene transfer, changes in genome size
Scale/ Speed:
• GWAS – Genome Wide Association studies to find all genes associated with specific phenotype
• Complex genetic disease due to many interacting genes, many unknown genes, can quickly identify candidate genes
• Development of many genetic markers at once
• Find all differentially expressed genes related to alternative
phenotypic state
Genomic information is differentially expressed over time and space in an individual. This reflects...
Biological processes!:
>development (growth and differentiation)
>disease
>responses to environmental stimuli (think phenotypic plasticity)09
ENCODE project determined
That over 80% of the human genome's components have been assigned at least one biological function
What's the take home message with gene regulation?
Quite complex
Involves many interacting proteins
Examples of differing expression in response to life stage or environmental stimuli
1. Fetal vs adult regulation of gloving gene region
2. Daphnia pulex develops curved helmet and elongated tail spine in presence of predator chemical cues (34/107 genes were novel)
3. Frog genome response to Db infection - immune response not there at early stages. At later stages, differential gene expression more down regulated than up. Fungus winning! Gene suppression :(
Epigenome is
Modifications to genome fetching gene regulation without changing nucleotides
>involves DNA methylation or histone modification
>Methylation silences genes at CpG sites
Genomic information is transformed over generations! This reflects processes of...
1. Mutation
2. Population genetics
>Results in differences among populations and species
Sanger sequencing
Dideoxy chain termination sequencing
ddNTP stands for ___________, and morphologically a ddNTP is ...
Dideoxyribonucleotide
Missing 3' -OH group and terminates the DNA chain
How was the HGP conducted?
Hierarchical shotgun sequencing. Explain.
Genomic library
Contains entire genome in overlapping segments
>vector determines size of DNA insert that can be carried (100-200kb fragments into BAC vector during HGP)
HGSP sequencing strategy
Don't sequence everything
1. Take overlapping BACs and draw out tiling set (29k clones as opposed to 354k)
2. Shotgun sequence and computer assembly
3. More shotgun, gap closure, finishing
Mapping:
Genetic maps vs physical maps
Genetic:
>relative order of genetic markers
>Distance expressed in units of recombination frequency
>units = cM
Physical:
>Distances based on length of DNA sequence between genetic elements
>Units = kb, Mb, etc.
Why does recombination help make genetic maps?
1. Linked genes (alleles) transmitted as a single unit
2. Coinheritance implies two alleles on same chromosome
Degree of crossing over (recombination) between two genes on the same chromosome is ...
proportional to the distance between them. Map units, aka ____________, are ...
Centimorgans, 1 % recombination between 2 genes on a chromosome
STSs
Sequence tagged sites; known fragments of DNA of known chromosomal location, used to map BAC inserts
HGP mapped BAC inserts by
1. Using STSs
2. Digesting library clones with HindIIIo get DNA fragment fingerprint
What is FISH and what is it used for?
Fluorescent in situ hybridization; used to map BAC clones with no STSs
What is a tiling set?
Fragments with Minimal amount of overlap to allow seqs to be stitched back together
Library
~2kb fragment BAC inserts; ligate into a vector and sequence from either end (phage vector sequence is known)
>get about 800bp from each end, so close to 2000 bp total
Contigs
Computer program orders seqs based on overlap to form longest contiguous sequence
Scaffold
Overlapping clone sequences eliminate gaps, yield complete seq.; scaffold = merged contigs
Define coverage
Average number of times a base was sequenced
Name some of the Bermuda Rules and who they applied to
1996
-assembled seq greater than 1000bp deposited into public database every 24hrs
-no patents may be filed
-joint commitment of sequencing centers and funding agencies
What did Celera do differently from the HGP?
Whole genome shotgun sequencing instead of heierarchical shotgun sequencing; broke whole genome into pieces of different sizes, skipped mapping, Advanced computing assembled pieces into correct order
Differences in next-gen sequencing?
-Numerous technologies (454, Illumina, SOLID)
-much shorter fragments and longer runs
-thousands to millions of short frags done at once rather than 96 800bp reads
-real challenge is assembling these very short reads (75 to 400bp)
Human genome is ~___ Gb but only ~__% codes for proteins;
Genome contains about how many genes?
3.2; 1.5
25,000
How do we find genes in a DNA sequence?
Look for the DNA signatures of genes:
– Start codons
– Stop codons
– “open” sequence in between (ORF)
• These regions are potential genes
– i.e. they could produce proteins if proper promoters and translation initiation sequences are present.

• How many ways can you translate a DNA sequence?
6
Ab initio methods of gene detection rely on...
Signal and content:
Signal is presence of specific sequence (e.g., Prinbow box or -35
element, polyadenylation signal)
Content is statistical properties of protein-coding sequence itself (GC content, ORF, codon frequency, etc.)
TATA box
a DNA sequence (cis-regulatory element) found in the promoter region of genes in archaea and eukaryotes;[2] approximately 24% of human genes contain a TATA box within the core promoter
Why is an initio gene prediction in eukaryotes more difficult than in prokaryotes?
Less gene-rich
Introns
Complex promoter sequences
What is an EST?
Expressed sequence tag; partial sequence of a cDNA
What are the difficulties associated with biological evidence?
• Part of different cDNAs show similarity to same region of chromosome (alternative splicing).
• BLAST doesn’t distinguish splice sites, so intron boundaries often incorrect
• More often than not, have ESTs first, are often incomplete, 5’ end missing, so gene boundaries may be incorrect
• Predictions less accurate as move away from species
Transposons
Move within the genome and can insert themselves into various positions within and between chromosomes;
Usually exist in many similar copies within a genome ("families");
LINES and SINES are major families in humans
Describe class I transposons
Retrotransposons – resemble RNA viruses and move via an RNA intermediate (copy and paste)
•Usestheenzymereversetranscriptase(mayormay not encode RT).
•Mayormaynothavelongterminalrepeats(LTR’s)
Describe class II transposons
DNA transposons – move from one part of the genome to another by a cut-and-paste or copy and paste mechanisms
•Use the enzyme transposase thatallowsthemtojump (may or may not encode transposase)
• Cause small duplication at target site
Retro transposons are transcribed by _____, the RT turns into _____ and gets ....
Retrotransposons are transcribed by RNA pol II, RT turns into cDNA, reinserted into genome
DNA Transposons move by two methods:
nonreplicative (cut and paste) and replicative (copy and paste) transposition
How are transposons identified in a new genome?
Discovery: (e.g. homology based method) BLAST genome using transposon-specific sequences as queries, e.g., sequence for transposase, reverse transcriptase, integrase; scan surrounding regions for flanking repeats to iden3fy boundaries; (e.g. structure based method) look for long terminal repeats
Detection: (e.g., RepeatMasker/CENSOR) Iden3fy interspersed repeat sequences in genome that match consensus of transponsable element consensus sequence in RepBase databases