Use LEFT and RIGHT arrow keys to navigate between flashcards;
Use UP and DOWN arrow keys to flip the card;
H to show hint;
A reads text to speech;
114 Cards in this Set
- Front
- Back
- 3rd side (hint)
What is functional genomics?
|
The study of how things with different genotypes can have different functions, i.e. phenotypes!
|
|
|
How big the genome?
|
3Gb. 6 meters in each cell!
|
|
|
What different types of proteins are there.
|
Structural, regulatory__Housekeeping (general), Luxury (specific)
|
|
|
Why might RNA not be a good proxy for proteins?
|
asf |
|
|
What are the 4 analyses you can do with RNA-seq?
|
Expression levels, DE, patterns in expression, splicing and isoforms
|
|
|
What is cDNA?
|
DNA strands produced from RNA via revere transcriptase.
|
|
|
Name 2 ways to measure RNA
|
Microarrays__RNA-seq
|
|
|
Explain how microarrays work
|
Probes with solid support.__Hybridise labeled samples of mRNA to these.__Shoot with lasers.
|
|
|
Explain how RNA-seq works
|
asd |
|
|
Name elements in transcriptional regulation
|
Gene/protein interactions__DNA methylation__Chromatin (epigenetic structure)__microRNA__Gene expression
|
|
|
Explain 2-color microarrays
|
Experimental culture (red) + control culture (green)__Measure colour
|
|
|
Explain 1-color microarrays and give examples of types
|
BeadArray: Beads coated with copies of probe__23 base address linked to 50-base gene specific probe__30 copies of each bead type per array: 47k genes => 1.3M probes
|
|
|
Explain the general workflow when working with microarrays
|
asf |
|
|
Explain the three cornerstones of experimental design
|
asd |
|
|
How to detect outliers?
|
log-2 normalisation__Positive/negative controls__Number of deteceted probes per sample__Signal distribution__Clustering / PCA
|
|
|
Explain how control probes work
|
Positive: Houskeeping genes -- should be there!__Negative: Things that should not be there!__Negative also used to determine background.
|
|
|
What do MA-plots show?
|
log2 fraction vs. mean of logs. Should be straight.
|
|
|
Name some normalisation techniques
|
Simple scaling, LOESS, Quantile, robust spline
|
|
|
How does quantile normalisation work?
|
asf |
|
|
What can be done to determine normalisation?
|
MA-plot / PCA__Search for patterns in data
|
|
|
Describe the RNA-seq workflow
|
Align to sequence (genome/transcriptome), get annotation, compare with databases
|
|
|
Describe why and how to filter probes
|
p = 0.01 => lots of finds! 50k probes -> 500 false pos.__Remove probes not present in all samples.__Use multiple probes per gene
|
|
|
Describe the three downstream analsysis categories
|
Class discovery: clustering, PCA__Class comparison: groups known, genes / pathways associated?__Class prediction: SVM, survival
|
|
|
What is the cross-hybridisation problem and what does it limit?
|
In microarrays, dna tags to the wrong probes.__Sensitivity, range, probe coverage
|
|
|
Benefits of RNA-seq
|
Can get isoform data -- not possible with microarray__Gives absolute abundance
|
|
|
Benefit of microarrays
|
Faster, easier, more mature analysis
|
|
|
How does sequencing work?
|
Reversible terminators__Sequencing by ligation__Nanopores
|
|
|
Describe the Solexa sequencing structure
|
15 steps or so
|
|
|
What are some modes of sequencing?
|
Paired-end: each end aligned separately__Multiplexing__Capture sequencing
|
|
|
What are some benefits of paired end sequencing?
|
Disambiguate non-uniquely mapped reads__Detect isoforms__Detect duplications, inversions, chromosomal rearrangements__Calculation of distribution of insert sizes
|
|
|
Describe multiplexing (probably not in it)
|
Look it up
|
|
|
Describe capture sequncing
|
Magnetic strip to specific sequences__Flush away rest.
|
|
|
Discuss problems with sequence alignment
|
Filtered / unfiltered (???)__Unique / non-unique__Duplicates (amplification bias)__Mismatches / indels__Adapters and index sequence
|
|
|
Sequencing trends (probably not)
|
sa |
|
|
What questions are RNA and ChIP seq answering?
|
RNA: 1) WHAT is the sequence? 2) WHAT is he concentration?__ChIP: 1) WHERE does it bind? 2) HOW MUCH is bound?
|
|
|
Why is RNA-seq imperfect?
|
Data consists of 1-2 sequences per fragment__Base call qualities for each base in each read varies__RNA-data is meta-data. Read -> cDNA library__Reference genome rarely sample genome, SNPs etc, indels, structural variants__Reads prone to error (1/1000)
|
|
|
Describe the ChIP-seq protocol
|
1. Crosslink and shear__2. Add protein specific antibody and immunoprecipitate__3. Sequence one end of each fragment__4. Get coverage!
|
|
|
Describe the RNA-seq protocol
|
1. Select RNA of interest (e.g. mRNA)__2. Fragment and reverse-transcribe to dsDNA__3. Size select, denature to ss-cDNA__4. Sequence n bases from one/both ends of fragments (n ~ 50-100)
|
|
|
How to map RNA reads to transcripts?
|
De novo assembly(???)__Alignment + gene model assembly (map to DNA)__Transcriptome alignment (map to RNA)
|
|
|
Explain how de Bruijn graphs work (VERY POSSIBLE EXAM QUESTION)
|
asf |
|
|
What is a kmer?
|
A substring of a read
|
|
|
Explain how to choose k in RNA-seq assembly
|
asf |
|
|
How do SNPs/errors appear in Bruijn graphs?
|
Bubbles! Take path with highest coverage.
|
|
|
Explain differences between genome and transcriptome alignment
|
G: Detection of novel genes. Spliced alignment is tricky. Insert sizes harder to interpret.__T: No need for spliced alignment. Simplifies read counting for each isoform. Simplifies discrimination between mappings using insert sizes. Novel genes go undetected.
|
|
|
What is TopHat? Describe workflow.
|
RNA-seq aligner.
|
|
|
What is a gene model?
|
asf |
|
|
How does Cufflinks work?
|
asf |
|
|
RNA-Skim, kallisto, Sailfish -- hash-table aligners
|
asf |
|
|
Describe briefly how to filter alignments
|
Pick part SNP / variant with best coverage__Multiply matching sequences with outlying insert lengths__Take out repeats (sole peaks)
|
|
|
Why are isoforms interesting?
|
They give increased resolution of RNA-sequence (the variants)__We have two versions of each isoform sequence in diploid orgs
|
|
|
What is a Poisson distribution?
|
Independent events occuring at given rate__Mean=Var=r8, pets_r8=dogs_r8+cats_r8
|
|
|
What are three determining factor for how many reads align to a transcript?
|
Total nr reads__Length of transcript__Abundance of transcript
|
|
|
What is a formula for estimated gene reads?
|
r_g = Poisson(b mu_g l_g)__mu_g: concentration, l_g effective length, b: norm
|
|
|
What are some problems with Poisson models?
|
Gene length ambigous due to isoforms__Cross-linkage__We don't get reads of isoforms
|
|
|
Explain the multinomial distribution
|
>2 categories. Good when there are multiple SNPs and isoforms.
|
|
|
MMSEQ
|
asf |
|
|
Transcript amalgation
|
asf |
|
|
What is gene imprinting?
|
Genes are expressed in a parent-of-origin specific manner__Gene from father imprinted -- only mother version expressed
|
|
|
What are the aims of read count normalisation?
|
Comparable across features (e.g. genes)__Comparable across samples__Human-friendly scale
|
|
|
What is RPKM normalisation?
|
Set k_ig such that estimates of mu_ig are comparable between genes and samples.__muhat_ig = r_ig / k_ig
|
|
|
From Binomial to Poisson
|
awff |
|
|
TMM normalisation
|
asf |
|
|
Median log deviation normalisation
|
asf |
|
|
What is a negative binomial distribution? When do we use it?
|
When rate of Poisson dist not fixed, but varies according to a gamma dist.__Variance is greater than mean.
|
|
|
How does ChIP-seq work?
|
1.Isolate chromatin 2. Cross-link and fragment 3. Antibodies, precipitate 4.__Reverse cross-links, purify DNA 5. Ligate adaports, sequence
|
|
|
What is ChIP-seq interested in?
|
Where is the protein bound? 90 % background!__Where are proteins bound differentially?__What do the sites mean biologically?
|
|
|
Describe the ChIP-seq workflow
|
asf |
|
|
Why use a control track in ChIP-seq?
|
Get background. See tissue anomalies. Open chromatin???. Experimental / technical biases. Background distribution irregular.
|
|
|
Name three types of controls in ChIP-seq
|
Input__Vehicle__Non-specific antibody
|
|
|
Explain good experimental design in ChIP-seq
|
Technical replicates: multiple lanes per flowcell__Biological: patient samples, model organisms__Experimental: repeat procedures, (different antibody)
|
|
|
Name the important ChIP-seq parameters in expermiental design
|
Single end / paired end, read length (50bp), read depth (20-30M), batches and randomisation, multiplexing (one pool is optimal)
|
|
|
What are blacklists for?
|
Duplicates etc.
|
|
|
How would you perform quality assesment in ChIP-seq?
|
Coverage histogram__Fragment length estimation (cross-correlation, cross coverage, normalised__score)
|
|
|
How do you call peaks?
|
Identify maxima. Take region around it.
|
|
|
Name three peak-based metrics
|
1. Reads in peaks (reads that overlap peaks, dist of read density)__2. Peak profiles (mean density at each position relative to summit)__3. Clustering and PCA
|
|
|
What are the two types of differential binding analysis?
|
Overlap: peak/site occupancy__Quantitative: binding _affinity___Binding site count density__Binding profile__Sliding windows
|
|
|
What are consensus peaks?
|
Peaks found in several samples from the same tissue/experiment
|
|
|
Explain the DiffBind workflow
|
1. Read in peaksets__2. Determine occupancy__3. Count reads__4. DBA__5. Plot and report, then re-evaluate.
|
|
|
Name plotting tools used in DBA
|
MA plots__Heatmaps / Clustering / PCA__Boxplots. Peak abundance / density.
|
|
|
Name as man yas you can of the 7 regulatory elements:
|
TFs, Histone mods, Nucleosome pos, chromatin domains, Polycomb group (PcG) proteins, DNA methylation, non-coding RNA.
|
|
|
Name four types of TFs
|
Master regulators, General TFs, Pioneer factors, Tissue specific factors
|
|
|
What are the three classes of regulatory elements?
|
Core promotes__Enhancers & super enhancers__Locus control regions__Silencers__Insulators
|
|
|
Core promoters
|
asf |
|
|
Enhancers & super enhancers
|
asf |
|
|
Locus control regions
|
asf |
|
|
Silencers & insulators
|
asf |
|
|
How to identify regulatory elements?
|
Motif analysis__Sequence binding sites
|
|
|
MEME
|
MEME-ChIP
|
HOMER
|
|
What is the difference between epigenetics and epigenomics?
|
genetics: gene expression__genomics: overall chromatin state of full genome; organism has single genome, many epigenomes
|
|
|
What is a histone?
|
What the DNA coils around -> gives chromatin
|
|
|
How can histone be modified?
|
5 common ways -- see slide
|
|
|
What are chromatin marks?
|
af |
|
|
Chromatin segmentation
|
asf |
|
|
WHAT ARE CHROMATIN MARKS?
|
asf |
|
|
Explain how Hi-C works
|
asf |
|
|
What are some Hi-C technical and biological biases?
|
awf |
|
|
A-B compartments
|
af |
|
|
How to detect chromatin accessibility?
|
awf |
|
|
ATAC-seq
|
asf |
|
|
What is a pathway?
|
Series of conseq interactions that give rise to a certain state
|
|
|
What are the three types of pathways?
|
Signalling__Genetic / transcriptional / regulatory__Metabolic (ana / cata / energy transport)
|
|
|
How would you determine enrichment of gene lists?
|
Fisher's / hypergeom__Chi**2__Binomial
|
|
|
How would you determine enrichment of ranked lists?
|
Kolmogorov-Smirnov__Minimum hypergenometric test__Wilcoxon rank sum
|
|
|
Where do the gene lists in PWA come from?
|
omics data: RNA / ChIP / proteomics
|
|
|
What is Gene Ontology?
|
Contains information on cellular component, molecular function, biological__process
|
|
|
What are some causes of SNPs?
|
Replication errors__Repair error__Mutagens__Spontaneous
|
|
|
What are some causes of INDELs?
|
Strand slippage__Aberrant repair__Retrotransposons
|
|
|
What are some causes of Structural Variations?
|
Replication errors__Retrotransposition__Repair errors__Recombination errors
|
|
|
What is a somatic variation?
|
Mutations acquired post conception__SNVs, not SNPs__CNAs, not CNVs__INDELs__SVs
|
|
|
What is the difference between SNVs/SNPs and CNAs/CNVs?
|
???
|
|
|
Explain the copy number calling workflow
|
Quantify signal (depth / intensity)__Segment chromosome (HMM / smoothing)__Call changes (threshold, cluster, prob. models)
|
|
|
B-allele frequency
|
af |
|
|
BAF-banding
|
fass |
|
|
Some cancer types are mutation driven, some CNA driven
|
as |
|