• Shuffle
    Toggle On
    Toggle Off
  • Alphabetize
    Toggle On
    Toggle Off
  • Front First
    Toggle On
    Toggle Off
  • Both Sides
    Toggle On
    Toggle Off
  • Read
    Toggle On
    Toggle Off
Reading...
Front

Card Range To Study

through

image

Play button

image

Play button

image

Progress

1/114

Click to flip

Use LEFT and RIGHT arrow keys to navigate between flashcards;

Use UP and DOWN arrow keys to flip the card;

H to show hint;

A reads text to speech;

114 Cards in this Set

  • Front
  • Back
  • 3rd side (hint)
What is functional genomics?
The study of how things with different genotypes can have different functions, i.e. phenotypes!
How big the genome?
3Gb. 6 meters in each cell!
What different types of proteins are there.
Structural, regulatory__Housekeeping (general), Luxury (specific)
Why might RNA not be a good proxy for proteins?

asf

What are the 4 analyses you can do with RNA-seq?
Expression levels, DE, patterns in expression, splicing and isoforms
What is cDNA?
DNA strands produced from RNA via revere transcriptase.
Name 2 ways to measure RNA
Microarrays__RNA-seq
Explain how microarrays work
Probes with solid support.__Hybridise labeled samples of mRNA to these.__Shoot with lasers.
Explain how RNA-seq works

asd

Name elements in transcriptional regulation
Gene/protein interactions__DNA methylation__Chromatin (epigenetic structure)__microRNA__Gene expression
Explain 2-color microarrays
Experimental culture (red) + control culture (green)__Measure colour
Explain 1-color microarrays and give examples of types
BeadArray: Beads coated with copies of probe__23 base address linked to 50-base gene specific probe__30 copies of each bead type per array: 47k genes => 1.3M probes
Explain the general workflow when working with microarrays

asf

Explain the three cornerstones of experimental design

asd

How to detect outliers?
log-2 normalisation__Positive/negative controls__Number of deteceted probes per sample__Signal distribution__Clustering / PCA
Explain how control probes work
Positive: Houskeeping genes -- should be there!__Negative: Things that should not be there!__Negative also used to determine background.
What do MA-plots show?
log2 fraction vs. mean of logs. Should be straight.
Name some normalisation techniques
Simple scaling, LOESS, Quantile, robust spline
How does quantile normalisation work?

asf

What can be done to determine normalisation?
MA-plot / PCA__Search for patterns in data
Describe the RNA-seq workflow
Align to sequence (genome/transcriptome), get annotation, compare with databases
Describe why and how to filter probes
p = 0.01 => lots of finds! 50k probes -> 500 false pos.__Remove probes not present in all samples.__Use multiple probes per gene
Describe the three downstream analsysis categories
Class discovery: clustering, PCA__Class comparison: groups known, genes / pathways associated?__Class prediction: SVM, survival
What is the cross-hybridisation problem and what does it limit?
In microarrays, dna tags to the wrong probes.__Sensitivity, range, probe coverage
Benefits of RNA-seq
Can get isoform data -- not possible with microarray__Gives absolute abundance
Benefit of microarrays
Faster, easier, more mature analysis
How does sequencing work?
Reversible terminators__Sequencing by ligation__Nanopores
Describe the Solexa sequencing structure
15 steps or so
What are some modes of sequencing?
Paired-end: each end aligned separately__Multiplexing__Capture sequencing
What are some benefits of paired end sequencing?
Disambiguate non-uniquely mapped reads__Detect isoforms__Detect duplications, inversions, chromosomal rearrangements__Calculation of distribution of insert sizes
Describe multiplexing (probably not in it)
Look it up
Describe capture sequncing
Magnetic strip to specific sequences__Flush away rest.
Discuss problems with sequence alignment
Filtered / unfiltered (???)__Unique / non-unique__Duplicates (amplification bias)__Mismatches / indels__Adapters and index sequence
Sequencing trends (probably not)

sa

What questions are RNA and ChIP seq answering?
RNA: 1) WHAT is the sequence? 2) WHAT is he concentration?__ChIP: 1) WHERE does it bind? 2) HOW MUCH is bound?
Why is RNA-seq imperfect?
Data consists of 1-2 sequences per fragment__Base call qualities for each base in each read varies__RNA-data is meta-data. Read -> cDNA library__Reference genome rarely sample genome, SNPs etc, indels, structural variants__Reads prone to error (1/1000)
Describe the ChIP-seq protocol
1. Crosslink and shear__2. Add protein specific antibody and immunoprecipitate__3. Sequence one end of each fragment__4. Get coverage!
Describe the RNA-seq protocol
1. Select RNA of interest (e.g. mRNA)__2. Fragment and reverse-transcribe to dsDNA__3. Size select, denature to ss-cDNA__4. Sequence n bases from one/both ends of fragments (n ~ 50-100)
How to map RNA reads to transcripts?
De novo assembly(???)__Alignment + gene model assembly (map to DNA)__Transcriptome alignment (map to RNA)
Explain how de Bruijn graphs work (VERY POSSIBLE EXAM QUESTION)

asf

What is a kmer?
A substring of a read
Explain how to choose k in RNA-seq assembly

asf

How do SNPs/errors appear in Bruijn graphs?
Bubbles! Take path with highest coverage.
Explain differences between genome and transcriptome alignment
G: Detection of novel genes. Spliced alignment is tricky. Insert sizes harder to interpret.__T: No need for spliced alignment. Simplifies read counting for each isoform. Simplifies discrimination between mappings using insert sizes. Novel genes go undetected.
What is TopHat? Describe workflow.
RNA-seq aligner.
What is a gene model?

asf

How does Cufflinks work?

asf

RNA-Skim, kallisto, Sailfish -- hash-table aligners

asf

Describe briefly how to filter alignments
Pick part SNP / variant with best coverage__Multiply matching sequences with outlying insert lengths__Take out repeats (sole peaks)
Why are isoforms interesting?
They give increased resolution of RNA-sequence (the variants)__We have two versions of each isoform sequence in diploid orgs
What is a Poisson distribution?
Independent events occuring at given rate__Mean=Var=r8, pets_r8=dogs_r8+cats_r8
What are three determining factor for how many reads align to a transcript?
Total nr reads__Length of transcript__Abundance of transcript
What is a formula for estimated gene reads?
r_g = Poisson(b mu_g l_g)__mu_g: concentration, l_g effective length, b: norm
What are some problems with Poisson models?
Gene length ambigous due to isoforms__Cross-linkage__We don't get reads of isoforms
Explain the multinomial distribution
>2 categories. Good when there are multiple SNPs and isoforms.
MMSEQ

asf

Transcript amalgation

asf

What is gene imprinting?
Genes are expressed in a parent-of-origin specific manner__Gene from father imprinted -- only mother version expressed
What are the aims of read count normalisation?
Comparable across features (e.g. genes)__Comparable across samples__Human-friendly scale
What is RPKM normalisation?
Set k_ig such that estimates of mu_ig are comparable between genes and samples.__muhat_ig = r_ig / k_ig
From Binomial to Poisson

awff

TMM normalisation

asf

Median log deviation normalisation

asf

What is a negative binomial distribution? When do we use it?
When rate of Poisson dist not fixed, but varies according to a gamma dist.__Variance is greater than mean.
How does ChIP-seq work?
1.Isolate chromatin 2. Cross-link and fragment 3. Antibodies, precipitate 4.__Reverse cross-links, purify DNA 5. Ligate adaports, sequence
What is ChIP-seq interested in?
Where is the protein bound? 90 % background!__Where are proteins bound differentially?__What do the sites mean biologically?
Describe the ChIP-seq workflow

asf

Why use a control track in ChIP-seq?
Get background. See tissue anomalies. Open chromatin???. Experimental / technical biases. Background distribution irregular.
Name three types of controls in ChIP-seq
Input__Vehicle__Non-specific antibody
Explain good experimental design in ChIP-seq
Technical replicates: multiple lanes per flowcell__Biological: patient samples, model organisms__Experimental: repeat procedures, (different antibody)
Name the important ChIP-seq parameters in expermiental design
Single end / paired end, read length (50bp), read depth (20-30M), batches and randomisation, multiplexing (one pool is optimal)
What are blacklists for?
Duplicates etc.
How would you perform quality assesment in ChIP-seq?
Coverage histogram__Fragment length estimation (cross-correlation, cross coverage, normalised__score)
How do you call peaks?
Identify maxima. Take region around it.
Name three peak-based metrics
1. Reads in peaks (reads that overlap peaks, dist of read density)__2. Peak profiles (mean density at each position relative to summit)__3. Clustering and PCA
What are the two types of differential binding analysis?
Overlap: peak/site occupancy__Quantitative: binding _affinity___Binding site count density__Binding profile__Sliding windows
What are consensus peaks?
Peaks found in several samples from the same tissue/experiment
Explain the DiffBind workflow
1. Read in peaksets__2. Determine occupancy__3. Count reads__4. DBA__5. Plot and report, then re-evaluate.
Name plotting tools used in DBA
MA plots__Heatmaps / Clustering / PCA__Boxplots. Peak abundance / density.
Name as man yas you can of the 7 regulatory elements:
TFs, Histone mods, Nucleosome pos, chromatin domains, Polycomb group (PcG) proteins, DNA methylation, non-coding RNA.
Name four types of TFs
Master regulators, General TFs, Pioneer factors, Tissue specific factors
What are the three classes of regulatory elements?
Core promotes__Enhancers & super enhancers__Locus control regions__Silencers__Insulators
Core promoters

asf

Enhancers & super enhancers

asf

Locus control regions

asf

Silencers & insulators

asf

How to identify regulatory elements?
Motif analysis__Sequence binding sites
MEME
MEME-ChIP
HOMER
What is the difference between epigenetics and epigenomics?
genetics: gene expression__genomics: overall chromatin state of full genome; organism has single genome, many epigenomes
What is a histone?
What the DNA coils around -> gives chromatin
How can histone be modified?
5 common ways -- see slide
What are chromatin marks?

af

Chromatin segmentation

asf

WHAT ARE CHROMATIN MARKS?

asf

Explain how Hi-C works

asf

What are some Hi-C technical and biological biases?

awf

A-B compartments

af

How to detect chromatin accessibility?

awf

ATAC-seq

asf

What is a pathway?
Series of conseq interactions that give rise to a certain state
What are the three types of pathways?
Signalling__Genetic / transcriptional / regulatory__Metabolic (ana / cata / energy transport)
How would you determine enrichment of gene lists?
Fisher's / hypergeom__Chi**2__Binomial
How would you determine enrichment of ranked lists?
Kolmogorov-Smirnov__Minimum hypergenometric test__Wilcoxon rank sum
Where do the gene lists in PWA come from?
omics data: RNA / ChIP / proteomics
What is Gene Ontology?
Contains information on cellular component, molecular function, biological__process
What are some causes of SNPs?
Replication errors__Repair error__Mutagens__Spontaneous
What are some causes of INDELs?
Strand slippage__Aberrant repair__Retrotransposons
What are some causes of Structural Variations?
Replication errors__Retrotransposition__Repair errors__Recombination errors
What is a somatic variation?
Mutations acquired post conception__SNVs, not SNPs__CNAs, not CNVs__INDELs__SVs
What is the difference between SNVs/SNPs and CNAs/CNVs?
???
Explain the copy number calling workflow
Quantify signal (depth / intensity)__Segment chromosome (HMM / smoothing)__Call changes (threshold, cluster, prob. models)
B-allele frequency

af

BAF-banding

fass

Some cancer types are mutation driven, some CNA driven

as