2023-08-09

Single-cell RNA-seq Methods

Presentation by Sophia

scRNA-seq can provide a more nuanced understanding of splicing across cell states and types.

Exon Inclusion vs Junction Usage to measure splicing

Exon inclusion identifies presence or absence of an exon in a given splicing event. Junction usage split reads used to identify intron/exon boundaries.

  • Exon inclusion may have more reads associated but also requires splicing annotation.
  • PSI: percent spliced in. Metric used to measure either exon inclusion or junction usage.

Overview of challenges and motivations

Challenges

  • Limited coverage for full transcript (3’ bias) within high throughput sequencing (10x) \(\rightarrow\) alternate scRNA-seq approaches, e.g. Smart-seq, SPLiT-seq, long read methods. Smart-seq provides better full transcript coverage (but lower throughput). SPLiT-seq has higher throughput (but lower coverage than Smart-seq).
  • Sparsity. Small percentage of junctions and clusters with reads in a given single cell (total junctions = 38,802 and clusters = 11,585). Less than 4% clusters and less than 1% junctions are observed.
  • Differential splicing across cells.

Motivations

UMAP can capture the same clustering information from splicing data (compared to that from gene expression).

Data

  • Tabula Muris (Tabula Sapiens)
  • ZooBrain (Satija Lab)
  • PBMC (ParseBio)

Methods

  • BRIE2. Read counts are measured for two distinct isoforms in a given splice event (plus ambiguous counts). PSI is learned for splice events across cell types. Bayesian approach allows for some knowledge of relative uncertainty of differential splicing. Drawbacks: only considers 2 isoform events, only skipped exon events made it through the pipeline, unclear exactly what sort of splicing patterns are being detected.
  • Psix. Foreground model, exon inclusion should be similar to other nearby cells. Distance measured within low-dimension projeciton. Background model, exon inclusion reflects global average. Benefits: does not require pre-defined cell states / types, gives a continuous picture of splicing. Drawbacks: only considers skipped exon events, may need to consider gene expression bias, not the easiest to run due to many involved input files required.
  • SpliZ. Junctions are considered if 3’ site has multiple associated 5’ sites. Junctions are ranked and averaged by distance between 3’ site per cell. SpliZ scores indicate a cell’s deviance from average junction distance. Benefits: Can model more than just SE events. Only needs to perform one test per gene. Drawbacks: Needs at least 2 junctions to consider a splice event. Finding a most commonly used junction by length excludes a lot of information about the cell’s splicing profile.

Overall challenges

  • Large knowledge gap between scores/information outputted by the models and actual splicing profile.
  • Many methods require splicing annotations, limiting discovery of novel splice sites.
  • Many methods require pre-defined cell types / states.
  • Limited splicing event type capture.