2023-08-09

Single-cell RNA-seq Methods

Presentation by Sophia

scRNA-seq can provide a more nuanced understanding of splicing across cell states and types.

Exon Inclusion vs Junction Usage to measure splicing

Exon inclusion identifies presence or absence of an exon in a given splicing event. Junction usage split reads used to identify intron/exon boundaries.

Exon inclusion may have more reads associated but also requires splicing annotation.
PSI: percent spliced in. Metric used to measure either exon inclusion or junction usage.

Overview of challenges and motivations

Challenges

Limited coverage for full transcript (3’ bias) within high throughput sequencing (10x) \(\rightarrow\) alternate scRNA-seq approaches, e.g. Smart-seq, SPLiT-seq, long read methods. Smart-seq provides better full transcript coverage (but lower throughput). SPLiT-seq has higher throughput (but lower coverage than Smart-seq).
Sparsity. Small percentage of junctions and clusters with reads in a given single cell (total junctions = 38,802 and clusters = 11,585). Less than 4% clusters and less than 1% junctions are observed.
Differential splicing across cells.

Motivations

UMAP can capture the same clustering information from splicing data (compared to that from gene expression).

Data

Tabula Muris (Tabula Sapiens)
ZooBrain (Satija Lab)
PBMC (ParseBio)

Methods

BRIE2. Read counts are measured for two distinct isoforms in a given splice event (plus ambiguous counts). PSI is learned for splice events across cell types. Bayesian approach allows for some knowledge of relative uncertainty of differential splicing. Drawbacks: only considers 2 isoform events, only skipped exon events made it through the pipeline, unclear exactly what sort of splicing patterns are being detected.
Psix. Foreground model, exon inclusion should be similar to other nearby cells. Distance measured within low-dimension projeciton. Background model, exon inclusion reflects global average. Benefits: does not require pre-defined cell states / types, gives a continuous picture of splicing. Drawbacks: only considers skipped exon events, may need to consider gene expression bias, not the easiest to run due to many involved input files required.
SpliZ. Junctions are considered if 3’ site has multiple associated 5’ sites. Junctions are ranked and averaged by distance between 3’ site per cell. SpliZ scores indicate a cell’s deviance from average junction distance. Benefits: Can model more than just SE events. Only needs to perform one test per gene. Drawbacks: Needs at least 2 junctions to consider a splice event. Finding a most commonly used junction by length excludes a lot of information about the cell’s splicing profile.

Overall challenges

Large knowledge gap between scores/information outputted by the models and actual splicing profile.
Many methods require splicing annotations, limiting discovery of novel splice sites.
Many methods require pre-defined cell types / states.
Limited splicing event type capture.