2023-06-07

Multi-intron splicing order prediction using deep learning

Presentation by Sei Chang

Pre-mRNA splicing order is predetermined. Most genes spliced in one or few predominant orders. Preserved across different cell types and stages of neuron differentiation.

  • Direct RNA nanopore sequencing of chromatin-associated polyA-selected RNA from human K562 cells.
  • Sequencing of differentiated spinal motor neurons at three time points (D4, D9, D14).
  • Focus on post-transcriptional splicing, widespread in human cells.

Given: counts of partially spliced reads describing splicing status of each intron. Task: Apply deep learning on pre-mRNA sequence to find strongest determinants for predicting likeliest splicing order.

Problem Setup

Consider an exon flanked by introns A and B, both of which will be removed during the precursor mRNA splicing. It can occur in two ways:

  1. AB \(\rightarrow\) B \(\rightarrow\) 0
  2. AB \(\rightarrow\) A \(\rightarrow\) 0

Assume the counts to be given by \(y\) and the rates to be given by \(r\). Then, at equilibrium,

\[ y_{\textrm{AB}}\ r_{\textrm{AB}\rightarrow \textrm{B}} - y_{\textrm{B}}\ r_{\textrm{B}\rightarrow 0} = 0 \] \[ \frac{y_\textrm{B}}{y_{\textrm{AB}}} = \frac{r_{\textrm{AB}\rightarrow \textrm{B}}}{r_{\textrm{B}\rightarrow 0}} \]

Similarly,

\[ y_{\textrm{AB}}\ r_{\textrm{AB}\rightarrow \textrm{A}} - y_{\textrm{A}}\ r_{\textrm{A}\rightarrow 0} = 0 \] \[ \frac{y_\textrm{A}}{y_{\textrm{AB}}} = \frac{r_{\textrm{AB}\rightarrow \textrm{A}}}{r_{\textrm{A}\rightarrow 0}} \]

  • Input: encoded genomic sequence
  • Output: intron counts / pairwise ratios

Model Architecture: Convolutional LSTM

  • Sequence of time distributed convolutional layer + LSTM.
  • Stack multiple convLSTM.

Insplico: quantify intron splicing order.

AISO: Adjacent Intron Splicing Order.

  • Input: mapped short or long-read RNA-Seq.
  • Output: intron read count and other statistics.