2023-08-02

Long read analysis pipeline predicts RBPs with isoform-specific functions

Presentation by Stella Park

Alternative splicing produces different mRNAs. RBPs bind pre-mRNA and regulate splicing.

  • Identify both annotated and unannotated RBP isoforms using ENCODE data.

SQANTI3: Transcriptome characterization, Isoform classification

60% of the isoforms are NIC (novel isoforms with a new combination of known splice sites). 34% of the isoforms are FSM (matches all splice junctions perfectly). Of the NIC isoforms, more than 40% have intron retention, and almost 40% are combination of annotated splice sites. Remaining are combination of annotated splice junctions.

CPAT: ORF prediction, compares ORF to annotated tranlation start site and uses logistic model for prediction. ~97% ORF sequences of annotated isoforms (Stella’s) match with GENCODE annotated isoforms.

NMD: Non-sense mediated decay. Frameshifts can give rise to PTCs (premature termination codons) on these undesired isoforms so that they will be dgraded by NMD

Domain Prediction: Localization and RRM

Nucimportv2: Localization Prediction. Prediction nuclear import: infer P(import = true | e), where e is the evidence for query protein, including cNLS scores, NLSdb matches, ppi interactions and SVM scores. cNLS - classical Nuclear Localization Signals. NLS is an amino acid sequence that ‘tags’ a protein for import into the cell nucleus by nuclear transport. ppi - protein-protein interaction.

Most TFs are predicted to be imported into nucleus. RBPs - Different isoforms of same gene have more variable import probability.

PFAM domain comparison

  • Gencode v39 protein coding transcripts
  • tappAS Gencode v39 annotation
  • List of desired domains

More than 75% of RNA binding domains are from the zinc finger family.

Regions of potential interest

  • PTBP1
  • RRM
  • Intrinsically disordered regions (IDRs)