2023-04-26

Presentation by Scott Adamson

Title: Experimental parameters and their impact on sQTL detection.

Excerpts from the talk.

Functional interpretation of GWAS:

  • Colocalization between datasets
  • Integration of functional data (Epigenetic, splicing)
  • Machine learning predictions
  • Experimental follow-up

How does a variant impact splicing? Splicing (dys)regulation is important for Mendelian and polygenic phenotypes.

  • 10% of human disease associated exonic alleles cause aberrant splicing (Soemedi R, Nat Genet 2017)
  • Changes in splicing are likely causes of 35% of Mendelian diseases without a diagnosis from exome or genome sequencing (Cummings BB, Sci Transl Med 2017)
  • sQTLs can contribute to GWAS signal more than eQTLs (asthma, breast cancer, heart rate, height, etc.). (Garrido-Martin D, Nat Commun 2021)

Scott is trying to develop a new assay, engineering variants on high throughput and see their effect on splicing. There are several parameters that impact sQTL detection.

  • Read parameters
    • Read length
    • Paired-end vs single-end
  • Single cell specific prameters
    • Editing efficiency
    • Number of cells
    • Number of cells with sQTL variant
    • sQTL effect size
    • Junction parameters
    • Number of reads per cell

Downsample or trim reads and measure the no of junctions. Dataset has 150nt paired end reads sequenced to ~150m.

Number of junction spanning reads or junctions detected – Scott is looking at this parameters. Yield per base parie - 0.0015 junctions / bp, Paired-end rate is almost twice that of single-end rate.

Population scale scRNA-seq enables discovery of cell type specific eQTLs. They are shared (and not) with bulk eQTLs. See Yazar et al, Science, 2022. OneK1K.org: Data of 1.27 million peripheral blood mononuclear cells (PMBCs) collected from 982 donors. I think cis-eQTLs will be shared over cell types.

scRNA-seq sQTL simulator

  • Input parameters config files fed
  • Editing and sQTL workflows
  • Implemented in python with pytorch, pyro, tensorQTL
  • Optional cell type specific expression and proportions
  • Test for sQTL with guide only or with genotype

Identifying and implementing reasonable default parameters is not simple.

  • Read count parameters: targeted vs whole transcriptome, implemented as technology specific expression quantiles.
  • Transcript-end read bias: Unsure how to implement.

Scott is using targeted gene expression (10x ECCITE data, Julia Domingo) for estimating read count parameters. For normal scRNA-seq, he is using a Snakemake pipeline to proces large number of scRNA-seq data.

How to estimate adequate base editing efficiency?