2023-04-26
Presentation by Scott Adamson
Title: Experimental parameters and their impact on sQTL detection.
Excerpts from the talk.
Functional interpretation of GWAS:
- Colocalization between datasets
- Integration of functional data (Epigenetic, splicing)
- Machine learning predictions
- Experimental follow-up
How does a variant impact splicing? Splicing (dys)regulation is important for Mendelian and polygenic phenotypes.
- 10% of human disease associated exonic alleles cause aberrant splicing (Soemedi R, Nat Genet 2017)
- Changes in splicing are likely causes of 35% of Mendelian diseases without a diagnosis from exome or genome sequencing (Cummings BB, Sci Transl Med 2017)
- sQTLs can contribute to GWAS signal more than eQTLs (asthma, breast cancer, heart rate, height, etc.). (Garrido-Martin D, Nat Commun 2021)
Scott is trying to develop a new assay, engineering variants on high throughput and see their effect on splicing. There are several parameters that impact sQTL detection.
- Read parameters
- Read length
- Paired-end vs single-end
- Single cell specific prameters
- Editing efficiency
- Number of cells
- Number of cells with sQTL variant
- sQTL effect size
- Junction parameters
- Number of reads per cell
Downsample or trim reads and measure the no of junctions. Dataset has 150nt paired end reads sequenced to ~150m.
Number of junction spanning reads or junctions detected – Scott is looking at this parameters. Yield per base parie - 0.0015 junctions / bp, Paired-end rate is almost twice that of single-end rate.
Population scale scRNA-seq enables discovery of cell type specific eQTLs. They are shared (and not) with bulk eQTLs. See Yazar et al, Science, 2022. OneK1K.org: Data of 1.27 million peripheral blood mononuclear cells (PMBCs) collected from 982 donors. I think cis-eQTLs will be shared over cell types.
scRNA-seq sQTL simulator
- Input parameters config files fed
- Editing and sQTL workflows
- Implemented in python with pytorch, pyro, tensorQTL
- Optional cell type specific expression and proportions
- Test for sQTL with guide only or with genotype
Identifying and implementing reasonable default parameters is not simple.
- Read count parameters: targeted vs whole transcriptome, implemented as technology specific expression quantiles.
- Transcript-end read bias: Unsure how to implement.
Scott is using targeted gene expression (10x ECCITE data, Julia Domingo) for estimating read count parameters. For normal scRNA-seq, he is using a Snakemake pipeline to proces large number of scRNA-seq data.
How to estimate adequate base editing efficiency?