2023, April

2023-04-28

Tried and failed to install editable version (pip install -e .) of Python packages with Fortran dependencies.
After installing all packages using conda, I ran pip install --upgrade --force-reinstall which erased everything from the conda environment and reinstalled random packages with pip. I have to reinstall the environment. For the future, I have to find how to get an interactive mode for pip install.

GradVI. Scatterplot comparing number of iterations, total runtime and ELBO between (a) GradVI compound and CAVI, and (b) GradVI direct and CAVI) for strongly correlated (\(r^2 = 0.99\)) covariates in case of multiple linear regression. Create new DSC simulation to calculate runtime of trendfiltering in higher dimension.
Meeting with Matthew. Discuss Stochastic Gradient Descent (SGD) and natural gradients. Discuss list of things to be done before the manuscript can be reviewed by Matthew.

Meeting with Ekene and Gilead. They are working on improving SparsePro using gradient descent optimization for estimating the coefficients \(w\) for functional annotations.
NPD. Fix errors in extraction of summary statistics of selected variants from PGC. Clean the R code for reading all the data.
RStudio server is not showing plots

Journal Club. Anjali discussed the article, “Adjusting for common variant polygenic scores improves yield in rare variant association analyses”, Jurgens et al, Nature Genetics 55, 544–548 (2023). Link. Using 65 traits from the UK Biobank data, the authors showed that including PGS as a fixed-effect covariate in linear mixed model analysis (using GENESIS) improved the yield and discovery power for rare variant association signal. The authors constructed PGS for each trait based on two different methods: (1) lead-SNP, and (2) PRS-CS, which applies a Bayesian regression framework to identify posterior variant effect sizes based on a continuous shrinkage prior, which is directly learnt from the data
NPD. Tidy the selection of variants from GTEx and PGC summary statistics, extract the summary statistics of selected variants from PGC, OpenGWAS and GTEx.

Group meeting, presentation by Tatsuhiko Naito
ADSP meeting. Anjali discussed the p-value distributions from FST.
Setup RStudio at NYGC cluster.
NPD. David shared the pilot project for NPD. I looked at the data cleaning of the summary statistics.
Setup the NPD website.

GradVI. Simulation complete for linear regression with correlated variables, \(r^2 = 0.99\). Run dscquery for collecting results. Compare niter between GradVI (compound) and GradVI (direct) and mr.ash. Share figure on slack.
General Safety and Compliance Orientation

Setup navigation in Quarto journal.
Probgen Reading Club, presentation by Sei Chang
GradVI. Create new notebook for linear regression with correlated variables, find examples where GradVI converge faster than CAVI, set up DSC simulation pipeline for those examples, submit DSC job on interactive node at RCC, check jobs are running properly (no error in data generation, lasso initialization).

This is the beginning of my NYGC journal, on my third day here.