2023, May

2023-05-24

  • Group meeting, listen to talks from “The Biology of Genomes”, a conference at CSHL from May9 - May13, 2023. We listened to two talks:
  1. “Discovering stimulatory state specific T2D GWAS mechanisms with single cell multi-omics on iPSC-derived FAP villages” by Christa Ventresca from the Stephen Parker lab,
  2. “DragoNNFruit—Learning cis- and trans-regulation of chromatin accessibility at single base and single cell resolution” by Jacob Schreiber from the Kundaje lab.
  • Start a Latex document to write the Frank-Wolfe algorithm for nuclear norm matrix factorization.

2023-05-23

  • Python implementation for convex optimization using Frank-Wolfe algorithm.
  • I observed that the linear optimization problem using \(K = 40\) principal components in the first step of the FW algorithm retains the structure of the matrix, but this is wrong!

2023-05-22

2023-04-16

  • Clean notebook files, set up workflow for publishing NPD notebooks.
  • Read more about conventional PCA and weighted PCA.

2023-04-15

  • Implement Delchambre’s weighted PCA.
  • I observed hidden structures for the summary statistics in conventional PCA, but I did not observe structures in weighted PCA. Why?

2023-04-12

  • Implement preprocessing of \(\beta\) and SE matrix in Python. Required for playing with PCA and understanding the structure of the data.

2023-04-11

  • Meeting with David. We discussed data cleaning and its effect on weighted PCA. See Slack for details.
  • Probgen reading club, presented by Scott Adamson. Chapter 8 from Population Genetics book by Graham Coop.

2023-04-10

2023-04-09

  • GradVI manuscript - reorganize methods and introduction.

2023-04-08

  • NPD Summary Statistics. Look at weighted PCA after data cleaning. It appears that there is not much distinguishable from the principal components.

2023-04-05

  • Last date for NYGC onboarding. Finish courses and benefits election.
  • Set up Macbook Pro for work.

2023-04-04

  • Make changes and create pull request on DSC for supporting collections.abc and rpy2 v3.5.11.
  • Numerical experiments for dense linear regression for correlated variables.
  • Plot results (ELBO, RMSE, runtime) for dense linear regression for correlated variables.
  • Run more replicates for the above simulation.
  • Seminar by Julia Domingo

2023-04-03

  • GradVI plots without colors in scatterplot.
  • GradVI runtime-per-iteration plot with \(\log_{10}\) scale.
  • Meeting with Matthew.

2023-04-02

  • Run genlasso with large \(n\) for one simulation at the NYGC cluster.
  • Onboarding - Health insurance.
  • Genlasso running into out-of-memory error.
  • Make changes in GradVI to run without creating the \(\mathbb{H}\) matrix.

2023-04-01

  • The GradVI trendfiltering runtime experiments have failed. Troubleshoot: 40G memory is not enough for the jobs. I removed jobs with \(n = (10^5, 10^6)\) and submitted new jobs with 100G of memory in interactive node. The idea is to run the large jobs separately for GradVI.
  • Simulate first order trend filtering data with less memory.
  • How to run GradVI without generating \(\mathbf{H}\) matrix? I derived equations for obtaining \(d_j\) without generating the \(\mathbf{H}\) matrix and made the required changes in the GradVI software to run without generating the matrix. This introduced a bug which prevents running trend filtering for higher orders.