About

This is a proof of concept implementation of quasi-Laplace approximation with Mr.Ash. The quasi-Laplace approximation introduces a regularizer on the likelihood and approximates the posterior with a Gaussian distribution. This approach can be used for generalized linear model with any link function. See the implementation for logistic model.

The variational treatment of Poisson model restricts the factorized distributions to have a Gaussian distribution. Instead, here we restrict the posterior to have a Gaussian distribution and the factorized distributions appear naturally.

Here, we compare the quasi-Laplace approach with "blackbox" PoissonRegressor from Python scikit-learn. We have set the regularization strength alpha to approximately 1.0 over number of samples.

We used the mean Poisson deviance to quantify the quality of classification, and the Gini index1 to quantify the sparsity of the coefficients.

#collapse_hide

import numpy as np
np.set_printoptions(precision = 4, suppress=True)
import pandas as pd

from scipy import optimize
from scipy import stats
from scipy.special import digamma
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_poisson_deviance
from sklearn.utils import gen_even_slices

import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib import ticker as plticker
from mpl_toolkits.axes_grid1 import make_axes_locatable

import sys
sys.path.append("../../utils/")
import mpl_stylesheet
mpl_stylesheet.banskt_presentation(fontfamily = 'latex-clearsans', fontsize = 18, colors = 'banskt', dpi = 72)

Generate toy data

Let us consider a Poisson model with sparse coefficients, so that the number of causal variables ncausal is much less than the number of variables nvar in the model. To ensure sparsity of the coefficients, we use a Gaussian mixture prior of nGcomp components with variances given by $\sigma_k^2$ (sigmak2) and unknown mixture coefficients. The sparsity of the initialization is controlled by the variable sparsity that specifies the mixture coefficient of the zero-th component $\mathcal{N}(0, 0)$ (or the delta function). We are particularly interested in systems where the number of samples nsample is less than nvar. Validation is performed on separate test dataset with ntest samples.

nsample = [50, 200]
nvar = [100, 100]
ncausal = 5
nGcomp = 10
sparsity = 0.95
prior_variance = 5
ntest = 2000

sigmak2 = np.array([prior_variance * np.square(np.power(2, (i)/nGcomp) - 1) for i in range(nGcomp)])

We use the exponential inverse link function $\theta_i = \exp\left(\mathbf{X}_i\boldsymbol{\beta}\right)$ to generate the response variable $y_i$ for $i = 1, \ldots, N$ samples using the Poisson probability distribution \begin{equation*} p\left(y_i \mid \mathbf{X}_i, \boldsymbol{\beta}\right) = \frac{\theta_i^{y_i}e^{-\theta_i}}{y_i!} \end{equation*} $\mathbf{X}$ is centered and scaled such that for each variable $j$, the variance is $\mathrm{var}(\mathbf{x}_j) = 1$.

# collapse-hide

def standardize(X):
    Xnorm = (X - np.mean(X, axis = 0)) 
    Xstd =  Xnorm / np.std(Xnorm, axis = 0)
    return Xstd

def poisson_data(X, beta):
    Xbeta = np.dot(X, beta)
    pred = np.exp(Xbeta)
    Y = np.random.poisson(pred)
    return Y

def gini(x):
    n = len(x)
    xabs = abs(x)
    xsort = xabs[np.argsort(xabs)]
    t1 = xsort / np.sum(xabs)
    t2 = (n - np.arange(n) - 1 + 0.5) / n
    gini = 1 - 2 * np.sum(t1 * t2)
    return gini

def simulate_data(nsample, nvar, ncausal, ntest):
    X = np.random.rand(nsample * nvar).reshape(nsample, nvar)
    X = standardize(X)
    Xtest = np.random.rand(ntest * nvar).reshape(ntest, nvar)
    Xtest = standardize(Xtest)

    beta = np.zeros(nvar)
    causal_idx = np.random.choice(nvar, ncausal, replace = False)
    beta[causal_idx] = np.random.normal(loc = 0., scale = 1., size = ncausal)

    Y = poisson_data(X, beta)
    Ytest = poisson_data(Xtest, beta)
    
    return X, Y, beta, Xtest, Ytest

Let us have a look at the generated data.

# collapse-hide

X =     [None for n in nsample]
Xtest = [None for n in nsample]
Y =     [None for n in nsample]
Ytest = [None for n in nsample]
beta =  [None for n in nsample]

for i, n in enumerate(nsample):
    X[i], Y[i], beta[i], Xtest[i], Ytest[i] \
        = simulate_data(n, nvar[i], ncausal, ntest)

print(f'There are {ncausal} non-zero coefficients with variance {np.var(beta[beta!=0]):.4f}')

fig = plt.figure(figsize = (12,12))
ax = [fig.add_subplot(221),
      fig.add_subplot(222),
      fig.add_subplot(223),
      fig.add_subplot(224)]

for i, n in enumerate(nsample):
    Xbeta = np.dot(X[i], beta[i])
    ax[i * 2].scatter(Xbeta, np.log(Y[i] + 1), s = 10)
    #ax[i * 2].scatter(np.exp(Xbeta), Y[i], s = 10)
    ax[i * 2 + 1].hist(Y[i])

    ax[i * 2].set_xlabel(r'$\sum_i X_{ni} \beta_i$')
    ax[i * 2].set_ylabel(r'$\log(Y_n + 1)$')
    ax[i * 2 + 1].set_xlabel(r'$Y_n$')
    ax[i * 2 + 1].set_ylabel('Number')
    
plt.tight_layout()
plt.show()
There are 5 non-zero coefficients with variance 0.0735

Implementation of quasi-Laplace with variational approximation

See Latex notes for details of the theory. Current implementation focus on readability and debug. Computational efficiency has been ignored.

Warning: ELBO decreasing with parameter update.

There must be an error in theory or implementation. Debug!

# collapse-hide

def regularized_log_likelihood(beta, X, Y, L):
    nvar = beta.shape[0]
    Xbeta = np.dot(X, beta)

    ## Function
    llb = np.sum(Y * Xbeta - np.exp(Xbeta))# - gammaln(Y+1))
    reg = - 0.5 * np.einsum('i,i->i', np.square(beta), L)
    loglik = llb + reg
    
    ## Gradient
    der = np.einsum('i,ij->j', Y, X) - np.einsum('ij, i->j', X, np.exp(Xbeta)) - np.multiply(beta, L)
    
    return -loglik, -der

def precisionLL(X, beta, L):
    nvar = X.shape[1]
    Xbeta = np.dot(X, beta)
    pred = np.exp(Xbeta)
    Sinv = np.einsum('i,ij,ik->jk', pred, X, X)
    Sinv[np.diag_indices(nvar)] += L
    return Sinv

def get_regularized_mode(X, Y, beta0, L):
    nsample, nvar = X.shape
    args  = X, Y, L
    gmode = optimize.minimize(regularized_log_likelihood,
                              beta0,
                              args=args,
                              method='L-BFGS-B',
                              jac=True,
                              bounds=None,
                              options={'maxiter': 20000000,
                                       'maxfun': 20000000,
                                       'ftol': 1e-9,
                                       'gtol': 1e-9
                                       #'disp': True
                                      })
    return gmode.x

def get_elbo_qL(alpha, mu, s2, pi, sigmak2, XPX, SinvM, M, L, logdetS, debug = False):
    Ebeta = np.sum(np.multiply(alpha, mu)[:, 1:], axis = 1)
    mu2 = np.square(mu)
    t1 = - 0.5 * np.dot(Ebeta.T, np.dot(XPX, Ebeta))
    t2 = np.dot(Ebeta.T, SinvM)
    t3 = 0
    t4 = 0
    for j in range(nvar):
        Ebeta2j = 0
        for k in range(nGcomp):
            mom2jk = mu2[j, k] + s2[j, k]
            Ebeta2j += alpha[j, k] * mom2jk
            t4jk = 0
            if alpha[j, k] != 0:
                t4jk += np.log(pi[k])
                t4jk += - np.log(alpha[j, k])
                if k > 0:
                    t4jk += 0.5 * np.log(s2[j, k])
                    t4jk += - 0.5 * np.log(sigmak2[k])
                    t4jk += - 0.5 * mom2jk / sigmak2[k]
                    t4jk += 0.5
                t4jk *= alpha[j, k]
            t4 += t4jk
        t3 += 0.5 * XPX[j, j] * (Ebeta[j] * Ebeta[j] - Ebeta2j)
    logdetL = np.sum(np.log(L))
    t0 = - 0.5 * (logdetL + logdetS + np.dot(M.T, SinvM))
    elbo = t0 + t1 + t2 + t3 + t4
    if debug:
        print(t0, t1 + t2, t3, t4)
    return elbo

#def get_elbo(alpha, mu, s2, pi, sigmak2, XPX, SinvM, M, L, logdetS):
def get_elbo(alpha, mu, s2, logpi, sigmak2, XPX, SinvM, M, L, logdetS):
    nvar, nGcomp = alpha.shape
    Ebeta = np.sum(np.multiply(alpha, mu)[:, 1:], axis = 1)
    mu2 = np.square(mu)
    logdetL = np.sum(np.log(L))
    t0 = - 0.5 * (logdetL + logdetS + np.dot(M.T, SinvM))
    t1 = - 0.5 * np.dot(Ebeta.T, np.dot(XPX, Ebeta))
    t2 = np.dot(Ebeta.T, SinvM)
    t3 = 0
    t4 = 0
    for j in range(nvar):
        Ebeta2j = 0
        for k in range(nGcomp):
            mom2jk = mu2[j, k] + s2[j, k]
            Ebeta2j += alpha[j, k] * mom2jk
            t4jk = 0
            if alpha[j, k] != 0:
                #t4jk += np.log(pi[k])
                t4jk += logpi[k]
                t4jk += - np.log(alpha[j, k])
                if k > 0:
                    t4jk += 0.5 * np.log(s2[j, k])
                    t4jk += - 0.5 * np.log(sigmak2[k])
                    t4jk += - 0.5 * mom2jk / sigmak2[k]
                    t4jk += 0.5
                t4jk *= alpha[j, k]
            t4 += t4jk
        t3 += 0.5 * XPX[j, j] * (Ebeta[j] * Ebeta[j] - Ebeta2j)
    t5 = np.sum(np.multiply(- np.sum(alpha, axis = 0), logpi))
    elbo = t0 + t1 + t2 + t3 + t4 + t5
    return elbo

def ash_qL_regularizer(X, Y, binit, L, mode_optim = True):
    nsample, nvar = X.shape
    if mode_optim:
        M = get_regularized_mode(X, Y, binit, L)
    else:
        M = binit.copy()
    Sinv = precisionLL(X, M, L)
    S = np.linalg.inv(Sinv)
    sgndetS, logdetS = np.linalg.slogdet(S)
    XPX = Sinv.copy()
    XPX[np.diag_indices(nvar)] -= L
    return M, S, Sinv, logdetS, XPX


def ash_qL_pi_init(nGcomp, pi0, sparse = True):
    '''
    By default, pi is initialized with sparsity, i.e. pi[0] > pi[j], for j > 0
    '''
    pi = np.zeros(nGcomp)
    pi[0] = pi0
    pi[1:(nGcomp - 1)] = (1 - pi0) / (nGcomp - 1)
    pi[nGcomp - 1] = 1 - np.sum(pi)
    if not sparse:
        pi = np.repeat(1 / nGcomp, nGcomp)
    return pi


def ash_logistic_qL(X, Y, nGcomp, sigmak2, binit, Linit, 
                    a_dirich = None,
                    maxiter = 1000, tol = 1e-10, pi0 = 0.8, sparse_init = True,
                    reg_step = 1, reg_tol = 0.1,
                    debug = True, debug_step = 10):
    
    '''
    Current focus is on readability and debug.
    '''
    
    nsample, nvar = X.shape
    
    # default dirichlet prior on pi
    if a_dirich is None:
        a_dirich = np.repeat(1., nGcomp)
        a_dirich[0] = 1. # soft max at pi[0]
    
    '''
    Initialize
    '''
    L = Linit.copy()
    M, S, Sinv, logdetS, XPX = ash_qL_regularizer(X, Y, binit, L)
    SinvM = np.dot(Sinv, M)
    SinvRtj = np.dot(Sinv, M) - np.multiply(np.diag(Sinv), M)
    
    pi = ash_qL_pi_init(nGcomp, pi0, sparse = sparse_init)
    logpi = np.log(pi)
    alpha = np.repeat(np.array([pi]), nvar, axis = 0)
    logalpha = np.log(alpha)
    s2 = np.zeros((nvar, nGcomp))
    s2[:, 1:] = 1 / (np.diag(XPX).reshape(-1, 1) + 1 / sigmak2[1:])
    mu = np.multiply(s2, (SinvM - SinvRtj).reshape(-1,1))
    R = np.sum(np.multiply(alpha, mu)[:, 1:], axis = 1)
    
    '''
    Iteration
    '''
    elbo_old = get_elbo(alpha, mu, s2, logpi, sigmak2, XPX, SinvM, M, L, logdetS)
    pi_old = pi.copy()
    pi_reg = pi.copy()
    
    if debug:
        print(f"Starting ELBO: {elbo_old:.4f} Pi0: {pi[0]:.4f}")
        

    for it in range(maxiter):
        
        # E-step
        for j in range(nvar):
            SinvRtj = np.sum(np.multiply(Sinv[:, j], R[:])) - Sinv[j, j] * R[j]
            
            s2[j, 0] = 0.
            mu[j, 0] = 0.
            #logalpha[j, 0] = np.log(pi[0])
            logalpha[j, 0] = logpi[0]
            for k in range(1, nGcomp):
                s2[j, k] = 1 / (XPX[j, j] + (1 / sigmak2[k]))
                mu[j, k] = s2[j, k] * (SinvM[j] - SinvRtj)
                ssf = np.log(s2[j, k] / sigmak2[k])
                ssr = mu[j, k] * mu[j, k] / s2[j, k]
                logalpha[j, k] = logpi[k] + 0.5 * ssf + 0.5 * ssr
                #logalpha[j, k] = np.log(pi[k]) + 0.5 * ssf + 0.5 * ssr
            
            maxL = np.max(logalpha[j, :])
            alpha[j, :]  = np.exp(logalpha[j, :] - maxL)
            alpha[j, :] /= np.sum(alpha[j, :])
            R[j] = np.sum(np.multiply(alpha[j, 1:], mu[j, 1:]))
            
        
        # M-step
        bk = np.sum(alpha, axis = 0) + a_dirich
        pi = bk / np.sum(bk)
        logpi = digamma(bk) - digamma(np.sum(bk))
        #for k in range(nGcomp):
        #    pi[k] = np.sum(alpha[:, k]) / nvar
            
        # Conditional M-step
        if (it + 1) % reg_step == 0:
            eps_reg = max([abs(x - y) for x, y in zip(pi, pi_reg)])
            if eps_reg > reg_tol:
                print(f"Recalculating regularizer, step {it}")            
                L = 1 / np.sum(alpha * sigmak2, axis = 1)
                M, S, Sinv, logdetS, XPX = ash_qL_regularizer(X, Y, R, L)
                SinvM = np.dot(Sinv, M)
                pi_reg = pi.copy()

        elbo = get_elbo(alpha, mu, s2, logpi, sigmak2, XPX, SinvM, M, L, logdetS)
        eps_elbo = elbo - elbo_old
        eps_pi = max([abs(x - y) for x, y in zip(pi, pi_old)])
        
        if debug and (it % debug_step == 0):
            print(f"Iteration {it}. ELBO: {elbo:.4f} Pi0: {pi[0]:.4f} epsilon: {eps_pi:.4f}, {eps_elbo:.4f}")
        
        '''
        Check termination, continue otherwise.
        '''
        if eps_pi < tol:
            print(eps_pi)
            break
 
        '''
        Keep this step in memory.
        Use deep copy to avoid in place substitutions.
        '''
        pi_old    = pi.copy()
        alpha_old = alpha.copy()
        mu_old    = mu.copy()
        s2_old    = s2.copy()
        R_old     = R.copy()
        elbo_old  = elbo
        
    return pi, alpha, mu, s2, R, L

Create a data frame for result summary

# collapse-hide 

res_df = [pd.DataFrame(columns = ['Method', 'RMSE', 'MeanPoisDev', 'Gini Index']) for n in nsample]

data_Xbeta = [np.dot(Xtest[i], beta[i])                              for i in range(len(nsample))]
data_mpd   = [mean_poisson_deviance(Ytest[i], np.exp(data_Xbeta[i])) for i in range(len(nsample))]
data_rmse  = [np.sqrt(mean_squared_error(Ytest[i], Ytest[i]))        for i in range(len(nsample))]
data_gini  = [gini(beta[i])                                          for i in range(len(nsample))]

for i in range(len(nsample)):
    res_df[i] = res_df[i].append(pd.DataFrame([['Data', data_rmse[i], data_mpd[i], data_gini[i]]], 
                                              columns = res_df[i].columns))

Fitting

  • Poisson Regressor (GLM_py)

# collapse-hide

clf = [linear_model.PoissonRegressor(alpha = 1.0/n, fit_intercept = False, max_iter = 5000) for n in nsample]

for i, n in enumerate(nsample):
    clf[i].fit(X[i], Y[i])

clf_Xbeta = [np.dot(Xtest[i], clf[i].coef_)                         for i in range(len(nsample))]
clf_mpd   = [mean_poisson_deviance(Ytest[i], 
                                   clf[i].predict(Xtest[i]))        for i in range(len(nsample))]
clf_rmse  = [np.sqrt(mean_squared_error(Ytest[i], 
                                        clf[i].predict(Xtest[i])))  for i in range(len(nsample))]
clf_gini  = [gini(clf[i].coef_)                                     for i in range(len(nsample))]

for i in range(len(nsample)):
    res_df[i] = res_df[i].append(pd.DataFrame([['GLM_Py', clf_rmse[i], clf_mpd[i], clf_gini[i]]], 
                                              columns = res_df[i].columns))
  • QL Ash

# collapse-hide

pi =    [None for n in nsample]
alpha = [None for n in nsample]
mu =    [None for n in nsample]
s2 =    [None for n in nsample]
R =     [None for n in nsample]
L =     [None for n in nsample]

#probk = ash_qL_pi_init(nGcomp, sparsity)
for i in range(len(nsample)):
    gold_reg = np.repeat(1e4, nvar[i])
    for j, b in enumerate(beta[i]):
        if b != 0:
            gold_reg[j] = 1 / prior_variance
    #fix_reg = np.repeat(1 / sigmak2[nGcomp - 1], nvar[i])
    #fix_reg = np.repeat( 1 / np.dot(probk, sigmak2), nvar[i])
    fix_reg = np.repeat(1 / prior_variance, nvar[i])

    binit = np.zeros(nvar[i])
    Linit = fix_reg

    pi[i], alpha[i], mu[i], s2[i], R[i], L[i]  \
                             = ash_logistic_qL(X[i], Y[i], nGcomp, sigmak2, binit, Linit,
                                               tol = 1e-4, debug_step = 1, maxiter = 1000,
                                               reg_step = 10,
                                               pi0 = sparsity, sparse_init = True)

qLash_Xbeta = [np.dot(Xtest[i], R[i])            for i in range(len(nsample))]
qLash_rmse  = [np.sqrt(mean_squared_error(Ytest[i], np.exp(qLash_Xbeta[i])))  
                                                 for i in range(len(nsample))]
qLash_mpd   = [mean_poisson_deviance(Ytest[i], np.exp(qLash_Xbeta[i]))
                                                 for i in range(len(nsample))]
qLash_gini  = [gini(R[i])                        for i in range(len(nsample))]
for i in range(len(nsample)):
    res_df[i] = res_df[i].append(pd.DataFrame([['qLash', qLash_rmse[i], qLash_mpd[i], qLash_gini[i]]], 
                                              columns = res_df[i].columns))
Starting ELBO: 47.5096 Pi0: 0.9500
Iteration 0. ELBO: 119.2579 Pi0: 0.8363 epsilon: 0.1137, 71.7483
Iteration 1. ELBO: 142.5224 Pi0: 0.8287 epsilon: 0.0076, 23.2645
Iteration 2. ELBO: 147.5091 Pi0: 0.8254 epsilon: 0.0039, 4.9868
Iteration 3. ELBO: 151.5264 Pi0: 0.8244 epsilon: 0.0020, 4.0173
Iteration 4. ELBO: 150.6354 Pi0: 0.8316 epsilon: 0.0072, -0.8910
Iteration 5. ELBO: 148.7583 Pi0: 0.8386 epsilon: 0.0069, -1.8771
Iteration 6. ELBO: 147.4634 Pi0: 0.8431 epsilon: 0.0046, -1.2949
Iteration 7. ELBO: 146.7829 Pi0: 0.8452 epsilon: 0.0021, -0.6805
Iteration 8. ELBO: 146.3258 Pi0: 0.8464 epsilon: 0.0012, -0.4571
Recalculating regularizer, step 9
Iteration 9. ELBO: 36.7813 Pi0: 0.8472 epsilon: 0.0008, -109.5445
Iteration 10. ELBO: 36.5829 Pi0: 0.8482 epsilon: 0.0010, -0.1984
Iteration 11. ELBO: 36.1625 Pi0: 0.8493 epsilon: 0.0011, -0.4204
Iteration 12. ELBO: 35.8729 Pi0: 0.8500 epsilon: 0.0007, -0.2896
Iteration 13. ELBO: 35.6859 Pi0: 0.8504 epsilon: 0.0005, -0.1870
Iteration 14. ELBO: 35.5640 Pi0: 0.8507 epsilon: 0.0003, -0.1219
Iteration 15. ELBO: 35.4843 Pi0: 0.8509 epsilon: 0.0002, -0.0797
Iteration 16. ELBO: 35.4320 Pi0: 0.8510 epsilon: 0.0001, -0.0523
Iteration 17. ELBO: 35.3975 Pi0: 0.8511 epsilon: 0.0001, -0.0345
8.085951900116406e-05
Starting ELBO: -88617.1823 Pi0: 0.9500
Iteration 0. ELBO: -8294.5825 Pi0: 0.1809 epsilon: 0.7691, 80322.5998
Iteration 1. ELBO: -4223.5688 Pi0: 0.0756 epsilon: 0.1053, 4071.0137
Iteration 2. ELBO: -3109.1345 Pi0: 0.0553 epsilon: 0.0335, 1114.4343
Iteration 3. ELBO: -2493.6543 Pi0: 0.0496 epsilon: 0.0181, 615.4803
Iteration 4. ELBO: -2077.7813 Pi0: 0.0391 epsilon: 0.0179, 415.8730
Iteration 5. ELBO: -1765.6863 Pi0: 0.0283 epsilon: 0.0179, 312.0950
Iteration 6. ELBO: -1521.4461 Pi0: 0.0290 epsilon: 0.0152, 244.2401
Iteration 7. ELBO: -1332.1208 Pi0: 0.0193 epsilon: 0.0186, 189.3253
Iteration 8. ELBO: -1178.0614 Pi0: 0.0158 epsilon: 0.0169, 154.0594
Recalculating regularizer, step 9
Iteration 9. ELBO: -1170.7134 Pi0: 0.0150 epsilon: 0.0158, 7.3480
Iteration 10. ELBO: -1067.0449 Pi0: 0.0133 epsilon: 0.0153, 103.6686
Iteration 11. ELBO: -979.0647 Pi0: 0.0127 epsilon: 0.0138, 87.9802
Iteration 12. ELBO: -903.3252 Pi0: 0.0135 epsilon: 0.0120, 75.7395
Iteration 13. ELBO: -837.1121 Pi0: 0.0161 epsilon: 0.0098, 66.2131
Iteration 14. ELBO: -778.7932 Pi0: 0.0203 epsilon: 0.0077, 58.3189
Iteration 15. ELBO: -727.7842 Pi0: 0.0246 epsilon: 0.0063, 51.0090
Iteration 16. ELBO: -682.2881 Pi0: 0.0314 epsilon: 0.0068, 45.4961
Iteration 17. ELBO: -644.0136 Pi0: 0.0347 epsilon: 0.0044, 38.2746
Iteration 18. ELBO: -610.4030 Pi0: 0.0348 epsilon: 0.0049, 33.6106
Iteration 19. ELBO: -579.1531 Pi0: 0.0358 epsilon: 0.0035, 31.2499
Iteration 20. ELBO: -552.0845 Pi0: 0.0343 epsilon: 0.0039, 27.0686
Iteration 21. ELBO: -527.2746 Pi0: 0.0309 epsilon: 0.0041, 24.8099
Iteration 22. ELBO: -504.7635 Pi0: 0.0269 epsilon: 0.0042, 22.5111
Iteration 23. ELBO: -483.7139 Pi0: 0.0251 epsilon: 0.0043, 21.0496
Iteration 24. ELBO: -463.6614 Pi0: 0.0259 epsilon: 0.0039, 20.0525
Iteration 25. ELBO: -444.6684 Pi0: 0.0283 epsilon: 0.0039, 18.9931
Iteration 26. ELBO: -426.6045 Pi0: 0.0319 epsilon: 0.0038, 18.0638
Iteration 27. ELBO: -410.0852 Pi0: 0.0347 epsilon: 0.0034, 16.5194
Iteration 28. ELBO: -395.6703 Pi0: 0.0350 epsilon: 0.0029, 14.4149
Iteration 29. ELBO: -382.7289 Pi0: 0.0338 epsilon: 0.0032, 12.9414
Iteration 30. ELBO: -370.7567 Pi0: 0.0320 epsilon: 0.0037, 11.9722
Iteration 31. ELBO: -359.2283 Pi0: 0.0308 epsilon: 0.0038, 11.5284
Iteration 32. ELBO: -347.7110 Pi0: 0.0312 epsilon: 0.0033, 11.5173
Iteration 33. ELBO: -337.1325 Pi0: 0.0306 epsilon: 0.0036, 10.5786
Iteration 34. ELBO: -327.4305 Pi0: 0.0292 epsilon: 0.0040, 9.7019
Iteration 35. ELBO: -317.8402 Pi0: 0.0289 epsilon: 0.0036, 9.5903
Iteration 36. ELBO: -307.5231 Pi0: 0.0321 epsilon: 0.0032, 10.3172
Iteration 37. ELBO: -296.6134 Pi0: 0.0384 epsilon: 0.0064, 10.9097
Iteration 38. ELBO: -285.5963 Pi0: 0.0467 epsilon: 0.0082, 11.0171
Iteration 39. ELBO: -274.9903 Pi0: 0.0551 epsilon: 0.0084, 10.6061
Iteration 40. ELBO: -265.4537 Pi0: 0.0613 epsilon: 0.0062, 9.5366
Iteration 41. ELBO: -256.8781 Pi0: 0.0653 epsilon: 0.0039, 8.5756
Iteration 42. ELBO: -249.0695 Pi0: 0.0677 epsilon: 0.0025, 7.8087
Iteration 43. ELBO: -241.8125 Pi0: 0.0694 epsilon: 0.0017, 7.2570
Iteration 44. ELBO: -235.0707 Pi0: 0.0705 epsilon: 0.0011, 6.7418
Iteration 45. ELBO: -228.8804 Pi0: 0.0711 epsilon: 0.0009, 6.1903
Iteration 46. ELBO: -223.0992 Pi0: 0.0713 epsilon: 0.0010, 5.7812
Iteration 47. ELBO: -217.5898 Pi0: 0.0717 epsilon: 0.0008, 5.5094
Iteration 48. ELBO: -212.0244 Pi0: 0.0735 epsilon: 0.0018, 5.5654
Iteration 49. ELBO: -206.0755 Pi0: 0.0779 epsilon: 0.0044, 5.9489
Iteration 50. ELBO: -200.8026 Pi0: 0.0814 epsilon: 0.0036, 5.2729
Iteration 51. ELBO: -195.7807 Pi0: 0.0845 epsilon: 0.0030, 5.0219
Iteration 52. ELBO: -190.5232 Pi0: 0.0886 epsilon: 0.0042, 5.2575
Iteration 53. ELBO: -185.5176 Pi0: 0.0926 epsilon: 0.0039, 5.0056
Iteration 54. ELBO: -181.1385 Pi0: 0.0942 epsilon: 0.0016, 4.3792
Iteration 55. ELBO: -177.0950 Pi0: 0.0944 epsilon: 0.0012, 4.0435
Iteration 56. ELBO: -173.2075 Pi0: 0.0941 epsilon: 0.0014, 3.8875
Iteration 57. ELBO: -169.2635 Pi0: 0.0942 epsilon: 0.0012, 3.9440
Iteration 58. ELBO: -164.8917 Pi0: 0.0959 epsilon: 0.0017, 4.3718
Iteration 59. ELBO: -159.9338 Pi0: 0.1007 epsilon: 0.0048, 4.9579
Iteration 60. ELBO: -154.5184 Pi0: 0.1089 epsilon: 0.0081, 5.4154
Iteration 61. ELBO: -149.3377 Pi0: 0.1180 epsilon: 0.0091, 5.1807
Iteration 62. ELBO: -144.7527 Pi0: 0.1256 epsilon: 0.0076, 4.5850
Iteration 63. ELBO: -141.0509 Pi0: 0.1302 epsilon: 0.0049, 3.7019
Iteration 64. ELBO: -137.7231 Pi0: 0.1335 epsilon: 0.0044, 3.3277
Iteration 65. ELBO: -134.5856 Pi0: 0.1360 epsilon: 0.0039, 3.1376
Iteration 66. ELBO: -131.7528 Pi0: 0.1375 epsilon: 0.0033, 2.8328
Iteration 67. ELBO: -129.0640 Pi0: 0.1387 epsilon: 0.0029, 2.6888
Iteration 68. ELBO: -126.1797 Pi0: 0.1413 epsilon: 0.0030, 2.8843
Recalculating regularizer, step 69
Iteration 69. ELBO: -145.1114 Pi0: 0.1458 epsilon: 0.0046, -18.9317
Iteration 70. ELBO: -141.8838 Pi0: 0.1507 epsilon: 0.0049, 3.2276
Iteration 71. ELBO: -139.2319 Pi0: 0.1540 epsilon: 0.0036, 2.6519
Iteration 72. ELBO: -137.0830 Pi0: 0.1553 epsilon: 0.0030, 2.1489
Iteration 73. ELBO: -135.0535 Pi0: 0.1558 epsilon: 0.0035, 2.0295
Iteration 74. ELBO: -133.0362 Pi0: 0.1563 epsilon: 0.0034, 2.0173
Iteration 75. ELBO: -130.8286 Pi0: 0.1579 epsilon: 0.0029, 2.2076
Iteration 76. ELBO: -128.1400 Pi0: 0.1614 epsilon: 0.0035, 2.6886
Iteration 77. ELBO: -125.7690 Pi0: 0.1642 epsilon: 0.0028, 2.3711
Iteration 78. ELBO: -123.9921 Pi0: 0.1648 epsilon: 0.0034, 1.7768
Iteration 79. ELBO: -122.4103 Pi0: 0.1648 epsilon: 0.0038, 1.5819
Iteration 80. ELBO: -120.9029 Pi0: 0.1646 epsilon: 0.0038, 1.5074
Iteration 81. ELBO: -119.4376 Pi0: 0.1644 epsilon: 0.0037, 1.4652
Iteration 82. ELBO: -118.0029 Pi0: 0.1641 epsilon: 0.0036, 1.4348
Iteration 83. ELBO: -116.5914 Pi0: 0.1640 epsilon: 0.0035, 1.4114
Iteration 84. ELBO: -115.1931 Pi0: 0.1639 epsilon: 0.0033, 1.3983
Iteration 85. ELBO: -113.7756 Pi0: 0.1643 epsilon: 0.0029, 1.4175
Iteration 86. ELBO: -112.1805 Pi0: 0.1658 epsilon: 0.0021, 1.5951
Iteration 87. ELBO: -109.8872 Pi0: 0.1702 epsilon: 0.0044, 2.2933
Iteration 88. ELBO: -107.0542 Pi0: 0.1764 epsilon: 0.0062, 2.8330
Iteration 89. ELBO: -103.9295 Pi0: 0.1833 epsilon: 0.0069, 3.1247
Iteration 90. ELBO: -100.4747 Pi0: 0.1907 epsilon: 0.0074, 3.4548
Iteration 91. ELBO: -97.9036 Pi0: 0.1968 epsilon: 0.0061, 2.5711
Iteration 92. ELBO: -96.1272 Pi0: 0.2008 epsilon: 0.0040, 1.7764
Iteration 93. ELBO: -94.8268 Pi0: 0.2028 epsilon: 0.0030, 1.3004
Iteration 94. ELBO: -93.2422 Pi0: 0.2054 epsilon: 0.0028, 1.5846
Iteration 95. ELBO: -91.1841 Pi0: 0.2094 epsilon: 0.0040, 2.0581
Iteration 96. ELBO: -89.8933 Pi0: 0.2112 epsilon: 0.0024, 1.2908
Iteration 97. ELBO: -88.8673 Pi0: 0.2119 epsilon: 0.0030, 1.0260
Iteration 98. ELBO: -87.8200 Pi0: 0.2130 epsilon: 0.0026, 1.0473
Iteration 99. ELBO: -86.4430 Pi0: 0.2157 epsilon: 0.0027, 1.3770
Iteration 100. ELBO: -84.4526 Pi0: 0.2197 epsilon: 0.0040, 1.9903
Iteration 101. ELBO: -82.9782 Pi0: 0.2214 epsilon: 0.0019, 1.4744
Iteration 102. ELBO: -82.0108 Pi0: 0.2222 epsilon: 0.0024, 0.9674
Iteration 103. ELBO: -81.1686 Pi0: 0.2223 epsilon: 0.0026, 0.8422
Iteration 104. ELBO: -80.3485 Pi0: 0.2223 epsilon: 0.0026, 0.8200
Iteration 105. ELBO: -79.5177 Pi0: 0.2222 epsilon: 0.0024, 0.8308
Iteration 106. ELBO: -78.6625 Pi0: 0.2220 epsilon: 0.0022, 0.8552
Iteration 107. ELBO: -77.7751 Pi0: 0.2219 epsilon: 0.0019, 0.8874
Iteration 108. ELBO: -76.8499 Pi0: 0.2218 epsilon: 0.0017, 0.9252
Iteration 109. ELBO: -75.8807 Pi0: 0.2218 epsilon: 0.0013, 0.9692
Iteration 110. ELBO: -74.8518 Pi0: 0.2221 epsilon: 0.0013, 1.0289
Iteration 111. ELBO: -73.7019 Pi0: 0.2230 epsilon: 0.0013, 1.1499
Iteration 112. ELBO: -72.2338 Pi0: 0.2253 epsilon: 0.0023, 1.4681
Iteration 113. ELBO: -70.5724 Pi0: 0.2281 epsilon: 0.0028, 1.6615
Iteration 114. ELBO: -69.2044 Pi0: 0.2290 epsilon: 0.0011, 1.3680
Iteration 115. ELBO: -67.9191 Pi0: 0.2293 epsilon: 0.0013, 1.2852
Iteration 116. ELBO: -66.6396 Pi0: 0.2293 epsilon: 0.0015, 1.2795
Iteration 117. ELBO: -65.3509 Pi0: 0.2293 epsilon: 0.0017, 1.2887
Iteration 118. ELBO: -64.0412 Pi0: 0.2292 epsilon: 0.0020, 1.3097
Iteration 119. ELBO: -62.7012 Pi0: 0.2291 epsilon: 0.0022, 1.3400
Iteration 120. ELBO: -61.3225 Pi0: 0.2289 epsilon: 0.0024, 1.3787
Iteration 121. ELBO: -59.8934 Pi0: 0.2289 epsilon: 0.0026, 1.4291
Iteration 122. ELBO: -58.3875 Pi0: 0.2290 epsilon: 0.0027, 1.5059
Iteration 123. ELBO: -56.7255 Pi0: 0.2298 epsilon: 0.0028, 1.6619
Iteration 124. ELBO: -54.7150 Pi0: 0.2318 epsilon: 0.0033, 2.0105
Iteration 125. ELBO: -52.5173 Pi0: 0.2346 epsilon: 0.0040, 2.1977
Iteration 126. ELBO: -50.6756 Pi0: 0.2362 epsilon: 0.0032, 1.8416
Iteration 127. ELBO: -48.9637 Pi0: 0.2369 epsilon: 0.0029, 1.7119
Iteration 128. ELBO: -47.1929 Pi0: 0.2380 epsilon: 0.0031, 1.7708
Iteration 129. ELBO: -45.0076 Pi0: 0.2410 epsilon: 0.0045, 2.1854
Iteration 130. ELBO: -42.6049 Pi0: 0.2445 epsilon: 0.0049, 2.4026
Iteration 131. ELBO: -40.7438 Pi0: 0.2455 epsilon: 0.0032, 1.8611
Iteration 132. ELBO: -39.0383 Pi0: 0.2455 epsilon: 0.0032, 1.7056
Iteration 133. ELBO: -37.3891 Pi0: 0.2455 epsilon: 0.0035, 1.6492
Iteration 134. ELBO: -35.7548 Pi0: 0.2455 epsilon: 0.0037, 1.6343
Iteration 135. ELBO: -34.0780 Pi0: 0.2461 epsilon: 0.0038, 1.6768
Iteration 136. ELBO: -32.1429 Pi0: 0.2486 epsilon: 0.0049, 1.9350
Iteration 137. ELBO: -29.1415 Pi0: 0.2546 epsilon: 0.0069, 3.0014
Iteration 138. ELBO: -25.7027 Pi0: 0.2594 epsilon: 0.0058, 3.4388
Recalculating regularizer, step 139
Iteration 139. ELBO: -50.1921 Pi0: 0.2629 epsilon: 0.0048, -24.4894
Iteration 140. ELBO: -47.4873 Pi0: 0.2656 epsilon: 0.0043, 2.7048
Iteration 141. ELBO: -45.3569 Pi0: 0.2665 epsilon: 0.0032, 2.1304
Iteration 142. ELBO: -43.4119 Pi0: 0.2665 epsilon: 0.0032, 1.9450
Iteration 143. ELBO: -41.5517 Pi0: 0.2664 epsilon: 0.0036, 1.8603
Iteration 144. ELBO: -39.7448 Pi0: 0.2663 epsilon: 0.0040, 1.8069
Iteration 145. ELBO: -37.9735 Pi0: 0.2661 epsilon: 0.0043, 1.7713
Iteration 146. ELBO: -36.2279 Pi0: 0.2660 epsilon: 0.0047, 1.7455
Iteration 147. ELBO: -34.5033 Pi0: 0.2658 epsilon: 0.0050, 1.7246
Iteration 148. ELBO: -32.7974 Pi0: 0.2657 epsilon: 0.0054, 1.7058
Iteration 149. ELBO: -31.1096 Pi0: 0.2656 epsilon: 0.0057, 1.6878
Iteration 150. ELBO: -29.4385 Pi0: 0.2656 epsilon: 0.0059, 1.6712
Iteration 151. ELBO: -27.7763 Pi0: 0.2659 epsilon: 0.0061, 1.6621
Iteration 152. ELBO: -26.0857 Pi0: 0.2667 epsilon: 0.0064, 1.6907
Iteration 153. ELBO: -24.1881 Pi0: 0.2693 epsilon: 0.0073, 1.8976
Iteration 154. ELBO: -21.6470 Pi0: 0.2753 epsilon: 0.0087, 2.5411
Iteration 155. ELBO: -19.1503 Pi0: 0.2823 epsilon: 0.0085, 2.4967
Iteration 156. ELBO: -16.7088 Pi0: 0.2881 epsilon: 0.0074, 2.4415
Iteration 157. ELBO: -14.4746 Pi0: 0.2921 epsilon: 0.0062, 2.2341
Iteration 158. ELBO: -12.6988 Pi0: 0.2938 epsilon: 0.0051, 1.7758
Recalculating regularizer, step 159
Iteration 159. ELBO: -19.8475 Pi0: 0.2965 epsilon: 0.0057, -7.1487
Iteration 160. ELBO: -17.7273 Pi0: 0.2993 epsilon: 0.0059, 2.1202
Iteration 161. ELBO: -16.1856 Pi0: 0.2999 epsilon: 0.0052, 1.5417
Iteration 162. ELBO: -14.7928 Pi0: 0.3001 epsilon: 0.0055, 1.3928
Iteration 163. ELBO: -13.4347 Pi0: 0.3004 epsilon: 0.0060, 1.3581
Iteration 164. ELBO: -12.0004 Pi0: 0.3016 epsilon: 0.0069, 1.4343
Iteration 165. ELBO: -10.1402 Pi0: 0.3048 epsilon: 0.0081, 1.8602
Iteration 166. ELBO: -7.5128 Pi0: 0.3115 epsilon: 0.0095, 2.6275
Iteration 167. ELBO: -5.1259 Pi0: 0.3186 epsilon: 0.0095, 2.3869
Iteration 168. ELBO: -2.8979 Pi0: 0.3238 epsilon: 0.0086, 2.2279
Iteration 169. ELBO: -1.1618 Pi0: 0.3258 epsilon: 0.0075, 1.7362
Iteration 170. ELBO: 0.1844 Pi0: 0.3260 epsilon: 0.0072, 1.3462
Iteration 171. ELBO: 1.3495 Pi0: 0.3258 epsilon: 0.0080, 1.1651
Iteration 172. ELBO: 2.3862 Pi0: 0.3255 epsilon: 0.0087, 1.0367
Iteration 173. ELBO: 3.3171 Pi0: 0.3251 epsilon: 0.0093, 0.9308
Iteration 174. ELBO: 4.1559 Pi0: 0.3248 epsilon: 0.0098, 0.8388
Iteration 175. ELBO: 4.9191 Pi0: 0.3245 epsilon: 0.0101, 0.7632
Iteration 176. ELBO: 5.6504 Pi0: 0.3246 epsilon: 0.0103, 0.7313
Iteration 177. ELBO: 6.4918 Pi0: 0.3256 epsilon: 0.0103, 0.8414
Iteration 178. ELBO: 7.5726 Pi0: 0.3279 epsilon: 0.0107, 1.0808
Recalculating regularizer, step 179
Iteration 179. ELBO: -4.8011 Pi0: 0.3299 epsilon: 0.0105, -12.3737
Iteration 180. ELBO: -4.3307 Pi0: 0.3304 epsilon: 0.0101, 0.4704
Iteration 181. ELBO: -4.0677 Pi0: 0.3301 epsilon: 0.0106, 0.2630
Iteration 182. ELBO: -3.9800 Pi0: 0.3295 epsilon: 0.0108, 0.0877
Iteration 183. ELBO: -4.0301 Pi0: 0.3289 epsilon: 0.0107, -0.0501
Iteration 184. ELBO: -4.1872 Pi0: 0.3283 epsilon: 0.0104, -0.1571
Iteration 185. ELBO: -4.4264 Pi0: 0.3277 epsilon: 0.0101, -0.2392
Iteration 186. ELBO: -4.7251 Pi0: 0.3273 epsilon: 0.0097, -0.2987
Iteration 187. ELBO: -5.0581 Pi0: 0.3269 epsilon: 0.0092, -0.3330
Iteration 188. ELBO: -5.3853 Pi0: 0.3268 epsilon: 0.0084, -0.3272
Iteration 189. ELBO: -5.5847 Pi0: 0.3277 epsilon: 0.0071, -0.1994
Iteration 190. ELBO: -5.0741 Pi0: 0.3312 epsilon: 0.0066, 0.5106
Iteration 191. ELBO: -3.3630 Pi0: 0.3381 epsilon: 0.0069, 1.7111
Iteration 192. ELBO: -2.5400 Pi0: 0.3425 epsilon: 0.0051, 0.8230
Iteration 193. ELBO: -2.2827 Pi0: 0.3439 epsilon: 0.0043, 0.2572
Iteration 194. ELBO: -2.1112 Pi0: 0.3443 epsilon: 0.0041, 0.1716
Iteration 195. ELBO: -1.9578 Pi0: 0.3445 epsilon: 0.0040, 0.1534
Iteration 196. ELBO: -1.7627 Pi0: 0.3448 epsilon: 0.0036, 0.1951
Iteration 197. ELBO: -1.4497 Pi0: 0.3456 epsilon: 0.0029, 0.3130
Iteration 198. ELBO: -0.8478 Pi0: 0.3474 epsilon: 0.0024, 0.6019
Recalculating regularizer, step 199
Iteration 199. ELBO: -11.9659 Pi0: 0.3510 epsilon: 0.0036, -11.1181
Iteration 200. ELBO: -10.6606 Pi0: 0.3558 epsilon: 0.0048, 1.3053
Iteration 201. ELBO: -8.9184 Pi0: 0.3613 epsilon: 0.0054, 1.7423
Iteration 202. ELBO: -7.8172 Pi0: 0.3646 epsilon: 0.0033, 1.1011
Iteration 203. ELBO: -6.9298 Pi0: 0.3668 epsilon: 0.0022, 0.8874
Iteration 204. ELBO: -5.6915 Pi0: 0.3706 epsilon: 0.0038, 1.2383
Iteration 205. ELBO: -4.3677 Pi0: 0.3746 epsilon: 0.0041, 1.3238
Iteration 206. ELBO: -2.8701 Pi0: 0.3779 epsilon: 0.0032, 1.4976
Iteration 207. ELBO: -1.1755 Pi0: 0.3817 epsilon: 0.0038, 1.6946
Iteration 208. ELBO: 0.1102 Pi0: 0.3841 epsilon: 0.0024, 1.2857
Iteration 209. ELBO: 1.3337 Pi0: 0.3858 epsilon: 0.0018, 1.2235
Iteration 210. ELBO: 2.7981 Pi0: 0.3885 epsilon: 0.0027, 1.4643
Iteration 211. ELBO: 4.7105 Pi0: 0.3922 epsilon: 0.0038, 1.9124
Iteration 212. ELBO: 6.2665 Pi0: 0.3942 epsilon: 0.0020, 1.5560
Iteration 213. ELBO: 7.3694 Pi0: 0.3951 epsilon: 0.0009, 1.1029
Iteration 214. ELBO: 8.3158 Pi0: 0.3961 epsilon: 0.0010, 0.9464
Iteration 215. ELBO: 9.2840 Pi0: 0.3980 epsilon: 0.0020, 0.9682
Iteration 216. ELBO: 10.5044 Pi0: 0.4021 epsilon: 0.0040, 1.2204
Iteration 217. ELBO: 12.1180 Pi0: 0.4090 epsilon: 0.0069, 1.6137
Iteration 218. ELBO: 13.9907 Pi0: 0.4172 epsilon: 0.0082, 1.8727
Iteration 219. ELBO: 16.1507 Pi0: 0.4245 epsilon: 0.0073, 2.1600
Iteration 220. ELBO: 18.5854 Pi0: 0.4314 epsilon: 0.0069, 2.4347
Iteration 221. ELBO: 20.8658 Pi0: 0.4361 epsilon: 0.0047, 2.2804
Iteration 222. ELBO: 23.1554 Pi0: 0.4407 epsilon: 0.0046, 2.2896
Iteration 223. ELBO: 24.9560 Pi0: 0.4453 epsilon: 0.0046, 1.8007
Iteration 224. ELBO: 26.9192 Pi0: 0.4507 epsilon: 0.0054, 1.9632
Iteration 225. ELBO: 28.9281 Pi0: 0.4553 epsilon: 0.0046, 2.0089
Iteration 226. ELBO: 30.5896 Pi0: 0.4603 epsilon: 0.0050, 1.6615
Iteration 227. ELBO: 32.3908 Pi0: 0.4672 epsilon: 0.0070, 1.8012
Iteration 228. ELBO: 34.7625 Pi0: 0.4781 epsilon: 0.0108, 2.3718
Recalculating regularizer, step 229
Iteration 229. ELBO: 18.7505 Pi0: 0.4917 epsilon: 0.0136, -16.0120
Iteration 230. ELBO: 21.2619 Pi0: 0.4994 epsilon: 0.0078, 2.5114
Iteration 231. ELBO: 23.0690 Pi0: 0.5044 epsilon: 0.0051, 1.8072
Iteration 232. ELBO: 25.2890 Pi0: 0.5115 epsilon: 0.0071, 2.2200
Iteration 233. ELBO: 27.3869 Pi0: 0.5174 epsilon: 0.0059, 2.0979
Iteration 234. ELBO: 29.7660 Pi0: 0.5237 epsilon: 0.0062, 2.3791
Iteration 235. ELBO: 31.8401 Pi0: 0.5292 epsilon: 0.0055, 2.0741
Iteration 236. ELBO: 34.5766 Pi0: 0.5373 epsilon: 0.0081, 2.7365
Iteration 237. ELBO: 37.0623 Pi0: 0.5459 epsilon: 0.0085, 2.4857
Iteration 238. ELBO: 38.7266 Pi0: 0.5541 epsilon: 0.0082, 1.6643
Iteration 239. ELBO: 40.6735 Pi0: 0.5610 epsilon: 0.0069, 1.9469
Iteration 240. ELBO: 42.2485 Pi0: 0.5631 epsilon: 0.0023, 1.5750
Iteration 241. ELBO: 43.4616 Pi0: 0.5657 epsilon: 0.0026, 1.2131
Iteration 242. ELBO: 45.5876 Pi0: 0.5715 epsilon: 0.0058, 2.1261
Iteration 243. ELBO: 47.6280 Pi0: 0.5741 epsilon: 0.0026, 2.0404
Iteration 244. ELBO: 48.4509 Pi0: 0.5771 epsilon: 0.0030, 0.8229
Iteration 245. ELBO: 49.3805 Pi0: 0.5805 epsilon: 0.0034, 0.9296
Iteration 246. ELBO: 50.1469 Pi0: 0.5829 epsilon: 0.0024, 0.7664
Iteration 247. ELBO: 50.7009 Pi0: 0.5838 epsilon: 0.0009, 0.5540
Iteration 248. ELBO: 51.1597 Pi0: 0.5842 epsilon: 0.0004, 0.4588
Iteration 249. ELBO: 51.5552 Pi0: 0.5846 epsilon: 0.0004, 0.3954
Iteration 250. ELBO: 51.9221 Pi0: 0.5851 epsilon: 0.0006, 0.3669
Iteration 251. ELBO: 52.3292 Pi0: 0.5862 epsilon: 0.0011, 0.4070
Iteration 252. ELBO: 53.0626 Pi0: 0.5889 epsilon: 0.0027, 0.7335
Iteration 253. ELBO: 54.2078 Pi0: 0.5942 epsilon: 0.0052, 1.1452
Iteration 254. ELBO: 54.8367 Pi0: 0.6006 epsilon: 0.0065, 0.6289
Iteration 255. ELBO: 55.7393 Pi0: 0.6077 epsilon: 0.0071, 0.9026
Iteration 256. ELBO: 57.0821 Pi0: 0.6184 epsilon: 0.0107, 1.3428
Iteration 257. ELBO: 59.3381 Pi0: 0.6344 epsilon: 0.0160, 2.2560
Iteration 258. ELBO: 61.3632 Pi0: 0.6453 epsilon: 0.0111, 2.0251
Recalculating regularizer, step 259
Iteration 259. ELBO: 42.3830 Pi0: 0.6513 epsilon: 0.0063, -18.9803
Iteration 260. ELBO: 43.7826 Pi0: 0.6607 epsilon: 0.0094, 1.3996
Iteration 261. ELBO: 47.2303 Pi0: 0.6788 epsilon: 0.0182, 3.4477
Iteration 262. ELBO: 49.9809 Pi0: 0.6938 epsilon: 0.0149, 2.7506
Iteration 263. ELBO: 52.7356 Pi0: 0.7095 epsilon: 0.0158, 2.7547
Iteration 264. ELBO: 54.4907 Pi0: 0.7188 epsilon: 0.0096, 1.7551
Iteration 265. ELBO: 55.6979 Pi0: 0.7253 epsilon: 0.0067, 1.2072
Iteration 266. ELBO: 58.1549 Pi0: 0.7410 epsilon: 0.0157, 2.4570
Iteration 267. ELBO: 61.0686 Pi0: 0.7597 epsilon: 0.0187, 2.9138
Iteration 268. ELBO: 62.4185 Pi0: 0.7717 epsilon: 0.0120, 1.3498
Recalculating regularizer, step 269
Iteration 269. ELBO: 48.1044 Pi0: 0.7850 epsilon: 0.0133, -14.3140
Iteration 270. ELBO: 50.7287 Pi0: 0.7997 epsilon: 0.0147, 2.6243
Iteration 271. ELBO: 52.5485 Pi0: 0.8219 epsilon: 0.0222, 1.8198
Iteration 272. ELBO: 53.9595 Pi0: 0.8382 epsilon: 0.0164, 1.4110
Iteration 273. ELBO: 54.1983 Pi0: 0.8477 epsilon: 0.0094, 0.2388
Iteration 274. ELBO: 55.2759 Pi0: 0.8562 epsilon: 0.0085, 1.0776
Iteration 275. ELBO: 57.2228 Pi0: 0.8652 epsilon: 0.0090, 1.9469
Iteration 276. ELBO: 57.7871 Pi0: 0.8690 epsilon: 0.0038, 0.5643
Iteration 277. ELBO: 58.1218 Pi0: 0.8695 epsilon: 0.0005, 0.3347
Iteration 278. ELBO: 58.3120 Pi0: 0.8697 epsilon: 0.0001, 0.1902
Iteration 279. ELBO: 58.4404 Pi0: 0.8697 epsilon: 0.0001, 0.1284
6.769251216631744e-05

Results

i = 0
print (f'N = {nsample[i]}, P = {nvar[i]}')
res_df[i]
N = 50, P = 100
Method RMSE MeanPoisDev Gini Index
0 Data 0.000000 1.054893 0.958865
0 GLM_Py 3.177705 3.400255 0.421482
0 qLash 2.346626 1.357556 0.919608
i = 1
print (f'N = {nsample[i]}, P = {nvar[i]}')
res_df[i]
N = 200, P = 100
Method RMSE MeanPoisDev Gini Index
0 Data 0.000000 0.810651 0.959049
0 GLM_Py 149.983669 17.259012 0.647575
0 qLash 4.995020 0.834997 0.959159

Calibration of predictions

To ensure that the methods yield reasonable predictions, we can bin test samples according to observed $\mathbf{y}$. Then for each bin, we plot the mean predicted $\langle\mathbf{y}_{\mathrm{pred}}\rangle$ in the log scale.

# collapse-hide

n_bins = 40

fig = plt.figure(figsize = (16, 8))
ax = [fig.add_subplot(121),
      fig.add_subplot(122)]

for i, n in enumerate(nsample):
    whichX = Xtest[i].copy()
    whichY = Ytest[i].copy()
    
    idx_sort = np.argsort(whichY)
    q = np.arange(0, 1, 1/n_bins) + 0.5/n_bins
    seg_true = np.zeros(n_bins)
    seg_clf  = np.zeros(n_bins)
    seg_qL   = np.zeros(n_bins)

    for j, sl in enumerate(gen_even_slices(len(whichY), n_bins)):
        seg_true[j] = np.average(whichY[idx_sort][sl])
    ax[i].scatter(q, np.log(seg_true+1), s = 10, label = 'Observations')
    ax[i].plot(q, np.log(seg_true+1))

    Xbeta_clf = np.dot(whichX, clf[i].coef_)
    Ypred_clf = clf[i].predict(whichX)
    for j, sl in enumerate(gen_even_slices(len(whichY), n_bins)):
        seg_clf[j] = np.average(Ypred_clf[idx_sort][sl])
    ax[i].scatter(q, np.log(seg_clf+1), s = 10, label = 'GLM_py')
    ax[i].plot(q, np.log(seg_clf+1))

    Xbeta_qL = np.dot(whichX, R[i])
    Ypred_qL = np.exp(Xbeta_qL)
    for j, sl in enumerate(gen_even_slices(len(whichY), n_bins)):
        seg_qL[j] = np.average(Ypred_qL[idx_sort][sl])
    ax[i].scatter(q, np.log(seg_qL+1), s = 10, label = 'qLash')
    ax[i].plot(q, np.log(seg_qL+1))

    leg = ax[i].legend(markerscale = 4)

    ax[i].set_xlabel('Fraction of samples')
    ax[i].set_ylabel(r'$\log (\langle{Y}\rangle + 1)$')
    ax[i].set_title(f'N = {nsample[i]}, P = {nvar[i]}', pad = 20)
    
plt.tight_layout(pad = 2.0)
plt.show()

Evaluation of ranking power

To evaluate the ranking power, I used the Lorenz curve. It is popularly used in economics to describe inequality in wealth but can be instructive here due to the sparsity in $\mathbf{y}$. We rank the samples based on predicted $\mathbf{y}$ for each model. The sample threshold is shown on the x-axis, and the cumulative sum of $\mathbf{y}$ up to the corresponding ranked sample is shown on the y-axis. Both axes are scaled to 1.

# collapse-hide

def lorenz_curve(y_true, y_pred):
    y_true, y_pred = np.asarray(y_true), np.asarray(y_pred)
    nsample = y_true.shape[0]
    ranking = np.argsort(y_pred)
    ranked_y = y_true[ranking]
    cumulated_y = np.cumsum(ranked_y).astype(float)
    cumulated_y /= cumulated_y[-1]
    cumulated_obs = np.cumsum(np.repeat(1., nsample))
    cumulated_obs /= cumulated_obs[-1]
    return cumulated_obs, cumulated_y

fig = plt.figure(figsize = (16, 8))
ax = [fig.add_subplot(121),
      fig.add_subplot(122)]

for i, n in enumerate(nsample):
    whichX = Xtest[i].copy()
    whichY = Ytest[i].copy()
    
    cum_obs, cum_y = lorenz_curve(whichY, whichY)
    ax[i].plot(cum_obs, cum_y, label = "Observation")
    
    cum_obs, cum_y = lorenz_curve(whichY, clf[i].predict(whichX))
    ax[i].plot(cum_obs, cum_y, label = "GLM_py")

    cum_obs, cum_y = lorenz_curve(whichY, np.exp(np.dot(whichX, R[i])))
    ax[i].plot(cum_obs, cum_y, label = "qLash")
    
    rand_idx = np.arange(len(whichY))
    np.random.shuffle(rand_idx)
    cum_obs, cum_y = lorenz_curve(whichY, whichY[rand_idx])
    ax[i].plot(cum_obs, cum_y, linestyle="dotted", color="black", label="Random")

    leg = ax[i].legend(handlelength = 1.5)

    ax[i].set_xlabel('Ranked samples')
    ax[i].set_ylabel('Cumulative sum of Y')
    ax[i].set_title(f'N = {nsample[i]}, P = {nvar[i]}', pad = 20)
    
plt.tight_layout(pad = 2.0)
plt.show()

1. For a review of sparsity indices, see Harley and Rickard (2009). Comparing Measures of Sparsity. arXiv. DOI: arXiv:0811.4706v2