About

This is a proof of concept implementation of quasi-Laplace approximation with Mr.Ash. The quasi-Laplace approximation introduces a regularizer on the likelihood and approximates the posterior with a Gaussian distribution. This approach can be used for generalized linear model with any link function.

Bayesian logistic regression with a Gaussian prior (ridge regression) uses the Laplace approximation to approximate the posterior with a Gaussian distribution. The variational treatment of logistic model by Jordan and Jaakkola (2000) also leads to a Gaussian approximation of the posterior distribution. The greater flexibility of the variational approximation leads to improved accuracy compared to Laplace method.

Here, I compare the quasi-Laplace approach with cross-validated Lasso ($L_1$ penalty). I used the $R^2$ statistic by Tjur1 to quantify the quality of classification, and the Gini index2 to quantify the sparsity of the coefficients.

Note: Coefficients are generally underestimated by QL irrespective of the choice of regularizer. Lasso overestimates the coefficients.

I found that the optimization depends on the choice of Gaussian mixture family.

#collapse_hide

import numpy as np
np.set_printoptions(precision = 4, suppress=True)
import pandas as pd

from scipy import optimize
from scipy import stats
from scipy.special import digamma
from sklearn import linear_model

import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib import ticker as plticker
from mpl_toolkits.axes_grid1 import make_axes_locatable

import sys
sys.path.append("../../utils/")
import mpl_stylesheet
mpl_stylesheet.banskt_presentation(fontfamily = 'latex-clearsans', fontsize = 18, colors = 'banskt', dpi = 72)

Generate toy data

Let us consider a logistic model with sparse coefficients, so that the number of causal variables ncausal is much less than the number of variables nvar in the model. To ensure sparsity of the coefficients, we use a Gaussian mixture prior of nGcomp components with variances given by $\sigma_k^2$ (sigmak2) and unknown mixture coefficients. The sparsity of the initialization is controlled by the variable sparsity that specifies the mixture coefficient of the zero-th component $\mathcal{N}(0, 0)$ (or the delta function). We are particularly interested in systems where the number of samples nsample is less than nvar. Validation is performed on separate test dataset with ntest samples.

nsample = [50, 200]
nvar = [100, 100]
ncausal = 5
nGcomp = 10
sparsity = 0.95
prior_variance = 5
ntest = 2000
hg2 = 0.8

sigmak2 = np.array([prior_variance * np.square(np.power(2, (i)/nGcomp) - 1) for i in range(nGcomp)])

The data is generated from a liability threshold model such that the explanatory variables explain only a given fraction hg2 of the response variable. $\mathbf{X}$ is centered and scaled such that for each variable $j$, the variance is $\mathrm{var}(\mathbf{x}_j) = 1$.

# collapse-hide

def standardize(X):
    Xnorm = (X - np.mean(X, axis = 0)) 
    Xstd =  Xnorm / np.std(Xnorm, axis = 0)
    return Xstd

def logistic_data(X, beta):
    Xbeta = np.dot(X, beta)
    pred = 1 / (1 + np.exp(-Xbeta))
    Y = np.random.binomial(1, pred)
    return Y, pred

def liability_model(X, beta, prevalence = 0.5):
    hg2 = np.sum(np.square(beta))
    eff = np.dot(X, beta)
    rand_std = np.sqrt(1 - hg2)
    rand_eff = np.random.normal(0, rand_std, X.shape[0])
    liability = eff + rand_eff
    cutoff = stats.norm.ppf(1 - prevalence)
    cases = np.where(liability >= cutoff)[0]
    ctrls = np.where(liability <  cutoff)[0]
    Y = np.zeros(X.shape[0])
    Y[cases] = 1
    return Y, liability

def tjur_R2(X, Y, beta):
    Xbeta = np.dot(X, beta)
    pred = 1 / (1 + np.exp(-Xbeta))
    R2 = np.mean(pred[Y == 1]) - np.mean(pred[Y == 0])
    return R2

def gini(x):
    n = len(x)
    xabs = abs(x)
    xsort = xabs[np.argsort(xabs)]
    t1 = xsort / np.sum(xabs)
    t2 = (n - np.arange(n) - 1 + 0.5) / n
    gini = 1 - 2 * np.sum(t1 * t2)
    return gini

def simulate_data(nsample, nvar, ncausal, ntest, hg2):
    X = np.random.rand(nsample * nvar).reshape(nsample, nvar)
    X = standardize(X)
    Xtest = np.random.rand(ntest * nvar).reshape(ntest, nvar)
    Xtest = standardize(Xtest)

    beta = np.zeros(nvar)
    causal_idx = np.random.choice(nvar, ncausal, replace = False)
    beta[causal_idx] = np.random.normal(loc = 0., scale = 1., size = ncausal)
    beta = beta * np.sqrt(hg2 / np.sum(np.square(beta)))

    Y, Y_liab = liability_model(X, beta)
    Ytest, Ytest_liab = liability_model(Xtest, beta)
    #Y, Y_liab = logistic_data(X, beta)
    #Ytest, Ytest_liab = logistic_data(Xtest, beta)
    
    return X, Y, beta, Xtest, Ytest, Y_liab, Ytest_liab

Let us have a look at the generated data.

# collapse-hide

X =          [None for n in nsample]
Xtest =      [None for n in nsample]
Y =          [None for n in nsample]
Ytest =      [None for n in nsample]
Y_liab =     [None for n in nsample]
Ytest_liab = [None for n in nsample]
beta =       [None for n in nsample]

for i, n in enumerate(nsample):
    X[i], Y[i], beta[i], Xtest[i], Ytest[i], Y_liab[i], Ytest_liab[i] \
        = simulate_data(n, nvar[i], ncausal, ntest, hg2)

print(f'There are {ncausal} non-zero coefficients with heritability {hg2:.4f}')

fig = plt.figure(figsize = (12,6))
ax = [fig.add_subplot(121),
      fig.add_subplot(122)]

for i, n in enumerate(nsample):
    Xbeta = np.dot(X[i], beta[i])
    pred = 1 / (1 + np.exp(-Xbeta))
    ax[i].scatter(Xbeta, Y[i], s = 10)
    ax[i].scatter(Xbeta, pred, s = 10)

    ax[i].set_xlabel(r'$\sum_i X_{ni} \beta_i$')
    ax[i].set_ylabel(r'$Y_n$')
    
plt.tight_layout()
plt.show()
There are 5 non-zero coefficients with heritability 0.8000

Implementation of quasi-Laplace with variational approximation

See Latex notes for details of the theory. Current implementation focus on readability and debug. Computational efficiency has been ignored.

Warning: ELBO decreasing with parameter update.

There must be an error in theory or implementation. Debug!

# collapse-hide

def regularized_log_likelihood(beta, X, Y, L):
    nvar = beta.shape[0]
    Xbeta = np.dot(X, beta)

    ## Function
    llb = np.sum(np.dot(Y, Xbeta) - np.log(1 + np.exp(Xbeta)))
    reg = - 0.5 * np.einsum('i,i->i', np.square(beta), L)
    loglik = llb + reg
    
    ## Gradient
    pred = 1 / (1 + np.exp(-Xbeta))
    der = np.einsum('i,ij->j', (Y - pred), X) - np.multiply(beta, L)
    
    return -loglik, -der

def precisionLL(X, beta, L):
    nvar = X.shape[1]
    Xbeta = np.einsum('i,ji->j', beta, X)
    pred = 1 / (1 + np.exp(-Xbeta))
    Sinv = np.einsum('i,i,ij,ik->jk', pred, (1 - pred), X, X)
    Sinv[np.diag_indices(nvar)] += L
    return Sinv

def get_regularized_mode(X, Y, beta0, L):
    nsample, nvar = X.shape
    args  = X, Y, L
    gmode = optimize.minimize(regularized_log_likelihood,
                              beta0,
                              args=args,
                              method='L-BFGS-B',
                              jac=True,
                              bounds=None,
                              options={'maxiter': 20000000,
                                       'maxfun': 20000000,
                                       'ftol': 1e-9,
                                       'gtol': 1e-9
                                       #'disp': True
                                      })
    return gmode.x

def get_elbo_qL(alpha, mu, s2, pi, sigmak2, XPX, SinvM, M, L, logdetS, debug = False):
    Ebeta = np.sum(np.multiply(alpha, mu)[:, 1:], axis = 1)
    mu2 = np.square(mu)
    t1 = - 0.5 * np.dot(Ebeta.T, np.dot(XPX, Ebeta))
    t2 = np.dot(Ebeta.T, SinvM)
    t3 = 0
    t4 = 0
    for j in range(nvar):
        Ebeta2j = 0
        for k in range(nGcomp):
            mom2jk = mu2[j, k] + s2[j, k]
            Ebeta2j += alpha[j, k] * mom2jk
            t4jk = 0
            if alpha[j, k] != 0:
                t4jk += np.log(pi[k])
                t4jk += - np.log(alpha[j, k])
                if k > 0:
                    t4jk += 0.5 * np.log(s2[j, k])
                    t4jk += - 0.5 * np.log(sigmak2[k])
                    t4jk += - 0.5 * mom2jk / sigmak2[k]
                    t4jk += 0.5
                t4jk *= alpha[j, k]
            t4 += t4jk
        t3 += 0.5 * XPX[j, j] * (Ebeta[j] * Ebeta[j] - Ebeta2j)
    logdetL = np.sum(np.log(L))
    t0 = - 0.5 * (logdetL + logdetS + np.dot(M.T, SinvM))
    elbo = t0 + t1 + t2 + t3 + t4
    if debug:
        print(t0, t1 + t2, t3, t4)
    return elbo

#def get_elbo(alpha, mu, s2, pi, sigmak2, XPX, SinvM, M, L, logdetS):
def get_elbo(alpha, mu, s2, logpi, sigmak2, XPX, SinvM, M, L, logdetS):
    nvar, nGcomp = alpha.shape
    Ebeta = np.sum(np.multiply(alpha, mu)[:, 1:], axis = 1)
    mu2 = np.square(mu)
    logdetL = np.sum(np.log(L))
    t0 = - 0.5 * (logdetL + logdetS + np.dot(M.T, SinvM))
    t1 = - 0.5 * np.dot(Ebeta.T, np.dot(XPX, Ebeta))
    t2 = np.dot(Ebeta.T, SinvM)
    t3 = 0
    t4 = 0
    for j in range(nvar):
        Ebeta2j = 0
        for k in range(nGcomp):
            mom2jk = mu2[j, k] + s2[j, k]
            Ebeta2j += alpha[j, k] * mom2jk
            t4jk = 0
            if alpha[j, k] != 0:
                #t4jk += np.log(pi[k])
                t4jk += logpi[k]
                t4jk += - np.log(alpha[j, k])
                if k > 0:
                    t4jk += 0.5 * np.log(s2[j, k])
                    t4jk += - 0.5 * np.log(sigmak2[k])
                    t4jk += - 0.5 * mom2jk / sigmak2[k]
                    t4jk += 0.5
                t4jk *= alpha[j, k]
            t4 += t4jk
        t3 += 0.5 * XPX[j, j] * (Ebeta[j] * Ebeta[j] - Ebeta2j)
    t5 = np.sum(np.multiply(- np.sum(alpha, axis = 0), logpi))
    elbo = t0 + t1 + t2 + t3 + t4 + t5
    return elbo

def ash_qL_regularizer(X, Y, binit, L, mode_optim = True):
    nsample, nvar = X.shape
    if mode_optim:
        M = get_regularized_mode(X, Y, binit, L)
    else:
        M = binit.copy()
    Sinv = precisionLL(X, M, L)
    S = np.linalg.inv(Sinv)
    sgndetS, logdetS = np.linalg.slogdet(S)
    XPX = Sinv.copy()
    XPX[np.diag_indices(nvar)] -= L
    return M, S, Sinv, logdetS, XPX


def ash_qL_pi_init(nGcomp, pi0, sparse = True):
    '''
    By default, pi is initialized with sparsity, i.e. pi[0] > pi[j], for j > 0
    '''
    pi = np.zeros(nGcomp)
    pi[0] = pi0
    pi[1:(nGcomp - 1)] = (1 - pi0) / (nGcomp - 1)
    pi[nGcomp - 1] = 1 - np.sum(pi)
    if not sparse:
        pi = np.repeat(1 / nGcomp, nGcomp)
    return pi


def ash_logistic_qL(X, Y, nGcomp, sigmak2, binit, Linit, 
                    a_dirich = None,
                    maxiter = 1000, tol = 1e-10, pi0 = 0.8, sparse_init = True,
                    reg_step = 1, reg_tol = 0.1,
                    debug = True, debug_step = 10):
    
    '''
    Current focus is on readability and debug.
    '''
    
    nsample, nvar = X.shape
    
    # default dirichlet prior on pi
    if a_dirich is None:
        a_dirich = np.repeat(1., nGcomp)
        a_dirich[0] = 1. # soft max at pi[0]
    
    '''
    Initialize
    '''
    L = Linit.copy()
    M, S, Sinv, logdetS, XPX = ash_qL_regularizer(X, Y, binit, L)
    SinvM = np.dot(Sinv, M)
    SinvRtj = np.dot(Sinv, M) - np.multiply(np.diag(Sinv), M)
    
    pi = ash_qL_pi_init(nGcomp, pi0, sparse = sparse_init)
    logpi = np.log(pi)
    alpha = np.repeat(np.array([pi]), nvar, axis = 0)
    logalpha = np.log(alpha)
    s2 = np.zeros((nvar, nGcomp))
    s2[:, 1:] = 1 / (np.diag(XPX).reshape(-1, 1) + 1 / sigmak2[1:])
    mu = np.multiply(s2, (SinvM - SinvRtj).reshape(-1,1))
    R = np.sum(np.multiply(alpha, mu)[:, 1:], axis = 1)
    
    '''
    Iteration
    '''
    elbo_old = get_elbo(alpha, mu, s2, logpi, sigmak2, XPX, SinvM, M, L, logdetS)
    pi_old = pi.copy()
    pi_reg = pi.copy()
    
    if debug:
        print(f"Starting ELBO: {elbo_old:.4f} Pi0: {pi[0]:.4f}")
        

    for it in range(maxiter):
        
        # E-step
        for j in range(nvar):
            SinvRtj = np.sum(np.multiply(Sinv[:, j], R[:])) - Sinv[j, j] * R[j]
            
            s2[j, 0] = 0.
            mu[j, 0] = 0.
            #logalpha[j, 0] = np.log(pi[0])
            logalpha[j, 0] = logpi[0]
            for k in range(1, nGcomp):
                s2[j, k] = 1 / (XPX[j, j] + (1 / sigmak2[k]))
                mu[j, k] = s2[j, k] * (SinvM[j] - SinvRtj)
                ssf = np.log(s2[j, k] / sigmak2[k])
                ssr = mu[j, k] * mu[j, k] / s2[j, k]
                logalpha[j, k] = logpi[k] + 0.5 * ssf + 0.5 * ssr
                #logalpha[j, k] = np.log(pi[k]) + 0.5 * ssf + 0.5 * ssr
            
            maxL = np.max(logalpha[j, :])
            alpha[j, :]  = np.exp(logalpha[j, :] - maxL)
            alpha[j, :] /= np.sum(alpha[j, :])
            R[j] = np.sum(np.multiply(alpha[j, 1:], mu[j, 1:]))
            
        
        # M-step
        bk = np.sum(alpha, axis = 0) + a_dirich
        pi = bk / np.sum(bk)
        logpi = digamma(bk) - digamma(np.sum(bk))
        #for k in range(nGcomp):
        #    pi[k] = np.sum(alpha[:, k]) / nvar
            
        # Conditional M-step
        if (it + 1) % reg_step == 0:
            eps_reg = max([abs(x - y) for x, y in zip(pi, pi_reg)])
            if eps_reg > reg_tol:
                print(f"Recalculating regularizer, step {it}")            
                L = 1 / np.sum(alpha * sigmak2, axis = 1)
                M, S, Sinv, logdetS, XPX = ash_qL_regularizer(X, Y, R, L)
                SinvM = np.dot(Sinv, M)
                pi_reg = pi.copy()

        elbo = get_elbo(alpha, mu, s2, logpi, sigmak2, XPX, SinvM, M, L, logdetS)
        eps_elbo = elbo - elbo_old
        eps_pi = max([abs(x - y) for x, y in zip(pi, pi_old)])
        
        if debug and (it % debug_step == 0):
            print(f"Iteration {it}. ELBO: {elbo:.4f} Pi0: {pi[0]:.4f} epsilon: {eps_pi:.4f}, {eps_elbo:.4f}")
        
        '''
        Check termination, continue otherwise.
        '''
        if eps_pi < tol:
            print(eps_pi)
            break
 
        '''
        Keep this step in memory.
        Use deep copy to avoid in place substitutions.
        '''
        pi_old    = pi.copy()
        alpha_old = alpha.copy()
        mu_old    = mu.copy()
        s2_old    = s2.copy()
        R_old     = R.copy()
        elbo_old  = elbo
        
    return pi, alpha, mu, s2, R, L

Hold the result summary statistic in a data frame

# collapse-hide 

res_df = [pd.DataFrame(columns = ['Method', 'RMSE', 'Tjur R2', 'Gini Index']) for n in nsample]

data_Xbeta = [np.dot(Xtest[i], beta[i])                              for i in range(len(nsample))]
data_tjur  = [tjur_R2(Xtest[i], Ytest[i], beta[i])                   for i in range(len(nsample))]
data_rmse  = [np.sqrt(np.mean(np.square(data_Xbeta[i] - 
                                        np.dot(Xtest[i], beta[i])))) for i in range(len(nsample))]
data_gini  = [gini(beta[i])                                          for i in range(len(nsample))]

for i in range(len(nsample)):
    res_df[i] = res_df[i].append(pd.DataFrame([['Data', data_rmse[i], data_tjur[i], data_gini[i]]], 
                                              columns = res_df[i].columns))

Fitting

  • Lasso

# collapse-hide

clf = [linear_model.LogisticRegressionCV(cv = 5, penalty='l1', solver='saga', 
                                        fit_intercept = False, max_iter = 5000) for n in nsample]

for i, n in enumerate(nsample):
    clf[i].fit(X[i], Y[i])

clf_Xbeta = [np.dot(Xtest[i], clf[i].coef_[0])                      for i in range(len(nsample))]
clf_tjur  = [tjur_R2(Xtest[i], Ytest[i], clf[i].coef_[0])           for i in range(len(nsample))]
clf_rmse  = [np.sqrt(np.mean(np.square(clf_Xbeta[i] - 
                                       np.dot(Xtest[i], beta[i])))) for i in range(len(nsample))]
clf_gini  = [gini(clf[i].coef_[0])                                  for i in range(len(nsample))]

for i in range(len(nsample)):
    res_df[i] = res_df[i].append(pd.DataFrame([['LASSO', clf_rmse[i], clf_tjur[i], clf_gini[i]]], 
                                              columns = res_df[i].columns))
  • QL Ash

# collapse-hide

pi =    [None for n in nsample]
alpha = [None for n in nsample]
mu =    [None for n in nsample]
s2 =    [None for n in nsample]
R =     [None for n in nsample]
L =     [None for n in nsample]

#probk = ash_qL_pi_init(nGcomp, sparsity)
for i in range(len(nsample)):
    gold_reg = np.repeat(1e4, nvar[i])
    for j, b in enumerate(beta[i]):
        if b != 0:
            gold_reg[j] = 1 / prior_variance
    #fix_reg = np.repeat(1 / sigmak2[nGcomp - 1], nvar[i])
    #fix_reg = np.repeat( 1 / np.dot(probk, sigmak2), nvar[i])
    fix_reg = np.repeat(1 / prior_variance, nvar[i])

    binit = np.zeros(nvar[i])
    Linit = fix_reg

    pi[i], alpha[i], mu[i], s2[i], R[i], L[i]  \
                             = ash_logistic_qL(X[i], Y[i], nGcomp, sigmak2, binit, Linit,
                                               tol = 1e-4, debug_step = 1, maxiter = 1000,
                                               reg_step = 10,
                                               pi0 = sparsity, sparse_init = True)

qLash_Xbeta = [np.dot(Xtest[i], R[i])            for i in range(len(nsample))]
qLash_rmse  = [np.sqrt(np.mean(np.square(qLash_Xbeta[i] - np.dot(Xtest[i], beta[i])))) 
                                                 for i in range(len(nsample))]
qLash_r2    = [tjur_R2(Xtest[i], Ytest[i], R[i]) for i in range(len(nsample))]
qLash_gini  = [gini(R[i])                        for i in range(len(nsample))]
for i in range(len(nsample)):
    res_df[i] = res_df[i].append(pd.DataFrame([['qLash', qLash_rmse[i], qLash_r2[i], qLash_gini[i]]], 
                                              columns = res_df[i].columns))
Starting ELBO: 61.1261 Pi0: 0.9500
Iteration 0. ELBO: 57.2035 Pi0: 0.8800 epsilon: 0.0700, -3.9226
Iteration 1. ELBO: 72.2507 Pi0: 0.8516 epsilon: 0.0284, 15.0472
Iteration 2. ELBO: 83.5596 Pi0: 0.8280 epsilon: 0.0236, 11.3089
Iteration 3. ELBO: 92.5864 Pi0: 0.8077 epsilon: 0.0202, 9.0268
Iteration 4. ELBO: 100.0577 Pi0: 0.7900 epsilon: 0.0178, 7.4713
Iteration 5. ELBO: 106.4002 Pi0: 0.7741 epsilon: 0.0159, 6.3425
Iteration 6. ELBO: 111.8875 Pi0: 0.7598 epsilon: 0.0143, 5.4873
Iteration 7. ELBO: 116.7056 Pi0: 0.7467 epsilon: 0.0131, 4.8181
Iteration 8. ELBO: 120.9864 Pi0: 0.7346 epsilon: 0.0121, 4.2808
Recalculating regularizer, step 9
Iteration 9. ELBO: 83.9211 Pi0: 0.7234 epsilon: 0.0112, -37.0653
Iteration 10. ELBO: 89.4515 Pi0: 0.7405 epsilon: 0.0171, 5.5305
Iteration 11. ELBO: 85.5565 Pi0: 0.7501 epsilon: 0.0096, -3.8951
Iteration 12. ELBO: 83.3611 Pi0: 0.7549 epsilon: 0.0048, -2.1954
Iteration 13. ELBO: 82.2490 Pi0: 0.7570 epsilon: 0.0027, -1.1121
Iteration 14. ELBO: 81.7565 Pi0: 0.7575 epsilon: 0.0026, -0.4926
Iteration 15. ELBO: 81.6184 Pi0: 0.7572 epsilon: 0.0024, -0.1380
Iteration 16. ELBO: 81.6874 Pi0: 0.7564 epsilon: 0.0023, 0.0690
Iteration 17. ELBO: 81.8793 Pi0: 0.7553 epsilon: 0.0022, 0.1919
Iteration 18. ELBO: 82.1446 Pi0: 0.7539 epsilon: 0.0021, 0.2653
Iteration 19. ELBO: 82.4530 Pi0: 0.7525 epsilon: 0.0020, 0.3084
Iteration 20. ELBO: 82.7855 Pi0: 0.7511 epsilon: 0.0019, 0.3325
Iteration 21. ELBO: 83.1300 Pi0: 0.7496 epsilon: 0.0018, 0.3445
Iteration 22. ELBO: 83.4785 Pi0: 0.7481 epsilon: 0.0018, 0.3485
Iteration 23. ELBO: 83.8258 Pi0: 0.7466 epsilon: 0.0017, 0.3473
Iteration 24. ELBO: 84.1683 Pi0: 0.7451 epsilon: 0.0016, 0.3425
Iteration 25. ELBO: 84.5038 Pi0: 0.7437 epsilon: 0.0016, 0.3354
Iteration 26. ELBO: 84.8307 Pi0: 0.7423 epsilon: 0.0015, 0.3269
Iteration 27. ELBO: 85.1481 Pi0: 0.7409 epsilon: 0.0015, 0.3174
Iteration 28. ELBO: 85.4555 Pi0: 0.7396 epsilon: 0.0014, 0.3074
Iteration 29. ELBO: 85.7526 Pi0: 0.7383 epsilon: 0.0014, 0.2971
Iteration 30. ELBO: 86.0393 Pi0: 0.7370 epsilon: 0.0013, 0.2867
Iteration 31. ELBO: 86.3157 Pi0: 0.7358 epsilon: 0.0013, 0.2764
Iteration 32. ELBO: 86.5820 Pi0: 0.7346 epsilon: 0.0012, 0.2663
Iteration 33. ELBO: 86.8384 Pi0: 0.7335 epsilon: 0.0012, 0.2564
Iteration 34. ELBO: 87.0851 Pi0: 0.7324 epsilon: 0.0011, 0.2467
Iteration 35. ELBO: 87.3224 Pi0: 0.7313 epsilon: 0.0011, 0.2373
Iteration 36. ELBO: 87.5507 Pi0: 0.7303 epsilon: 0.0011, 0.2283
Iteration 37. ELBO: 87.7703 Pi0: 0.7293 epsilon: 0.0010, 0.2196
Iteration 38. ELBO: 87.9815 Pi0: 0.7283 epsilon: 0.0010, 0.2112
Iteration 39. ELBO: 88.1845 Pi0: 0.7274 epsilon: 0.0009, 0.2031
Iteration 40. ELBO: 88.3798 Pi0: 0.7265 epsilon: 0.0009, 0.1953
Iteration 41. ELBO: 88.5677 Pi0: 0.7256 epsilon: 0.0009, 0.1878
Iteration 42. ELBO: 88.7483 Pi0: 0.7248 epsilon: 0.0009, 0.1807
Iteration 43. ELBO: 88.9221 Pi0: 0.7239 epsilon: 0.0008, 0.1738
Iteration 44. ELBO: 89.0893 Pi0: 0.7232 epsilon: 0.0008, 0.1672
Iteration 45. ELBO: 89.2502 Pi0: 0.7224 epsilon: 0.0008, 0.1609
Iteration 46. ELBO: 89.4049 Pi0: 0.7217 epsilon: 0.0007, 0.1548
Iteration 47. ELBO: 89.5539 Pi0: 0.7210 epsilon: 0.0007, 0.1490
Iteration 48. ELBO: 89.6973 Pi0: 0.7203 epsilon: 0.0007, 0.1434
Iteration 49. ELBO: 89.8353 Pi0: 0.7196 epsilon: 0.0007, 0.1380
Iteration 50. ELBO: 89.9682 Pi0: 0.7190 epsilon: 0.0006, 0.1329
Iteration 51. ELBO: 90.0962 Pi0: 0.7183 epsilon: 0.0006, 0.1280
Iteration 52. ELBO: 90.2194 Pi0: 0.7177 epsilon: 0.0006, 0.1232
Iteration 53. ELBO: 90.3381 Pi0: 0.7172 epsilon: 0.0006, 0.1187
Iteration 54. ELBO: 90.4524 Pi0: 0.7166 epsilon: 0.0006, 0.1143
Iteration 55. ELBO: 90.5625 Pi0: 0.7161 epsilon: 0.0005, 0.1101
Iteration 56. ELBO: 90.6686 Pi0: 0.7155 epsilon: 0.0005, 0.1061
Iteration 57. ELBO: 90.7709 Pi0: 0.7150 epsilon: 0.0005, 0.1023
Iteration 58. ELBO: 90.8694 Pi0: 0.7146 epsilon: 0.0005, 0.0985
Iteration 59. ELBO: 90.9644 Pi0: 0.7141 epsilon: 0.0005, 0.0950
Iteration 60. ELBO: 91.0559 Pi0: 0.7136 epsilon: 0.0005, 0.0916
Iteration 61. ELBO: 91.1442 Pi0: 0.7132 epsilon: 0.0004, 0.0883
Iteration 62. ELBO: 91.2293 Pi0: 0.7128 epsilon: 0.0004, 0.0851
Iteration 63. ELBO: 91.3114 Pi0: 0.7123 epsilon: 0.0004, 0.0821
Iteration 64. ELBO: 91.3905 Pi0: 0.7120 epsilon: 0.0004, 0.0791
Iteration 65. ELBO: 91.4668 Pi0: 0.7116 epsilon: 0.0004, 0.0763
Iteration 66. ELBO: 91.5404 Pi0: 0.7112 epsilon: 0.0004, 0.0736
Iteration 67. ELBO: 91.6115 Pi0: 0.7108 epsilon: 0.0004, 0.0710
Iteration 68. ELBO: 91.6800 Pi0: 0.7105 epsilon: 0.0003, 0.0685
Iteration 69. ELBO: 91.7460 Pi0: 0.7102 epsilon: 0.0003, 0.0661
Iteration 70. ELBO: 91.8098 Pi0: 0.7098 epsilon: 0.0003, 0.0638
Iteration 71. ELBO: 91.8714 Pi0: 0.7095 epsilon: 0.0003, 0.0615
Iteration 72. ELBO: 91.9307 Pi0: 0.7092 epsilon: 0.0003, 0.0594
Iteration 73. ELBO: 91.9881 Pi0: 0.7089 epsilon: 0.0003, 0.0573
Iteration 74. ELBO: 92.0434 Pi0: 0.7086 epsilon: 0.0003, 0.0553
Iteration 75. ELBO: 92.0968 Pi0: 0.7084 epsilon: 0.0003, 0.0534
Iteration 76. ELBO: 92.1483 Pi0: 0.7081 epsilon: 0.0003, 0.0515
Iteration 77. ELBO: 92.1981 Pi0: 0.7078 epsilon: 0.0003, 0.0498
Iteration 78. ELBO: 92.2461 Pi0: 0.7076 epsilon: 0.0002, 0.0480
Iteration 79. ELBO: 92.2925 Pi0: 0.7074 epsilon: 0.0002, 0.0464
Iteration 80. ELBO: 92.3373 Pi0: 0.7071 epsilon: 0.0002, 0.0448
Iteration 81. ELBO: 92.3805 Pi0: 0.7069 epsilon: 0.0002, 0.0432
Iteration 82. ELBO: 92.4223 Pi0: 0.7067 epsilon: 0.0002, 0.0418
Iteration 83. ELBO: 92.4626 Pi0: 0.7065 epsilon: 0.0002, 0.0403
Iteration 84. ELBO: 92.5015 Pi0: 0.7063 epsilon: 0.0002, 0.0389
Iteration 85. ELBO: 92.5391 Pi0: 0.7061 epsilon: 0.0002, 0.0376
Iteration 86. ELBO: 92.5755 Pi0: 0.7059 epsilon: 0.0002, 0.0363
Iteration 87. ELBO: 92.6106 Pi0: 0.7057 epsilon: 0.0002, 0.0351
Iteration 88. ELBO: 92.6445 Pi0: 0.7055 epsilon: 0.0002, 0.0339
Iteration 89. ELBO: 92.6772 Pi0: 0.7054 epsilon: 0.0002, 0.0327
Iteration 90. ELBO: 92.7088 Pi0: 0.7052 epsilon: 0.0002, 0.0316
Iteration 91. ELBO: 92.7394 Pi0: 0.7050 epsilon: 0.0002, 0.0306
Iteration 92. ELBO: 92.7689 Pi0: 0.7049 epsilon: 0.0002, 0.0295
Iteration 93. ELBO: 92.7974 Pi0: 0.7047 epsilon: 0.0002, 0.0285
Iteration 94. ELBO: 92.8250 Pi0: 0.7046 epsilon: 0.0001, 0.0276
Iteration 95. ELBO: 92.8516 Pi0: 0.7044 epsilon: 0.0001, 0.0266
Iteration 96. ELBO: 92.8774 Pi0: 0.7043 epsilon: 0.0001, 0.0257
Iteration 97. ELBO: 92.9022 Pi0: 0.7042 epsilon: 0.0001, 0.0249
Iteration 98. ELBO: 92.9263 Pi0: 0.7041 epsilon: 0.0001, 0.0240
Iteration 99. ELBO: 92.9495 Pi0: 0.7039 epsilon: 0.0001, 0.0232
Iteration 100. ELBO: 92.9719 Pi0: 0.7038 epsilon: 0.0001, 0.0224
Iteration 101. ELBO: 92.9936 Pi0: 0.7037 epsilon: 0.0001, 0.0217
Iteration 102. ELBO: 93.0146 Pi0: 0.7036 epsilon: 0.0001, 0.0210
Iteration 103. ELBO: 93.0348 Pi0: 0.7035 epsilon: 0.0001, 0.0203
Iteration 104. ELBO: 93.0544 Pi0: 0.7034 epsilon: 0.0001, 0.0196
Iteration 105. ELBO: 93.0733 Pi0: 0.7033 epsilon: 0.0001, 0.0189
Iteration 106. ELBO: 93.0916 Pi0: 0.7032 epsilon: 0.0001, 0.0183
9.718556947829748e-05
Starting ELBO: 105.4349 Pi0: 0.9500
Iteration 0. ELBO: 101.7404 Pi0: 0.8842 epsilon: 0.0658, -3.6945
Iteration 1. ELBO: 112.7592 Pi0: 0.8636 epsilon: 0.0206, 11.0188
Iteration 2. ELBO: 119.9030 Pi0: 0.8489 epsilon: 0.0147, 7.1437
Iteration 3. ELBO: 125.0468 Pi0: 0.8377 epsilon: 0.0113, 5.1438
Iteration 4. ELBO: 129.0120 Pi0: 0.8285 epsilon: 0.0092, 3.9653
Iteration 5. ELBO: 132.2176 Pi0: 0.8207 epsilon: 0.0078, 3.2055
Iteration 6. ELBO: 134.8975 Pi0: 0.8139 epsilon: 0.0068, 2.6800
Iteration 7. ELBO: 137.1931 Pi0: 0.8078 epsilon: 0.0060, 2.2955
Iteration 8. ELBO: 139.1949 Pi0: 0.8024 epsilon: 0.0055, 2.0018
Recalculating regularizer, step 9
Iteration 9. ELBO: 49.3419 Pi0: 0.7974 epsilon: 0.0050, -89.8530
Iteration 10. ELBO: 66.5506 Pi0: 0.7925 epsilon: 0.0049, 17.2087
Iteration 11. ELBO: 69.1194 Pi0: 0.7889 epsilon: 0.0037, 2.5688
Iteration 12. ELBO: 70.1870 Pi0: 0.7866 epsilon: 0.0023, 1.0676
Iteration 13. ELBO: 70.7297 Pi0: 0.7851 epsilon: 0.0015, 0.5427
Iteration 14. ELBO: 71.0743 Pi0: 0.7840 epsilon: 0.0013, 0.3446
Iteration 15. ELBO: 71.3446 Pi0: 0.7831 epsilon: 0.0013, 0.2702
Iteration 16. ELBO: 71.5870 Pi0: 0.7822 epsilon: 0.0012, 0.2424
Iteration 17. ELBO: 71.8172 Pi0: 0.7814 epsilon: 0.0011, 0.2302
Iteration 18. ELBO: 72.0391 Pi0: 0.7806 epsilon: 0.0010, 0.2220
Iteration 19. ELBO: 72.2533 Pi0: 0.7798 epsilon: 0.0009, 0.2141
Iteration 20. ELBO: 72.4589 Pi0: 0.7791 epsilon: 0.0009, 0.2056
Iteration 21. ELBO: 72.6553 Pi0: 0.7783 epsilon: 0.0008, 0.1964
Iteration 22. ELBO: 72.8421 Pi0: 0.7777 epsilon: 0.0008, 0.1869
Iteration 23. ELBO: 73.0192 Pi0: 0.7770 epsilon: 0.0007, 0.1771
Iteration 24. ELBO: 73.1866 Pi0: 0.7764 epsilon: 0.0007, 0.1673
Iteration 25. ELBO: 73.3443 Pi0: 0.7758 epsilon: 0.0006, 0.1578
Iteration 26. ELBO: 73.4928 Pi0: 0.7753 epsilon: 0.0006, 0.1484
Iteration 27. ELBO: 73.6322 Pi0: 0.7748 epsilon: 0.0005, 0.1395
Iteration 28. ELBO: 73.7631 Pi0: 0.7743 epsilon: 0.0005, 0.1309
Iteration 29. ELBO: 73.8859 Pi0: 0.7739 epsilon: 0.0005, 0.1227
Iteration 30. ELBO: 74.0009 Pi0: 0.7734 epsilon: 0.0004, 0.1150
Iteration 31. ELBO: 74.1085 Pi0: 0.7730 epsilon: 0.0004, 0.1077
Iteration 32. ELBO: 74.2093 Pi0: 0.7727 epsilon: 0.0004, 0.1008
Iteration 33. ELBO: 74.3035 Pi0: 0.7723 epsilon: 0.0004, 0.0943
Iteration 34. ELBO: 74.3917 Pi0: 0.7720 epsilon: 0.0003, 0.0881
Iteration 35. ELBO: 74.4741 Pi0: 0.7717 epsilon: 0.0003, 0.0824
Iteration 36. ELBO: 74.5511 Pi0: 0.7714 epsilon: 0.0003, 0.0770
Iteration 37. ELBO: 74.6231 Pi0: 0.7711 epsilon: 0.0003, 0.0720
Iteration 38. ELBO: 74.6904 Pi0: 0.7709 epsilon: 0.0003, 0.0673
Iteration 39. ELBO: 74.7532 Pi0: 0.7707 epsilon: 0.0002, 0.0629
Iteration 40. ELBO: 74.8120 Pi0: 0.7704 epsilon: 0.0002, 0.0587
Iteration 41. ELBO: 74.8669 Pi0: 0.7702 epsilon: 0.0002, 0.0549
Iteration 42. ELBO: 74.9182 Pi0: 0.7700 epsilon: 0.0002, 0.0513
Iteration 43. ELBO: 74.9661 Pi0: 0.7699 epsilon: 0.0002, 0.0479
Iteration 44. ELBO: 75.0108 Pi0: 0.7697 epsilon: 0.0002, 0.0448
Iteration 45. ELBO: 75.0527 Pi0: 0.7695 epsilon: 0.0002, 0.0418
Iteration 46. ELBO: 75.0917 Pi0: 0.7694 epsilon: 0.0001, 0.0391
Iteration 47. ELBO: 75.1282 Pi0: 0.7693 epsilon: 0.0001, 0.0365
Iteration 48. ELBO: 75.1624 Pi0: 0.7691 epsilon: 0.0001, 0.0341
Iteration 49. ELBO: 75.1942 Pi0: 0.7690 epsilon: 0.0001, 0.0319
Iteration 50. ELBO: 75.2240 Pi0: 0.7689 epsilon: 0.0001, 0.0298
Iteration 51. ELBO: 75.2518 Pi0: 0.7688 epsilon: 0.0001, 0.0278
Iteration 52. ELBO: 75.2778 Pi0: 0.7687 epsilon: 0.0001, 0.0260
Iteration 53. ELBO: 75.3021 Pi0: 0.7686 epsilon: 0.0001, 0.0243
9.355958308392143e-05

Results

i = 0
print (f'N = {nsample[i]}, P = {nvar[i]}')
res_df[i]
N = 50, P = 100
Method RMSE Tjur R2 Gini Index
0 Data 0.000000 0.295451 0.964241
0 LASSO 0.814003 0.413014 0.943921
0 qLash 0.531422 0.275941 0.669638
i = 1
print (f'N = {nsample[i]}, P = {nvar[i]}')
res_df[i]
N = 200, P = 100
Method RMSE Tjur R2 Gini Index
0 Data 0.000000 0.307515 0.971575
0 LASSO 2.593213 0.601840 0.870948
0 qLash 2.556943 0.628894 0.872833

# collapse-hide

fig = plt.figure(figsize = (12, 7))
ax = [fig.add_subplot(121),
      fig.add_subplot(122)]

for i, n in enumerate(nsample):
    whichX = Xtest[i].copy()
    whichY = Ytest[i].copy()

    Xbeta = np.dot(whichX, beta[i])
    pred_data = 1 / (1 + np.exp(-Xbeta))
    ax[i].scatter(Xbeta, whichY, s = 1)

    Xbeta_clf = np.dot(whichX, clf[i].coef_[0])
    pred_clf = 1 / (1 + np.exp(-Xbeta_clf))
    ax[i].scatter(Xbeta, pred_clf, s = 1, label = 'Lasso')

    Xbeta_qL = np.dot(whichX, R[i])
    pred_qL = 1 / (1 + np.exp(-Xbeta_qL))
    ax[i].scatter(Xbeta, pred_qL, s = 1, label = 'qLash')

    leg = ax[i].legend(markerscale = 10)

    ax[i].set_xlabel(r'$\sum_i X_{ni} \beta_i$')
    ax[i].set_ylabel(r'$p_n$')
    ax[i].set_title(f'N = {nsample[i]}, P = {nvar[i]}', pad = 20)
    
plt.tight_layout()
plt.show()

1. Tjur, T. (2009). Coefficients of Determination in Logistic Regression Models - A New Proposal: The Coefficient of Discrimination. The American Statistician, 63(4), 366-372. DOI: 10.1198/tast.2009.08210

2. For a review of sparsity indices, see Harley and Rickard (2009). Comparing Measures of Sparsity. arXiv. DOI: arXiv:0811.4706v2