Title: | Shrinkage for Effect Estimation |
---|---|
Description: | Computes shrinkage estimators for regression problems. Selects penalty parameter by minimizing bias and variance in the effect estimate, where bias and variance are estimated from the posterior predictive distribution. See Keller and Rice (2017) <doi:10.1093/aje/kwx225> for more details. |
Authors: | Joshua Keller |
Maintainer: | Joshua Keller <[email protected]> |
License: | GPL (>=2) |
Version: | 0.1.2 |
Built: | 2025-03-08 02:44:36 UTC |
Source: | https://github.com/kpkeller/eshrink |
Computes shrinkage estimators for regression problems. Selects penalty parameter by minimizing bias and variance in the effect estimate, where bias and variance are estimated from the posterior predictive distribution. See Keller and Rice (2017) <doi:10.1093/aje/kwx225> for more details.
Compute confidence intervals by 'inverting the test' to determine if a given value should lie in the confidence region.
check_CIbound( testBeta, obsEst, type = c("ridge", "lasso"), postParam, lambdaseq, X, nPost, ind = 1, Bstar = 100, B = 500, loss = "fMBV", lowerBound = TRUE, reproducible = TRUE, alpha = 0.025, returnDist = FALSE, ... ) invertTest( interval, type = "ridge", lower.interval = interval, upper.interval = interval, ..., tol = 0.005, fulldetail = FALSE )
check_CIbound( testBeta, obsEst, type = c("ridge", "lasso"), postParam, lambdaseq, X, nPost, ind = 1, Bstar = 100, B = 500, loss = "fMBV", lowerBound = TRUE, reproducible = TRUE, alpha = 0.025, returnDist = FALSE, ... ) invertTest( interval, type = "ridge", lower.interval = interval, upper.interval = interval, ..., tol = 0.005, fulldetail = FALSE )
testBeta |
Candidate value of beta to test. |
obsEst |
Estimate of beta from the observed data for which a confidence interval is desired |
type |
String indicating "ridge" or "LASSO". |
postParam |
List of parameters for the posterior distribution of beta. See |
lambdaseq |
Sequence of penalty values |
X |
deisgn matrix |
nPost |
Number of posterior samples to use. |
ind |
Index of parameter to test. Defaults to 1. |
Bstar |
Number of estimators to compute for comparison distribution. Larger values improve the precision of the procedure but increase computational cost. |
B |
Passed to |
loss |
Either |
lowerBound |
Logical indicating that the test is for a lower bound |
reproducible |
Should the simulated datasets be reproducible? |
alpha |
Percentile of the distribution to compare against. See details. |
returnDist |
If TRUE, then distribution of estimates generated is returned
instead of comparison against |
... |
In |
interval |
Interval to check. Used for both upper and lower bound, if they are not provided |
lower.interval , upper.interval
|
Bounding intervals over which to check for lower and upper endpoints of CI |
tol |
Passed to |
fulldetail |
If TRUE, then output from |
This function is used as part of an 'inverting the test' approach to generate confidence intervals for estimators from festRidge
. Bstar
datasets are generated from slices of the posterior distribution of the model parameters where beta (or other parameter indicated by ind
) is fixed at the value testBeta
. For each dataset, beta is estimated via festRidge
or festLASSO
, and the resulting distribution of estimators is compared against the estimate from the observed data (obsEst
).
The values of lambdaseq
, X
, nPost
, and loss
are passed to festRidge
or festLASSO
and typically match the values that were used to compute obsEst
.
The computational cost of this function is most affected by the values of
nPost
and Bstar
. Large values of the latter are important for
adequately representing the distribution of parameter estimates. In some
settings, nPost
can be reduced without substantially impacting
the results. However, each dataset is likely to be different.
Joshua Keller
Computes a vector of regression coefficients for a provided ridge penalty.
estRidge(lambda, X, y, penalize, XtX = crossprod(X))
estRidge(lambda, X, y, penalize, XtX = crossprod(X))
lambda |
ridge penalty factor |
X |
design matrix for the regression. |
y |
outcome vector. Unless |
penalize |
vector giving penalty structure. Values must be in [0, 1]. See Details for more information. |
XtX |
(optional) cross product of the design matrix. If running simulations or
other procedure for identical |
The input penalize
is a vector of ridge penalty factors,
such that the penalty for covariate j is lambda*penalize[j]
.
Although its primary purpose is for indicating which variables to penalize (1)
and which to not penalize (0), fractional values between 0 and 1 are accepted.
Defaults to c(0, rep(1, p-1)), where
p is number of columns in X (this penalizes all coefficients but
the first).
The design matrix X
is assumed to contain only numeric values, so
any factors should be coded according to desired contrast (e.g., via model.matrix
)
Joshua Keller
Computes a ridge or LASSO estimate for a given regression problem, with penalty parameter chosen to minimize bias and variance.
festLASSO( X, y, loss = c("fMSE", "fMBV", "both"), ind = 1, lseq, B = 500, penalize, rescale.lambda = TRUE, scale = FALSE, returnMSE = FALSE, postsamp, returnPS = FALSE, nPost = 1000, se.version = c("varExp", "full", "none"), ... ) festRidge( X, y, loss = c("fMSE", "fMBV", "both"), ind = 1, lseq, penalize, scale = FALSE, returnMSE = FALSE, postsamp, returnPS = FALSE, nPost = 1000, se.version = c("varExp", "full", "none"), XtXlamIinv = NULL, ... )
festLASSO( X, y, loss = c("fMSE", "fMBV", "both"), ind = 1, lseq, B = 500, penalize, rescale.lambda = TRUE, scale = FALSE, returnMSE = FALSE, postsamp, returnPS = FALSE, nPost = 1000, se.version = c("varExp", "full", "none"), ... ) festRidge( X, y, loss = c("fMSE", "fMBV", "both"), ind = 1, lseq, penalize, scale = FALSE, returnMSE = FALSE, postsamp, returnPS = FALSE, nPost = 1000, se.version = c("varExp", "full", "none"), XtXlamIinv = NULL, ... )
X |
Design matrix for the regression. Assumed to contain only numeric values, so
any factors should be coded according to desired contrast (e.g., via |
y |
Outcome vector. Unless |
loss |
Loss function for choosing the penalty parameter. See details. |
ind |
Vector of integers or logicals indicating which coefficients the loss is to be computed on. |
lseq |
Sequence of penalty values to consider. |
B |
Number of future datasets to simulate for each point in posterior sample. |
penalize |
See |
rescale.lambda |
If |
scale |
Logical indicating whether the design matrix X be scaled. See details. |
returnMSE |
Logical indicating whether mse object should be returned. |
postsamp |
List containing posterior sample (from |
returnPS |
logical indicating whether or not the full posterior sample should be included in output. |
nPost |
Size of posterior sample to compute |
se.version |
String indicating which version of standard errors to use. See |
... |
Other arguments passed to |
XtXlamIinv |
explicit value of (X'X + diag(penalty))^-1. Useful for simulations to save computation. |
The value of the ridge or LASSO penalty is selected by minimizing the
posterior expectation of the loss function, which is chosen by the argument
loss
. Possible options are fMBV
, which uses the loss function
and
fMSE
, which uses the loss function
.
To balance the influence of covariates, it is recommended
that the design matrix be standardized. This can be done by
the user or via the argument scale
. If scale=TRUE
,
then coefficient and standard error estimates are back-transformed.
Use the XtXlamIinv
argument with caution. No checks are done on the provided
value. Note that lseq
is re-ordered to be decreasing, and provided values
of XtXlamIinv
must account for this.
mseRidge,vcovfestRidge, simLASSO, check_CIbound
Computes the analytic mean-squared error (MSE), bias, and
variance for ridge regression estimators given different
values of the true beta
and sigma2
parameters.
mseRidge(lambda, XtX, beta, sigma2, penalize, ind = 1, XtXlamIinv = NULL) biasRidge(lambda, XtX, beta, penalize, ind = 1, XtXlamIinv = NULL) varRidge(lambda, XtX, sigma2 = 1, penalize, ind = 1, XtXlamIinv = NULL)
mseRidge(lambda, XtX, beta, sigma2, penalize, ind = 1, XtXlamIinv = NULL) biasRidge(lambda, XtX, beta, penalize, ind = 1, XtXlamIinv = NULL) varRidge(lambda, XtX, sigma2 = 1, penalize, ind = 1, XtXlamIinv = NULL)
lambda |
penalty parameter value. For |
XtX |
Cross product of design matrix. Not needed if |
beta |
True parameter values. Either a vector of length |
sigma2 |
Value of the variance parameter |
penalize |
Vector of penalty factors. See |
ind |
Numerical or logical vector indicating which elements of the bias vector and variance matrix should be returned. Defaults to the first element. |
XtXlamIinv |
Optional explicit value of |
The computations assume that all covariates are correctly included in the mean model and bias is due only to penalization. The bias is given by:
where . The variance is given by:
If beta
is provided as a matrix, this will treat
each column of beta
as a different true parameter vector
and return a matrix of bias values (or a vector, if ind
has length 1).
Providing a pre-computed value of XtXlamIinv
can reduce the computational
cost in simulations. However, the user is responsible for assuring that the value
of lambda
provided matches the value used to compute XtXlamIinv
.
For mseRidge
, a list containing the variance, bias, and MSE. For biasRidge
and varRidge
, a matrix is returned.
Joshua Keller
Draws a sample from the posterior distribution of parameters from a Bayesian Linear regression model.
samplePosterior( X, y, n, a0 = 1, b0 = 5e-05, v0inv = 1/1000, mu0 = 0, returnParams = TRUE, intercept = FALSE )
samplePosterior( X, y, n, a0 = 1, b0 = 5e-05, v0inv = 1/1000, mu0 = 0, returnParams = TRUE, intercept = FALSE )
X |
Design matrix of size |
y |
Outcome variable |
n |
Size of posterior sample to be computed. A value of 0 is accepted. |
a0 , b0
|
Hyperparameters (shape, rate) for inverse gamma distribution of the error variance. |
v0inv |
Prior precision for the error term. Either a single value
to be repeated in a diagonal precision matrix, or a |
mu0 |
Prior mean. Either a single value that will be repeated,
or a vector of length |
returnParams |
Logical indicating whether the parameters of the posterior distribution are returned. |
intercept |
Logical indicating whether an intercept is included in the model.
If |
This function draws a sample from the posterior distributions of the coefficient parameter () and error variance parameter (
) from a Bayesian linear regression model. The variance parameter is assumed to follow an inverse-gamma distribution. Conditional on the error variance parameter and a specified precision matrix, the coefficient parameters (
) are multivariate normal.
A list containing the following elements:
sigma2 |
A vector containing the posterior sample of |
beta |
Matrix containing the posterior sample of |
postMu |
Vector containing the posterior mean (if |
postV |
Matrix giving the posterior mean (if |
an , bn
|
Posterior hyperparameters for the inverse gamma
distribution of the error variance (if |
Joshua Keller
x <- rnorm(40, mean=2, sd=2) y <- x + rnorm(40, sd=1) samplePosterior(X=x, y=y, n=10) samplePosterior(X=cbind(1, x), y=y, n=10, intercept=TRUE) samplePosterior(X=cbind(1, x), y=y, n=0, mu=c(3, 3), intercept=TRUE)
x <- rnorm(40, mean=2, sd=2) y <- x + rnorm(40, sd=1) samplePosterior(X=x, y=y, n=10) samplePosterior(X=cbind(1, x), y=y, n=10, intercept=TRUE) samplePosterior(X=cbind(1, x), y=y, n=0, mu=c(3, 3), intercept=TRUE)
Simulates data from a regression model and computes the lasso estimate for this data.
simLASSO(lambda, X, beta, sigma, penalize, rescale.lambda = TRUE, ind = 1)
simLASSO(lambda, X, beta, sigma, penalize, rescale.lambda = TRUE, ind = 1)
lambda |
Penalty factor to be applied |
X |
Design matrix of regression problem |
beta |
true value of parameter vector to simulate from |
sigma |
true value of square root of variance parameter for simulating. |
penalize |
Vector giving penalty structure. Supplied to glmnet as ' |
rescale.lambda |
Should lambda be rescaled to account for the default re-scaling done by glmnet? |
ind |
Index of coefficient to be returned. Value of 0 implies all coefficients (i.e. the full parameter vector estimate) |
Simulates data from a regression model with true
coefficient parameter beta
and normal errors with
standard deviation sigma
. Computes the LASSO
estimate for the coefficient vector using the glmnet
function from the package of the same name.
Joshua Keller
Computes an estimate of the variance-covariance matrix for an 'fLoss' ridge estimator
vcovfestRidge( fLoss, lambda, XtX, postBeta, postSigma2, penalize, ind = 1, version = c("varExp", "full") )
vcovfestRidge( fLoss, lambda, XtX, postBeta, postSigma2, penalize, ind = 1, version = c("varExp", "full") )
fLoss |
A matrix of loss function values to be minimized. Assumed structure is the columns correspond to different values of penalty parameter and rows correspond to points in a posterior sample of (beta, sigma). |
lambda |
The sequence of penalty parameter values
corresponding to the columns of |
XtX |
Cross product of the design matrix. |
postBeta |
Matrix containing the posterior sample of beta values. Assumed to be n-by-p, where n is number of samples (and equal to number of rows in fLoss) and p is number of regression parameters in model. |
postSigma2 |
Vector containing the posterior sample of variance parameters. Should have same length as postBeta. |
penalize |
Vector indicating which variables are
penalized in the regression model. See details for
|
ind |
Numerical or logical vector indicating which elements of the variance matrix should be returned. Defaults to the (1,1) element |
version |
Character string indicating which version of standard error to compute. 'varExp' or 'full', corresponding to the variance of the conditional mean of the estimator or that plus the expected value of the conditional variance. In practice, the latter is often too large. |
Computes a standard error estimate for an 'fLoss' estimator, where 'fLoss' is typically fMSE or fMBV. Approximates the variance of the estimator using the the variance conditional on the observed data (i.e. using the posterior distribution of parameters). Currently, two different versions are available.
Joshua Keller