Penalized multiple regression (Migrated from community.research.microsoft.com) RRS feed

  • Question

  • patwa posted on 08-30-2010 7:34 AM

    A hot topic in bioinformatics is penalized multiple regression where the number of explanatory variables is much larger than the number of observations (p >> n). I now wonder if it would be feasible to construct a model within Infer that could handle a data set where p would be on the order of 1 million and s on the order of 10 000. Lets say that we use a simple multiple regression model, i.e. y = my + sum(X_ij*b_j) + e . Here, y is a vector with Gaussian data, my is the mean that could be assumed to have some uninformative prior, X_ij contains indicators (-1, 0 and 1) for each of variable j and observation i. e is Gaussian noise. What I want is a prior over b_j that allows for some kind of shrinkage. One approach often used is Gibbs sampling in combination with a mixture approach (Stochastic Search Variable Selection), but this is not computationally feasible for the size mentioned here.  Another popular approach is the LASSO, there are both frequntist and Bayesian versions.  Any recommendations would be welcome.

    Friday, June 3, 2011 6:01 PM


All replies