Asked by:
Model selection in linear regression
Question

What is the most straightforward way to perform model selection in a simple linear regression coded as follows (I'm primarily interested in beta1):
Variable<double> beta0 = Variable.GaussianFromMeanAndVariance(0.0,sigsqBeta).Named("beta0");
Variable<double> beta1 = Variable.GaussianFromMeanAndVariance(0.0,sigsqBeta).Named("beta1");
Variable<double> tau = Variable.GammaFromShapeAndScale(A,1/B).Named("tau");
Range index = new Range(n).Named("index");
VariableArray<double> y = Variable.Array<double>(index).Named("y");
VariableArray<double> x = Variable.Array<double>(index).Named("x");
VariableArray<double> mu = Variable.Array<double>(index).Named("mu");
mu[index] = beta0 + beta1*x[index];
y[index] = Variable.GaussianFromMeanAndPrecision(mu[index],tau);
Thursday, June 9, 2011 9:15 AM
All replies

I'm assuming your goal to determine the hyperparameters (in your case, just sigsqBeta). If so, you can do this either using model evidence, or by splitting your data into training and validation sets. Then vary the hyperparameter. See also the final section of this example.
Model evidence can also be used in general for selection across different model structures, provided the likelihood function and the data stay the same.
Thursday, June 9, 2011 9:59 AMOwner 
My goal is to compare y = beta0 vs. y = beta0 + beta1. Is there some way to calculate Bayes Factor for the model above?Thursday, June 9, 2011 2:22 PM

The simplest way to compute a Bayes Factor for two models of the same data is to create a boolean variable that selects between the models, and put Variable.If around each model. The posterior distribution of the selector gives the Bayes Factor. See the Clinical trial tutorial for an example of this.Thursday, June 9, 2011 3:53 PM

I'm afraid I cannot follow. What variable in the Clinical trial exemple correspond to a Bayes factor?Friday, June 10, 2011 9:43 AM

In the Clinical trial example, the variable isEffective selects between the models. So its posterior distribution gives the Bayes factor. This posterior is written to the console in the example.Friday, June 10, 2011 10:50 AM

For example:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using MicrosoftResearch.Infer.Models; using MicrosoftResearch.Infer; namespace patwa1 { class Program { static void Main(string[] args) { double sigsqBeta = 1.0; double[] xobs = { 1.0, 2.0, 3.0 }; double[] yobs = { 1.0, 2.0, 3.0 }; int n = xobs.Length; double A = 1.0; double B = 1.0; Range index = new Range(n).Named("index"); Variable<bool> model1 = Variable.Bernoulli(0.5).Named("model1"); VariableArray<double> y = Variable.Array<double>(index).Named("y"); VariableArray<double> x = Variable.Array<double>(index).Named("x"); VariableArray<double> mu = Variable.Array<double>(index).Named("mu"); using (Variable.If(model1)) { var beta0 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta01"); var beta1 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta11"); var tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau1"); mu[index] = beta0 + beta1 * x[index]; y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau); } using (Variable.IfNot(model1)) { var beta0 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta02"); var tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau2"); mu[index] = beta0.ForEach(index); y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau); } x.ObservedValue = xobs; y.ObservedValue = yobs; var engine = new InferenceEngine(); Console.WriteLine(engine.Infer(model1)); } } }
Friday, June 10, 2011 11:05 AMOwner 
Great, many thanks.Friday, June 10, 2011 12:12 PM

Isn't this approach more of an indicator variable selection (related to stochastic search variable selection George, E. I. and McCulloch, R. E. (1993) instead of Bayes factor since the Bayes factor isn't restricted to the 01 interval?
Monday, June 13, 2011 12:54 PM 
The Bayes factor is the logodds of the inferred condition variable:
Console.WriteLine(engine.Infer<Bernoulli>(model1).LogOdds);
Monday, June 13, 2011 4:49 PMOwner 
I'm afraid that this yields negative estimates, Bayes factor should always be positive. Or am I wrong?
Wednesday, June 15, 2011 1:02 PM 
Sorry  this is the log of what you want. Just use Math.Exp(logBayesFactor) as in the user guideWednesday, June 15, 2011 1:14 PMOwner

OK, thanks. One more question. If I want to repeat this model on several data sets, would it be possible to parallelize it?Friday, June 17, 2011 9:52 AM

The main mechanism to do this is to take the generated inference class and add it to your project. You can then instantiate it as many times as you want. This is described in http://research.microsoft.com/infernet/docs/Using%20a%20precompiled%20inference%20algorithm.aspx.
Alternatively, you can get the compiled inference alogorithm (described in http://research.microsoft.com/infernet/docs/Controlling%20how%20inference%20is%20performed.aspx), get its type using GetType(), and then use Activator.CreateInstance() to create new instances of the generated class.
Tuesday, June 21, 2011 10:15 AMOwner 
When I run a simplified variant of this model
using System; using System.Collections.Generic; using System.Linq; using System.Text; using MicrosoftResearch.Infer.Models; using MicrosoftResearch.Infer.Factors; using MicrosoftResearch.Infer.Distributions; using MicrosoftResearch.Infer; namespace InferNetP2LinearRegression { class Program { static void Main(string[] args) { double[] xobs = { 0.2, 0.3, 0.8, 1.3, 1.8, 2.3, 2.8, 3.3, 3.8, 4.3, 4.8, 5.3}; double[] yobs = { 5.074476033, 6.600718815, 4.884130877, 4.417261879, 3.381936761, 3.97316699, 3.990442347, 4.120380425, 6.295349392, 2.835300298, 2.842412922, 3.007296355}; int n = xobs.Length; double A = 1.0; double B = 1.0; Range index = new Range(n).Named("index"); VariableArray<double> y = Variable.Array<double>(index).Named("y"); VariableArray<double> x = Variable.Array<double>(index).Named("x"); VariableArray<double> mu = Variable.Array<double>(index).Named("mu"); Variable<double> beta0 = Variable.GaussianFromMeanAndVariance(0.0, 10).Named("beta0"); Variable<double> beta1 = Variable.GaussianFromMeanAndVariance(0.0, 10).Named("beta1"); Variable<double> tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau"); mu[index] = beta0 + beta1 * x[index]; y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau); x.ObservedValue = xobs; y.ObservedValue = yobs; var engine = new InferenceEngine(); engine.Algorithm = new ExpectationPropagation(); Console.WriteLine("beta0=" + engine.Infer(beta0)); Console.WriteLine("beta1=" + engine.Infer(beta1)); Console.WriteLine("tau=" + engine.Infer(tau)); Console.ReadLine(); } } }
With VariationalMessagePassing the program works fine, but with ExpectationPropagation I get an error: . Is this an expected behavior? What is the cause of this difference?Wednesday, June 22, 2011 10:10 PM 
See the post at http://social.microsoft.com/Forums/enUS/infer.net/thread/ca430d98ec3841d9a6e31cec061d4a7a and the links therein. In this case, you can for example make the prior tighter for beta (A=30, B=30) or perhaps reorder the data to be not so structured.
John
Thursday, June 23, 2011 9:56 AMOwner 
It is expected that EP will fail to converge on some problems. However, for this particular example, it is possible to get EP to converge by using a different update schedule. If you look at the generated code or set engine.ShowSchedule=true, you will see that it is updating (beta0,beta1,tau) in parallel, leading to poor results. For the next version of Infer.NET, we have a new scheduler that avoids parallel updates, and on this example it chooses a better schedule that does converge. Using Infer.NET version 2.4, one way to change the schedule is to add dummy constraints. The following dummy constraint seems to work for this example:
Variable.ConstrainEqualRandom(mu[index], Gaussian.Uniform());
This constraint doesn't change the probabilistic model mathematically but it does change the update schedule just enough to get EP to converge. A similar scheduling issue (and fix) appeared on a previous thread.
Thursday, June 23, 2011 2:11 PM