locked
Model selection in linear regression RRS feed

  • Question

  • What is the most straightforward way to perform model selection in a simple linear regression coded as follows (I'm primarily interested in beta1):

    Variable<double> beta0 = Variable.GaussianFromMeanAndVariance(0.0,sigsqBeta).Named("beta0");

    Variable<double> beta1 = Variable.GaussianFromMeanAndVariance(0.0,sigsqBeta).Named("beta1");

    Variable<double> tau = Variable.GammaFromShapeAndScale(A,1/B).Named("tau");

     

    Range index = new Range(n).Named("index");

    VariableArray<double> y = Variable.Array<double>(index).Named("y");

    VariableArray<double> x = Variable.Array<double>(index).Named("x");

    VariableArray<double> mu = Variable.Array<double>(index).Named("mu");

    mu[index] = beta0 + beta1*x[index];

    y[index] = Variable.GaussianFromMeanAndPrecision(mu[index],tau);

     

     

     

     

    Thursday, June 9, 2011 9:15 AM

All replies

  • I'm assuming your goal to determine the hyperparameters (in your case, just sigsqBeta). If so, you can do this either using model evidence, or by splitting your data into training and validation sets. Then vary the hyperparameter. See also the final section of this example.

    Model evidence can also be used in general for selection across different model structures, provided the likelihood function and the data stay the same.

    Thursday, June 9, 2011 9:59 AM
    Owner
  • My goal is to compare y = beta0 vs. y = beta0 + beta1. Is there some way to calculate Bayes Factor for the model above?
    Thursday, June 9, 2011 2:22 PM
  • The simplest way to compute a Bayes Factor for two models of the same data is to create a boolean variable that selects between the models, and put Variable.If around each model.  The posterior distribution of the selector gives the Bayes Factor.  See the Clinical trial tutorial for an example of this.
    Thursday, June 9, 2011 3:53 PM
  • I'm afraid I cannot follow. What variable in the Clinical trial exemple correspond to a Bayes factor?
    Friday, June 10, 2011 9:43 AM
  • In the Clinical trial example, the variable isEffective selects between the models.  So its posterior distribution gives the Bayes factor.  This posterior is written to the console in the example.
    Friday, June 10, 2011 10:50 AM
  • For example:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer;
    
    namespace patwa1
    {
      class Program
      {
        static void Main(string[] args)
        {
          double sigsqBeta = 1.0;
          double[] xobs = { 1.0, 2.0, 3.0 };
          double[] yobs = { 1.0, 2.0, 3.0 };
          int n = xobs.Length;
          double A = 1.0;
          double B = 1.0;
    
          Range index = new Range(n).Named("index");
          Variable<bool> model1 = Variable.Bernoulli(0.5).Named("model1");
    
          VariableArray<double> y = Variable.Array<double>(index).Named("y");
          VariableArray<double> x = Variable.Array<double>(index).Named("x");
          VariableArray<double> mu = Variable.Array<double>(index).Named("mu");
    
          using (Variable.If(model1))
          {
            var beta0 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta01");
            var beta1 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta11");
            var tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau1");
            mu[index] = beta0 + beta1 * x[index];
            y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau);
          }
          using (Variable.IfNot(model1))
          {
            var beta0 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta02");
            var tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau2");
            mu[index] = beta0.ForEach(index);
            y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau);
          }
    
          x.ObservedValue = xobs;
          y.ObservedValue = yobs;
    
          var engine = new InferenceEngine();
          Console.WriteLine(engine.Infer(model1));
        }
      }
    }
    

     

    Friday, June 10, 2011 11:05 AM
    Owner
  • Great, many thanks.
    Friday, June 10, 2011 12:12 PM
  • Isn't this approach more of an indicator variable selection (related to stochastic search variable selection George, E. I. and McCulloch, R. E. (1993) instead of Bayes factor since the Bayes factor isn't restricted to the 0-1 interval?


    Monday, June 13, 2011 12:54 PM
  • The Bayes factor is the log-odds of the inferred condition variable:

    Console.WriteLine(engine.Infer<Bernoulli>(model1).LogOdds);
    
    Monday, June 13, 2011 4:49 PM
    Owner
  • I'm afraid that this yields negative estimates, Bayes factor should always be positive. Or am I wrong?

    Wednesday, June 15, 2011 1:02 PM
  • Sorry - this is the log of what you want. Just use Math.Exp(logBayesFactor) as in the user guide
    Wednesday, June 15, 2011 1:14 PM
    Owner
  • OK, thanks. One more question. If I want to repeat this model on several data sets, would it be possible to parallelize it? 
    Friday, June 17, 2011 9:52 AM
  • The main mechanism to do this is to take the generated inference class and add it to your project. You can then instantiate it as many times as you want. This is described in http://research.microsoft.com/infernet/docs/Using%20a%20precompiled%20inference%20algorithm.aspx.

    Alternatively, you can get the compiled inference alogorithm (described in http://research.microsoft.com/infernet/docs/Controlling%20how%20inference%20is%20performed.aspx), get its type using GetType(), and then use Activator.CreateInstance() to create new instances of the generated class.

     

    Tuesday, June 21, 2011 10:15 AM
    Owner
  • When I run a simplified variant of this model

     

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer.Factors;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer;
    
    namespace InferNetP2LinearRegression
    {
      class Program
      {
        static void Main(string[] args)
        {
          double[] xobs = { -0.2,
                    0.3,
                    0.8,
                    1.3,
                    1.8,
                    2.3,
                    2.8,
                    3.3,
                    3.8,
                    4.3,
                    4.8,
                    5.3};
          double[] yobs = {  5.074476033,
                    6.600718815,
                    4.884130877,
                    4.417261879,
                    3.381936761,
                    3.97316699,
                    3.990442347,
                    4.120380425,
                    6.295349392,
                    2.835300298,
                    2.842412922,
                    3.007296355};
          int n = xobs.Length;
          double A = 1.0;
          double B = 1.0;
    
    
    
    
    
          Range index = new Range(n).Named("index");
    
          VariableArray<double> y = Variable.Array<double>(index).Named("y");
          VariableArray<double> x = Variable.Array<double>(index).Named("x");
          VariableArray<double> mu = Variable.Array<double>(index).Named("mu");
    
    
          Variable<double> beta0 = Variable.GaussianFromMeanAndVariance(0.0, 10).Named("beta0");
          Variable<double> beta1 = Variable.GaussianFromMeanAndVariance(0.0, 10).Named("beta1");
          Variable<double> tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau");
          mu[index] = beta0 + beta1 * x[index];
    
          y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau);
    
          x.ObservedValue = xobs;
          y.ObservedValue = yobs;
    
          var engine = new InferenceEngine();
          engine.Algorithm = new ExpectationPropagation();
    
          Console.WriteLine("beta0=" + engine.Infer(beta0));
          Console.WriteLine("beta1=" + engine.Infer(beta1));
          Console.WriteLine("tau=" + engine.Infer(tau));
          Console.ReadLine();
        }
      }
    }
    

    With VariationalMessagePassing the program works fine, but with ExpectationPropagation I get an error: . Is this an expected behavior? What is the cause of this difference?

    Wednesday, June 22, 2011 10:10 PM
  • See the post at http://social.microsoft.com/Forums/en-US/infer.net/thread/ca430d98-ec38-41d9-a6e3-1cec061d4a7a and the links therein. In this case, you can for example make the prior tighter for beta (A=30, B=30) or perhaps reorder the data to be not so structured.

    John

    Thursday, June 23, 2011 9:56 AM
    Owner
  • It is expected that EP will fail to converge on some problems.  However, for this particular example, it is possible to get EP to converge by using a different update schedule.  If you look at the generated code or set engine.ShowSchedule=true, you will see that it is updating (beta0,beta1,tau) in parallel, leading to poor results.  For the next version of Infer.NET, we have a new scheduler that avoids parallel updates, and on this example it chooses a better schedule that does converge.  Using Infer.NET version 2.4, one way to change the schedule is to add dummy constraints.  The following dummy constraint seems to work for this example:
    Variable.ConstrainEqualRandom(mu[index], Gaussian.Uniform());
    This constraint doesn't change the probabilistic model mathematically but it does change the update schedule just enough to get EP to converge.  A similar scheduling issue (and fix) appeared on a previous thread.



    Thursday, June 23, 2011 2:11 PM