Model selection in linear regression

问题 Model selection in linear regression

  • 2011年6月9日 9:15
     
     

    What is the most straightforward way to perform model selection in a simple linear regression coded as follows (I'm primarily interested in beta1):

    Variable<double> beta0 = Variable.GaussianFromMeanAndVariance(0.0,sigsqBeta).Named("beta0");

    Variable<double> beta1 = Variable.GaussianFromMeanAndVariance(0.0,sigsqBeta).Named("beta1");

    Variable<double> tau = Variable.GammaFromShapeAndScale(A,1/B).Named("tau");

     

    Range index = new Range(n).Named("index");

    VariableArray<double> y = Variable.Array<double>(index).Named("y");

    VariableArray<double> x = Variable.Array<double>(index).Named("x");

    VariableArray<double> mu = Variable.Array<double>(index).Named("mu");

    mu[index] = beta0 + beta1*x[index];

    y[index] = Variable.GaussianFromMeanAndPrecision(mu[index],tau);

     

     

     

     

全部回复

  • 2011年6月9日 9:59
    所有者:
     
     

    I'm assuming your goal to determine the hyperparameters (in your case, just sigsqBeta). If so, you can do this either using model evidence, or by splitting your data into training and validation sets. Then vary the hyperparameter. See also the final section of this example.

    Model evidence can also be used in general for selection across different model structures, provided the likelihood function and the data stay the same.

  • 2011年6月9日 14:22
     
     
    My goal is to compare y = beta0 vs. y = beta0 + beta1. Is there some way to calculate Bayes Factor for the model above?
  • 2011年6月9日 15:53
     
     
    The simplest way to compute a Bayes Factor for two models of the same data is to create a boolean variable that selects between the models, and put Variable.If around each model.  The posterior distribution of the selector gives the Bayes Factor.  See the Clinical trial tutorial for an example of this.
  • 2011年6月10日 9:43
     
     
    I'm afraid I cannot follow. What variable in the Clinical trial exemple correspond to a Bayes factor?
  • 2011年6月10日 10:50
     
     
    In the Clinical trial example, the variable isEffective selects between the models.  So its posterior distribution gives the Bayes factor.  This posterior is written to the console in the example.
  • 2011年6月10日 11:05
    所有者:
     
      包含代码

    For example:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer;
    
    namespace patwa1
    {
      class Program
      {
        static void Main(string[] args)
        {
          double sigsqBeta = 1.0;
          double[] xobs = { 1.0, 2.0, 3.0 };
          double[] yobs = { 1.0, 2.0, 3.0 };
          int n = xobs.Length;
          double A = 1.0;
          double B = 1.0;
    
          Range index = new Range(n).Named("index");
          Variable<bool> model1 = Variable.Bernoulli(0.5).Named("model1");
    
          VariableArray<double> y = Variable.Array<double>(index).Named("y");
          VariableArray<double> x = Variable.Array<double>(index).Named("x");
          VariableArray<double> mu = Variable.Array<double>(index).Named("mu");
    
          using (Variable.If(model1))
          {
            var beta0 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta01");
            var beta1 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta11");
            var tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau1");
            mu[index] = beta0 + beta1 * x[index];
            y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau);
          }
          using (Variable.IfNot(model1))
          {
            var beta0 = Variable.GaussianFromMeanAndVariance(0.0, sigsqBeta).Named("beta02");
            var tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau2");
            mu[index] = beta0.ForEach(index);
            y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau);
          }
    
          x.ObservedValue = xobs;
          y.ObservedValue = yobs;
    
          var engine = new InferenceEngine();
          Console.WriteLine(engine.Infer(model1));
        }
      }
    }
    

     

  • 2011年6月10日 12:12
     
     
    Great, many thanks.
  • 2011年6月13日 12:54
     
     

    Isn't this approach more of an indicator variable selection (related to stochastic search variable selection George, E. I. and McCulloch, R. E. (1993) instead of Bayes factor since the Bayes factor isn't restricted to the 0-1 interval?


  • 2011年6月13日 16:49
    所有者:
     
      包含代码

    The Bayes factor is the log-odds of the inferred condition variable:

    Console.WriteLine(engine.Infer<Bernoulli>(model1).LogOdds);
    
  • 2011年6月15日 13:02
     
     

    I'm afraid that this yields negative estimates, Bayes factor should always be positive. Or am I wrong?

  • 2011年6月15日 13:14
    所有者:
     
     
    Sorry - this is the log of what you want. Just use Math.Exp(logBayesFactor) as in the user guide
  • 2011年6月17日 9:52
     
     
    OK, thanks. One more question. If I want to repeat this model on several data sets, would it be possible to parallelize it? 
  • 2011年6月21日 10:15
    所有者:
     
     

    The main mechanism to do this is to take the generated inference class and add it to your project. You can then instantiate it as many times as you want. This is described in http://research.microsoft.com/infernet/docs/Using%20a%20precompiled%20inference%20algorithm.aspx.

    Alternatively, you can get the compiled inference alogorithm (described in http://research.microsoft.com/infernet/docs/Controlling%20how%20inference%20is%20performed.aspx), get its type using GetType(), and then use Activator.CreateInstance() to create new instances of the generated class.

     

  • 2011年6月22日 22:10
     
      包含代码

    When I run a simplified variant of this model

     

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer.Factors;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer;
    
    namespace InferNetP2LinearRegression
    {
      class Program
      {
        static void Main(string[] args)
        {
          double[] xobs = { -0.2,
                    0.3,
                    0.8,
                    1.3,
                    1.8,
                    2.3,
                    2.8,
                    3.3,
                    3.8,
                    4.3,
                    4.8,
                    5.3};
          double[] yobs = {  5.074476033,
                    6.600718815,
                    4.884130877,
                    4.417261879,
                    3.381936761,
                    3.97316699,
                    3.990442347,
                    4.120380425,
                    6.295349392,
                    2.835300298,
                    2.842412922,
                    3.007296355};
          int n = xobs.Length;
          double A = 1.0;
          double B = 1.0;
    
    
    
    
    
          Range index = new Range(n).Named("index");
    
          VariableArray<double> y = Variable.Array<double>(index).Named("y");
          VariableArray<double> x = Variable.Array<double>(index).Named("x");
          VariableArray<double> mu = Variable.Array<double>(index).Named("mu");
    
    
          Variable<double> beta0 = Variable.GaussianFromMeanAndVariance(0.0, 10).Named("beta0");
          Variable<double> beta1 = Variable.GaussianFromMeanAndVariance(0.0, 10).Named("beta1");
          Variable<double> tau = Variable.GammaFromShapeAndScale(A, 1 / B).Named("tau");
          mu[index] = beta0 + beta1 * x[index];
    
          y[index] = Variable.GaussianFromMeanAndPrecision(mu[index], tau);
    
          x.ObservedValue = xobs;
          y.ObservedValue = yobs;
    
          var engine = new InferenceEngine();
          engine.Algorithm = new ExpectationPropagation();
    
          Console.WriteLine("beta0=" + engine.Infer(beta0));
          Console.WriteLine("beta1=" + engine.Infer(beta1));
          Console.WriteLine("tau=" + engine.Infer(tau));
          Console.ReadLine();
        }
      }
    }
    

    With VariationalMessagePassing the program works fine, but with ExpectationPropagation I get an error: . Is this an expected behavior? What is the cause of this difference?

  • 2011年6月23日 9:56
    所有者:
     
     

    See the post at http://social.microsoft.com/Forums/en-US/infer.net/thread/ca430d98-ec38-41d9-a6e3-1cec061d4a7a and the links therein. In this case, you can for example make the prior tighter for beta (A=30, B=30) or perhaps reorder the data to be not so structured.

    John

  • 2011年6月23日 14:11
     
      包含代码
    It is expected that EP will fail to converge on some problems.  However, for this particular example, it is possible to get EP to converge by using a different update schedule.  If you look at the generated code or set engine.ShowSchedule=true, you will see that it is updating (beta0,beta1,tau) in parallel, leading to poor results.  For the next version of Infer.NET, we have a new scheduler that avoids parallel updates, and on this example it chooses a better schedule that does converge.  Using Infer.NET version 2.4, one way to change the schedule is to add dummy constraints.  The following dummy constraint seems to work for this example:
    Variable.ConstrainEqualRandom(mu[index], Gaussian.Uniform());
    This constraint doesn't change the probabilistic model mathematically but it does change the update schedule just enough to get EP to converge.  A similar scheduling issue (and fix) appeared on a previous thread.