none
Bayesian Linear Regression (Migrated from community.research.microsoft.com)

    Question

  • jlopes posted on 05-17-2009 10:07 PM

    Hi,

    I've been trying to do some bayesian lineâr regression as a first trial but with much success:

     double[,] data = new double[,] { {1,-3}, {1,-2.1}, {1,-1.3}, {1,0.5}, {1,1.2}, {1,3.3}, {1,4.4}, {1,5.5} };           
                Range rows= new Range(data.GetLength(0));
                Range columns = new Range(data.GetLength(1));
                Variable<Matrix> x = Variable.Constant<Matrix>(new Matrix(data)).Named("x");          
                Variable<Vector> w = Variable.VectorGaussianFromMeanAndPrecision(new Vector(new double[]{0,0}),PositiveDefiniteMatrix.Identity(2)).Named("w");
                Variable<Vector> yVar = Variable.MatrixTimesVector(x, w).Named("y");
                yVar.ObservedValue = new Vector(30, 45, 40, 80, 70, 100, 130, 110);
                InferenceEngine engine = new InferenceEngine(new VariationalMessagePassing());
                VectorGaussian postW = engine.Infer<VectorGaussian>(w);

     

    I suspect that it doesn't work because Variable.MatrixTimesVector is not implemented yet. What would be the best way to solve this problem? Implement MatrixTimesVector, quit because it won't work at all because of..., or try to find a way arround?

     

    Thanks,

    Joao

     

     

    Friday, June 03, 2011 4:54 PM
    Owner

Answers

  • jlopes replied on 05-19-2009 6:58 PM

    Many thanks. That's what I meant to do in  the first place but I misconceived the model.

     

     

    Friday, June 03, 2011 4:54 PM
    Owner

All replies

  • jlopes replied on 05-18-2009 7:31 AM

    I tried to do it as a series of innerproduct of vectors, but is also non supported. :)

    Guess I will have no choice?

     

    Friday, June 03, 2011 4:54 PM
    Owner
  • minka replied on 05-19-2009 1:31 AM

    The problem here is not with Infer.NET, but with the Variational Message Passing algorithm and the particular model being used here.  This is not really a standard linear regression model, since normally you would add Gaussian noise before observing the product.  Here there is no noise, and that is the source of the problem.  You are directly observing the product of two variables, which VMP cannot support. It is not a case of Infer.NET being incomplete.  VMP simply does not handle the case when a derived variable is observed.  You will run into this limitation no matter how you rewrite the model.  So, you should either use EP or change the model to have some additional noise.

    Friday, June 03, 2011 4:54 PM
    Owner
  • minka replied on 05-19-2009 1:38 AM

    You can get some insight into why VMP breaks down here by reading my paper "Divergence measures and message passing" (http://research.microsoft.com/en-us/um/people/minka/papers/message-passing/).  As shown there, VMP will not represent the posterior distribution but simply pick one possible solution and put all probability mass there.  This happens due to the zero-forcing nature of the divergence being minimized.  Rather than have VMP return degenerate solutions, we opted to have Infer.NET throw an exception in these cases.

    Friday, June 03, 2011 4:54 PM
    Owner
  • jwinn replied on 05-19-2009 10:40 AM

    So to build on Tom's answer, here is a solution using Vectors and InnerProduct which adds Gaussian noise:

    Vector[] data = new Vector[] { new Vector( 1.0, -3 ), new Vector( 1.0, -2.1 ), new Vector( 1.0, -1.3 ), new Vector ( 1.0, 0.5 ), new Vector( 1.0, 1.2 ), new Vector( 1.0, 3.3 ), new Vector( 1.0, 4.4 ), new Vector( 1.0, 5.5 ) };
    Range rows= new Range(data.Length);
    VariableArray<Vector> x = Variable.Constant(data, rows).Named("x");
    Variable<Vector> w = Variable.VectorGaussianFromMeanAndPrecision(new Vector(new double[] { 0, 0 }), PositiveDefiniteMatrix.Identity(2)).Named("w");
    VariableArray<double> y = Variable.Array<double>(rows);
    y[rows] =
    Variable.GaussianFromMeanAndVariance(Variable.InnerProduct(x[rows], w),1.0);
    y.ObservedValue = new double[] { 30, 45, 40, 80, 70, 100, 130, 110 };
    InferenceEngine engine = new InferenceEngine(new VariationalMessagePassing());
    VectorGaussian postW = engine.Infer<VectorGaussian>(w);
    Console.WriteLine("Posterior over the weights: "+Environment.NewLine+postW);

    Best,
    John W.

    Friday, June 03, 2011 4:54 PM
    Owner
  • jlopes replied on 05-19-2009 6:58 PM

    Many thanks. That's what I meant to do in  the first place but I misconceived the model.

     

     

    Friday, June 03, 2011 4:54 PM
    Owner
  • Thanks for that example. I've been trying to figure out how to extend the data initialization to cases where the input dimensions are of a very large number in an elegant fashion. My motivation for this is that I have a CSV file which I load into an array. This array is of dimension > Nx4000. Manually entering these high dimensional datapoints would take a lot of time. I was not able to figure out how to use Vector[] for this. I tried to extend examples based on the factor analysis and bayes point machine examples but couldn't progress much. Any help would be greatly appreciated! :)
    Tuesday, August 23, 2011 11:37 PM
  • Not sure exactly what question you are asking here - this seems like a C# question which we woul not typically address on this forum. But something like this should do the trick:

    List<Vector> dataList = new List<Vector>();
    
    using (StreamReader sr = new StreamReader("myFileName.csv"))
    {
    	string str;
    	while ((str = sr.ReadLine()) != null)
    	{
    		double[] arr = str.Split(',').Select(s => double.Parse(s)).ToArray();
    		dataList.Add(Vector.FromArray(arr));
    	}
    }
    Vector[] data = dataList.ToArray();
    

    John

    Wednesday, August 31, 2011 3:22 PM
    Owner
  • Many thanks for this, it worked. Sorry for posting a rather basic question.
    Thursday, September 08, 2011 11:20 PM
  • Would you tell me how this translates to the latest infer.net?

    I tried this:

    double[] input = { -3, -2.1, -1.3, 0.5, 1.2, 3.3, 4.4, 5.5};
    Vector[] data = new Vector[input.Length];
    for (int i = 0; i < data.Length; i++)
    {
        data[i] = Vector.FromArray(input[i]);
    }
    Range rows = new Range(data.Length);
    VariableArray<Vector> x = Variable.Constant(data, rows).Named("x");
    Variable<Vector> w = Variable.VectorGaussianFromMeanAndPrecision(Vector.Zero(2), PositiveDefiniteMatrix.Identity(2)).Named("w");
    VariableArray<double> y = Variable.Array<double>(rows);
    y[rows] = Variable.GaussianFromMeanAndVariance(Variable.InnerProduct(x[rows], w), 1.0);
    y.ObservedValue = new double[] { 30, 45, 40, 80, 70, 100, 130, 110 };
    InferenceEngine engine = new InferenceEngine(new VariationalMessagePassing());
    VectorGaussian postW = engine.Infer<VectorGaussian>(w);
    Console.WriteLine("Posterior over the weights: " + Environment.NewLine + postW);<br/>
    

     

    For me this results in: 'MicrosoftResearch.Infer.Utils.AssertFailedException' occurred in Infer.Runtime.dll

    Many thanks,

    Mirko

    Monday, October 03, 2011 2:00 PM
  • Hi Mirko

    Your weights are Vector variables of length 2, but your inputs are vectors of length one, and this throws the runtime exception. For example:

    data[i] = Vector.FromArray(input[i], 1.0);

    will add a bias input to your feature vectors whose length will then match the weights.

    John



    Monday, October 03, 2011 2:20 PM
    Owner
  • Thank you very much, John. The exception is gone and I've realised I haven't quite understood the code just yet :-)
    Monday, October 03, 2011 2:42 PM