Answered by:
Bayesian Linear Regression (Migrated from community.research.microsoft.com)

jlopes posted on 05172009 10:07 PM
Hi,
I've been trying to do some bayesian lineâr regression as a first trial but with much success:
double[,] data = new double[,] { {1,3}, {1,2.1}, {1,1.3}, {1,0.5}, {1,1.2}, {1,3.3}, {1,4.4}, {1,5.5} };
Range rows= new Range(data.GetLength(0));
Range columns = new Range(data.GetLength(1));
Variable<Matrix> x = Variable.Constant<Matrix>(new Matrix(data)).Named("x");
Variable<Vector> w = Variable.VectorGaussianFromMeanAndPrecision(new Vector(new double[]{0,0}),PositiveDefiniteMatrix.Identity(2)).Named("w");
Variable<Vector> yVar = Variable.MatrixTimesVector(x, w).Named("y");
yVar.ObservedValue = new Vector(30, 45, 40, 80, 70, 100, 130, 110);
InferenceEngine engine = new InferenceEngine(new VariationalMessagePassing());
VectorGaussian postW = engine.Infer<VectorGaussian>(w);I suspect that it doesn't work because Variable.MatrixTimesVector is not implemented yet. What would be the best way to solve this problem? Implement MatrixTimesVector, quit because it won't work at all because of..., or try to find a way arround?
Thanks,
Joao
Friday, June 3, 2011 4:54 PMOwner
Question
Answers

jlopes replied on 05192009 6:58 PM
Many thanks. That's what I meant to do in the first place but I misconceived the model.
 Marked as answer by Microsoft ResearchOwner Friday, June 3, 2011 4:54 PM
Friday, June 3, 2011 4:54 PMOwner
All replies

jlopes replied on 05182009 7:31 AM
I tried to do it as a series of innerproduct of vectors, but is also non supported. :)
Guess I will have no choice?
Friday, June 3, 2011 4:54 PMOwner 
minka replied on 05192009 1:31 AM
The problem here is not with Infer.NET, but with the Variational Message Passing algorithm and the particular model being used here. This is not really a standard linear regression model, since normally you would add Gaussian noise before observing the product. Here there is no noise, and that is the source of the problem. You are directly observing the product of two variables, which VMP cannot support. It is not a case of Infer.NET being incomplete. VMP simply does not handle the case when a derived variable is observed. You will run into this limitation no matter how you rewrite the model. So, you should either use EP or change the model to have some additional noise.
Friday, June 3, 2011 4:54 PMOwner 
minka replied on 05192009 1:38 AM
You can get some insight into why VMP breaks down here by reading my paper "Divergence measures and message passing" (http://research.microsoft.com/enus/um/people/minka/papers/messagepassing/). As shown there, VMP will not represent the posterior distribution but simply pick one possible solution and put all probability mass there. This happens due to the zeroforcing nature of the divergence being minimized. Rather than have VMP return degenerate solutions, we opted to have Infer.NET throw an exception in these cases.
Friday, June 3, 2011 4:54 PMOwner 
jwinn replied on 05192009 10:40 AM
So to build on Tom's answer, here is a solution using Vectors and InnerProduct which adds Gaussian noise:
Vector[] data = new Vector[] { new Vector( 1.0, 3 ), new Vector( 1.0, 2.1 ), new Vector( 1.0, 1.3 ), new Vector ( 1.0, 0.5 ), new Vector( 1.0, 1.2 ), new Vector( 1.0, 3.3 ), new Vector( 1.0, 4.4 ), new Vector( 1.0, 5.5 ) };
Range rows= new Range(data.Length);
VariableArray<Vector> x = Variable.Constant(data, rows).Named("x");
Variable<Vector> w = Variable.VectorGaussianFromMeanAndPrecision(new Vector(new double[] { 0, 0 }), PositiveDefiniteMatrix.Identity(2)).Named("w");
VariableArray<double> y = Variable.Array<double>(rows);
y[rows] = Variable.GaussianFromMeanAndVariance(Variable.InnerProduct(x[rows], w),1.0);
y.ObservedValue = new double[] { 30, 45, 40, 80, 70, 100, 130, 110 };
InferenceEngine engine = new InferenceEngine(new VariationalMessagePassing());
VectorGaussian postW = engine.Infer<VectorGaussian>(w);
Console.WriteLine("Posterior over the weights: "+Environment.NewLine+postW);Best,
John W. Proposed as answer by Harsh Vathsangam Tuesday, August 23, 2011 11:37 PM
Friday, June 3, 2011 4:54 PMOwner 
jlopes replied on 05192009 6:58 PM
Many thanks. That's what I meant to do in the first place but I misconceived the model.
 Marked as answer by Microsoft ResearchOwner Friday, June 3, 2011 4:54 PM
Friday, June 3, 2011 4:54 PMOwner 
Thanks for that example. I've been trying to figure out how to extend the data initialization to cases where the input dimensions are of a very large number in an elegant fashion. My motivation for this is that I have a CSV file which I load into an array. This array is of dimension > Nx4000. Manually entering these high dimensional datapoints would take a lot of time. I was not able to figure out how to use Vector[] for this. I tried to extend examples based on the factor analysis and bayes point machine examples but couldn't progress much. Any help would be greatly appreciated! :)Tuesday, August 23, 2011 11:37 PM

Not sure exactly what question you are asking here  this seems like a C# question which we woul not typically address on this forum. But something like this should do the trick:
List<Vector> dataList = new List<Vector>(); using (StreamReader sr = new StreamReader("myFileName.csv")) { string str; while ((str = sr.ReadLine()) != null) { double[] arr = str.Split(',').Select(s => double.Parse(s)).ToArray(); dataList.Add(Vector.FromArray(arr)); } } Vector[] data = dataList.ToArray();
John
Wednesday, August 31, 2011 3:22 PMOwner 
Many thanks for this, it worked. Sorry for posting a rather basic question.Thursday, September 8, 2011 11:20 PM

Would you tell me how this translates to the latest infer.net?
I tried this:
double[] input = { 3, 2.1, 1.3, 0.5, 1.2, 3.3, 4.4, 5.5}; Vector[] data = new Vector[input.Length]; for (int i = 0; i < data.Length; i++) { data[i] = Vector.FromArray(input[i]); } Range rows = new Range(data.Length); VariableArray<Vector> x = Variable.Constant(data, rows).Named("x"); Variable<Vector> w = Variable.VectorGaussianFromMeanAndPrecision(Vector.Zero(2), PositiveDefiniteMatrix.Identity(2)).Named("w"); VariableArray<double> y = Variable.Array<double>(rows); y[rows] = Variable.GaussianFromMeanAndVariance(Variable.InnerProduct(x[rows], w), 1.0); y.ObservedValue = new double[] { 30, 45, 40, 80, 70, 100, 130, 110 }; InferenceEngine engine = new InferenceEngine(new VariationalMessagePassing()); VectorGaussian postW = engine.Infer<VectorGaussian>(w); Console.WriteLine("Posterior over the weights: " + Environment.NewLine + postW);<br/>
For me this results in: 'MicrosoftResearch.Infer.Utils.AssertFailedException' occurred in Infer.Runtime.dll
Many thanks,
Mirko
Monday, October 3, 2011 2:00 PM 
Hi Mirko
Your weights are Vector variables of length 2, but your inputs are vectors of length one, and this throws the runtime exception. For example:
data[i] = Vector.FromArray(input[i], 1.0);
will add a bias input to your feature vectors whose length will then match the weights.
John
 Edited by John GuiverMicrosoft employee, Owner Monday, October 3, 2011 2:22 PM
Monday, October 3, 2011 2:20 PMOwner 
Thank you very much, John. The exception is gone and I've realised I haven't quite understood the code just yet :)Monday, October 3, 2011 2:42 PM

Would anyone please explain why the a multivariate posterior is defined for w?
What if w is defined using a variableArray, each element with a Gaussian distribution?
(then using Variable.Sum(w * data[i]) instead of innerproduct)
Tuesday, January 20, 2015 5:18 PM 
A VectorGaussian stores a full covariance matrix. A VariableArray does not. So you will get better results with the VectorGaussian.Tuesday, January 20, 2015 5:42 PMOwner

Hi All,
Am new to Bayesian Inferencing, although do have some basic understanding of probabilistic graphical models.
Can anyone please share some references to a tutorial/basic paper wrt Bayesian Linear regression being discussed in this post.
Thanks in advance,
Yogesh
Monday, September 7, 2015 1:19 PM 
Monday, September 7, 2015 1:24 PMOwner