Answered by:
How to include more outcomes for BPM?
Question

How can I modify the Tutorial 4: Bayes Point Machine to include more outcomes?
For example, how can I add willRent and get the probability for both?
double[] incomes = { 63, 16, 28, 55, 22, 20 }; double[] ages = { 38, 23, 40, 27, 18, 40 }; bool[] willBuy = { true, false, true, true, false, false }; bool[] willRent = { false, false, true, false, true, false };
I figured out how to add more metrics (other than incomes and ages) but I can't add more outcomes.
Thank you for your help.
Sunday, November 11, 2012 8:29 PM
Answers

Note that although you are specifying three classes, your training file only gives examples for two classes (i.e. you only have a 1 or a 0 in the first column).
The results are telling you that the probability of class 1 (= true in the tutorial) is 0.8843, 0.4522, 0.3073 for the three examples. This differs from the tutorial because formulation of the model is slightly different (the multiclass has weights for each class and uses an argmax factor), and therefore, because of the very small amount of data, the priors on the weights have a large affect on the answers.
To understand the effect of the priors, let's simplify things by changing nClass = 3 to nClass = 2. For this two class problem, the classifier will need to learn a bias which centralizes the data around 0 so that inner products to one side of 0 are one class, and inner products to the other side of 0 are the other class. Given that the data values are large and positive, the bias weight will need to be very large compared to the weights for the other features so as to centralize the data. But this consideration is not relected in the priors which treats all weights the same. The small quantity of data is not sufficient to overcome the effect of this poor prior.
In general when you are using BPM for real, the best thing is to standardize the data so the values are centred around 0, and rougly of scale 1. Then the default weight priors will be more sensible. You might want to try this to see the effect on the predictions. In addition, the more data you have, the less the effect of the prior. The moral of the story is that you need to think carefully about priors when data is limited, and this requires an understanding of the model.
John
 Marked as answer by Atarax Wednesday, November 21, 2012 4:19 PM
Wednesday, November 21, 2012 9:26 AMOwner
All replies

I more extensive Bayes Point Machine is documented at http://research.microsoft.com/infernet/docs/Multiclass%20classification.aspx and the code can be found in the Samples\C#\BayesPointMachine folder.
 Edited by John GuiverMicrosoft employee, Owner Monday, November 12, 2012 9:11 AM
Monday, November 12, 2012 9:09 AMOwner 
Please help me understand the BPM example:
static void Main() { int nClass = 3; int totalFeatures = 4; double noisePrec = 0.1; string trainingFile = @"..\..\data\data.txt"; Vector[] testData = new Vector[2]; testData[0] = Vector.FromArray(new double[] { 2.1, 0, 0, 0 }); testData[1] = Vector.FromArray(new double[] { 0, 0, 1.3, 0 }); Test_BPM(nClass, totalFeatures, noisePrec, trainingFile, testData); } private static void Test_BPM(int nClass, int totalFeatures, double noisePrec, string fileName, Vector[] testData) { Console.WriteLine("\n BPM "); List<Vector>[] data = DataFromFile.Read(fileName, nClass); BPM bpm = new BPM(nClass, totalFeatures, noisePrec); bpm.TrainingEngine.ShowProgress = false; bpm.TestEngine.ShowProgress = false; VectorGaussian[] wInfer = bpm.Train(data); Discrete[] predictions = bpm.Test(testData); Console.WriteLine("\nPredictions:"); foreach (Discrete pred in predictions) Console.WriteLine(pred); Console.WriteLine(); }
The tutorial #4 is clear on what you're inferring but here it's not so clear. For example on the tutorial we have the outcomes for the data points (willBuy). Where on the BPM example is that set? What is being predicted?
Thanks,
Wednesday, November 14, 2012 11:21 PM 
Each datum/row in the file is assumed to consist of the 0based class index followed the intput vector (tab, space or comma delimited).
The read method reads this file and returns an array of Vector lists. This array is indexed by the 0based class index. So the list of input vectors for class 0 is the list data[0], the list of input vectors for class 1 is the list data[1], etc.
Referring back to tutorial 4, the data in the file would be (using 0 for false and 1 for true):
1, 63, 38
0, 16, 23
1, 28, 40
1, 55, 27
0, 22, 18
0, 20, 40and the read method would create two lists:
data[0]: {63, 38}, {28, 40}, {55, 27}
data[1]: {16, 23}, {22, 18}, {20, 40}John
Thursday, November 15, 2012 9:15 AMOwner 
Thank you for your help,
I modified the BPM example to use the tutorial 4's data but I'm still getting errors. I modified the data file to have the following values:
1,63,38
0,16,23
1,28,40
1,55,27
0,22,18
0,20,40and the program to:
static void Main() { int nClass = 2; int totalFeatures = 4; double noisePrec = 0.1; string trainingFile = @"..\..\data\data.txt"; double[] incomesTest = { 58, 18, 22 }; double[] agesTest = { 36, 24, 37 }; Vector[] testData = new Vector[2]; testData[0] = Vector.FromArray(incomesTest); testData[1] = Vector.FromArray(agesTest); Test_BPM(nClass, totalFeatures, noisePrec, trainingFile, testData); }
What should "totalFeatures" be?
Is this the right way to setup the testData?
Anything else that might be wrong?
btw, the error I'm getting is: Vectors have different size, Parameter name: that
Thursday, November 15, 2012 6:24 PM 
In the code that you show, the number of features is 2 (given the earlier discussion about the training data. But your test vectors are of length 3. So your test data should be:
Vector[] testData = new Vector[3];
testData[0] = {58, 36};
testData[1] = {18, 24};
testData[2] = {22, 37};However, to truly match the Tutorial example, you should append a 1 to all vectors making them of length 3 (totalFeatures=3). So the data in your training file would be:
1,63,38,1
0,16,23,1
1,28,40,1
1,55,27,1
0,22,18,1
0,20,40,1and your test data:
58, 36,1
18, 24,1
22, 37,1Tuesday, November 20, 2012 5:58 PMOwner 
It's not giving me an error but now I don't know how to read the output compared to the tutorial's:
Predictions:
Discrete(0.09839 0.8843 0.01729)
Discrete(0.5294 0.4522 0.01836)
Discrete(0.6677 0.3073 0.02508)The output from the tutorial is:
output=
[0] Bernoulli(0.9555)
[1] Bernoulli(0.1565)
[2] Bernoulli(0.287)Here's the code:
int nClass = 3; int totalFeatures = 3; double noisePrec = 0.1; string trainingFile = @"..\..\data\data.txt"; Vector[] testData = new Vector[3]; testData[0] = Vector.FromArray(new double[] { 58, 36, 1 }); testData[1] = Vector.FromArray(new double[] { 18, 24, 1 }); testData[2] = Vector.FromArray(new double[] { 22, 37, 1 }); Test_BPM(nClass, totalFeatures, noisePrec, trainingFile, testData);
And this is the data file:
1,63,38,1
0,16,23,1
1,28,40,1
1,55,27,1
0,22,18,1
0,20,40,1Tuesday, November 20, 2012 10:57 PM 
Note that although you are specifying three classes, your training file only gives examples for two classes (i.e. you only have a 1 or a 0 in the first column).
The results are telling you that the probability of class 1 (= true in the tutorial) is 0.8843, 0.4522, 0.3073 for the three examples. This differs from the tutorial because formulation of the model is slightly different (the multiclass has weights for each class and uses an argmax factor), and therefore, because of the very small amount of data, the priors on the weights have a large affect on the answers.
To understand the effect of the priors, let's simplify things by changing nClass = 3 to nClass = 2. For this two class problem, the classifier will need to learn a bias which centralizes the data around 0 so that inner products to one side of 0 are one class, and inner products to the other side of 0 are the other class. Given that the data values are large and positive, the bias weight will need to be very large compared to the weights for the other features so as to centralize the data. But this consideration is not relected in the priors which treats all weights the same. The small quantity of data is not sufficient to overcome the effect of this poor prior.
In general when you are using BPM for real, the best thing is to standardize the data so the values are centred around 0, and rougly of scale 1. Then the default weight priors will be more sensible. You might want to try this to see the effect on the predictions. In addition, the more data you have, the less the effect of the prior. The moral of the story is that you need to think carefully about priors when data is limited, and this requires an understanding of the model.
John
 Marked as answer by Atarax Wednesday, November 21, 2012 4:19 PM
Wednesday, November 21, 2012 9:26 AMOwner 
I got lost in the details. Going back to the original question. If I wanted to calculate both willBuy and willRent, do I have to do them separately? Or is there a way to say "With these inputs (incomes, ages) we had these outcomes (willBuy, willRent). Now given this test (income, ages), what are the probabilities for willBuy and willRent"?Wednesday, November 21, 2012 4:49 PM

The propbabilities are in the predictions as above  they are the parameters of the predictive distributions:
Discrete(0.09839 0.8843 0.01729)
Discrete(0.5294 0.4522 0.01836)
Discrete(0.6677 0.3073 0.02508)For the first test point there is a 0.09839 of your first class, 0.8843 of your second class, and 0.01729 of your third class.
Wednesday, November 28, 2012 6:23 PMOwner