Answered by:
BayesPointMachine example (Migrated from community.research.microsoft.com)
Question

nTH posted on 03022010 4:54 PM
Hi all,
I was going through the example that's provided for Bayes Point Machine and was wondering how I can add more variables for the training set to take into account? Not just integers, but strings as well.
So currently it is using incomes and ages as examples but I want to add two new string arrays as well for it to use in the probability.
Friday, June 3, 2011 5:38 PM
Answers

John Guiver replied on 04122010 9:39 AM
Hi Jim
Apologies for the tardy response. Is your question only within the context of the Bayes Point machine? If so, please read http://community.research.microsoft.com/forums/t/4353.aspx for some clarifications about BPM, and pointers to multiclass treatments.
More generally in Infer.NET you can build a model in many ways to incorporate different observations. Gaussian, Gamma, and Beta random variables can all be observed as doubles, and there are several factors which give rise to such variables.
If you want to observe counts of an event, you will need to incorporate a Binomial factor (for a two class problem such as 'buy' or 'not buy') or a multinomial factor (for multiclass problems); these factors give rise to int random variables, can be observed as int, and their posteriors can be recovered as Discrete distributions. For example, you might consider modifying the BPM model to replace the IsPositive factor by a Logistic factor feeding into a Binomial factor. You can then observe the ouput of the Binomial factor.
Your test model will then make use of the posterior weights, and will infer the counts rather than observing them.
John
 Marked as answer by Microsoft Research Friday, June 3, 2011 5:39 PM
Friday, June 3, 2011 5:39 PM
All replies

John Guiver replied on 03042010 11:48 AM
You need to give us a bit more detail about your model. We do not support strings as domain types for distributions, but if the strings are really just string identifiers  i.e. 1 of N codes, then there are ways of incorporating these into different models.
In the meantime, please check out the sparse Bayes point machine. This gives a bit more flexibility in terms of features.
John
Friday, June 3, 2011 5:38 PM 
nTH replied on 03082010 2:15 PM
I wanted to add another variable to the BPM example under tutorials. The variable that I added was "expenses". Below is the tutorial code with the addition of "expenses". However, I'm getting an error regarding the number of parameters used. Am I adding the new variable correctly into the BPM for analysis?
public class BayesPointMachineExample
{
public void Run()
{
// data
double[] expenses = { 99, 0, 21, 46, 63, 80 };
double[] incomes = { 63, 16, 28, 55, 22, 20 };
double[] ages = { 38, 23, 40, 27, 18, 40 };
bool[] willBuy = { true, false, true, true, false, false };
// Create target y
VariableArray<bool> y = Variable.Observed(willBuy).Named("y");
Variable<Vector> w = Variable.Random(new VectorGaussian(new Vector(3),
PositiveDefiniteMatrix.Identity(3))).Named("w");
BayesPointMachine(incomes, ages,expenses,w, y);
InferenceEngine engine = new InferenceEngine(new ExpectationPropagation());
VectorGaussian wPosterior = engine.Infer<VectorGaussian>(w);
Console.WriteLine("Dist over w=\n"+wPosterior);
double[] expensesTest = { 99, 30, 2 };
double[] incomesTest = { 58, 18, 22 };
double[] agesTest = { 36, 24, 37 };
VariableArray<bool> ytest = Variable.Array<bool>(new Range(agesTest.Length)).Named("ytest");
BayesPointMachine(incomesTest, agesTest, expensesTest,Variable.Random(wPosterior).Named("w"), ytest);
Console.WriteLine("output=\n" + engine.Infer(ytest));
}
public void BayesPointMachine(double[] incomes, double[] ages, double[] expenses, Variable<Vector> w, VariableArray<bool> y)
{
// Create x vector, augmented by 1
Range j = y.Range.Named("person");
Vector[] xdata = new Vector[incomes.Length];
for (int i = 0; i < xdata.Length; i++) xdata[i] = new Vector(incomes[i], ages[i], expenses[i], 1);
VariableArray<Vector> x = Variable.Observed(xdata,j).Named("x");
// Bayes Point Machine
y[j] = Variable.IsPositive(Variable.InnerProduct(w, x[j]).Named("innerProduct"));
}
}
}
Friday, June 3, 2011 5:38 PM 
John Guiver replied on 03092010 8:00 AM
You have added another variable, but have not modified the w parameter vector which needs to be of size 4 (3 dimensions for the features, and 1 dimension for the bias). Change the line that defines w to:
Variable<Vector> w = Variable.Random(new VectorGaussian(new Vector(4),
PositiveDefiniteMatrix.Identity(4))).Named("w");John
Friday, June 3, 2011 5:38 PM 
nTH replied on 03092010 8:55 PM
Thank you so much. This is great help.
Friday, June 3, 2011 5:38 PM 
nTH replied on 03302010 11:11 AM
Also, why is the x vector augmented by 1? If we change to anything larger than 1 are we then assuming our ranges can be greater than 100?
Friday, June 3, 2011 5:38 PM 
nTH replied on 03302010 1:19 PM
Actually, nevermind. Augmenting by 1 has anything to do with setting the ranges for the variables.
Friday, June 3, 2011 5:38 PM 
JimGale replied on 04082010 11:43 AM
Hi John:
Does the ability yet exist to return a discrete value, double or int, from the probability of a set of trained values, rather than the bool observations listed above?
So, in this specific case, can one first observe purchases and then with test data infer how much one may purchase rather than likelytobuy?
Thanks,
Jim Gale
Friday, June 3, 2011 5:39 PM 
John Guiver replied on 04122010 9:39 AM
Hi Jim
Apologies for the tardy response. Is your question only within the context of the Bayes Point machine? If so, please read http://community.research.microsoft.com/forums/t/4353.aspx for some clarifications about BPM, and pointers to multiclass treatments.
More generally in Infer.NET you can build a model in many ways to incorporate different observations. Gaussian, Gamma, and Beta random variables can all be observed as doubles, and there are several factors which give rise to such variables.
If you want to observe counts of an event, you will need to incorporate a Binomial factor (for a two class problem such as 'buy' or 'not buy') or a multinomial factor (for multiclass problems); these factors give rise to int random variables, can be observed as int, and their posteriors can be recovered as Discrete distributions. For example, you might consider modifying the BPM model to replace the IsPositive factor by a Logistic factor feeding into a Binomial factor. You can then observe the ouput of the Binomial factor.
Your test model will then make use of the posterior weights, and will infer the counts rather than observing them.
John
 Marked as answer by Microsoft Research Friday, June 3, 2011 5:39 PM
Friday, June 3, 2011 5:39 PM