locked
Using a discrete variable in the BayesPointMachineExample RRS feed

  • Question

  • The inputs into the model in the BayesPointMachineExample are income and ages. How can I change one of these to be a discrete input e.g. SalesPerson: {0,1,2}.

    I know I can declare it as follows but I'm not sure how to include it in the BPM:

    Variable<int> salesPerson = Variable.DiscreteUniform(3);

    Thanks

    Sunday, February 10, 2013 5:29 AM

All replies

  • In many applications, inputs to BPM represent discrete values. Even if you have continuous values it is still worthwhile representing them as discrete (they are several ways to do this) especially if you have lots of data. The advantage of doing this is that you can learn a nonlinear mapping between inputs and outputs, whereas just inputting the continuous value directly forces a linear relationship.

    You should keep the BPM as is, but write some code that maps from raw values to input vector. Just represent a 3-value discrete value as a 1-of-3 code (1, 0, 0), (0, 1, 0), (0, 0, 1).

    John

    Monday, February 11, 2013 9:38 AM
    Owner
  • I've changed the BayesPointMachineExample to use a discrete value (category instead of income) and output a gaussian representing a price instead of willBuy. It does seem to force a linear relationship. As I increase the category value the output also increases even given a category that did not exist in the input data. 

    Here's the code:

    public void Run()
    {
    // data
    double[] category = { 2, 1, 0, 2, 1, 0 };
    double[] ages = { 38, 23, 40, 27, 18, 40 };
    double[] price = { 33, 50, 22, 19, 44, 19};
    
    // Create target y
    VariableArray<double> y = Variable.Observed(price).Named("y");
    Variable<Vector> w = Variable.Random(new VectorGaussian(Vector.Zero(3),
    	PositiveDefiniteMatrix.Identity(3))).Named("w");
    BayesPointMachine(category, ages, w, y);
    
    InferenceEngine engine = new InferenceEngine();
    if (!(engine.Algorithm is GibbsSampling))
    {
    	VectorGaussian wPosterior = engine.Infer<VectorGaussian>(w);
    	Console.WriteLine("Dist over w=\n"+wPosterior);
    
    	double[] incomesTest = { 0, 1, 2, 3 };
    	double[] agesTest = { 24, 24, 24, 24 };
        VariableArray<double> ytest = Variable.Array<double>(new Range(agesTest.Length)).Named("ytest");
    	BayesPointMachine(incomesTest, agesTest, Variable.Random(wPosterior).Named("w"), ytest);
    	Console.WriteLine("output=\n" + engine.Infer(ytest));
    }
    else Console.WriteLine("This model has a non-conjugate factor, and therefore cannot use Gibbs sampling");
    
    }
    
    public void BayesPointMachine(double[] incomes, double[] ages, Variable<Vector> w, VariableArray<double> y)
    {
    // Create x vector, augmented by 1
    Range j = y.Range.Named("person");
    Vector[] xdata = new Vector[incomes.Length];
    for (int i = 0; i < xdata.Length; i++) 
    	xdata[i] = Vector.FromArray(incomes[i], ages[i], 1);
    VariableArray<Vector> x = Variable.Observed(xdata,j).Named("x");
    
    // Bayes Point Machine
    double noise = 0.1;
    y[j] = Variable.GaussianFromMeanAndVariance(Variable.InnerProduct(w, x[j]).Named("innerProduct"),noise);
    }

    So if I want to map the discrete values as a vector instead would I do something like this in the BayesPointMachine() method?

    for (int i = 0; i < xdata.Length; i++)
    {
        Vector category = Vector.FromArray(new double[] { 0, 0, 1, 0 }); // Category 2 (Category is 0 based)
        Vector age = Vector.FromArray(new double[] { 0, 0, 0, 0, 0, 0, 0, 0, 1 }); // Age = 9
        xdata[i] = category + age;
    }

    Thanks


    Monday, February 11, 2013 11:25 AM
  • You could do that. But more efficient just to create a zero Vector of the correct length (Vector x = Vector.Zero(9)) and then set the individual non-zero values (x[2] = 1; x[8] = 1).

    If you are building an application here rather than just look at toy problems, you should build up some feature infrastructure which maps from your raw records to feature vectors.

    If you have a lot or features and/or feature buckets, you are better off using the sparse version of BPM.

    Tuesday, February 12, 2013 9:05 AM
    Owner