locked
Trying to learn Infer,net in C# RRS feed

  • Question

  • Hello All, I am trying to understand how to use Infer.net with C#. I have been looking at the tutorials, videos and I feel that it is a great tool. However, despite being a long time programmer in C++ and C# I am having a bit of a problem wrapping my head around the use of Infer.net. So, in order to better understand how to use it and code it with C# I decided to create a little project for myself. Here is what I am trying to solve.

    I have a person with characteristics such as:
      Height  which is chosen from an enumeration {SHORT, MEDIUM, TALL}
      Weight from enumeration {LIGHT, MEDIUM, HEAVY}
      Hair colour  from enumeration {BROWN, BLONDE, RED, BLACK]

     That person enters a kennel full of dogs, each of whom has an identity. For the sake of argument, say that there are N such dogs in the kennel. I identify them as D[i] (for Dog # i)
     I observe the behaviour of the dogs when presented a person. I am interested in whether the dog LICKS the person of not.

     I would like to find out the following:
     Given a person with some set of characteristics (say SHORT height, MEDIUM weight, BROWN hair), what is the probability that a particular dog will LICK that person?
     For each dog in the kennel, can I find what set of characteristics in a person will make it more likely to LICK that person? Maybe I can use this information to better match a dog to a person coming into the kennel.

    I would truly appreciate a solution in C#. I would like to think that this problem is rather elementary and that is why I use it as it will make it easier for me to understand the program.

    Thank you very much for any help.

    Friday, November 25, 2011 4:58 PM

All replies

  • You are going about this the right way. First determine what your variables are - in your case, you have specified height, weight and hair colour as integer variables. You might also consider them as continuous variables, and that will affect the model you design. Once you know what your variables are, you can set about designing a model. Given that you are going with Discrete variables, a starting point would be to pose this as a Discrete Bayes Net and you can program that pretty much as in the Discrete Bayes Net example which is part of the download. The conditional probability tables which map characteristics to the lick variable is also a variable in the problem. You can infer the CPT, and also ask predictive questions of the model. Here is a Discrete Bayes Net solution to your problem. There are many way this model could be extended/modified - for example to cover the absence of personal attributes, to take into account the fact that your variables may be better represented as doubles, or to determine the discriminatability of dogs etc. The purpose of Infer.NET is to allow you to make a model that fits your problem rather than force a problem into an existing model structure, so you need to think carefully about your data, and design models accordingly.

    using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Text;

    using MicrosoftResearch.Infer.Models;

    using MicrosoftResearch.Infer.Maths;

    using MicrosoftResearch.Infer.Distributions;

    using MicrosoftResearch.Infer;

     

    namespace Cybister1

    {

      public class Lick

      {

        public Variable<int> NumPeople = Variable.New<int>().Named("NumPeople");

        public Variable<int> NumDogs = Variable.New<int>().Named("NumDogs");

        public Variable<int> NumHeight = Variable.New<int>().Named("NumHeight");

        public Variable<int> NumWeight = Variable.New<int>().Named("NumWeight");

        public Variable<int> NumHair = Variable.New<int>().Named("NumHair");

        public VariableArray<int> Height;

        public VariableArray<int> Weight;

        public VariableArray<int> Hair;

        public VariableArray<VariableArray<int>, int[][]> Licks;

        public VariableArray<VariableArray<VariableArray<Vector>, Vector[][]>, Vector[][][]> CPTLick;

        public VariableArray<VariableArray<VariableArray<Dirichlet>, Dirichlet[][]>, Dirichlet[][][]> CPTLickPrior;

        public Dirichlet[][][] CPTLickPosterior;

        public InferenceEngine Engine = new InferenceEngine();

     

        public Lick()

        {

          Range p = new Range(NumPeople).Named("p");

          Range d = new Range(NumDogs).Named("n");

          Range h = new Range(NumHeight).Named("h");

          Range w = new Range(NumWeight).Named("w");

          Range hr = new Range(NumHair).Named("hr");

          Height = Variable.Array<int>(p).Named("Height");

          Height.SetValueRange(h);

          Weight = Variable.Array<int>(p).Named("Weight");

          Weight.SetValueRange(w);

          Hair = Variable.Array<int>(p).Named("Hair");

          Hair.SetValueRange(hr);

          CPTLickPrior = Variable.Array(Variable.Array(Variable.Array<Dirichlet>(hr), w), h).Named("cptLickPrior");

          CPTLick = Variable.Array(Variable.Array(Variable.Array<Vector>(hr), w), h).Named("cptLick");

          CPTLick[h][w][hr] = Variable<Vector>.Random(CPTLickPrior[h][w][hr]);

          Licks = Variable.Array(Variable.Array<int>(d), p).Named("lick");

          using (Variable.ForEach(p))

          using (Variable.Switch(Height[p]))

          using (Variable.Switch(Weight[p]))

          using (Variable.Switch(Hair[p]))

            Licks[p][d] = Variable.Discrete(CPTLick[Height[p]][Weight[p]][Hair[p]]).ForEach(d);

        }

     

        public void LearnParameters(

          Tuple<int[], int[], int[], int[][]> data,

          Dirichlet[][][] cptLickPrior)

        {

          int[] height = data.Item1;

          int[] weight = data.Item2;

          int[] hair = data.Item3;

          int[][] licks = data.Item4;

     

          NumDogs.ObservedValue = licks[0].Length;

          NumPeople.ObservedValue = height.Length;

          NumHeight.ObservedValue = cptLickPrior.Length;

          NumWeight.ObservedValue = cptLickPrior[0].Length;

          NumHair.ObservedValue = cptLickPrior[0][0].Length;

     

          Height.ObservedValue = height;

          Weight.ObservedValue = weight;

          Hair.ObservedValue = hair;

          Licks.ObservedValue = licks;

     

          CPTLickPrior.ObservedValue = cptLickPrior;

          CPTLickPosterior = Engine.Infer<Dirichlet[][][]>(CPTLick);

        }

     

        public void LearnParameters(

          Tuple<int[], int[], int[], int[][]> data,

          int numHeight,

          int numWeight,

          int numHair)

        {

          Dirichlet[][][] cptLickPrior = Enumerable.Range(0, numHeight).Select(

            i => Enumerable.Range(0, numWeight).Select(

              j => Enumerable.Range(0, numHair).Select(

                k => Dirichlet.Uniform(2)).ToArray()).ToArray()).ToArray();

     

          LearnParameters(data, cptLickPrior);

        }

     

        public double WillDogLickMe(

          int height, int weight, int hair)

        {

          var cptLickPrior = this.CPTLickPosterior;

          NumDogs.ObservedValue = 1;

          NumPeople.ObservedValue = 1;

          NumHeight.ObservedValue = cptLickPrior.Length;

          NumWeight.ObservedValue = cptLickPrior[0].Length;

          NumHair.ObservedValue = cptLickPrior[0][0].Length;

     

          Height.ObservedValue = new int[] { height };

          Weight.ObservedValue = new int[] { weight };

          Hair.ObservedValue = new int[] { hair };

          Licks.ClearObservedValue();

          CPTLickPrior.ObservedValue = cptLickPrior;

     

          var WillLick = Engine.Infer<Discrete[][]>(Licks)[0][0];

          return WillLick.GetProbs()[0];

        }

      }

     

      class Program

      {

        public static Tuple<int[], int[], int[], int[][]> Sample(

          int numPeople,

          int numDogs,

          Vector probHeight,

          Vector probWeight,

          Vector probHair,

          Vector[][][] cptLick)

        {

          int[] heights = new int[numPeople];

          int[] weights = new int[numPeople];

          int[] hairs = new int[numPeople];

          int[][] licks = new int[numPeople][];

          for (int i = 0; i < numPeople; i++)

          {

            heights[i] = Discrete.Sample(probHeight);

            weights[i] = Discrete.Sample(probWeight);

            hairs[i] = Discrete.Sample(probHair);

            licks[i] = new int[numDogs];

            for (int j = 0; j < numDogs; j++)

            {

              licks[i][j] = Discrete.Sample(cptLick[heights[i]][weights[i]][hairs[i]]);

            }

          }

     

          return new Tuple<int[], int[], int[], int[][]>(heights, weights, hairs, licks);

        }

     

        static void Main(string[] args)

        {

          // Generate some data given a random ground truth:

          Rand.Restart(11);

          int numHeight = 3;

          int numWeight = 3;

          int numHair = 4;

          Vector probHeight = Dirichlet.Uniform(numHeight).Sample();

          Vector probWeight = Dirichlet.Uniform(numWeight).Sample();

          Vector probHair = Dirichlet.Uniform(numHair).Sample();

          Vector[][][] cptLick = Enumerable.Range(0, numHeight).Select(

            i => Enumerable.Range(0, numWeight).Select(

              j => Enumerable.Range(0, numHair).Select(

                k => Dirichlet.Uniform(2).Sample()).ToArray()).ToArray()).ToArray();

          int numPeople = 1000;

          int numDogs = 10;

     

          Tuple<int[], int[], int[], int[][]> data = Sample(numPeople, numDogs, probHeight, probWeight, probHair, cptLick);

     

          // Build the model

          var model = new Lick();

     

          // Parameter learning

          model.LearnParameters(data, numHeight, numWeight, numHair);

     

          // Compare learnt probabilies with ground truth

          for (int i = 0; i < numHeight; i++)

            for (int j = 0; j < numWeight; j++)

              for (int k = 0; k < numHair; k++)

              {

                Console.WriteLine("{0}, {1}, {2}: Expected: {3}, Actual: {4}",

                i, j, k, cptLick[i][j][k], model.CPTLickPosterior[i][j][k].GetMean());

              }

     

          // Will the dogs lick me?

          int myHeight = 1;

          int myWeight = 2;

          int myHair = 0;

     

          Console.WriteLine("Dog will lick me with probability {0:0.000}", model.WillDogLickMe(myHeight, myWeight, myHair));

         

        }

      }

    }

     

     

     

    Monday, November 28, 2011 4:20 PM
    Owner
  • I don't think John's solution solves the problem, because it assumes that the dogs are identical.  The problem statement was to model each dog so that it can be matched with the appropriate person.  This only makes sense if the dogs are different.  You could take John's solution and learn a separate CPT for each dog, though this would require a massive amount of data to learn since each dog has 36 parameters.  To learn with a smaller amount of data you'd have to couple the dogs together (using a more sophisticated parameter prior) or use a per-dog model with fewer parameters.  

    To use fewer parameters, you can think of each dog as performing its own mapping from a set of person attributes to a probability of lick.  Thus each dog can be viewed as a classifier, such as a Bayes Point Machine.  So the problem boils down to a variant of the Bayes Point Machine tutorial, where there is a separate weight vector for each dog.  Alternatively, you could use a Gaussian Process classifier for each dog.  These require real valued attributes.  To encode a category such as hair, it is usually best to represent it as 4 binary indicators, one for each hair type. This would lead to a lot of weights, but less than 36.

    In summary, this problem is not elementary if you want to solve it with small amounts of data per dog.

    Tuesday, November 29, 2011 10:00 AM
    Owner
  • Thank you John for your long and detailed reply. I did not realize that my problem was not elementary. I will need to play with your solution and see if I can understand it. Again, thank you for taking the time to provide me with this solution.
    Friday, December 2, 2011 2:38 AM
  • Thank you Tom for your input. Indeed, the dogs are not identical. I will take a look at the Bayes Point Machine tutorial and see how it helps me better understand my problem.
    Friday, December 2, 2011 2:44 AM