locked
Specifying large Conditional Probability Tables of discrete events (Migrated from community.research.microsoft.com) RRS feed

  • Question

  • freddycct posted on 08-11-2009 9:41 AM

    Suppose I have 3 random variables x, y and z. Each random variable can take 10 possible states. and let

    x depend on y and z, hence, I need to specify the conditional probability table of P(x | y, z ) 

    If I specify the conditional probabilities using the "using Variable.if(case) .... " statements, then I need to type alot of statements.

    I understand there is a ifblock to use but there's no simple example to illustrate how to use it.

    Is there any help on this or a simple work around?

    Friday, June 3, 2011 5:13 PM

Answers

  • jwinn replied on 08-12-2009 5:57 AM

    You should be aware that some care is needed when working with large tabular conditional probabilities such as the one you are proposing.  This table involves 10x10x10=1000 parameters - learning such as large set of parameters will require a huge amount of data - you will need to see every combination of the states of x,y and z multiple times to get a good estimate of the probability P(x|y,z). 

    The problem is that you are assuming nothing about the underlying relationship between x, y and z, so that P(x|y=1,z=1) could be utterly different from P(x|y=2,z=1) or P(x|y=1,z=2).  This is rarely the case.  I don't know your application, but suppose if x, y, and z are discretisations of continuous variables into ten bins - then you would expect P(x|y,z) to vary smoothly across x and also to vary smoothly as y and z change.  This smoothness is lost if you use a tabular conditional probability.  Infer.NET provides lots of other kinds of factor which can be used to represent more specific relationships between x,y and z  - for example if x,y and z are discretisation of continuous variables which are linearly related then you can directly represent them as continuous variables and specify arithmetic relationships between them e.g. x = Ay + Bz + C and learn A, B, C - just three parameters instead of 1000.

    I hope this makes sense and explains why we have many other kinds of relationships available in Infer.NET as well as conditional probability tables.

    Best,

    John W.

    Friday, June 3, 2011 5:13 PM

All replies

  • John Guiver replied on 08-11-2009 12:29 PM

    Useful question. I don't think we have example code in the user guide, but here is a succinct way to do it using the Variable.Switch statement. I have shown this for 2x2x2, but the approach is the same for any number of states:

    Vector[ , ] cpt =
        {{new Vector(0.1, 0.9), new Vector(0.3, 0.7)},
        {new Vector(0.5, 0.5), new Vector(0.4, 0.6)}};

    var x = Variable.New<int>();
    Range yRange = new Range(cpt.GetLength(0));
    Range zRange = new Range(cpt.GetLength(1));

    var y = Variable.DiscreteUniform(yRange);
    var z = Variable.DiscreteUniform(zRange);
    var probs = Variable.Array<Vector>(yRange, zRange);
    probs.ObservedValue = cpt;

    using (Variable.Switch(y))
    {
        using (Variable.Switch(z))
        {
            x.SetTo(Variable.Discrete(probs[y, z]));
        }
    }
    var engine = new InferenceEngine();
    var xpost = engine.Infer(x);

    Friday, June 3, 2011 5:13 PM
  • jwinn replied on 08-12-2009 5:57 AM

    You should be aware that some care is needed when working with large tabular conditional probabilities such as the one you are proposing.  This table involves 10x10x10=1000 parameters - learning such as large set of parameters will require a huge amount of data - you will need to see every combination of the states of x,y and z multiple times to get a good estimate of the probability P(x|y,z). 

    The problem is that you are assuming nothing about the underlying relationship between x, y and z, so that P(x|y=1,z=1) could be utterly different from P(x|y=2,z=1) or P(x|y=1,z=2).  This is rarely the case.  I don't know your application, but suppose if x, y, and z are discretisations of continuous variables into ten bins - then you would expect P(x|y,z) to vary smoothly across x and also to vary smoothly as y and z change.  This smoothness is lost if you use a tabular conditional probability.  Infer.NET provides lots of other kinds of factor which can be used to represent more specific relationships between x,y and z  - for example if x,y and z are discretisation of continuous variables which are linearly related then you can directly represent them as continuous variables and specify arithmetic relationships between them e.g. x = Ay + Bz + C and learn A, B, C - just three parameters instead of 1000.

    I hope this makes sense and explains why we have many other kinds of relationships available in Infer.NET as well as conditional probability tables.

    Best,

    John W.

    Friday, June 3, 2011 5:13 PM
  • freddycct replied on 08-12-2009 6:48 AM

    Thanks for all the replies. I exaggerated abit on the problem. The conditional table i have is P(x | y, z) where x and y has 5 states and z has 2 states. So I do have a total of 5x5x2 = 50 parameters which I think is still a hassle to type it. I am still exploring Bayesian networks so I am modeling my application using discrete variables instead of continuous  variables.

    Friday, June 3, 2011 5:13 PM