Answered by:
Specifying large Conditional Probability Tables of discrete events (Migrated from community.research.microsoft.com)
Question

freddycct posted on 08112009 9:41 AM
Suppose I have 3 random variables x, y and z. Each random variable can take 10 possible states. and let
x depend on y and z, hence, I need to specify the conditional probability table of P(x  y, z )
If I specify the conditional probabilities using the "using Variable.if(case) .... " statements, then I need to type alot of statements.
I understand there is a ifblock to use but there's no simple example to illustrate how to use it.
Is there any help on this or a simple work around?
Friday, June 3, 2011 5:13 PM
Answers

jwinn replied on 08122009 5:57 AM
You should be aware that some care is needed when working with large tabular conditional probabilities such as the one you are proposing. This table involves 10x10x10=1000 parameters  learning such as large set of parameters will require a huge amount of data  you will need to see every combination of the states of x,y and z multiple times to get a good estimate of the probability P(xy,z).
The problem is that you are assuming nothing about the underlying relationship between x, y and z, so that P(xy=1,z=1) could be utterly different from P(xy=2,z=1) or P(xy=1,z=2). This is rarely the case. I don't know your application, but suppose if x, y, and z are discretisations of continuous variables into ten bins  then you would expect P(xy,z) to vary smoothly across x and also to vary smoothly as y and z change. This smoothness is lost if you use a tabular conditional probability. Infer.NET provides lots of other kinds of factor which can be used to represent more specific relationships between x,y and z  for example if x,y and z are discretisation of continuous variables which are linearly related then you can directly represent them as continuous variables and specify arithmetic relationships between them e.g. x = Ay + Bz + C and learn A, B, C  just three parameters instead of 1000.
I hope this makes sense and explains why we have many other kinds of relationships available in Infer.NET as well as conditional probability tables.
Best,
John W.
 Marked as answer by Microsoft Research Friday, June 3, 2011 5:13 PM
Friday, June 3, 2011 5:13 PM
All replies

John Guiver replied on 08112009 12:29 PM
Useful question. I don't think we have example code in the user guide, but here is a succinct way to do it using the Variable.Switch statement. I have shown this for 2x2x2, but the approach is the same for any number of states:
Vector[ , ] cpt =
{{new Vector(0.1, 0.9), new Vector(0.3, 0.7)},
{new Vector(0.5, 0.5), new Vector(0.4, 0.6)}};
var x = Variable.New<int>();
Range yRange = new Range(cpt.GetLength(0));
Range zRange = new Range(cpt.GetLength(1));
var y = Variable.DiscreteUniform(yRange);
var z = Variable.DiscreteUniform(zRange);
var probs = Variable.Array<Vector>(yRange, zRange);
probs.ObservedValue = cpt;
using (Variable.Switch(y))
{
using (Variable.Switch(z))
{
x.SetTo(Variable.Discrete(probs[y, z]));
}
}
var engine = new InferenceEngine();
var xpost = engine.Infer(x);Friday, June 3, 2011 5:13 PM 
jwinn replied on 08122009 5:57 AM
You should be aware that some care is needed when working with large tabular conditional probabilities such as the one you are proposing. This table involves 10x10x10=1000 parameters  learning such as large set of parameters will require a huge amount of data  you will need to see every combination of the states of x,y and z multiple times to get a good estimate of the probability P(xy,z).
The problem is that you are assuming nothing about the underlying relationship between x, y and z, so that P(xy=1,z=1) could be utterly different from P(xy=2,z=1) or P(xy=1,z=2). This is rarely the case. I don't know your application, but suppose if x, y, and z are discretisations of continuous variables into ten bins  then you would expect P(xy,z) to vary smoothly across x and also to vary smoothly as y and z change. This smoothness is lost if you use a tabular conditional probability. Infer.NET provides lots of other kinds of factor which can be used to represent more specific relationships between x,y and z  for example if x,y and z are discretisation of continuous variables which are linearly related then you can directly represent them as continuous variables and specify arithmetic relationships between them e.g. x = Ay + Bz + C and learn A, B, C  just three parameters instead of 1000.
I hope this makes sense and explains why we have many other kinds of relationships available in Infer.NET as well as conditional probability tables.
Best,
John W.
 Marked as answer by Microsoft Research Friday, June 3, 2011 5:13 PM
Friday, June 3, 2011 5:13 PM 
freddycct replied on 08122009 6:48 AM
Thanks for all the replies. I exaggerated abit on the problem. The conditional table i have is P(x  y, z) where x and y has 5 states and z has 2 states. So I do have a total of 5x5x2 = 50 parameters which I think is still a hassle to type it. I am still exploring Bayesian networks so I am modeling my application using discrete variables instead of continuous variables.
Friday, June 3, 2011 5:13 PM