Specifying large Conditional Probability Tables of discrete events (Migrated from community.research.microsoft.com)
-
Friday, June 03, 2011 5:13 PMOwner
freddycct posted on 08-11-2009 9:41 AM
Suppose I have 3 random variables x, y and z. Each random variable can take 10 possible states. and let
x depend on y and z, hence, I need to specify the conditional probability table of P(x | y, z )
If I specify the conditional probabilities using the "using Variable.if(case) .... " statements, then I need to type alot of statements.
I understand there is a ifblock to use but there's no simple example to illustrate how to use it.
Is there any help on this or a simple work around?
All Replies
-
Friday, June 03, 2011 5:13 PMOwner
John Guiver replied on 08-11-2009 12:29 PM
Useful question. I don't think we have example code in the user guide, but here is a succinct way to do it using the Variable.Switch statement. I have shown this for 2x2x2, but the approach is the same for any number of states:
Vector[ , ] cpt =
{{new Vector(0.1, 0.9), new Vector(0.3, 0.7)},
{new Vector(0.5, 0.5), new Vector(0.4, 0.6)}};
var x = Variable.New<int>();
Range yRange = new Range(cpt.GetLength(0));
Range zRange = new Range(cpt.GetLength(1));
var y = Variable.DiscreteUniform(yRange);
var z = Variable.DiscreteUniform(zRange);
var probs = Variable.Array<Vector>(yRange, zRange);
probs.ObservedValue = cpt;
using (Variable.Switch(y))
{
using (Variable.Switch(z))
{
x.SetTo(Variable.Discrete(probs[y, z]));
}
}
var engine = new InferenceEngine();
var xpost = engine.Infer(x); -
Friday, June 03, 2011 5:13 PMOwner
jwinn replied on 08-12-2009 5:57 AM
You should be aware that some care is needed when working with large tabular conditional probabilities such as the one you are proposing. This table involves 10x10x10=1000 parameters - learning such as large set of parameters will require a huge amount of data - you will need to see every combination of the states of x,y and z multiple times to get a good estimate of the probability P(x|y,z).
The problem is that you are assuming nothing about the underlying relationship between x, y and z, so that P(x|y=1,z=1) could be utterly different from P(x|y=2,z=1) or P(x|y=1,z=2). This is rarely the case. I don't know your application, but suppose if x, y, and z are discretisations of continuous variables into ten bins - then you would expect P(x|y,z) to vary smoothly across x and also to vary smoothly as y and z change. This smoothness is lost if you use a tabular conditional probability. Infer.NET provides lots of other kinds of factor which can be used to represent more specific relationships between x,y and z - for example if x,y and z are discretisation of continuous variables which are linearly related then you can directly represent them as continuous variables and specify arithmetic relationships between them e.g. x = Ay + Bz + C and learn A, B, C - just three parameters instead of 1000.
I hope this makes sense and explains why we have many other kinds of relationships available in Infer.NET as well as conditional probability tables.
Best,
John W.
- Marked As Answer by Microsoft ResearchOwner Friday, June 03, 2011 5:13 PM
-
Friday, June 03, 2011 5:13 PMOwner
freddycct replied on 08-12-2009 6:48 AM
Thanks for all the replies. I exaggerated abit on the problem. The conditional table i have is P(x | y, z) where x and y has 5 states and z has 2 states. So I do have a total of 5x5x2 = 50 parameters which I think is still a hassle to type it. I am still exploring Bayesian networks so I am modeling my application using discrete variables instead of continuous variables.