Asked by:
Model selection (Migrated from community.research.microsoft.com)
Question

Ravi Pandya posted on 08072010 9:19 AM
Normal 0 false false false ENUS XNONE XNONE Merging threads and posting our email discussion to the forum per John's request.
John: A goal with the model condition variables was to share computation among the different possible networks  e.g. a>b>c and a>b share a common edge, and there might be statistics computed from the observations that could be shared. I can share code without it, and it was probably premature (or misguided) optimization.
Tom: Copying y0 doesn't help, unfortunately. I did figure out the AllZeroException problem  I was directly observing the same Gaussian variable in each of the observation data sets, so I had a set of point masses, one for each observation. I made the observations an array of Gaussians centered on common variable, and that fixed it.
With these changes, and switching to VMP, it now successfully runs model selection over all 25 possible topologies (all noncyclic 3node directed graphs)! There is still one AllZeroException, but it still proceeds; I think it might be running out of precision, since it is just a permutation of a topology that works fine in other cases.
The model probabilities are all quite low (LogOdds ranging from 25 to 110!). I think I need to add more realistic priors, and validate the factor graphs and calculations in more depth so I better understand what it is doing.
But this is great progress, thanks!
Ravi
Hi Ravi
We don’t support a Gaussian Product evidence message operator when both arguments are random variables. You could implement the evidence operator – Tom would be the one to advise. In general, we look at using VMP for models with product factors, and it should compile OK with VMP, or if the product is outside the Variable.If.
A couple of other points:
(a) I am having difficulty understanding what you are trying to achieve with your model condition variable. Why are you trying to link up your models? Is it that you are trying to use a single model instance and using observations on the variables to configure the topology? This approach will avoid model recompilation, but is that important for your experiments? You will not save any coding effort or code reuse, as you can already share as much code between models as you want without linking up your models.
(b) As we are investing some time in this conversation, and the level of the conversation is not that proprietary and is of general interest, I would prefer to answer questions like this on the public forum so the whole community can benefit. Can I ask you to consider posting general questions on the forum? (I’m happy to continue answering questions by email also if you prefer J)
John
Normal 0 false false false ENUS XNONE XNONE
_____________________________________________
From: Tom Minka
Sent: Friday, August 06, 2010 9:09 AM
To: Ravi Pandya; John Guiver
Cc: John Winn
Subject: RE: Model selectionI think the issue is that this line:
y.SetTo(y0);
should be:
y.SetTo(Variable.Copy(y0));
Tom
Normal 0 false false false ENUS XNONE XNONE
_____________________________________________
From: Tom Minka
Sent: Friday, August 06, 2010 9:13 AM
To: Ravi Pandya; John Guiver
Cc: John Winn
Subject: RE: Model selectionThe AllZeroException is referring to equilibrium0_rep0_B , not equilibrium0_uses_B . Check that this array does not contain conflicting point masses.
Tom
_____________________________________________
From: Ravi Pandya
Sent: 06 August 2010 14:51
To: John Guiver
Cc: John Winn; Tom Minka
Subject: RE: Model selectionI was able to build separate models for each network instead, so there’s no pressing need to fix this. I took a look at the SharedVariable docs, and it might be able to do what I was intending with the condition variables. I’ll try to get the simple separate models working first, and then work on optimization (a good strategy in general J).
As I try to model the network behavior more accurately, I am running into a recurring issue with the product of Gaussians. Here’s a small sample that illustrates the problem:
static void CompositionSample3()
{
var x = Variable.GaussianFromMeanAndVariance(1.0, 0.1);
var edge = Variable.GaussianFromMeanAndVariance(0.5, 0.3);
var y0 = Variable.GaussianFromMeanAndVariance(1.0, 0.1);
var y = Variable.New<double>();
var model = Variable.Bernoulli(0.5);
using (Variable.If(model))
{
y.SetTo(x*edge*y0);
}
using (Variable.IfNot(model))
{
y.SetTo(y0);
}
x.ObservedValue = 0.6;
y.ObservedValue = 0.7;
var engine = new InferenceEngine();
var result = engine.Infer<Bernoulli>(model);
}
This gives me a set of errors during model construction like the following:
[0] System.ArgumentException: double is not of type Gaussian for argument 1 of method GaussianProductEvidenceOp.LogEvidenceRatio
Parameter Provided Expected
  
product double Gaussian
a Gaussian Gaussian
b Gaussian double
If I remove the highlighted “*y0” it works fine. I run into similar exceptions if I introduce Gaussian observation noise (i.e. keep x & y hidden, but observe xo & yo that are Gaussians with means x & y and some constant variance). Do you have any suggestions here? Could I implement a new Factor that would enable this?
Thanks for all your help,
Ravi
_____________________________________________
From: John Guiver
Sent: Friday, August 06, 2010 6:36 AM
To: Ravi Pandya
Cc: John Winn; Tom Minka
Subject: RE: Model selectionHi Ravi
Thanks for this. The problem is in the lines
using (Variable.If(i[Variable.Constant(0)]))
{
x.SetTo(x0);
}
where i is random and x0 is constant.
Although, from a model writer’s perspective this is a reasonable thing to do, that particular construct is not supported right now. We were not planning on fixing this in the short term as (a) there are higher priority items on our TODO list and (b) there are better ways to do this. Also, I don’t think it is the pattern you want to be using – however, let us know if you want us to address this. You should be able to avoid code duplication if you have a lot of commonality by putting common code into method calls or base classes, or by passing a topology flag down to your model construction code. The important things is to make sure the variable instances are different between your different topologies. Let me know if you need further help with this, and let us know how you get on with your model comparison.
John
_____________________________________________
From: Ravi Pandya
Sent: 02 August 2010 15:22
To: John Guiver; John Winn
Subject: RE: Model selectionI simplified part of the model and that error is no longer happening. I then got an error “Cannot define constant variable in a stochastic context” which I figured out was due to my trying share the same model for different model selection runs. I now build a distinct model for each network topology and have it running now. The next step is to feed in some real data and see what kind of results I get.
In case you’re interested, here is a small code snippet that demonstrates the model sharing issue. I thought it might be worth trying to share the model between topologies, because there is a lot of commonality and with my data set (~1000 observations) there could be a lot of duplicated work.
static void ConditionalModelSample()
{
// range over possible models
var r = new Range(2).Named("r");
// model selection
var i = Variable.Array<bool>(r).Named("i");
// hidden variable dependent on model
var x = Variable.Array<bool>(r).Named("x");
// observed variable
var o = Variable.Array<double>(r).Named("o");
using (Variable.ForEach(r))
{
i[r] = Variable.Bernoulli(0.5);
// condition observed on hidden
using (Variable.If(x[r]))
{
o[r] = Variable.GaussianFromMeanAndVariance(1.0, 0.1);
}
using (Variable.IfNot(x[r]))
{
o[r] = Variable.GaussianFromMeanAndVariance(0.0, 0.1);
}
}
var x0 = Variable.Constant(new[] { true, false });
var x1 = Variable.Constant(new[] { false, true });
// condition hidden variable on model selection variable
using (Variable.If(i[Variable.Constant(0)]))
{
x.SetTo(x0);
}
using (Variable.If(i[Variable.Constant(1)]))
{
x.SetTo(x1);
}
// observe
o.ObservedValue = new double[] {0.95, 0.05};
// infer model probabilities
var engine = new InferenceEngine();
engine.BrowserMode = BrowserMode.Always;
engine.Infer(i[Variable.Constant(0)]);
}
Thanks,
Ravi
_____________________________________________
From: John Guiver
Sent: Saturday, July 31, 2010 11:15 AM
To: Ravi Pandya; John Winn
Subject: RE: Model selectionRavi – if you could whittle down your model(s) to a simple example and email the code, it will enable us to get to the root of the problem much quicker.
Thanks
John
_____________________________________________
From: Ravi Pandya
Sent: 31 July 2010 17:08
To: John Winn; John Guiver
Subject: RE: Model selectionFYI, I also tried using an array of Boolean variables to select each network topology as in the user guide, but I get the same error (actually “Model was null”). It might just be that my model is too complex, and it can’t figure out a way to do model selection. I’ll see if I can simplify it.
Infer.NET is quite cool – the API for building models is quite elegant, despite the limitations of C# as a domainspecific language. Nice work!
Ravi
_____________________________________________
From: Ravi Pandya
Sent: Saturday, July 31, 2010 6:49 AM
To: John Winn; John Guiver
Subject: Model selectionI’m trying to use Infer.NET to do network structure inference for a gene regulatory network. I have a discrete topology variable with 25 values, one for each noncyclic 3node network. I use Case blocks to build the model for each network, using condition variables so they can share structure (e.g. edges) where possible. I have the model building successfully (though there were some oddities about ForEach that I don’t quite understand), but when I try to run inference I get a NullReferenceException “Model is null” and I can’t see any way to debug it. Do you have any suggestions on how to figure it out?
Thanks,
Ravi Pandya
Architect
eXtreme Computing Group (XCG)
http://research.microsoft.com/enus/labs/xcg/default.aspx
Friday, June 3, 2011 5:58 PM