Asked by:
Supervised LDA
Question

Dear Infer.NET community:
As I am still learning the details of Infer.NET, I came across an implementation need that I cant' seem to get right after 2 days of trying a couple of things, perhaps I am going in the wrong direction.
I am trying to implement Supervised LDA. The model itself doesn't matter as much as this issue that I am having: Essentially the issue is using the empirical topic counts in the dot product with a gaussiandistributed weight vector, inside a logistic factor (BernoulliFromLogOdds). I have tried a few things, but can't quite get it. First, I tried using a Multinomial VariableArray instead of a Discrete variable for drawing topics. The issue I ran into was converting the array into a vector type to be used as an argument to InnerProduct factor. Besides the SetTo, or ConstraintEqual (which gave me problems), I couldn't see any other way. Here is sketch of what I tried to write, as it illustrates the problem (in this case I explicitly assume only two topics and do it in a rolled out way for illustration purpose):
class Program { static void Main(string[] args) { Range n = new Range(10); // number of words Range k = new Range(2); // number of latent topics
// indicates whether nth word is in topic 1 VariableArray<double> indicatorsTopicOne = Variable.Array<double>(n);
// indicates whether nth word is in topic 0 VariableArray<double> indicatorsTopicZero = Variable.Array<double>(n);
// topic draws for each word VariableArray<int> topics = Variable.Array<int>(n);
// prior on topics Variable<Vector> theta = Variable<Vector>.DirichletSymmetric(k.SizeAsInt, 1.0);
// prior on topic counts. Probably not a good choice, but the line below it won't work Variable<Vector> topicCounts = Variable<Vector>.DirichletSymmetric(k.SizeAsInt, 1.0); topics[n] = Variable.Discrete(theta).ForEach(n); //Variable<Vector> topicCounts = new Variable<Vector>(); < this doesn't work using(Variable.ForEach(n)) { Variable<bool> isTopicOne = topics[n] == 1; using(Variable.If(isTopicOne)) { Variable.ConstrainEqual(indicatorsTopicOne[n], 1.0); Variable.ConstrainEqual(indicatorsTopicZero[n], 0.0); } using(Variable.IfNot(isTopicOne)) { Variable.ConstrainEqual(indicatorsTopicZero[n], 1.0); Variable.ConstrainEqual(indicatorsTopicOne[n], 0.0); } } Variable<double> topicOneCount = Variable.Sum(indicatorsTopicOne); Variable<double> topicZeroCount = Variable.Sum(indicatorsTopicZero); topicCounts.SetValueRange(k);
// I tried this below as well (in comments) //Variable.ConstrainEqual<double>(Variable.GetItem(topicCounts, 0), topicZeroCount); //Variable.ConstrainEqual<double>(Variable.GetItem(topicCounts, 1), topicOneCount);
// Not sure if setting like this is the right way. Variable.GetItem(topicCounts, 0).SetTo(topicZeroCount); Variable.GetItem(topicCounts, 1).SetTo(topicOneCount); Vector mean = Vector.FromArray(new double[] {1.0,2.0}); PositiveDefiniteMatrix prec = PositiveDefiniteMatrix.Identity(2); Variable<Vector> w = Variable<Vector>.VectorGaussianFromMeanAndPrecision(mean, prec); Variable<bool> y = Variable.BernoulliFromLogOdds(Variable.InnerProduct(w, topicCounts)); Variable.ConstrainTrue(y); InferenceEngine engine = new InferenceEngine(new VariationalMessagePassing()); engine.Infer(w); } }
Is there a way to encode what I want to do in Infer.NET, or is it not possible at the moment? I am still struggling a bit in figuring out the right way to encode the model.
Thanks so much for your time and help!
Tuesday, April 22, 2014 10:07 PM
All replies

Put the topic counts into an array and use Variable.Vector to convert the array to a vector.Wednesday, April 23, 2014 8:08 AMOwner

Thank you Tom. I ran into another issue, that I just can't seem to get around. In order to obtain the counts, for each topic, I keep an array, where each element is either a 1 or a 0, depending if the word is assigned the topic corresponding to the topic of the array. This is an array of doubles. To obtain counts, I use the Sum factor. Alternatively I tried keeping a boolean array and then applying SumWhere, with a vector of all ones, but that would bring up exceptions because output of the SumWhere cannot be fixed. My question is probably trivial: How do I generate these arrays of 1s and 0s? If I don't assign a definition to the array, I was trying to use SetTo(1) or SetTo(0), but that resulted in exceptions as well (shown below). Alternatively I could assign priors to these arrays and then use ConstrainEqual.
Here is what I tried:
VariableArray<double> oneDoubleIndices = Variable.Array<double>(n); VariableArray<double> zeroDoubleIndices = Variable.Array<double>(n); using(Variable.ForEach(n)) { topics[n] = Variable.Discrete(theta); } using(Variable.ForEach(n)) { Variable<bool> isOne = topics[n] == 1; using(Variable.If(isOne)) { oneDoubleIndices[n] = 1.0; zeroDoubleIndices[n] = 0.0; } using(Variable.IfNot(isOne)) { oneDoubleIndices[n] = 0.0 zeroDoubleIndices[n] = 1.0; } } Variable<double> zeroCount = Variable.New<double>(); Variable<double> oneCount = Variable.New<double>(); zeroCount.SetTo(Variable.Sum(zeroDoubleIndices)); oneCount.SetTo(Variable.Sum(oneDoubleIndices));
An error that I get is: Error 0: Internal: Message on lhs has wrong type (Gaussian not assignable from D
ouble) in
Factor.Copy<double>(1.0)Am I failing to understand how to properly do this?
Also, not necessarily specific to the problem, I am interested in figuring how to define a variable in a deterministic way from another random variable. For example, in this case topic[n] is stochastic, and the oneDoubleIndices[n] is stochastic, but defined using a deterministic procedure, i.e. by comparing a value of topic[n] to a constant and branching, then setting oneDoubleIndices[n] to a value. Simple deterministic operations through operator overloading accomplish this, I understand, but not sure how to do that procedurally, which is what I am attempting here.
Finally, I tried using a vector of integer counts produced by the Multinomial factor, but could not figure out how to cast Variable<int> to Variable<double> for passing to the sum, is it possible?
Thank you so much for taking the time to read this!
Friday, April 25, 2014 5:32 AM 
The issue is that oneDoubleIndices is stochastic and there is no indication of what distribution type to use. You need to provide a distribution type as follows:
oneDoubleIndices[n] = Variable.Random(Gaussian.PointMass(1.0));
The approach you have used is the correct way to convert from Boolean to double. To convert from int to double, use an array lookup.Friday, April 25, 2014 8:21 AMOwner 
Thank you again for your prompt reply, Tom.
I am really sorry to keep annoying you with this, but I am still having trouble.
This still gives me the same error (when I am defining oneDoubleIndices[n] inside the for loop, which I think equivalent to what you suggested:
using(Variable.ForEach(n)) { Variable<bool> isOne = topics[n] == 1; using(Variable.If(isOne)) { oneDoubleIndices[n] = Variable.Random(Gaussian.PointMass(1.0)); zeroDoubleIndices[n] = Variable.Random(Gaussian.PointMass(0.0)); }
where the the array variables are declared outside the forloop as before:
VariableArray<double> zeroDoubleIndices = Variable.Array<double>(n);
I am really confused about the pointmass, in particular why is it Gaussian? I think I am missing the point of how what I need to accomplish gets accomplished in Infer.NET. In particular, if I am trying to define a deterministic procedure to produce another stochastic variable from a different stochastic variable (such as I am trying to do here), is defining a pointmass distribution the way to go?
In regards to the conversion from int to double, you mentioned that I can just do an array lookup. I understood it as this, which doesn't seem to work for me, thus I clearly misunderstood what you meant by array lookup:
VariableArray<int> topicsMult = Variable.Multinomial(k,theta); VariableArray<double> topicsMultDouble = Variable.Array<double>(k); using(Variable.ForEach(k)) { topicsMultDouble[k] = topicsMult[k]; }
This gives me a type error.
Sorry again for all the trouble, but this has all been very useful!
Monday, April 28, 2014 10:13 AM 
Sorry, I sent the wrong fix. The correct fix is:
oneDoubleIndices[n] = Variable.GaussianFromMeanAndVariance(1.0,0);
This is only a workaround for a bug in the current version. Your original code will work in the next version of Infer.NET.
To convert from int to double, make an array of doubles where index i contains (double)i. Indexing this array by a random int gives you the corresponding random double.
Monday, April 28, 2014 5:30 PMOwner