Answered by:
A question about LDA extention (Migrated from community.research.microsoft.com)

Question
-
msdy posted on 10-17-2009 12:23 AM
Referring John's post for the LDA model, I am wonderring how to extend it to different data sets, similar to correspond LDA.
Suppose two data sets, both are discrete. Then we can use two LDAs to construct this model, as the figure.
I think most of the code are similar to John's post, but how I can index Z with X for G in infer .net? part of the our my code is as following, but I got errors. Can anyone help? Thank you.
var Z = Variable.Array(Variable.Array<int>(DocSize1), CorpusSize);
var W = Variable.Array(Variable.Array<int>(DocSize1), CorpusSize);var G = variable.Array(Variable.Array<int>(DocSIze2), CorpusSize);
W.ObservedValue = docs1;
G.ObservedValue = docs2
using (Variable.ForEach(CorpusSize))
{
using (Variable.ForEach(DocSize1))
{
Z[CorpusSize][DocSize1] = Variable.Discrete(theta[CorpusSize]).Attrib(new ValueRange(TopicsNum));
using (Variable.Switch(Z[CorpusSize][DocSize1]))
{
W[CorpusSize][DocSize1] = Variable.Discrete(phi[Z[CorpusSize][DocSize1]]);
}
}using (Variable.ForEach(DocSize2))
{X[CorpusSize][DocSize2] = Variable.DiscreteUniform(docSizeVar[DocSize1]).Attrib(new ValueRange(DocSize1));
using (Variable.Switch( X[CorpusSize][DocSize2]))
{
G[CorpusSize][DocSize1] = Variable.Discrete(phi[Z[CorpusSize][X[CorpusSize][DocSize2]]]);
}
}
}Friday, June 3, 2011 5:21 PM
Answers
-
msdy replied on 10-26-2009 6:37 PM
Thank you, John.
It's right now.
- Marked as answer by Microsoft Research Friday, June 3, 2011 5:22 PM
Friday, June 3, 2011 5:22 PM
All replies
-
John Guiver replied on 10-18-2009 7:15 AM
Hi msdy
Sorry - I am unclear as to exactly what you want. Can you try describing the generative process? In the original model, the generative process is:
For each document in the corpus
For each word in the corpus
Draw a topic from the document's topic distribution
Observe the word as drawn from that topic's word distributionCould you give a similar description for your extension? Thanks
John G.
Friday, June 3, 2011 5:21 PM -
minka replied on 10-18-2009 2:43 PM
Some of the indices seem to be wrong. This line:
X[CorpusSize][DocSize2] = Variable.DiscreteUniform(docSizeVar[DocSize1]).Attrib(new ValueRange(DocSize1));
should be:
X[CorpusSize][DocSize2] = Variable.DiscreteUniform(docSizeVar[CorpusSize]).Attrib(new ValueRange(DocSize1));
And this line:
G[CorpusSize][DocSize1] = Variable.Discrete(phi[Z[CorpusSize][X[CorpusSize][DocSize2]]]);
should be:
G[CorpusSize][DocSize2] = Variable.Discrete(phi[Z[CorpusSize][X[CorpusSize][DocSize2]]]);
To make the code clearer, I'd recommend changing the range names, as follows:CorpusSize => document
DocSize1 => word1
DocSize2 => word2This makes the generative model a bit clearer. For each document, you are generating a set of words, and then another set of words using the same topics.
Friday, June 3, 2011 5:21 PM -
msdy replied on 10-18-2009 6:52 PM
Hi John,
Thanks for reply.
Not sure why my reply doesn't show here. I will reply again.
This model is actually Blei's correspondence LDA, which annotates images with captions. Here, I just simplifier it to generate two different type of documents.
It is easy to implement under WinBUGS, but it can not handle jagged array, so I'd like to try infer.net.
Here is the code under WinBUGS. Hope it make this extension easier.
model{
for(s in 1:S){
theta[s, 1:K]~ddirch(alpha[])
for(n in 1:L[ s ]){
z[s,n]~dcat(theta[s,1:K])
r[s,n]~dcat(lamda[z[s,n], 1:V1])
}
for(m in 1:T[ s ]){
x[s,m]~dunif(1,N)
g[s,m]~dcat(phi[ z[s,x[s,m]],1:V2])
}
}
for(k in 1:K){
lamda[k,1:V1]~ddirch(beta[])
phi[k,1:V2]~ddirch(gamma[])
}
}
static void Main(string[] args)
{
// Setting
int K = 2; // Number of topics
int V1 = 3; // Number of words in doc type 1
int V2 = 4; // Number of words in doc type 2
// doc type 1
int[][] Doc1 = {
new int[] {0,1,1,1},
new int[] {0,2,2,2},
new int[] {0,1},
new int[] {1,1,2,1},
new int[] {1,1,0},
new int[] {0,1,1,2}
};
// doc type 2
int[][] Doc2 = {
new int[] {2,0,0,3,1},
new int[] {1,3,2,1,2},
new int[] {0,3,2,1},
new int[] {1,1,1,3,2},
new int[] {0,0,3,0},
new int[] {0,0,1,2,3}
};
// Put the sizes into arrayes
int S = Doc1.Length;
int[] word1_size = new int[ S ];
for (int i = 0; i < S; i++)
word1_size[ i ] = Doc1[ i ].Length;
int[] word2_size = new int[ S ];
for (int i = 0; i < S; i++)
word2_size[ i ] = Doc2[ i ].Length;
// Set up the ranges
Range document = new Range(S);
Range TopicsNum = new Range(K);
//Range word1num = new Range(V1);
//Range word2num = new Range(V2);
VariableArray<int> word1Var = Variable.Observed(word1_size, document);
Range word1 = new Range(word1Var[document]);
VariableArray<int> word2Var = Variable.Observed(word2_size, document);
Range word2 = new Range(word2Var[document]);
// Initialization
double[] alpha = { 0.5, 0.5 };
double[] beta = { 0.1, 0.1, 0.1 };
double[] gamma = { 0.2, 0.2, 0.2, 0.2 };
VariableArray<Vector> theta = Variable.Array<Vector>(document);
VariableArray<Vector> lamda = Variable.Array<Vector>(TopicsNum);
VariableArray<Vector> phi = Variable.Array<Vector>(TopicsNum);
theta[document] = Variable.Dirichlet(alpha).ForEach(document);
lamda[TopicsNum] = Variable.Dirichlet(beta).ForEach(TopicsNum);
phi[TopicsNum] = Variable.Dirichlet(gamma).ForEach(TopicsNum);
// Break symmetry by initialising lamda marginals
Vector r_denseBeta = new Vector(V1, 10.0);
Dirichlet[] initLmd = new Dirichlet[K];
Dirichlet r_dirich = new Dirichlet(r_denseBeta);
for (int k = 0; k < K; k++)
initLmd[k] = new Dirichlet(r_dirich.Sample());
lamda.InitialiseTo(Distribution<Vector>.Array(initLmd));
// Break symmetry by initialising phi marginals
Vector g_denseBeta = new Vector(V2, 10.0);
Dirichlet[] initPhi = new Dirichlet[K];
Dirichlet g_dirich = new Dirichlet(g_denseBeta);
for (int k = 0; k < K; k++)
initPhi[k] = new Dirichlet(g_dirich.Sample());
phi.InitialiseTo(Distribution<Vector>.Array(initPhi));
// Latent factors
var Z = Variable.Array(Variable.Array<int>(word1), document);
var X = Variable.Array(Variable.Array<int>(word2), document);
// Observed values
var R = Variable.Array(Variable.Array<int>(word1), document);
R.ObservedValue = Doc1;
var G = Variable.Array(Variable.Array<int>(word2), document);
G.ObservedValue = Doc2;
using (Variable.ForEach(document))
{
using (Variable.ForEach(word1))
{
Z[document][word1] = Variable.Discrete(theta[document]).Attrib(new ValueRange(TopicsNum));
using (Variable.Switch(Z[document][word1]))
{
R[document][word1] = Variable.Discrete(lamda[Z[document][word1]]);
}
}
using (Variable.ForEach(word2))
{
X[document][word2] = Variable.DiscreteUniform(word1Var[document]).Attrib(new ValueRange(word1));
using (Variable.Switch(X[document][word2]))
{
using (Variable.Switch(Z[document][X[document][word2]]))
{
G[document][word2] = Variable.Discrete(phi[Z[document][X[document][word2]]]);
}
}
}
}
InferenceEngine engine = new InferenceEngine();
Console.WriteLine(engine.Infer(Z));
}
Friday, June 3, 2011 5:21 PM -
msdy replied on 10-18-2009 6:56 PM
Hi Tom,
Thanks for help.
I have corrected the code and replied here, but my posts have been appended for a day without showing. Can you check it?
Thank you.
Friday, June 3, 2011 5:21 PM -
minka replied on 10-19-2009 5:29 AM
The error message means that you are missing some ValueRange attributes on the Discrete variables. The simplest way to add these is to change the Dirichlet constructors:
theta[document] = Variable.Dirichlet(alpha).ForEach(document);
lamda[TopicsNum] = Variable.Dirichlet(beta).ForEach(TopicsNum);
phi[TopicsNum] = Variable.Dirichlet(gamma).ForEach(TopicsNum);
In each of these calls, you should use Variable.Dirichlet(valueRange, range). Any Discrete variables you create from them will automatically inherit the valueRanges.
Friday, June 3, 2011 5:21 PM -
msdy replied on 10-20-2009 6:56 PM
Thanks for reply.
I changed the constructors to
theta[document] = Variable.Dirichlet(document, alpha).ForEach(document);
lamda[TopicsNum] = Variable.Dirichlet(word1num, beta).ForEach(TopicsNum);
phi[TopicsNum] = Variable.Dirichlet(word2num, gamma).ForEach(TopicsNum);
Then I got another error: Range 'index0' is already open in a ForEach or Switch block at Z[document][word1] = ... in second using block. Does that mean Z can not be used in two parallel switch? Your help is appreciated. Thank you.
using (Variable.ForEach(document))
{
using (Variable.ForEach(word1))
{
Z[document][word1] = Variable.Discrete(theta[document]).Attrib(new ValueRange(TopicsNum));
using (Variable.Switch(Z[document][word1]))
{
R[document][word1] = Variable.Discrete(lamda[Z[document][word1]]);
}
}
using (Variable.ForEach(word2))
{
X[document][word2] = Variable.DiscreteUniform(word1Var[document]).Attrib(new ValueRange(word1));
using (Variable.Switch(X[document][word2]))
{
using (Variable.Switch(Z[document][X[document][word2]]))
{
G[document][word2] = Variable.Discrete(phi[Z[document][X[document][word2]]]);
}
}
}
}
Friday, June 3, 2011 5:21 PM -
minka replied on 10-21-2009 10:26 AM
You used the wrong valueRange in the Dirichlet constructor for theta. theta should range over topicsNum. To check that you've done this correctly, you should not need to explicitly put a valueRange attribute on Z.
Friday, June 3, 2011 5:22 PM -
msdy replied on 10-25-2009 9:10 PM
Thanks for pointing it out.
I have fixed it, and now I got another error message: "Cannot automatically determine distribution type for variable type 'int': you must specifiy a Marginal Prototype attribute for variable 'X'.
I tried to use
X.AddAttribute(new MarginalPrototype(new DiscreteUniform()));
but then I got " Error 1 Cannot create an instance of the static class 'MicrosoftResearch.Infer.Factors.DiscreteUniform'.
How do I specifiy a Marginal protptype for variable X then?
Friday, June 3, 2011 5:22 PM -
John Guiver replied on 10-26-2009 5:15 AM
I think the problem is in your definition of X which should be:
X[document][word2] = Variable.DiscreteUniform(word1);
If you do this, you will not need a marginal prototype.
To answer your specific question, you cannot use 'new DiscreteUniform()' as a constructor because DiscreteUniform() is factor method, and not a constructor for a Discrete distribution.
John
Friday, June 3, 2011 5:22 PM -
msdy replied on 10-26-2009 6:37 PM
Thank you, John.
It's right now.
- Marked as answer by Microsoft Research Friday, June 3, 2011 5:22 PM
Friday, June 3, 2011 5:22 PM