Answered by:
error occurs when constructing a lda model with simple corpus (Migrated from community.research.microsoft.com)
Question

xgear posted on 02212009 4:16 AM
setting of the model:
number of documents in corpus:12
number of topics:3
number of words(terms) in corpus:12
for simplicity,suppose that each document is composed of 2 words
Normal 0 7.8 磅 0 2 false false false MicrosoftInternetExplorer4
The whole corpus is show in the table below ,with each line representing a document.
Original corpus
After indexing ,the whole corpus is denoted as
university test
teacher student
teacher university
university student
economy bank
economy money
stock economy
money stock
goverment policy
goverment president
goverment military
president policy
0, 3
0, 1
0, 2
1, 2
4, 7
4, 5
4, 6
5, 6
8, 11
8, 9
8, 10
9, 10
after runing the program,the consle window show:
Compile model.....complilation failed.
then a "transform chain" window shows information "can only indexed by loop variables,not index0",it seems the position where the error occurs is near(in) "two using nest" of source code
By the way,can jagged array provide a array of a array,where the length of last array is not fixed,so I can remove the limit that each docoment is composed of 2 words.
Your help is appreciated!
Normal 0 7.8 磅 0 2 false false false MicrosoftInternetExplorer4
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
static void Main(string[] args)
{
int M = 12;//number of documents in corpus
int K = 3;//number of topics
int V = 12; //number of words(terms) in corpus
int Nm = 2;//suppose that each document is composed of 2 words
Range CorpusSize = new Range(M);
Range TopicsNum = new Range(K);
Range WordsNum = new Range(V);
Range DocSize = new Range(Nm);
double[] alpha={ 0.5, 0.5, 0.5 };
double[] beta = { 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 };
VariableArray<Vector> theta = Variable.Array<Vector>(CorpusSize);
VariableArray<Vector> phi = Variable.Array<Vector>(TopicsNum);
theta[CorpusSize] = Variable.Dirichlet(alpha).ForEach(CorpusSize);
phi[TopicsNum] = Variable.Dirichlet(beta).ForEach(TopicsNum);
VariableArray2D<int> W = Variable.Array<int>(CorpusSize, DocSize);
VariableArray2D<int> Z = Variable.Array<int>(CorpusSize, DocSize);
using (Variable.ForEach(CorpusSize))
{
using (Variable.ForEach(DocSize))
{
Z[CorpusSize, DocSize] = Variable.Discrete(theta[CorpusSize]);
W[CorpusSize, DocSize] = Variable.Discrete(phi[Z[CorpusSize, DocSize]]);
}
}
W = Variable.Observed(new int[,] { { 0, 3 }, { 0, 1 }, { 0, 2 }, { 1, 2 }, { 4, 7 }, { 4, 5 }, { 4, 6 }, { 5, 6 }, { 8, 11 }, { 8, 9 }, { 8, 10 }, { 9, 10 } }, CorpusSize, DocSize);
InferenceEngine engine = new InferenceEngine();
Console.WriteLine(engine.Infer(Z));
}
Friday, June 3, 2011 5:23 PM
Answers

msdy replied on 11132009 9:47 PM
Thanks for reply. Now I got another question.
Not sure if I understand it correctly, but please help.
In this implementation, each document is denoted by indexed words. And each word is sampled from a topic’s word distribution. The example shows that each word only appears once in a document.
I come across a question here. There is no dimensionality reduction for documents since word counts are not used in this model. If the documents include several repeated words, then each individual word would be regarded different, and the output of the code is inference for each individual word.
For example, if I replace the docs in John’s code with
// Documents of variable length
int[] block1 = System.Linq.Enumerable.Repeat(0, 1000).ToArray();
int[] block2 = System.Linq.Enumerable.Repeat(1, 2000).ToArray();
int[] block3 = System.Linq.Enumerable.Repeat(8, 1000).ToArray();
int[] block4 = System.Linq.Enumerable.Repeat(11, 1500).ToArray();
int[] doc1 = block1.Concat(block2).ToArray();
int[] doc2 = block3.Concat(block4).ToArray();
int[] doc3 = block1.Concat(block4).ToArray();
int[] doc4 = block2.Concat(block3).ToArray();
int[][] docs = {
doc1,
doc2,
doc3,
doc4
};
Even though there are only 4 unique words (indexed by 0, 1, 8, 11) in the corpus, the model treat each single word in the document as different. The efficiency is not good in this way.
Did I understand it right? How do we handle this situation?
Thank you.
 Marked as answer by Microsoft Research Friday, June 3, 2011 5:26 PM
Friday, June 3, 2011 5:26 PM
All replies

laura replied on 02222009 10:05 AM
Hi,
Since W depends on certain choices for Z, you have to add a gate (Variable.Switch).
Furthermore, you have to give set the valueRange attribute to Z, so infer.net knowns over which values the gate ranges.
Use the following code and your model compiles.
Z[CorpusSize, DocSize] = Variable.Discrete(theta[CorpusSize]).Attrib(new ValueRange (TopicsNum));
using(Variable.Switch(Z[CorpusSize, DocSize]))
{
W[CorpusSize, DocSize] = Variable.Discrete(phi[Z[CorpusSize, DocSize]]);
}Laura
Friday, June 3, 2011 5:23 PM 
laura replied on 02222009 10:24 AM
I just came across a flaw in your code.
In your example, you first create a datastructure for W and wire it to the model. then you redefine W using a new observed data structure, which is not linked to the model. Since the data is not linked, infer() get the inference results based only on the prior.
You have to define your observed variables W as such upfront.
instead of
VariableArray2D<int> W = Variable.Array<int>(CorpusSize, DocSize).Named("W");
use the following line (and omit it later on)
VariableArray2D<int> W = Variable.Observed(new int[,] { { 0, 3 }, { 0, 1 }, { 0, 2 }, { 1, 2 }, { 4, 7 }, { 4, 5 }, { 4, 6 }, { 5, 6 }, { 8, 11 }, { 8, 9 }, { 8, 10 }, { 9, 10 } }, CorpusSize, DocSize);Another thing is that you have to break symmetry, otherwise all phis will be identical.
To break symmetry slightly, create a dense Dirichlet (denseBeta). Draw K times from it using dirich.Sample(), convert it to an infer.net array and call phi.InitializeTo()
double[] denseBeta = new double[V];
for (int v = 0; v < V; v++) denseBeta[v] = 10.0;
Dirichlet[] initPhi = new Dirichlet[K];
Dirichlet dirich = (new Dirichlet(denseBeta));
for (int k = 0; k < K; k++)
{
initPhi[k] = new Dirichlet(dirich.Sample());
}
phi.InitialiseTo(Distribution<Vector>.Array(initPhi));Laura
Friday, June 3, 2011 5:24 PM 
laura replied on 02222009 10:31 AM
To answer you final question, yes, using jagged arrays documents can have different length. If you need an example, in John Guiver's post i in the Bernoulli thread (http://community.research.microsoft.com/forums/p/2779/4511.aspx#4511 ) "e" is a jagged random variable array. Note that "sRange" is a variable range depending on "uRange".
Laura
Friday, June 3, 2011 5:24 PM 
John Guiver replied on 02232009 6:41 AM
Just to summarise everything Laura has noted (many thanks Laura), including the jagged array stuff, here is a modified version of your C# code that will compile and run:
static void Main(string [] args)
{
int K = 3; //number of topics
int V = 12; //number of words(terms) in corpus
// Documents of variable length
int[][] docs = {
new int[] { 0, 3, 4 },
new int[] { 0, 1 },
new int[] { 0, 2, 4, 5 },
new int[] { 1, 2 },
new int[] { 4, 7 },
new int[] { 4, 5 },
new int[] { 4, 6 },
new int[] { 5, 6 },
new int[] { 8, 11 },
new int[] { 8, 9 },
new int[] { 8, 10 },
new int[] { 9, 10 }};
// Put the sizes into an array
int M = docs.Length;
int[] sizes = new int[M];
for (int i = 0; i < M; i++)
sizes[ i ] = docs[ i ].Length;
// Set up the ranges
Range CorpusSize = new Range(M);
Range TopicsNum = new Range(K);
Range WordsNum = new Range(V);
VariableArray<int> docSizeVar = Variable.Observed(sizes, CorpusSize);
Range DocSize = new Range(docSizeVar[CorpusSize]);
double[] alpha= { 0.5, 0.5, 0.5 };
double[] beta = { 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1 };
VariableArray<Vector> theta = Variable.Array<Vector>(CorpusSize);
VariableArray<Vector> phi = Variable.Array<Vector>(TopicsNum);
theta[CorpusSize] = Variable.Dirichlet(alpha).ForEach(CorpusSize);
phi[TopicsNum] = Variable.Dirichlet(beta).ForEach(TopicsNum);
// Break symmetry by initialising phi marginals
Vector denseBeta = new Vector(V, 10.0);
Dirichlet[] initPhi = new Dirichlet[K];
Dirichlet dirich = new Dirichlet(denseBeta);
for (int k=0; k < K; k++)
initPhi[k] = new Dirichlet(dirich.Sample());
phi.InitialiseTo(Distribution<Vector>.Array(initPhi));
var Z = Variable.Array(Variable.Array<int>(DocSize), CorpusSize);
var W = Variable.Array(Variable.Array<int>(DocSize), CorpusSize);
W.ObservedValue = docs;
using (Variable.ForEach(CorpusSize))
{
using (Variable.ForEach(DocSize))
{
Z[CorpusSize][DocSize] = Variable.Discrete(theta[CorpusSize]).Attrib(new ValueRange(TopicsNum));
using (Variable.Switch(Z[CorpusSize][DocSize]))
{
W[CorpusSize][DocSize] = Variable.Discrete(phi[Z[CorpusSize][DocSize]]);
}
}
}
InferenceEngine engine = new InferenceEngine();
Console.WriteLine(engine.Infer(Z));
}Friday, June 3, 2011 5:24 PM 
xgear replied on 02232009 8:42 AM
thanks
Friday, June 3, 2011 5:24 PM 
Junming Huang replied on 03052009 8:27 PM
whatshould be the returned value type of engine.Infer(Z) in last line? I wanna store the posterior distribution of Z in a local variable for future use. tried several types but seemed not working.
Friday, June 3, 2011 5:24 PM 
Junming Huang replied on 03062009 3:10 AM
oh, it seems correct if I use a variable of DistributionArray<DistributionRefArray<Discrete, int>>
Thanks all
Friday, June 3, 2011 5:24 PM 
John Guiver replied on 03062009 5:09 AM
Although what you have is correct in this case, DistributionArray, DistributionRefArray, and other distribution array classes are not designed to be used in the API  Infer.NET may use any one of a number of classes to internally represent distribution arrays, chosing the most efficient representation for the model. However, they can all be referenced via the IDistribution<> interface.
We encourage you to use either one of the following two approaches, depending on what you want to do with the posterior.
IDistribution<int[][]> ZPostAsDistribution = engine.Infer<IDistribution<int[][]>>(Z);
Discrete[][] ZPostAsArray = Distribution.ToArray<Discrete[][]>(engine.Infer(Z));
We are looking at possibly making the second case more succinct in a future release by just allowing Discrete[][] to be a type parameter for the Infer method.John G.
Friday, June 3, 2011 5:24 PM 
laura replied on 03062009 5:54 AM
Normal 0 21 false false false DE XNONE XNONE MicrosoftInternetExplorer4
Hi John,
Hiding the Ref/Struct arrays is a cool thing I wasn't yet aware of.
Unfortunately I can not make it work in F#. I tried the following, but the compiler complains "The field, constructor or member 'ToArray' is not defined. " This is particularly funny since I can select the method from the member list of the Distribution class.
let infResult = inferenceEngine.Infer<IDistribution<Beta[]>>(epsilon)
let infResultObj = inferenceEngine.Infer<obj>(epsilon)
let epsilonPostAsArray = Distribution.ToArray<Beta[]>(infResultObj)
is there anything special to this method?
Laura
Friday, June 3, 2011 5:25 PM 
John Guiver replied on 03062009 6:03 AM
I think that in F# you currently need to use Distribution< >.ToArray rather than Distribution.ToArray. This is an F# bug that has been logged  it occurs when you have a generic and nongeneric version of the same class name, and the nongeneric version (Distribution in our case) has a generic method (ToArray in our case)
John
Friday, June 3, 2011 5:25 PM 
laura replied on 03062009 6:12 AM
I tried the following as well, still get the same error. I tried to rebuild all, just in case. Still no success.
let epsilonPostAsArray = Distribution<_>.ToArray<Beta[]>(infResultObj)
let epsilonPostAsArray = Distribution<Beta>.ToArray<Beta[]>(infResultObj)
// just in case I was referencing the wrong class
let epsilonPostAsArray = MicrosoftResearch.Infer.Distributions.Distribution<_>.ToArray<Beta[]>(infResultObj)
I find it strance that the following expression does not give compile errors.
let x = Distribution.Equals(infResult, infResultObj)
That is why I wonder what might be so special about the ToArray method.
Laura
Friday, June 3, 2011 5:25 PM 
John Guiver replied on 03062009 6:18 AM
You must have a space rather than an underscore in Distribution< >
John
Friday, June 3, 2011 5:25 PM 
laura replied on 03062009 6:28 AM
Thanks, John!
Friday, June 3, 2011 5:25 PM 
freddycct replied on 08132009 9:00 PM
Hi,
May I know the mathematical reason for breaking symmetry? What's so bad about all phis being identical? If we supply the data, the model learns and adapts accordingly, so I am not sure why we have to break symmetry.
Friday, June 3, 2011 5:25 PM 
freddycct replied on 08132009 9:58 PM
I run the proposed LDA code and commented out the symmetry breaking codes. So the inference returns uniform results for the inferred variables.
I read the mixture of Gaussians tutorial and it states that breaking symmetry is a consequence of using approximate inference algorithms such as VMP. Can I confirm my understanding?
 We break symmetry because of the approximate inference algorithms.
 If we use exact algorithms, do we still break symmetry?
 Will exact inference algorithms such as Junction Trees be supported in the future?
Friday, June 3, 2011 5:25 PM 
jwinn replied on 08142009 4:26 AM
The reason we need to break symmetry is that the model is not identifiable. Supposing we generated some data from the model with known phis e.g. corresponding to topics 1=education, 2=health and 3=economy. Now suppose we relabel the topics so that 1=health and 2=education and swap the parameters accordingly i.e. we swap phi1 and phi2 and we swap the first two elements of theta. If we generate from this new model, then we get data with exactly the same distribution as before the swap. In fact, this will be true of any permutation of how we label the topics, because the model is symmetric with respect to the topics.
Now suppose we have some data and don't know the phis but wish to infer their posterior distribution. The true posterior will be a multimodal distribution with one mode for every possible permutation of the topics. However, both EP and VMP can only capture a single posterior mode. Since there is no reason for the inference prodecure to favour one mode over another  the symmetry in the model means that the updates will be exactly the same for all phis and all elements of theta and the inference will get stuck in an unstable equilibrium where all phis are the same and theta is uniform. To escape from this symmetrical fixed point, we need to perturb the system somehow, to arbitrarily break the symmetry between the different topic permutations. We can do this by making the initial messages slightly different for each topic  this mean that the algorithm is started slightly closer to one posterior mode than the others and it can converge on that mode, corresponding to some particular permutation of the topics.
In summary, there is nothing wrong with the phis being identically distributed in the model before we see data. After seeing data, the nonidentifiable model will have a set of posterior modes corresponding to all possible permutations of the topics. Neither VMP nor EP can capture such highly multimodal distributions so we need to nudge them slightly towards one of the modes.
Friday, June 3, 2011 5:25 PM 
jwinn replied on 08142009 4:45 AM
To expand further on your second question, if we were able to perform exact inference, we would recover the full multimodal posterior and symmetry breaking would not be necessary. However, exact inference is not tractable in this model. In general, exact inference is only tractable for relatively small, discrete models (with some exceptions). Hence, most kinds of models that people are interested in using today (such as LDA!) are not tractable for exact inference. For this reason, supporting junction trees is relatively low on our priority list  note also that there are plenty of existing software package for junction tree inference in discrete models.
Friday, June 3, 2011 5:25 PM 
laura replied on 08142009 5:05 AM
With LDA we want to learn topics that are hidden in text documents. Phi refers to word distributions that are characteristic for each topic. If all those phis are identical, we found that all topics are identical. The question is whether all topics in the data are indeed identical (which I doubt is true if you have realistic data sets) or whether the inference got stuck at "a saddle point in optimization space". (I use the word saddle point here in a somewhat figurative manner.)
The problem with LDA is that any permutation of topic indices give an equally good solution.
Breaking the symmetry refers to initializing messages to nonnull message, i.e. instead of starting the inference loop at a position that is likely to be a saddle point, we start a bit next to it. This initialization should "wash out" during the iterations (such as a gibbs sampling initialization will not have any effect on the final result).
You can achieve a similar effect by perturbing each of the phi's prior a bit. But this will not wash out during inference.
Laura
Friday, June 3, 2011 5:25 PM 
freddycct replied on 08142009 9:39 PM
Thank you all for the replies. I think things will be clearer when I learn about variational inference in my graphical model course.
Friday, June 3, 2011 5:26 PM 
msdy replied on 11122009 9:32 PM
Hi John,
I ran this model with Gibbs sampling, but failed during compilation. Any idea?
// Use Gibbs sampling
GibbsSampling gs = new GibbsSampling();
gs.BurnIn = 100;
gs.Thin = 10;
InferenceEngine ie = new InferenceEngine(gs);
ie.NumberOfIterations = 2000;
Console.WriteLine(ie.Infer(Z));
Friday, June 3, 2011 5:26 PM 
minka replied on 11132009 8:07 AM
The Gibbs sampler is still in the experimental stages and it does not yet support 'Variable.Switch' or 'Variable.If'.
Friday, June 3, 2011 5:26 PM 
msdy replied on 11132009 9:47 PM
Thanks for reply. Now I got another question.
Not sure if I understand it correctly, but please help.
In this implementation, each document is denoted by indexed words. And each word is sampled from a topic’s word distribution. The example shows that each word only appears once in a document.
I come across a question here. There is no dimensionality reduction for documents since word counts are not used in this model. If the documents include several repeated words, then each individual word would be regarded different, and the output of the code is inference for each individual word.
For example, if I replace the docs in John’s code with
// Documents of variable length
int[] block1 = System.Linq.Enumerable.Repeat(0, 1000).ToArray();
int[] block2 = System.Linq.Enumerable.Repeat(1, 2000).ToArray();
int[] block3 = System.Linq.Enumerable.Repeat(8, 1000).ToArray();
int[] block4 = System.Linq.Enumerable.Repeat(11, 1500).ToArray();
int[] doc1 = block1.Concat(block2).ToArray();
int[] doc2 = block3.Concat(block4).ToArray();
int[] doc3 = block1.Concat(block4).ToArray();
int[] doc4 = block2.Concat(block3).ToArray();
int[][] docs = {
doc1,
doc2,
doc3,
doc4
};
Even though there are only 4 unique words (indexed by 0, 1, 8, 11) in the corpus, the model treat each single word in the document as different. The efficiency is not good in this way.
Did I understand it right? How do we handle this situation?
Thank you.
 Marked as answer by Microsoft Research Friday, June 3, 2011 5:26 PM
Friday, June 3, 2011 5:26 PM