Mixture of unigram model (Migrated from community.research.microsoft.com) RRS feed

  • Question

  • davmago posted on 03-02-2011 4:51 AM


    I'm trying to write a mixture of unigram model. i.e a model with an observed variable W (representing the word appearing in a document) and a latent variable \theta that select "the topic" from which the current word in the document is sampled.

    The generative process for one document should be something like:

    • For Each word in the document,
      • Sample a topic z ~ \theta,
      • Sample the word according the topic specific distribution. w~ \theta_z

    I world like to infer The topic distribution \theta, and the topic specific word distribution.

    I read the plain LDA model on the tutorial, but i'm a bit confused on how i can define \theta and W.

    Can you give me some advice?

    Thanks in advance



    Friday, June 3, 2011 6:29 PM


  • John Guiver replied on 03-04-2011 4:29 AM

    Hi Davide

    I am not sure how this differs from the LDA model we provide. The page on Inference and Prediction shows how to extract the distributions. The full source code is available as part of the down load and shows examples of calling it.


    Friday, June 3, 2011 6:29 PM