[Contd] Online Bernoulli Mixture Model -- update messages RRS feed

  • Question

  • I finally got a chance to get back working on online version of the Bernoulli Mixture model (cf. previous posts in forum). My question is about random variables and the linked accumulators: Do I need to pass all of them to the next iteration (batch of data)? In the case of the BMM model, there are several random variables that must be updated:

    • pi -- prior over cluster proportions
    • c -- cluster assignments for data points
    • t -- per-cluster specification (array of Bernoullis)

    Online updates are possible via special accumulators that are created and attached to the random variable to be updated. For example, to enable "pi" updates, the following code defines an accumulator and attaches it:

    var piMessage = Variable.Observed<Dirichlet>(Dirichlet.Uniform(numClusters));
    Variable.ConstrainEqualRandom(pi, piMessage);

    The one proceeds with model defition and online learning part. In online learning loop where the data is fed in chunks, the "pi" updates are done like this:


    piMarginal = engine.Infer<Dirichlet>(pi); piMessage.ObservedValue = engine.Infer<Dirichlet>(pi, QueryTypes.MarginalDividedByPrior); ...

    My question is: do I need to write such update code for all three model parameters?

    Friday, June 30, 2017 11:40 PM


  • You need it for pi and t since these are shared between batches.  c is not shared between batches.
    • Marked as answer by usptact Wednesday, July 5, 2017 3:53 PM
    Wednesday, July 5, 2017 11:28 AM

All replies

  • You need it for pi and t since these are shared between batches.  c is not shared between batches.
    • Marked as answer by usptact Wednesday, July 5, 2017 3:53 PM
    Wednesday, July 5, 2017 11:28 AM
  • I spent quite some time figuring out how to do the update for t. The main difficulty is that it is an array of arrays.

    I start by declaring the array itself:

    var t = Variable.Array(Variable.Array<bool>(d), k).Named("t");

    Then I initialize it with uniform distribution (may be I should make it slightly more informed by using Beta(2,2)?):

    t[k][d] = Variable.Beta(1,1).ForEach(k).ForEach(d);

    Prepare a .NET array of arrays of Beta distributions:

    Beta[][] tInit = new Beta[numClusters][];
    for (int i = 0; i < numClusters; i++ ) {
      tInit[i] = new Beta[numDims];
      for (int j = 0; j < numDims; j++)
        tInit[i][j] = Beta.Uniform(); // or use Beta(2,2)?

    Then I create an accumulator and make it observed. It is initialized with tInit and will be updated by inference engine by each new batch:

    var tMessage = Variable.Observed(tInit, k, d);



    Then I need to constrain each respective variable in t and tMessage to be equal (tried to create a one-liner to no avail):

    using (Variable.ForEach(k)) {
      using (Variable.ForEach(d)) {
        Variable.ConstrainEqualRandom(t[k][d], tMessage[k][d]);

    The other code follows. Before running inference in batches, I create an array of arrays for marginals for t, that I will use to follow the learning progress:

    Beta[][] tMarginal = new Beta[numClusters][];
    for (int i = 0; i < numClusters; i++) {
      tMarginal[i] = new Beta[numDims];
      for (int j = 0; j < numDims; j++)
        tMarginal[i][j] = new Beta(2, 2);

    Now comes the main learning loop. The data is fed in chunks and information about pi and t are learned:

    bool[][] batch = new bool[batchSize][];
    for (int b = 0; b < numData / batchSize; b++ ) {
      nItems.ObservedValue = batchSize;
      batch = data.Skip(b * batchSize).Take(batchSize).ToArray();
      x.ObservedValue = batch;
      piMarginal = engine.Infer<Dirichlet>(pi);
      tMarginal = engine.Infer<Beta[][]>(t);
      var temp = engine.Infer<Dirichlet>(pi. QueryTypes.MarginalDividedByPrior);
      tMessage.ObservedValue = engine.Infer<Beta[][]>(t, QueryTypes.MarginalDividedByPrior);
      piMessage.ObservedValue = temp;
      // logic that display tMarginal as image

    Are there any issues with this code? Here are my observations when trying to learn clusters from 10K MNIST data (28x28 binary images):

    • batch needs to be as large as possible
    • small batch (e.g. 100) results in only few cluster information being updated; only few positions in piMarginal increment
    • large batches (e.g. 500 and more) require gigabytes of memory
    • large batches update most of the clusters (according to piMarginal)
    • displaying tMarginal as images show that clusters such as "1", "0", "3", ocassionally "5" and "8" are learned
    • increasing the number of clusters past 10, e.g. 15 does not learn all 10 digits (only few clusters are updated)

    What puzzles me is why not all clusters are populated and updated by the algorithm? The algorithm can update 3 out of 10 positions in piMessage in after first batch. The next batches will update only the very same 3 clusters. I don't see updates spreading over all clusters. I suspect something is still wrong.

    Wednesday, July 5, 2017 4:32 PM
  • This code looks correct to me.  If you suspect a problem with inference, you can always generate synthetic data from known parameters and see if those parameters are recovered.
    Wednesday, July 5, 2017 4:36 PM
  • Thanks, Tom, for quick feedback! Yes, I did try to learn from synthetic data. I generated data from 10 clusters (1000 examples from each). Each cluster was specified by 10 Bernoullis, where the probTrue was randomly set in range [0,1]. Therefore a cluster was guaranteed different from each other cluster.

    When fitting the model, I noticed that the algorithm was able to recover only few of them (in reasonable bounds for probTrue). Other clusters were significantly different.

    I will try to hand-craft clusters, e.g. checkerboard pattern. Something that makes clusters very different.

    Wednesday, July 5, 2017 5:31 PM
  • any updates to your issue @usptact ? I'm trying to do something similar and it doesn't look like my variables are receiving updates either. 
    Monday, October 2, 2017 3:07 PM
  • @cyentist I spent quite a lot of time trying to get it working. In my example, the updates were working but with a fairly large batch size. If I set a small batch, only few cluster RVs got updated, no matter how many data batches followed. For example, I set the batch size to 50 and request 10 clusters. The learning algorithm will start by updating some random 2-3 clusters and keep learning those with subsequent batches. The other 7-8 cluster RVs stayed at their init values. I wasn't able to solve this issue...
    • Edited by usptact Wednesday, October 4, 2017 11:26 PM
    Wednesday, October 4, 2017 11:25 PM