locked
Beginner simple HMM implementation, not converging RRS feed

  • Question

  • Hi all,

    I started this week using Infer.NET, and I tried to implement a very basic HMM.

    This is the code, for reference and reproducibility: http://pastebin.com/YaD9WwdU

    I know it is long (300 lines), but is very basic and thoroughly commented.

    There is a:

     - TargetModel that generates some distribution for the transition and emission CPT, and then samples some sequences of the observed variable.

    - main Model, that tries to learn the transition and emission CPT of the TargetModel.

    Unluckily, it never converges to the right solution, no matter how many sequences, how long, how many observed or hidden states (of course, in the trivial case with one hidden state it works). 

    The code is inspired from the examples:

    [1] https://social.microsoft.com/Forums/en-US/cd249a23-72f5-4fe7-bfcb-79601998ca2d/an-influence-model-migrated-from-communityresearchmicrosoftcom?forum=infer.net 

    [2] http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Standard%20LDA.aspx

    Can someone please tell me if I'm doing something wrong? Even little comments/recommendations may help.

    Many thanks,

    Best,



    Thursday, June 2, 2016 3:11 PM

All replies

  • If you look at the learned A matrix, you will see that it isn't breaking symmetry between the two states.  When I initialize A at the true value, it converges to the correct answer.  These two pieces of evidence suggest that you need to use a smarter initialization.  Instead of initializing the parameters, I suggest initializing the hidden state sequence.  That is the approach used in the code at https://github.com/oliparson/infer-hmm.
    Friday, June 3, 2016 1:12 PM
    Owner
  • Thanks for the quick answer Tom, from a quick look the test model in your link is indeed very similar to mine, a part from the Gaussian observations.

    I'll try it in the next few days and let you know.

    Friday, June 3, 2016 9:43 PM
  • Unfortunately, I was not able to obtain convergence of A or B neither with 

    - the initialization of A to the true value, nor

    - the random initialization of hidden state sequences (as in your link).

    a part from rare cases. I reduced the alphabetSize to 2, in order to work with a comparable parameter space as the model in your link.

    Did you obtain convergence with the code I posted before?


    Monday, June 6, 2016 1:09 PM
  • Yes.  With Rand.Restart(101) I get the following:
    A
    Discrete(0.3755 0.6245)
    Discrete(0.5382 0.4618)
    B
    Discrete(0.855 0.145)
    Discrete(0.5934 0.4066)
    transitions
    0.2824 0.7176
    0.4084 0.5916
    emissions
    0.5218 0.4782
    0.8266 0.1734
    Here we see that it has swapped the two hidden states but otherwise approximately recovered the parameters.  Keep in mind that the algorithm here is VMP not EM so you will not be getting maximum-likelihood estimates.
    Monday, June 6, 2016 4:47 PM
    Owner
  • First of all, thank you again for replying! I'm really curious about this library and I'd like to use it in my research.

    Yes, with Rand.Restart(101) (and no ground truth A and B initializations or random states initializations) I get the exact same numbers.

    However by looking at the estimated transitions matrix, I see that:

     - The transition matrix seems different from A, because in A we have 2 unstable states, while in transition, only state 0 is unstable.

     - Row 1 of transition is similar to row 0 of A, but in reality it is inverted, because self-transition transition[1][1] is similar to A[0][1] which is not the self-transition.

    I admit this is reasoning is quite rough, then I experimented a little with different seeds (Rand.Restart 102, 103, 104 etc.) and I get unpredictable behaviors in the estimations. This is why I thought I made some beginner mistake in the implementation.

    I experimented with EM with the same A and B matrices and 1 sequence of 10000 frames, and I get exact estimation of A and B up to the second decimal. Probably I adopted an inherently wrong approach by assuming that the mean of the posterior distributions (with an uninformed prior)  would approximate the ML estimates. 

    I've just found in this forum a recommendation about your lectures, that I'll watch in order to help me clarify my thoughts:

    (http://videolectures.net/mlss09uk_minka_ai/)

    ---

    I also tried GibbsSampling engine (replacing just the inference engine initialization) and it crashed saying:

    "System.Exception: The model has a loop of deterministic edges, which is not allowed by Variational Message Passing."

    However I can't see any of such loop of deterministic edges and I am not using VMP in this case, so I have some difficulties in trying to figure out how to debug this exception.





    Monday, June 6, 2016 8:16 PM
  • An update: I experimented with Gaussian emissions (replacing just the part of the code regarding the emissions with code extracts from the oliparson/infer-hmm link), and the model is working perfectly with both ExpectationPropagation and VMP.

    Could it be an issue with discrete emissions?

    Tuesday, June 7, 2016 8:51 PM
  • This is the current version of the module (still not learning properly), for reference:

    http://pastebin.com/kfkcLzxq

     

    Hidden states initialization is at lines 244-245, called from line 496

    Initialization of parameters (A and B) at lines 302-328, called from line 492 or 493

    Thursday, June 9, 2016 3:22 PM