Possible to infer Left-Right HMM / CRF with covariates affecting the transition probabilities from latent states? RRS feed

  • Question

  • Hi,

    I was wondering if Infer.NET could perform inference on what I believe might be a standard case for it, though have not found other toolboxes able to do this and could not figure out if it could after browsing the documentation.

    Perhaps it would be easiest to start by describing the model I am hoping to do inference on.  It's a very standard left-right HMM or conditional random field (CRF), shown in the figure below.  For a given number of N underlying states, there are 5 latent states, arranged in a column below, which connect to the next columns and it's own columns latent states.  The one catch in this model is that not all states emit observations, only 3/5 do (this model is very common model in biological sciences and underlies modern computational biology, it's called a PairedHMM, where a sequence alternates between emitting or not emitting observations, most DNA sequencing analysis uses it).  Unfortunately, most `machine learning` packages seem to not be able to handle this, and I am not sure if Infer.Net can.

    Outside of this one complication (of insertions and deletions, such that the observed sequence length may not match the number of columns shown), it's a very standard CRF where one starts in the "Emit 3" state on the left and simulates a multinomial transition through latent states until the path followed winds up in "Emit 3" on the right (the real model obviously goes much further to the right).  If you are in an emit state, the generative function is a very simple parameterized distribution over the 4 DNA basepairs.  The novel part is I want to model the transition probabilities based on covariates.

    Each column of nodes has fixed covariates associated with it, and I would like to model the transition probabilities out of each node in that column as a standard multinomial logistic regression (or softmax) function of these covariates.  All identically labelled nodes in the HMM would ideally have the same regression function, so I didn't think this would be too hard, as it implies an obvious Gibbs sampling scheme, which I imagine is in the toolbox, of

    1 - Use current regression parameters to sample a path through the CRF
    2 - Use current path to sample regression parameters
    3 - Repeat

    I thought Infer.NET might just be the framework that was setup to handle this type of CRF.  

    Does this seem feasible?  Any help or advice would be much appreciated.

    Sunday, February 8, 2015 9:10 AM

All replies

  • Yes, this type of HMM can be expressed in the Infer.NET modelling API.  See the user guide page on Markov chains.  A CRF would be more difficult.  Note that an HMM and CRF are not equivalent if you are learning the transition probabilities, so you need to be sure which one you want. 
    Monday, February 9, 2015 2:45 PM