Answered by:
weighted likelihood (Migrated from community.research.microsoft.com)
Question

TimSal posted on 02222011 11:16 AM
I am estimating a large model where I want to down weight the likelihood contributions of some of the older observations using a weighted log likelihood function, i.e. log(p(y)) = w1*log(p(y1))+w2*log(p(y2))
Is there a convenient way to do this, apart from manually editing the messages in the generated code?
I know it would probably be better to formally allow the parameters of the model to vary over time, but this will greatly increase the size of the model, and Infer.NET does not handle markov models optimally yet.
I can down weight observations by making them conditional on unobserved boolean random variables, i.e.
using (Variable.If(Variable.Bernoulli(weight))) { process observations }
but this has the undesirable effect of making the actual weights dependent on the observations (as these influence the posterior of the boolean r.v. they depend on) Is there any way to get around this? Is it possible to fix the posterior distribution of the boolean r.v. in some way, or does anyone know another method of weighting the likelihood terms?
Friday, June 3, 2011 6:19 PM
Answers

John Guiver replied on 02242011 10:26 AM
Again that's going to depend on what you are doing  i.e. which of the many types of HMM are you looking at and what do you want to learn? If you are learning transition matrix parameters then it is probably best right now to do a shared variable approach  this will guarantee getting the schedule right. Can you explain further why you think that this will be so inefficient  how many time steps are you talking about?
Anyway, a shared variable approach will involve creating a 'slice' model class which has an input state, output state, transition parameters, emission parameters, and emission variable. All variables will have 'variable' prior distributions (i.e. distributions that can be changed at run time without recompilation of the model  for example Variable<Discrete>); this is avoid model compilation at each step of the chain. The output state will be constrained to rather than derived from the downstream 'prior' for the backward pass. You will also need to have some message initialisation (again in a form that does not trigger compilation of the model) in order to break symmetries. You also need a 'start of chain' model  however this has large overlap with the slice model and can use the same class. Then create one 'start of chain model' instance, and one 'slice model' instance, and use shared variables. I can help show how to implement this (I have used a similar approach on an another project), but I would need a precisie model description, and preferably some toy data to illustrate it.
If you are not learn transition parameters then you may be able to use the unrolled version  we can cope with much larger unrolled models than we could a year again due improvments in the model compiler.
John
 Marked as answer by Microsoft Research Friday, June 3, 2011 6:20 PM
Friday, June 3, 2011 6:20 PM
All replies

TimSal replied on 02232011 4:23 AM
I managed to obtain the desired result (in a very limited setting) by implementing the following new factor: in one direction it passes on messages unaltered, in the other it weights the natural parameters of the message with the specified weight. I would appreciate any feedback.
public static class MyFactor
{
[ParameterNames("varOut", "varIn", "weight")]
public static double WeightContribution(double varIn, double weight)
{
return varIn;
}
[FactorMethod(typeof(MyFactor), "WeightContribution")]
[Quality(QualityBand.Experimental)]
public static class WeightContributionOp
{
public static Gaussian varOutAverageConditional([SkipIfUniform] Gaussian varIn)
{
return varIn;
}
public static Gaussian varInAverageConditional([SkipIfUniform] Gaussian varOut, double weight)
{
double mean, variance;
varOut.GetMeanAndVariance(out mean, out variance);
return new Gaussian(mean, variance / weight);
}
[Skip]
public static double LogEvidenceRatio(Gaussian varOut) { return 0.0; }
}
}
Friday, June 3, 2011 6:19 PM 
John Guiver replied on 02232011 8:53 AM
Hi Tim
Congratulations on implementing a new factor!
In order for us to advise, we really need to know exactly what your model is. You can in fact do largescale HMM models efficiently in Infer.NET right now (although it is not straightforward to get this right, and we are actively working on improving the API), so if you give more detail, we may be able to do what you want without a workaround. At least it should help us provide more focussed advice, because right now it is not clear how your new factor fits in the model, or why it is only acting in one direction.
John
Friday, June 3, 2011 6:19 PM 
TimSal replied on 02242011 3:39 AM
Hi John,
I'm working on several related models. What they have in common is that the dimension of the statevector (number of parameters at any one time) is relatively large compared to the number of time periods. This probably means that "unrolling" the HMM will not result in a great loss of efficiency. However, it will of course be important to get the message passing schedule correct (i.e. forwardbackward). What is the best way to achieve this? I am currently calling the compiled models from Matlab which makes using shared variables somewhat impractical.
Friday, June 3, 2011 6:20 PM 
John Guiver replied on 02242011 10:26 AM
Again that's going to depend on what you are doing  i.e. which of the many types of HMM are you looking at and what do you want to learn? If you are learning transition matrix parameters then it is probably best right now to do a shared variable approach  this will guarantee getting the schedule right. Can you explain further why you think that this will be so inefficient  how many time steps are you talking about?
Anyway, a shared variable approach will involve creating a 'slice' model class which has an input state, output state, transition parameters, emission parameters, and emission variable. All variables will have 'variable' prior distributions (i.e. distributions that can be changed at run time without recompilation of the model  for example Variable<Discrete>); this is avoid model compilation at each step of the chain. The output state will be constrained to rather than derived from the downstream 'prior' for the backward pass. You will also need to have some message initialisation (again in a form that does not trigger compilation of the model) in order to break symmetries. You also need a 'start of chain' model  however this has large overlap with the slice model and can use the same class. Then create one 'start of chain model' instance, and one 'slice model' instance, and use shared variables. I can help show how to implement this (I have used a similar approach on an another project), but I would need a precisie model description, and preferably some toy data to illustrate it.
If you are not learn transition parameters then you may be able to use the unrolled version  we can cope with much larger unrolled models than we could a year again due improvments in the model compiler.
John
 Marked as answer by Microsoft Research Friday, June 3, 2011 6:20 PM
Friday, June 3, 2011 6:20 PM