Asked by:
What priors can be used to learn a (mixture of) Gamma distribution?
Question

As I noted in this thread, I'm trying to learn a gamma distribution, specifically, several as part of a mixture model. Similar to the examples with Gaussian mixtures, I've created prior distributions for each of the gamma parameters (also as gamma distributions, since they are positive). Here's what the main part of the model looks like:
public TaskRateModel(int components, Priors p) { evidence = Variable.Bernoulli(0.5).Named("evidence"); // See http://research.microsoft.com/enus/um/cambridge/projects/infernet/docs/Computing%20model%20evidence%20for%20model%20selection.aspx using (Variable.If(evidence)) { // Define ranges k = new Range(components).Named("k"); numUsers = Variable.New<int>().Named("numUsers"); n = new Range(numUsers).Named("n"); z = Variable.Array<int>(n).Named("z"); observedSessions = Variable.Array<int>(n).Named("observedSessions"); ns = new Range(observedSessions[n]).Named("userSessions"); maxSessions = Variable.New<int>(); Range ms = new Range(maxSessions); totalSessions = Variable.Array<double>(n).Named("totalSessions"); taskRatesObserved = Variable.Array(Variable.Array<double>(ns), n).Named("taskRatesObserved"); taskRatesObserved.SetValueRange(ms); sessionTimesObserved = Variable.Array(Variable.Array<double>(ns), n).Named("sessionTimesObserved"); sessionTimesObserved.SetValueRange(ms); // Set up priors sessionCountShapePriors = Variable.Array<Gamma>(k).Named("sessionCountShapePriors"); sessionCountShapePriors.ObservedValue = p.SessionCountRates; sessionCountRatePriors = Variable.Array<Gamma>(k).Named("sessionCountRatePriors"); sessionCountRatePriors.ObservedValue = p.SessionCountRates; taskRateShapePriors = Variable.Array<Gamma>(k).Named("taskRateShapePriors"); taskRateShapePriors.ObservedValue = p.TaskRateShapes; taskRateRatePriors = Variable.Array<Gamma>(k).Named("taskRateRatePriors"); taskRateRatePriors.ObservedValue = p.TaskRateShapes; sessionTimeShapePriors = Variable.Array<Gamma>(k).Named("sessionTimeShapePriors"); sessionTimeShapePriors.ObservedValue = p.SessionTimeShapes; sessionTimeRatePriors = Variable.Array<Gamma>(k).Named("sessionTimeRatePriors"); sessionTimeRatePriors.ObservedValue = p.SessionTimeShapes; sessionCountShapeDists = Variable.Array<double>(k).Named("sessionCountShapes"); sessionCountShapeDists[k] = Variable<double>.Random(sessionCountShapePriors[k]); sessionCountRateDists = Variable.Array<double>(k).Named("sessionCountRates"); sessionCountRateDists[k] = Variable<double>.Random(sessionCountRatePriors[k]); taskRateShapeDists = Variable.Array<double>(k).Named("taskRateShapes"); taskRateShapeDists[k] = Variable<double>.Random(taskRateShapePriors[k]); taskRateRateDists = Variable.Array<double>(k).Named("taskRateRates"); taskRateRateDists[k] = Variable<double>.Random(taskRateRatePriors[k]); sessionTimeShapeDists = Variable.Array<double>(k).Named("sessionTimeShapes"); sessionTimeShapeDists[k] = Variable<double>.Random(sessionTimeShapePriors[k]); sessionTimeRateDists = Variable.Array<double>(k).Named("sessionTimeRates"); sessionTimeRateDists[k] = Variable<double>.Random(sessionTimeRatePriors[k]); // Define mixture weightPrior = Variable.New<Dirichlet>().Named("weightPrior"); weightPrior.ObservedValue = p.MixtureProb; weights = Variable<Vector>.Random(weightPrior).Named("weights"); weights.SetValueRange(k); using (Variable.ForEach(n)) { z[n] = Variable.Discrete(weights); using (Variable.Switch(z[n])) { totalSessions[n] = Variable.GammaFromShapeAndRate(sessionCountShapeDists[z[n]], sessionCountRateDists[z[n]]); using (Variable.ForEach(ns)) { taskRatesObserved[n][ns] = Variable.GammaFromShapeAndRate(taskRateShapeDists[z[n]], taskRateRateDists[z[n]]); sessionTimesObserved[n][ns] = Variable.GammaFromShapeAndRate(sessionTimeShapeDists[z[n]], sessionTimeRateDists[z[n]]); } } } } }
These are the priors I'm using to train the model:
new Priors { MixtureProb = new Dirichlet( Enumerable.Repeat<double>(1, k).ToArray() ), SessionCountShapes = Util.ArrayInit(k, i => Gamma.FromShapeAndScale(0.1, 10)), SessionCountRates = Util.ArrayInit(k, i => Gamma.FromShapeAndScale(0.1, 10)), TaskRateShapes = Util.ArrayInit(k, i => Gamma.FromShapeAndScale(0.1, 10)), TaskRateRates = Util.ArrayInit(k, i => Gamma.FromShapeAndScale(0.1, 10)), SessionTimeShapes = Util.ArrayInit(k, i => Gamma.FromShapeAndScale(0.1, 10)), SessionTimeRates = Util.ArrayInit(k, i => Gamma.FromShapeAndScale(0.1, 10)), };
I've tried doing inference on this model in several ways, with different algorithms and factors.
 When EP with any Gamma factor, or VMP with GammaFromShapeAndScale, or GammaFromMeanAndVariance, I get an error that seems to mean that these factors do not support using stochastic parameters (which seems to be supported by the InferenceManager). There are a total of (#component x #distribution) of these, as expected.
Additional information: MessageTransform failed with 9 error(s) and 0 warning(s): Error 0: This model is not supported with VariationalMessagePassing due to Gamma.Sample(double sample, double shape, double scale). Try using a different algorithm or expressing the model differently in Gamma.Sample(sessionCountShapes_k_, sessionCountRates_k_) Details: [0] System.ArgumentException: Gamma is not of type double for argument 2 of method GammaFromShapeAndScaleOp.AverageLogFactor(double sample = double, double shape = Gamma, double scale = Gamma) [1] System.ArgumentException: Gamma is not of type double for argument 2 of method GammaFromShapeAndScaleOp.AverageLogFactor(Gamma sample = double, double shape = Gamma, double scale = Gamma)
 When using VMP with GammaFromShapeAndRate (the only case that compiles), I get the following sets of errors and eventually an ImproperDistributionException due to Gamma(NaN, NaN) during inference.
Compiling model...compilation had 9 warning(s). [1] GammaFromShapeAndRateOpBase.ShapeAverageLogarithm(Gamma.PointMass(sessionTimesObserved[n][userSessions]), sessionTimeShapes_marginal_F[k], sessionTimeRates_marginal_F[k], sessionTimeShapes_k__rep_B[n][k][userSessions]) has quality band Experimental which is less than the recommended quality band (Preview) [2] GammaFromShapeAndRateOpBase.ShapeAverageLogarithm(Gamma.PointMass(taskRatesObserved[n][userSessions]), taskRateShapes_marginal_F[k], taskRateRates_marginal_F[k], taskRateShapes_k__rep_B[n][k][userSessions]) has quality band Experimental which is less than the recommended quality band (Preview) [3] GammaFromShapeAndRateOpBase.ShapeAverageLogarithm(Gamma.PointMass(totalSessions[n]), sessionCountShapes_marginal_F[k], sessionCountRates_marginal_F[k], sessionCountShapes_k__B[n][k]) has quality band Experimental which is less than the recommended quality band (Preview) [4] GammaFromShapeAndRateOpBase.AverageLogFactor(Gamma.PointMass(sessionTimesObserved[n][userSessions]), sessionTimeShapes_marginal_F[k], sessionTimeRates_marginal_F[k]) has quality band Experimental which is less than the recommended quality band (Preview) [5] GammaFromShapeAndRateOpBase.AverageLogFactor(Gamma.PointMass(taskRatesObserved[n][userSessions]), taskRateShapes_marginal_F[k], taskRateRates_marginal_F[k]) has quality band Experimental which is less than the recommended quality band (Preview) [6] GammaFromShapeAndRateOpBase.AverageLogFactor(Gamma.PointMass(totalSessions[n]), sessionCountShapes_marginal_F[k], sessionCountRates_marginal_F[k]) has quality band Experimental which is less than the recommended quality band (Preview) [7] GammaFromShapeAndRateOpBase.ShapeAverageLogarithm(Gamma.PointMass(sessionTimesObserved[n][userSessions]), sessionTimeShapes_marginal_F[k], sessionTimeRates_marginal_F[k], sessionTimeShapes_k__rep_B[n][k][userSessions]) has quality band Experimental which is less than the recommended quality band (Preview) [8] GammaFromShapeAndRateOpBase.ShapeAverageLogarithm(Gamma.PointMass(taskRatesObserved[n][userSessions]), taskRateShapes_marginal_F[k], taskRateRates_marginal_F[k], taskRateShapes_k__rep_B[n][k][userSessions]) has quality band Experimental which is less than the recommended quality band (Preview) [9] GammaFromShapeAndRateOpBase.AverageLogFactor(Gamma.PointMass(totalSessions[n]), sessionCountShapes_marginal_F[k], sessionCountRates_marginal_F[k]) has quality band Experimental which is less than the recommended quality band (Preview)
Is there any hope of learning a Gamma distribution like we can learn mean/precision for Gaussians, or should we resort to a positively constrained Gaussian as you described in the other thread?
I also would be open to learning a single gamma (point mass) for this mixture model instead of a distribution over gammas; but it doesn't seem that is possible to specify with Infer.NET, despite several ways I've tried to specify it  if the prior is a point mass, it's impossible to update the posterior and feed that back into the model. I'd be open to any suggestions for that, too.
 Edited by Andrew Mao Monday, December 29, 2014 7:47 PM
Monday, December 29, 2014 7:43 PM
All replies

It turns out that the problem here was that one of the input values was NaN (from a 0/0).
It's still the case that we can only use VMP with GammaFromShapeAndRate with this type of model (and get the experimental warnings), but it does seem to work.
Sorry about the long post. It may be more userfriendly to future users to throw a more informative error on NaN inputs to double values.
 Marked as answer by Andrew Mao Monday, January 12, 2015 6:07 AM
 Edited by Andrew Mao Monday, January 12, 2015 7:53 PM
 Unmarked as answer by Andrew Mao Monday, January 12, 2015 7:53 PM
Monday, January 12, 2015 6:06 AM