# Building Bayesian change-point detection model • ### Question

• I am attempting to implement a simple Bayesian change-point detection model. As a starting point, I use the model from PyMC3 examples section:

http://docs.pymc.io/notebooks/getting_started.html#Case-study-2:-Coal-mining-disasters

The data is disaster count in coal mines from <g class="gr_ gr_19 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" data-gr-id="19" id="19">year</g> 1851 to <g class="gr_ gr_20 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar multiReplace" data-gr-id="20" id="20">year</g> 1962. The idea is to model each a change point (year) as a separator between two disaster rates. There is <g class="gr_ gr_29 gr-alert gr_gramm gr_inline_cards gr_run_anim Grammar only-ins replaceWithoutSep" data-gr-id="29" id="29">early</g> rate (Poisson variable) before the change-point and late rate (Poisson variable) after the change-point. The goal is to infer the probability over years where this change-point is.

I have come up with the following model in Infer.NET:

```            int[] disaster_data = new int[] {4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, -999, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, -999, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1 };

Range n = new Range(disaster_data.Length);

var switchpoint = Variable.DiscreteUniform(n);

var early_rate = Variable.Exp(1);
var late_rate = Variable.Exp(1);

var data = Variable.Array<int>(n);

using (ForEachBlock block = Variable.ForEach(n))
{
//var pt = Variable.Observed<int>(n.SizeAsInt);
using (Variable.If(switchpoint > block.Index))
data[n] = Variable.Poisson(early_rate);
using (Variable.IfNot(switchpoint > block.Index))
data[n] = Variable.Poisson(late_rate);
}

data.ObservedValue = disaster_data;

InferenceEngine engine = new InferenceEngine();

var switchpointMarginal = engine.Infer<Dirichlet>(switchpoint);```

When I run the program, I get the error of zero probability model. Where can be the issue? I suspect the problem is with this condition where I check whether <g class="gr_ gr_44 gr-alert gr_spell gr_inline_cards gr_run_anim ContextualSpelling" data-gr-id="44" id="44">switchpoint</g> is bigger than the current year (as run over by the Variable.Foreach(n)). Or should I use Switch somehow?

Thanks

• Edited by Friday, November 3, 2017 8:38 PM
Friday, November 3, 2017 8:37 PM

### Answers

• The model is correct now, but it seems you have found a bug in Infer.NET.  It will be fixed in the next version.  Meanwhile, you can work-around with a switch block, like so:

```            var switchpoint = Variable.DiscreteUniform(n.Clone());
switchpoint.Name = nameof(switchpoint);

using (Variable.Switch(switchpoint))
{
var early_rate = Variable.GammaFromShapeAndRate(2.0, 2.0).Named("early_rate");
var late_rate = Variable.GammaFromShapeAndRate(2.0, 2.0).Named("late_rate");

using (ForEachBlock block = Variable.ForEach(n))
{
using (Variable.If(switchpoint > block.Index))
data[n] = Variable.Poisson(early_rate);
using (Variable.IfNot(switchpoint > block.Index))
data[n] = Variable.Poisson(late_rate);
}
}
```

• Marked as answer by Monday, November 6, 2017 8:31 PM
Monday, November 6, 2017 4:41 PM

### All replies

• Sorry for the grammarly tags, I cannot remove them! Next time I will disable it when posting questions. Sorry!
Friday, November 3, 2017 8:39 PM
• Here is my latest code where I tried to introduce the years array.

```            int[] disaster_data = new int[] {4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, -999, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, -999, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1 };

int[] years_data = Enumerable.Range(0, 111).ToArray();

Range n = new Range(disaster_data.Length).Named("years_idx");

var switchpoint = Variable.DiscreteUniform(disaster_data.Length).Named("switchpoint");

var early_rate = Variable.Exp(1).Named("early_rate");
var late_rate = Variable.Exp(1).Named("late_rate"); ;

var data = Variable.Array<int>(n).Named("data");
var years = Variable.Array<int>(n).Named("years");

using (Variable.ForEach(n))
{
using (Variable.If(switchpoint > years[n]))
data[n] = Variable.Poisson(late_rate);
using (Variable.IfNot(switchpoint > years[n]))
data[n] = Variable.Poisson(early_rate);
}

data.ObservedValue = disaster_data;
years.ObservedValue = years_data;

InferenceEngine engine = new InferenceEngine();
engine.Compiler.GenerateInMemory = false;
engine.Compiler.WriteSourceFiles = true;
engine.Compiler.IncludeDebugInformation = true;

try
{
var switchpointMarginal = engine.Infer<Discrete>(switchpoint);
Console.WriteLine(switchpointMarginal);
}
catch (MicrosoftResearch.Infer.Maths.AllZeroException exception)
{
Console.WriteLine(exception);
}

Console.ReadKey();```

Friday, November 3, 2017 9:27 PM
• The error message here is correct.  The number -999 has zero probability of being drawn from a Poisson distribution.
Friday, November 3, 2017 10:35 PM
• Thanks! This is something that I wanted to address later with missing values but forgot to remove or impute that data for now.

I put in some values from the valid range to get rid of unobserved (-999) values. The inference gives me a flat posterior. Is there another issue that I can't see?

Friday, November 3, 2017 10:52 PM
• Both rates are the same constant, so there's no reason to put the changepoint anywhere.
Friday, November 3, 2017 11:06 PM
• Interesting! In the PyMC3 example the rates are both set to 1.0. To my understanding the data should update the two parameters. Or it can't because it is fixed?

I tried to put a prior for each of the rate parameters. Since there is no Exponential distribution in Infer.NET, I went with Gamma(1, param) where param is a RV that has a flat prior set. Hopefully this would allow the model to learn the rate parameters instead of being fixed. Am I correct here?

My current code is now:

```Gamma early_rate_prior = Gamma.Uniform();
Gamma late_rate_prior = Gamma.Uniform();

var early_rate_param = Variable.Random<double>(early_rate_prior);
var late_rate_param = Variable.Random<double>(late_rate_prior);

var early_rate = Variable.GammaFromShapeAndRate(1.0, early_rate_param).Named("early_rate");
var late_rate = Variable.GammaFromShapeAndRate(1.0, late_rate_param).Named("late_rate");```

Now I get the following exception: "The distribution is improper (Gamma(NaN, NaN)). Cannot compute expectations."

Friday, November 3, 2017 11:52 PM
• That is an improper prior.  You must use proper priors in Infer.NET.
Saturday, November 4, 2017 11:26 AM
• I definitely lack some basics then. Correct me if I am wrong:

• Isn't the Gamma prior the right choice for the Poisson distribution (support [0, +infty])?
• Isn't Poisson distribution special case of Gamma if shape=1.0 and rate=lambda/mean?
• To define the prior, the rate parameter, shouldn't I use a distribution that has a support [0, +infty]?

What do you think would be the right parametrization? May be you can suggest some reading that I can go and learn?

Thanks!

• Edited by Saturday, November 4, 2017 10:33 PM
Saturday, November 4, 2017 10:27 PM
• The meaning of the term "improper prior" can be found on Wikipedia.
Saturday, November 4, 2017 10:40 PM
• Thanks for the pointer.

When I set the two rates RVs as fixed Gamma distributions (with fixed parameters):

```var early_rate = Variable.GammaFromShapeAndRate(2.0, 2.0).Named("early_rate");
var late_rate = Variable.GammaFromShapeAndRate(2.0, 2.0).Named("late_rate");```

I get a different error:

```System.ArgumentException has been thrown
probTrue = 1 is not in [0,1]
Parameter name: probTrue```

I am pretty sure that Gamma is a proper prior for Poisson distribution parameter lambda. What is this "probTrue"?

Sunday, November 5, 2017 7:10 AM
• The model is correct now, but it seems you have found a bug in Infer.NET.  It will be fixed in the next version.  Meanwhile, you can work-around with a switch block, like so:

```            var switchpoint = Variable.DiscreteUniform(n.Clone());
switchpoint.Name = nameof(switchpoint);

using (Variable.Switch(switchpoint))
{
var early_rate = Variable.GammaFromShapeAndRate(2.0, 2.0).Named("early_rate");
var late_rate = Variable.GammaFromShapeAndRate(2.0, 2.0).Named("late_rate");

using (ForEachBlock block = Variable.ForEach(n))
{
using (Variable.If(switchpoint > block.Index))
data[n] = Variable.Poisson(early_rate);
using (Variable.IfNot(switchpoint > block.Index))
data[n] = Variable.Poisson(late_rate);
}
}
```

• Marked as answer by Monday, November 6, 2017 8:31 PM
Monday, November 6, 2017 4:41 PM
• Thank you, Tom! Much appreciated for unblocking me on this example!

I did not try the model yet but I will soon get to it and keep playing with it.

EDIT: Just updated my code and I can confirm that the model works. The switchpoint posterior matches very closely (visually) that computed using PyMC3 that runs NUTS under the hood.
• Edited by Tuesday, November 7, 2017 4:29 AM
Monday, November 6, 2017 10:07 PM