Answered by:
iterative observation of data (arrays) and problem with jagged array type specification  infer.net
Question

Hi everybody,
I read this document about CyclingTime example. I tried to implement my model in the same way. In my model I don't have a single averageTime and trafficNoise, but I have a matrix and two arrays of variables which I want to infer.
I have several series of observations, therefore due to this document I have to observe data, infer posteriors, then set posteriors as new priors and observe the next set of data.
Then for example about one of my variables which is defined as a jagged array of Gaussian variables:
protected VariableArray<VariableArray<double>, double[][]> w;
=== now my inferModelData method is returning a ModelData object, which I will use for next step priors:
public ModelData InferModelData(double[][] trainingData)
{
ModelData posteriors;
genesT1.ObservedValue = trainingData[0];
genesT2.ObservedValue = trainingData[1];
posteriors.w = InferenceEngine.Infer<VariableArray<double>, double[][]>(w);
posteriors.alpha = InferenceEngine.Infer<VariableArray<double>>(alpha);
posteriors.beta = InferenceEngine.Infer<VariableArray<double>>(beta);
posteriors.genesT1 = InferenceEngine.Infer<VariableArray<double>>(genesT1);
return posteriors;
}======and Model data is defined as follows:
public struct ModelData
{
public VariableArray<double> genesT1;
public VariableArray<double> alpha;
public VariableArray<double> beta;
public VariableArray<VariableArray<double>, double[][]> w;
}===========
I have two problems here:
* At first I don't know why this line (below) in InferModelData has errors. How should I determine return type for w?
posteriors.w = InferenceEngine.Infer<VariableArray<double>, double[][]>(w);
** My second question is that, for having a Gaussian distribution with a Gamma variance, how can I determine the type? The prior probability on w elements is "Gaussian.FromMeanAndVariance(0, Variable.GammaFromShapeAndRate(1, 1)));" . I.e. w has Laplace distribution. But I can't send this distribution to the initialization method using Gaussian type for it.
Can anyone please explain how can I determine this type?
initialData initPriors = new initialData(
Gaussian.FromMeanAndVariance(1, 0.002),
// Gaussian.FromMeanAndVariance(1, 0.002),
Gaussian.FromMeanAndVariance(1, 0.002),
Gaussian.FromMeanAndVariance(1, 0.002),
Gaussian.FromMeanAndVariance(0, Variable.GammaFromShapeAndRate(1, 1)));// ERROR HERE
==== The function for initialData is:
public initialData(Gaussian genesT1, Gaussian alpha, Gaussian beta, Gaussian w)
{
genesT1Dist = genesT1;
alphaDist = alpha;
betaDist = beta;
wDist = w;
}~~~~~~~~~~~~~
Thanks a lot for reading this long post.
Any help is appreciated :)
Zahra
Monday, June 2, 2014 1:55 PM
Answers

Have a separate variable for the prior and define W using the special Random factor. When doing incremental learning set the observed value of the prior to the inferred posterior. This is shown in the recommender example. Take a careful look at how the user and item traits are defined and inferred.
var userTraits = Variable.Array(Variable.Array<double>(trait), user); var userTraitPrior = Variable.Array(Variable.Array<Gaussian>(trait), user); userTraits[user][trait] = Variable<double>.Random(userTraitPrior[user][trait]); userTraitPrior.ObservedValue = Util.ArrayInit( numUsers, u => Util.ArrayInit(numTraits, t => Gaussian.FromMeanAndVariance(0.0, 1.0))); // Define the rest of the model // Observe data var userTraitPosterior = engine.Infer<Gaussian[][]>(userTraits); userTraitPrior.ObservedValue = userTraitPosterior; // Observe data ...
 Edited by Yordan ZaykovMicrosoft employee Monday, June 2, 2014 11:54 PM
 Proposed as answer by Yordan ZaykovMicrosoft employee Monday, June 2, 2014 11:54 PM
 Unproposed as answer by Yordan ZaykovMicrosoft employee Tuesday, June 3, 2014 4:08 PM
 Marked as answer by RazinR Wednesday, June 4, 2014 8:43 AM
Monday, June 2, 2014 3:12 PM 
Hi Zahra,
I'm not sure that what you did is correct. In the new way you defined the model, w is no longer shared across data batches. I thought you do want to share w though. For now I'll assume this is the case, so do let me know if it's not.
My previous post showed how to do online learning when the model parameters are defined using the Random factor. If you want to do online learning in a model where the prior is hierarchical, you need to take a different approach. I’ll explain how to do this for a single model parameter to avoid boiler plate code, but I hope you can extend it to arrays of parameters.
Let’s firstly consider the simpler scenario where w is defined using the Random factor from a fixed wPrior variable. That is, the prior is not hierarchical. We’ll assume that the online learning is performed over a set of data batches, and the parameter w is shared across all of them. What you do after training on each data batch is to infer the posterior of w and plug it in as the new prior for the next batch. Thus, you can think of wPrior as the summary of what we’ve learnt “so far” (or “up to the current batch”).
Unfortunately, when the prior is hierarchical, you can no longer “store” your current knowledge in it in such a way. Look at the following model, which hopefully resembles your factor graph to a certain extend. The variable w now has a Laplace prior – the mean of the Gaussian is observed, but its variance is drawn from a Gamma. Here G.M.V. stands for GaussianFromMeanAndVariance and G.S.R. stands for GammaFromShapeAndRate.
I explicitly denoted the data batches with b<sub>1</sub>…b<sub>n</sub> to illustrate the abstract model that you have. This is important because it will differ from the model that you actually run inference in. That’s because you want to process these batches one by one (as opposed to having them all at once as shown above).
To process the batches sequentially, we need to add a new “accumulator” of our knowledge (an equivalent to what wPrior was used for in the nonhierarchical case). We’ll call this wMsg. It stores the product of all messages sent to w by the previous batches. We’ll attach this variable to w by using yet another special factor – ConstrainEqualRandom (or C.E.R.). All it does is to pass forward the message of its argument. This is shown in the following factor graph:
Initially, wMsg should be Uniform, since there are no previous batches. After each batch, you need to store there the message sent upward to w, which also happens to be the marginal of w divided by its prior. Luckily, Infer.NET allows you to obtain this by doing the following:
wMsg.ObservedValue = engine.Infer<Gaussian>(w, QueryTypes.MarginalDividedByPrior);Also, you need to give the compiler a hint that you’ll be doing this:
w.AddAttribute(QueryTypes.MarginalDividedByPrior);Here’s the complete code for online learning of a Gaussian with a Laplacian mean:
Variable<int> nItems = Variable.New<int>().Named("nItems"); Range item = new Range(nItems).Named("item"); Variable<double> variance = Variable.GammaFromShapeAndRate(1.0, 1.0).Named("variance"); Variable<double> w = Variable.GaussianFromMeanAndVariance(0.0, variance).Named("w"); VariableArray<double> x = Variable.Array<double>(item).Named("x"); x[item] = Variable.GaussianFromMeanAndPrecision(w, 1.0).ForEach(item); Variable<Gaussian> wMsg = Variable.Observed(Gaussian.Uniform()).Named("wMsg"); Variable.ConstrainEqualRandom(w, wMsg); w.AddAttribute(QueryTypes.Marginal); w.AddAttribute(QueryTypes.MarginalDividedByPrior); InferenceEngine engine = new InferenceEngine(); // inference on a single batch double[] data = { 2, 3, 4, 5 }; x.ObservedValue = data; nItems.ObservedValue = data.Length; Gaussian wExpected = engine.Infer<Gaussian>(w); // online learning in minibatches int batchSize = 1; double[][] dataBatches = new double[data.Length / batchSize][]; for (int batch = 0; batch < dataBatches.Length; ++batch) { dataBatches[batch] = data.Skip(batch * batchSize).Take(batchSize).ToArray(); } Gaussian wMarginal = Gaussian.Uniform(); for (int batch = 0; batch < dataBatches.Length; ++batch) { nItems.ObservedValue = dataBatches[batch].Length; x.ObservedValue = dataBatches[batch]; wMarginal = engine.Infer<Gaussian>(w); Console.WriteLine("w after batch {0} = {1}", batch, wMarginal); wMsg.ObservedValue = engine.Infer<Gaussian>(w, QueryTypes.MarginalDividedByPrior); } Console.WriteLine("Expected: {0}", wExpected); Console.WriteLine("Actual: {0}", wMarginal);
Note that in general you need to apply this method to all model parameters that are shared across the data batches.
Hope this helps,
Yordan Proposed as answer by Yordan ZaykovMicrosoft employee Wednesday, June 4, 2014 11:19 PM
 Marked as answer by RazinR Sunday, June 8, 2014 11:29 AM
Wednesday, June 4, 2014 6:57 PM 
It does sound like w is shared between batches. What I meant by "shared" is that a single element w[i][j] might be affected by data from different batches. Because alternatively, your model could be set up in such a way that some of the elements of w only depend on some of the batches. For example, batch 1 affects only 1 tile of the matrix w, batch 2 affects another (disjoint) tile, and so on. But I don't think this is the case.
You correctly understand that wMsg serves as a knowledge accumulator, and you need to define a matrix of these. Then indeed there is no need to plug in posteriors as priors, but instead you have to be careful with carrying the knowledge through wMsg by using MaginalDividedByPrior (as shown in the example above).
The posterior over w will be learnt in the end. But note that in order to obtain it, you need to call Infer with query type Marginal (which is the default, so it can be omitted; as shown in my post above). You have to do this only once, in the end. In contrast, during the online learning, you need to call Infer on w with query type MarginalDividedByPrior for each batch and set this as the observed value of wMsg.
 Marked as answer by RazinR Thursday, June 12, 2014 10:04 AM
Thursday, June 12, 2014 9:36 AM 
That is correct. wExpected is computed by running inference on the whole data at once (that is, one large batch), while wMarginal is computed by running inference on multiple small batches. The comparison between the two is only to demonstrate that in this simple model the two posteriors match.
If running inference on the whole data doesn't explode with an outofmemory exception, then use the first approach. Otherwise, use the second.
 Marked as answer by RazinR Thursday, July 3, 2014 2:38 PM
Thursday, June 26, 2014 12:49 PM
All replies

Have a separate variable for the prior and define W using the special Random factor. When doing incremental learning set the observed value of the prior to the inferred posterior. This is shown in the recommender example. Take a careful look at how the user and item traits are defined and inferred.
var userTraits = Variable.Array(Variable.Array<double>(trait), user); var userTraitPrior = Variable.Array(Variable.Array<Gaussian>(trait), user); userTraits[user][trait] = Variable<double>.Random(userTraitPrior[user][trait]); userTraitPrior.ObservedValue = Util.ArrayInit( numUsers, u => Util.ArrayInit(numTraits, t => Gaussian.FromMeanAndVariance(0.0, 1.0))); // Define the rest of the model // Observe data var userTraitPosterior = engine.Infer<Gaussian[][]>(userTraits); userTraitPrior.ObservedValue = userTraitPosterior; // Observe data ...
 Edited by Yordan ZaykovMicrosoft employee Monday, June 2, 2014 11:54 PM
 Proposed as answer by Yordan ZaykovMicrosoft employee Monday, June 2, 2014 11:54 PM
 Unproposed as answer by Yordan ZaykovMicrosoft employee Tuesday, June 3, 2014 4:08 PM
 Marked as answer by RazinR Wednesday, June 4, 2014 8:43 AM
Monday, June 2, 2014 3:12 PM 
Dear Yordan,
Thanks a lot for your answer. I followed recommender system example and I could improve the code. But I have still one question.
My w variable has a Gaussian distribution with Gamma variance (Laplace distribution). I can't just determine its type as Gaussian. Am I right?
I need to pass this as the prior to my ModelData class and also again return the posterior probability of w from InferModelData function.The problem is with the type of Gaussian. Should I pass average and variance of w, as separate priors instead of w? And then infer average and variance instead of w directly itself?
In this case if Gamma changes, do I still have a Laplacian distribution? I mean I need this Gamma(1) I think...
Thank you a lot
Gaussian alphaPrior = Gaussian.FromMeanAndVariance(1, 0.002);
Gaussian betaPrior = Gaussian.FromMeanAndVariance(1, 0.002);
Gaussian genesT1Prior = Gaussian.FromMeanAndVariance(1, 0.002);
Gaussian wPrior = Gaussian.FromMeanAndVariance(0, 1); // can't use Gamma variance here// I need GaussianFrom..(0,Variable.GammaFromShapeAndRate(1,1))
NetModelData initPriors = new NetModelData(
Util.ArrayInit(nGenes, u => genesT1Prior),
Util.ArrayInit(nGenes, u => alphaPrior),
Util.ArrayInit(nGenes, u => betaPrior),
Util.ArrayInit(nGenes, u => Util.ArrayInit(nGenes  1, t => wPrior))
);
 Edited by RazinR Tuesday, June 3, 2014 2:58 PM
Tuesday, June 3, 2014 2:56 PM 
Hi
=============
this is not the true solution (I thought it is!). please read the next reply by Yordan. I don't delete it then you can understand what is wrong here.
=============
My problem is just solved. I defined priors on mean and variance of W, not w itself.
protected VariableArray<VariableArray<Gaussian>, Gaussian[][]> wMeanPrior;
protected VariableArray<VariableArray<Gamma>, Gamma[][]> wVariancePrior;wMeanPrior = Variable.Array(Variable.Array<Gaussian>(geneWeightRange), geneRange).Named("wMeanPrior");
wVariancePrior = Variable.Array(Variable.Array<Gamma>(geneWeightRange), geneRange).Named("wVariancePrior");
wMean = Variable.Array(Variable.Array<double>(geneWeightRange), geneRange).Named("wMean");
wVariance = Variable.Array(Variable.Array<double>(geneWeightRange), geneRange).Named("wVariance");w[geneRange][geneWeightRange] = Variable.GaussianFromMeanAndVariance(wMean[geneRange][geneWeightRange], wVariance[geneRange][geneWeightRange]);
Now I get posteriors on mean and variance and use them in next iterations of data observations.
I hope this is useful for somebody having the same problem :)
Wednesday, June 4, 2014 2:57 PM 
Hi Zahra,
I'm not sure that what you did is correct. In the new way you defined the model, w is no longer shared across data batches. I thought you do want to share w though. For now I'll assume this is the case, so do let me know if it's not.
My previous post showed how to do online learning when the model parameters are defined using the Random factor. If you want to do online learning in a model where the prior is hierarchical, you need to take a different approach. I’ll explain how to do this for a single model parameter to avoid boiler plate code, but I hope you can extend it to arrays of parameters.
Let’s firstly consider the simpler scenario where w is defined using the Random factor from a fixed wPrior variable. That is, the prior is not hierarchical. We’ll assume that the online learning is performed over a set of data batches, and the parameter w is shared across all of them. What you do after training on each data batch is to infer the posterior of w and plug it in as the new prior for the next batch. Thus, you can think of wPrior as the summary of what we’ve learnt “so far” (or “up to the current batch”).
Unfortunately, when the prior is hierarchical, you can no longer “store” your current knowledge in it in such a way. Look at the following model, which hopefully resembles your factor graph to a certain extend. The variable w now has a Laplace prior – the mean of the Gaussian is observed, but its variance is drawn from a Gamma. Here G.M.V. stands for GaussianFromMeanAndVariance and G.S.R. stands for GammaFromShapeAndRate.
I explicitly denoted the data batches with b<sub>1</sub>…b<sub>n</sub> to illustrate the abstract model that you have. This is important because it will differ from the model that you actually run inference in. That’s because you want to process these batches one by one (as opposed to having them all at once as shown above).
To process the batches sequentially, we need to add a new “accumulator” of our knowledge (an equivalent to what wPrior was used for in the nonhierarchical case). We’ll call this wMsg. It stores the product of all messages sent to w by the previous batches. We’ll attach this variable to w by using yet another special factor – ConstrainEqualRandom (or C.E.R.). All it does is to pass forward the message of its argument. This is shown in the following factor graph:
Initially, wMsg should be Uniform, since there are no previous batches. After each batch, you need to store there the message sent upward to w, which also happens to be the marginal of w divided by its prior. Luckily, Infer.NET allows you to obtain this by doing the following:
wMsg.ObservedValue = engine.Infer<Gaussian>(w, QueryTypes.MarginalDividedByPrior);Also, you need to give the compiler a hint that you’ll be doing this:
w.AddAttribute(QueryTypes.MarginalDividedByPrior);Here’s the complete code for online learning of a Gaussian with a Laplacian mean:
Variable<int> nItems = Variable.New<int>().Named("nItems"); Range item = new Range(nItems).Named("item"); Variable<double> variance = Variable.GammaFromShapeAndRate(1.0, 1.0).Named("variance"); Variable<double> w = Variable.GaussianFromMeanAndVariance(0.0, variance).Named("w"); VariableArray<double> x = Variable.Array<double>(item).Named("x"); x[item] = Variable.GaussianFromMeanAndPrecision(w, 1.0).ForEach(item); Variable<Gaussian> wMsg = Variable.Observed(Gaussian.Uniform()).Named("wMsg"); Variable.ConstrainEqualRandom(w, wMsg); w.AddAttribute(QueryTypes.Marginal); w.AddAttribute(QueryTypes.MarginalDividedByPrior); InferenceEngine engine = new InferenceEngine(); // inference on a single batch double[] data = { 2, 3, 4, 5 }; x.ObservedValue = data; nItems.ObservedValue = data.Length; Gaussian wExpected = engine.Infer<Gaussian>(w); // online learning in minibatches int batchSize = 1; double[][] dataBatches = new double[data.Length / batchSize][]; for (int batch = 0; batch < dataBatches.Length; ++batch) { dataBatches[batch] = data.Skip(batch * batchSize).Take(batchSize).ToArray(); } Gaussian wMarginal = Gaussian.Uniform(); for (int batch = 0; batch < dataBatches.Length; ++batch) { nItems.ObservedValue = dataBatches[batch].Length; x.ObservedValue = dataBatches[batch]; wMarginal = engine.Infer<Gaussian>(w); Console.WriteLine("w after batch {0} = {1}", batch, wMarginal); wMsg.ObservedValue = engine.Infer<Gaussian>(w, QueryTypes.MarginalDividedByPrior); } Console.WriteLine("Expected: {0}", wExpected); Console.WriteLine("Actual: {0}", wMarginal);
Note that in general you need to apply this method to all model parameters that are shared across the data batches.
Hope this helps,
Yordan Proposed as answer by Yordan ZaykovMicrosoft employee Wednesday, June 4, 2014 11:19 PM
 Marked as answer by RazinR Sunday, June 8, 2014 11:29 AM
Wednesday, June 4, 2014 6:57 PM 
Dear Yordan,
Thank you SoOo Much for your great explanation. I appreciate your kind reply :)
Actually I was also thinking somehow this way of separating variance and mean from w itself is not going to work :">
I have a w matrix (w[i][j]) which shows the weights between nodes i,j. I have different sets of data in which I observe values for these nodes, and I want to learn w from all of this data. So the data comes from a model, which I'm trying to learn using w. Is that what you mean by sharing w between data batches?
Now if I understand, in this case setting priors manually doesn't work (since the w parameter is hierarchical). And defining wMsg in this case, does the task of accumulating the knowledge.
So I have to define a matrix of these Msg variables to accumulate knowledge for w. Then I write the same loop for learning from data batches and there is no need to set priors with previous posteriors, etc. I hope I understand it now.
w will be learned at the end. And for other parameters like the nodes themselves, I don't have to do anything. (Like here that the code does not do anything in case of x. x is updated after observing values and finally it has its learned distribution which depends on w.)
Thank you very much again.Bests,
Zahra
 Edited by RazinR Sunday, June 8, 2014 12:08 PM
Sunday, June 8, 2014 10:44 AM 
It does sound like w is shared between batches. What I meant by "shared" is that a single element w[i][j] might be affected by data from different batches. Because alternatively, your model could be set up in such a way that some of the elements of w only depend on some of the batches. For example, batch 1 affects only 1 tile of the matrix w, batch 2 affects another (disjoint) tile, and so on. But I don't think this is the case.
You correctly understand that wMsg serves as a knowledge accumulator, and you need to define a matrix of these. Then indeed there is no need to plug in posteriors as priors, but instead you have to be careful with carrying the knowledge through wMsg by using MaginalDividedByPrior (as shown in the example above).
The posterior over w will be learnt in the end. But note that in order to obtain it, you need to call Infer with query type Marginal (which is the default, so it can be omitted; as shown in my post above). You have to do this only once, in the end. In contrast, during the online learning, you need to call Infer on w with query type MarginalDividedByPrior for each batch and set this as the observed value of wMsg.
 Marked as answer by RazinR Thursday, June 12, 2014 10:04 AM
Thursday, June 12, 2014 9:36 AM 
Thank you very much! :)
Great explanation. I appreciate it.
Thursday, June 12, 2014 10:05 AM 
I have a question about these lines of code:
Console.WriteLine("Expected: {0}", wExpected); Console.WriteLine("Actual: {0}", wMarginal);
wExpected is just affected by one batch of data. and wMarginal is the online learning from different batches. So wExpecte and wMarginal here should be almost equal if I'm not wrong. The first one is the result of observing the data in one batch, and the second is the result of observing several batches.
So as I understood I may use just wMarginal in my code. Since my data is all comming in several batches.
So I thought this is just an example of both situations, and I just need the second solution  i.e. loop over batches  in my case, and not the expectedValue of w.
Thanks a lot :)
x.ObservedValue = data; /// Gaussian wMarginal = Gaussian.Uniform(); for (int batch = 0; batch < dataBatches.Length; ++batch) { nItems.ObservedValue = dataBatches[batch].Length; x.ObservedValue = dataBatches[batch];
....
 Edited by RazinR Wednesday, June 25, 2014 4:05 PM
Wednesday, June 25, 2014 3:50 PM 
That is correct. wExpected is computed by running inference on the whole data at once (that is, one large batch), while wMarginal is computed by running inference on multiple small batches. The comparison between the two is only to demonstrate that in this simple model the two posteriors match.
If running inference on the whole data doesn't explode with an outofmemory exception, then use the first approach. Otherwise, use the second.
 Marked as answer by RazinR Thursday, July 3, 2014 2:38 PM
Thursday, June 26, 2014 12:49 PM 
Thanks a lot Yordan.
In my case I have a dynamic Bayesian network with certain number of genes at each level. I have timeseries data and I can observe the whole data at once. But I have different experiments. I mean, for example I have complete time series for experiment 1, and complete time series for experiment 2, etc.
In this case I don't change the number of variables in order to fit the data in memory, but I want to manage different sets of complete data observations. So in this case it seems to be OK if I use the same approach (considering each timeseries data as a batch). Am I right?
And also another question: for setting Querytype to Marginal for matrix is it correct to do this:
w[geneRange][geneWeightRange].AddAttribute(QueryTypes.Marginal);
I should set it for w elements not for w itself (w.AddAttribute(QueryTypes.Marginal);)?
Thank you :)
 Edited by RazinR Monday, June 30, 2014 9:58 AM
Monday, June 30, 2014 9:09 AM 
Hi Zahra,
As to your first question, I can't really answer because it's not clear to me how the different experiments are related under your model. Is it the case that in practice the learned posterior from experiment n is used as a prior in experiment n+1? Is there a reason why you don't want to have a single model which comprises all experiments?
As to your second question, please set the attribute to the whole jagged array instead of to the single elements. That is, do w.AddAttribute(QueryTypes.Marginal);
Thanks,
YordanTuesday, July 8, 2014 4:10 PM 
Hi Yordan,
Thanks for answering, and excuse me for checking your answer late due to being in vacation.
In first question, I have a dynamic Bayesian network. I have different experiments. I want to use all of these different experiments to train the weights of this dynamic Bayesian structure. So I have a fixed structure, in which weights are missing, and I want to provide the model with this data step by step. I was thinking, I should observe all relative data (relative=from the same experiment) at once in the model, and observe the next experiment data in another set.
The model is a bit complicated and I cant find a way to observe all data at once. I have no idea how can I do this task.
I have an array of arrays for storing my variables (i: rows) at N layer (j: columns). I also have a matrix of Ws (Wij).
How can I extend this model to demonstrate M experiment? If I want to observe all data at once, should I have a cube (i x j x k : k is the index for experiment, i for gene, j for layer).
Thanks a lot,
Zahra
Monday, August 4, 2014 2:11 PM