Answered by:
Model selection in infer.dot net
Question

Hi,
As I checked the documentations, Infer.NET is using Variational Bayes or Expectation Propagation for inference. I was wondering how the training is being done. Is it using the following EMlike iteration?
 Inference (and calculating all marginals, using VB or EP)
 Maximizing the likelihood using some blackbox optimization toolboxes using the marginals in the previous case.
Thanks
Saturday, October 5, 2013 2:29 AM
Answers

Yes, that paper deviates from Bayesian conventions. Infer.NET takes a pure Bayesian approach so it does not optimize anything, not even hyperparameters. This means that we can implement our inference algorithms once and apply them to all variables in the model. See the philosophy of Bayesian inference.
 Marked as answer by Asjai Tuesday, October 29, 2013 11:36 PM
Tuesday, October 29, 2013 7:06 AMOwner
All replies

Infer.NET takes a Bayesian approach so training is not considered separate from inference. See the Bayes Point Machine tutorial.
 Proposed as answer by John GuiverMicrosoft employee, Owner Tuesday, October 8, 2013 8:00 AM
Saturday, October 5, 2013 7:25 AMOwner 
Thanks John, for your comment. Can you please give me more details about how it is actually being done inside the infer.net ?
Wednesday, October 23, 2013 9:56 PM 
The user guide has a brief description and links for the various algorithms, such as Variational Message Passing. If you want the actual update equations, you will need to read the published papers on these algorithms.Thursday, October 24, 2013 9:11 AMOwner

No, I know the equations, but it looks to me that VMP or EP are just inference methods, i,e. "given the parameters of the model, they give you the marginal beliefs of your model". But, what I don't understand is that, how infer.Net is using these inference tools to find optimal parameters of the model. Would you please give me some comments on that?
Thanks.
Thursday, October 24, 2013 11:02 PM 
Hi Asjai,
We're not finding the "optimal" parameters of the model. Try to think of each datum as an information carrier. This information is spread across the factor graph to the parameters of the model in a way defined by the inference algorithm (these are the "updates" that Tom referred to above). I think that this talk gives an extensive answer to your question.
Cheers,
YordanThursday, October 24, 2013 11:59 PM 
Thanks Yordan,
I watched the video's you introduced. It looks to me that, in the videos, it is just proposing a ways to approximate the multiplication (posterior) of factors in a graphical model, but I still don't understand how it is being used for.
I don't get why you mean "We're not finding the "optimal" parameters of the model". So, in infer.NET after modelling with arbitrary distribution, and given input samples, we should be able to estimate the optimal parameters, right? Or, are you saying that, this is being done while spreading messages via EP or VB ?
Could you please clarify what do you mean by "datum as an information carrier".
Thanks a lot.
Monday, October 28, 2013 5:20 PM 
In the Bayesian approach to statistics, you don't need to find optimal parameters. Here is a short document that explains the basics of Bayesian inference: http://www.stat.rice.edu/~dcox/Stat431/BayesianInference.pdf You can find lots of other similar resources on the web or textbooks on the subject.Monday, October 28, 2013 5:59 PMOwner

Thanks for the link.
So, you're saying that, even the parameters of the model are parameterized with some prior distributions by the designer's knowledge. Then given a new given the new input query X_i, to find the output Y_i
p(Y_i  X_i ) = p(X_iY_i) p(Y_i) / p(X_i) \propto p(X_iY_i) p(Y_i) // for fixed X_i
We just need to be able to efficiently approximate p(X_iY_i) p(Y_i) and find its mod (with respect to Y_i). If I am not mistaken, Infer.NET tries to perform inference efficiently on models of this sort. (where you don't have any parameters).
This somehow surprised me, since usually it is impossible to accurately model the prior information, and it might be useful to give them some freedom by defining hyperparameters, and optimizing the hyperparameters (it is still called "Bayesian", right?!). Off the top of my head, :
http://machinelearning.wustl.edu/mlpapers/paper_files/Tipping01.pdfWhich extensively discusses it's approach to finding its optimal parameters, by maximizing a likelihood of samples. I guess these papers deviate from the Bayesian conventions then; right?
Thanks for your comments.
 Edited by Asjai Monday, October 28, 2013 10:54 PM
Monday, October 28, 2013 10:53 PM 
Yes, that paper deviates from Bayesian conventions. Infer.NET takes a pure Bayesian approach so it does not optimize anything, not even hyperparameters. This means that we can implement our inference algorithms once and apply them to all variables in the model. See the philosophy of Bayesian inference.
 Marked as answer by Asjai Tuesday, October 29, 2013 11:36 PM
Tuesday, October 29, 2013 7:06 AMOwner 
Very illuminating!
Thanks!
Tuesday, October 29, 2013 11:36 PM