Answered by:
Bayesian statistic theory question (Migrated from community.research.microsoft.com)
Question

freddycct posted on 07312009 1:59 AM
Hi, I am a beginner in the Bayesian approach of machine learning. I have read some literature on Bayesian Inference and parameter learning. I have a few queries that I hope to find an answer to.
In the case of coin toss, we often use a Beta Distribution as a conjugate prior for inferring the probability of the next coin toss landing on heads assuming that we have observed a series of coin tosses. I understand that Beta Distribution has a convenient property where the posterior is also a Beta Distribution.
1) My question is how did the Beta Distribution come about? How do we derive the Beta Distribution? I know that wikipedia has a derivation of Dirichlet Distribution using Gamma Distribution. But that is not intuitive and fundamental enough.
2) Can we use a Binomial Distribution as a prior for bayesian inference and prediction?
3) Finally, I read that discrete random variables should not have continuous random variables as parents. But why do we use Beta Distribution which is a continuous function as parent of a discrete event?
Pardon me if these questions sound silly. But I really hope to find an answer.
Friday, June 3, 2011 5:10 PM
Answers

freddycct replied on 08052009 12:53 AM
Hi,
Can you recommend a textbook that explains the exponential family of distribution easily.
For the problem of continuous parents and discrete child, I read about the difficulties from Uri Lerner who classify these as Hybrid Bayesian Networks. http://robotics.Stanford.EDU/~uri/Papers/ . I guess it might be easier now.
Thanks for the reply.
 Marked as answer by Microsoft Research Friday, June 3, 2011 5:10 PM
Friday, June 3, 2011 5:10 PM
All replies

John Guiver replied on 08042009 5:05 AM
(1) The conjugate prior for a parameter of any exponential family distribution can be derived mechanically  see the Wiki article on exponential family distributions which has a section on conjugate priors. The Beta distribution can be derived in this way as the conjugate prior of the single parameter (the probability of success) of both the Bernoulli distribution (1 trial) and the Binomial distribution (many trials)
(2) The Binomial distribution, or a hierarchical BinomialBernoulli structure could certainly be used as a prior distribution for an integer parameter of known bound (i.e. number of trials), as more generally, can any Discrete distribution
(3) Not sure where you read this. Assuming I have correctly interpreted your question, the Bayes Point Machine is an example of discrete random variable having a continuous parent.
Hope this helps.
John G
Friday, June 3, 2011 5:10 PM 
freddycct replied on 08052009 12:53 AM
Hi,
Can you recommend a textbook that explains the exponential family of distribution easily.
For the problem of continuous parents and discrete child, I read about the difficulties from Uri Lerner who classify these as Hybrid Bayesian Networks. http://robotics.Stanford.EDU/~uri/Papers/ . I guess it might be easier now.
Thanks for the reply.
 Marked as answer by Microsoft Research Friday, June 3, 2011 5:10 PM
Friday, June 3, 2011 5:10 PM