freddycct posted on 07-31-2009 1:59 AM
Hi, I am a beginner in the Bayesian approach of machine learning. I have read some literature on Bayesian Inference and parameter learning. I have a few queries that I hope to find an answer to.
In the case of coin toss, we often use a Beta Distribution as a conjugate prior for inferring the probability of the next coin toss landing on heads assuming that we have observed a series of coin tosses. I understand that Beta Distribution has a convenient
property where the posterior is also a Beta Distribution.
1) My question is how did the Beta Distribution come about? How do we derive the Beta Distribution? I know that wikipedia has a derivation of Dirichlet Distribution using Gamma Distribution. But that is not intuitive and fundamental enough.
2) Can we use a Binomial Distribution as a prior for bayesian inference and prediction?
3) Finally, I read that discrete random variables should not have continuous random variables as parents. But why do we use Beta Distribution which is a continuous function as parent of a discrete event?
Pardon me if these questions sound silly. But I really hope to find an answer.