locked
Learning and prediction discrete model RRS feed

  • Question

  • Hi,

    I’d like to know if, using infer.net, it’s possible to train a discrete model, save it and then use it for prediction.

    I read Infer.Net 101 guide and there is only a continuous distribution example.

    Regards


    Tuesday, June 5, 2012 2:15 PM

All replies

  • Hi Alessandro,

    What do you mean by 'saving' a model? There's no 'model format' yet, but you can definitely use the posteriors that you've learnt, and feed them into your model as priors. The 101 paper explains how this can be done for Gaussian, and you can implement it for other distributions in a similar way. For learning a Discrete, you might want to take a look at this post.

    -Yordan

    Tuesday, June 5, 2012 3:32 PM
  • I try to explain better what I want to do.

    I yet defined a model like this:

    The gray nodes are all observed variable:

    O nodes represent availed internet browsing observation of users

    F nodes represent user’s features like age, gender, etc.

    K is è hidden node. It is used to group configuration of observed data in something like clusters.

    I want to learn this model with Infer.Net so, in a second time, I want hide the user’s features and infer those only from browsing data.

    My second model will be something like this:

    Thursday, June 7, 2012 9:07 AM
  • Hi Alessandro

    It is difficult to answer this without a bit more detail about your model as there are several ways of formulating the model. At first glance it would seem more sensible for the model to be generating the demographic data. Anyway I am assuming that K is an integer random variable indexed by data that is derived from discrete distributions conditioned on urls and/or demographics. It is the Dirichlet-distributed parameters of these discrete distributions that you need to learn and save out. Is your question, then, about how to serialize Dirichlet distributions?

    John

    Friday, June 8, 2012 3:30 PM
    Owner
  • Hi John,

    My aim is try to infer more than one demographics feature simultaneously, differently of what I can do with classifier like SVM, decision tree, Random forest, Bayesian classifier etc. which can infer one feature at a time.<o:p></o:p>

    I know it's more sensible create a model where demographics data generate browsing data, but, in this case, I can consider only 3 demographics features at the same time, because I can’t create a model with a variable dependent more 3 variables. <o:p></o:p>

    I could do some experiment changing variables.<o:p></o:p>

     

    Anyway this is what I’d like to do.<o:p></o:p>

    Let’s suppose to create a model like this:

    <o:p></o:p>

     

    My dataset contains browsing data of user with known features like gender, age, etc.<o:p></o:p>

    I split my dataset in 2 parts; in future I can split it in n- folder.<o:p></o:p>

    In the first phase I’d like to do training of this model using a part of dataset and during a second phase i'd like to infer demographics data using only browsing data and learned model and compare demographics predictions with true data, but I don’t know if it’s possible in Infer.Net framework.<o:p></o:p>

     

    Another possibility is creating a model like this:<o:p></o:p>

     

    Where k is latent node and its range is arbitrarily definited.<o:p></o:p>

    I can analyze the joint probability distribution of demographics features for each “cluster”.<o:p></o:p>

    Ex: Cluster 1, most are Male, age < 18, …<o:p></o:p>

    Cluster 2 – Female, 18 < age > 27, …<o:p></o:p>

    And so on, creating an association between cluster and demographics features configuration.<o:p></o:p>

    I’d like to use this information so, analyzing new data, I can suppose demographics features using belonging cluster information, but i can not use this information without saving the model.

    I hope I described my aim more clearly.

    Alessandro

    Monday, June 11, 2012 11:35 AM
  • Sorry Alessandro, but I still don't understand your model nor what question you are asking. Let's concentrate on the second model. It seems that you have probability vectors over clusters conditioned on demographics. If so, you will need to save the posterior distribution (a Dirichlet) of the probability vector for each demographic - does this answer your question?

    Tuesday, June 12, 2012 2:21 PM
    Owner