locked
Constructing Bayesian network...CPT and DAG for discrete variable network? (Migrated from community.research.microsoft.com) RRS feed

  • Question

  • bjornjon posted on 02-26-2011 12:56 PM

    Hi,

    I'm having some troubles getting into the Infer.NET mindset.

    I'm constructing a BN and the way I'm used to follow is:

    1) Preprocess CPT tables for each variable(node) from data/knowledge as vectors or arrays of different dimensions

    - if variable has no parents then use the priors for each discrete state

    - if f.e a variable with r number of states has parents with n and m states then the CPT has r*n*m values

    - and so on...

    2) Construct a DAG by connecting each node to its parents

    3) Plugg the CPT's into the DAG for correct variables

    I saw a nice example of the classic sprinker Bayesian network demo in the forums, but there the variables are boolean. I have variables that have maybe 10 discrete states and some variables(nodes) have 2-3 parents resulting in maybe 10*10*10*10 different values for the CPT....how do you code a Bayesian network like this in Infer.NET .....and even if you just have 3-4 states for each discrete variable...hows that coded?

    Best,

    BJ

    Friday, June 3, 2011 6:22 PM

Answers

  • John Guiver replied on 03-01-2011 4:06 AM

    To query for for car type given model and engine type:

        public Discrete QueryCarTypeFromModelAndEngine(int carModel, int engineType)

        {

          NumCases.ObservedValue = 1;

          CarModel.ObservedValue = new int[] { carModel };

          EngineType.ObservedValue = new int[] { engineType };

          CarType.ClearObservedValue();

          PowerTrain.ClearObservedValue();

     

          PT_M_Prior.ObservedValue = PT_M_Posterior;

          PT_P_Prior.ObservedValue = PT_P_Posterior;

          CPT_E_Prior.ObservedValue = CPT_E_Posterior;

          CPT_T_Prior.ObservedValue = CPT_T_Posterior;

     

          // Run the inference

          return InfEngine.Infer<Discrete[]>(CarType)[0];

        }

    Friday, June 3, 2011 6:23 PM

All replies

  • bjornjon replied on 02-26-2011 1:19 PM

    My data looks like...

    Center Barrack Factory Port Academy University Teacher..........

    100      2560      3302     100 10292      1212         1111    ....... [sec]

    201     3300       100        50    10098     222            1211   ........[sec]

    ...

    .....

    .........

    Friday, June 3, 2011 6:23 PM
  • bjornjon replied on 02-26-2011 3:28 PM

    More to the point.....

    DATA: (values from 0 - 99999 sorted into 10 discrete bins)

     

     

     

     

     

    double[] expansion = {1,1,1,1,1,1,1,3,2,2,2,1,1,1,2,1....};

     

     

    double[] barrack     = {3,1,2,3,4,2,2,2,3,3,2,2,4,2,2,1....};

     

     

    double[] factory      = {2,5,3,1,4,5,1,4,3,2,3,4,4,4,2,3....};

     

     

    double[] starport     = {9,9,4,3,8,3,6,7,7,6,3,4,4,5,6,6....};

    DAG (all connections pointing down)

                     expansion

              /                         \

        barrack                    factory

             \                           /

                       starport

     

    INFERENCE:

    I observe new evidence that factory is built in time XX which is in bin 3, so I clamp the random variable factory to that state and I want to know the effect this new evidence has on starport so I comput the posterior for starport to see how the probabilities for the random variable starport have changed.

    How is this done?

    Friday, June 3, 2011 6:23 PM
  • minka replied on 02-27-2011 5:01 PM

    The approach described here works for variables of any cardinality: http://community.research.microsoft.com/forums/t/6357.aspx.  The main thing to look at is the AddChildFromTwoParents function, which gives the general pattern for using CPTs in Infer.NET.

    Friday, June 3, 2011 6:23 PM
  • bjornjon replied on 02-28-2011 5:39 PM

    Thank you for the quick response.

    I'm diving into the code in the post you refered me to.

    The creation of the model looks really good and the methodology is clear.

    I'm not understanding completely how the inference of the PT and the CPT from the data supplied is done since I dont see the connection (in code) between the variables and the prior,posterior and actual PT's and CPT's....but they seem correct when printed out.

    After the creation of the model and inference of the parameters of the PT's and CPT's from the supplied data I want to be able to use the network to estimate the probabilities of CarType given that for example there is 100% certainty that CarModel = 1 or EngineType = 2 or both.

    In short...how do I, after the creation of the model with parameters infered from my data, put CarModel = 1 to 100% and given that evidence find the marginal probabilities of all the other variables?

    I've tried with similar approaches as used in the CarTypeFromCarModel function without getting correct results.

    Thank you agen for the response.

    Friday, June 3, 2011 6:23 PM
  • minka replied on 02-28-2011 5:48 PM

    The CarTypeFromCarModel function shows the way to do it.  What you mean by "correct results"?  Remember that Infer.NET is using approximate inference (belief propagation by default).

    Friday, June 3, 2011 6:23 PM
  • bjornjon replied on 03-01-2011 3:45 AM

    For example with this data...I would expect getting clear differences in the CarType distributions for CarModel beeing 1 or 2.

     

     

     

     

     

    int[] modelData = new int[]  { 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2 };

     

     

    int[] typeData = new int[]     { 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1 };

     

     

    int[] ptData = new int[]        { 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2 };

     

     

    int[] engineData = new int[] { 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3 };

     

    RESULTS....

    Car model PT
    Dirichlet(1 9 5 1)

    Power Train PT
    Dirichlet(1 9 5)

    Engine CPT
    [0,0] Dirichlet(1 1 1 1)
    [0,1] Dirichlet(1 1 1 1)
    [0,2] Dirichlet(1 1 1 1)
    [1,0] Dirichlet(1 1 1 1)
    [1,1] Dirichlet(1 1 2 1)
    [1,2] Dirichlet(1 1 1 1)
    [2,0] Dirichlet(1 1 1 1)
    [2,1] Dirichlet(1 1 1 1)
    [2,2] Dirichlet(1 1 1 2)
    [3,0] Dirichlet(1 1 1 1)
    [3,1] Dirichlet(1 1 1 1)
    [3,2] Dirichlet(1 1 1 1)

    Type CPT:
    [0] Dirichlet(1 1)
    [1] Dirichlet(1 1)
    [2] Dirichlet(9 1)
    [3] Dirichlet(1 5)

    Distributions for car types from model types
    Probability of type, given model 0: Discrete(0,5167 0,4833)
    Probability of type, given model 1: Discrete(0,5627 0,4373)
    Probability of type, given model 2: Discrete(0,4933 0,5067)
    Probability of type, given model 3: Discrete(0,5167 0,4833)

    I know I'm missing something foundamental here. I'd be really happy if you could explane to me what it is.

    Friday, June 3, 2011 6:23 PM
  • John Guiver replied on 03-01-2011 4:06 AM

    To query for for car type given model and engine type:

        public Discrete QueryCarTypeFromModelAndEngine(int carModel, int engineType)

        {

          NumCases.ObservedValue = 1;

          CarModel.ObservedValue = new int[] { carModel };

          EngineType.ObservedValue = new int[] { engineType };

          CarType.ClearObservedValue();

          PowerTrain.ClearObservedValue();

     

          PT_M_Prior.ObservedValue = PT_M_Posterior;

          PT_P_Prior.ObservedValue = PT_P_Posterior;

          CPT_E_Prior.ObservedValue = CPT_E_Posterior;

          CPT_T_Prior.ObservedValue = CPT_T_Posterior;

     

          // Run the inference

          return InfEngine.Infer<Discrete[]>(CarType)[0];

        }

    Friday, June 3, 2011 6:23 PM