locked
missing data in baysian network inference RRS feed

  • Question

  • Hi,

    Can anyone shed some light on how to infer a baysian network with missing data using Infer.net? I have a network with around 300 variables, and in each observation, some variables are missing (irregularly).

    Can I set observed values for each variable and each instance independently? Instead of passing an array for each variable for all instances.

    Thanks,

    Thursday, September 4, 2014 5:07 AM

Answers

  • Here is how you might do the Subarray approach:

    (1) Define the following three variables in the model:

    public Variable<int> NumCloudyObservations;
    public VariableArray<int> CloudyObservationIndices;
    public VariableArray<int> CloudyObservationValues;
    

    (2) Hook them up in the model as follows:

    NumCloudyObservations = Variable.New<int>().Named("NumCloudy");
    Range CObs = new Range(NumCloudyObservations);
    CloudyObservationValues = Variable.Array<int>(CObs).Named("CloudyObs");
    CloudyObservationIndices = Variable.Array<int>(CObs).Named("CloudyObsIndices");
    CloudyObservationIndices.SetValueRange(C);
    CloudyObservationValues = Variable.Subarray(Cloudy, CloudyObservationIndices);
    

    (3) Modify the various inference methods in your class to observe these indices and values rather than the full set of cloudy observations (i.e. do not observe "Cloudy"):

    NumCloudyObservations.ObservedValue = cloudyValues.Length;
    CloudyObservationValues.ObservedValue = cloudyValues;
    CloudyObservationIndices.ObservedValue = cloudyIndices;
    
    Friday, September 5, 2014 8:23 AM
    Owner

All replies

  • There are a couple of ways to deal with irregularity.

    See http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/How%20to%20handle%20missing%20data.aspx for one approach. Another approach is to use a Subarray factor indexed by an observed index array for each instance; then observe the output of that factor (which will be a jagged random variable array).   

    Thursday, September 4, 2014 8:11 AM
    Owner
  • THanks! I saw approach one. It only defines only variable with observed data and the other variables are not initialized. I tried to do this on the sprinkler/rain network, and define only half of the "Cloudy" variables, but the model does not compile complaining "not defined Cloudy variable"

    VariableArray<double> x = Variable.Array<double>(dataRange);

    using (Variable.ForEach(dataRange))
    {
     
    using (Variable.IfNot(isMissingVar[dataRange]))
      {
        x[dataRange] =
    Variable.GaussianFromMeanAndPrecision(mean, precision);
      }
    }

    Thursday, September 4, 2014 6:18 PM
  • Could you please explain more on the subarray factor approach?

    In the network, each node variable is represented as VariableArray<int>, and I can only set the ObservedValue for the entire array, instead of setting ObservedValue for each individual component.

    THanks,

    Thursday, September 4, 2014 6:26 PM
  • Here is how you might do the Subarray approach:

    (1) Define the following three variables in the model:

    public Variable<int> NumCloudyObservations;
    public VariableArray<int> CloudyObservationIndices;
    public VariableArray<int> CloudyObservationValues;
    

    (2) Hook them up in the model as follows:

    NumCloudyObservations = Variable.New<int>().Named("NumCloudy");
    Range CObs = new Range(NumCloudyObservations);
    CloudyObservationValues = Variable.Array<int>(CObs).Named("CloudyObs");
    CloudyObservationIndices = Variable.Array<int>(CObs).Named("CloudyObsIndices");
    CloudyObservationIndices.SetValueRange(C);
    CloudyObservationValues = Variable.Subarray(Cloudy, CloudyObservationIndices);
    

    (3) Modify the various inference methods in your class to observe these indices and values rather than the full set of cloudy observations (i.e. do not observe "Cloudy"):

    NumCloudyObservations.ObservedValue = cloudyValues.Length;
    CloudyObservationValues.ObservedValue = cloudyValues;
    CloudyObservationIndices.ObservedValue = cloudyIndices;
    
    Friday, September 5, 2014 8:23 AM
    Owner
  • In the example x is observed so it is defined for the whole range. When the value is not missing, it is also generated from the Gaussian and the observation acts as a constraint on the generated model variable and drives the learning. When the variable is missing, it is set only to some fixed irrelevant value and does not affect the learning. A variable must always be defined for all stochastic branches.
    Friday, September 5, 2014 8:35 AM
    Owner