Modeling Hidden Process Variable and Instrument Errors RRS feed

  • Question

  • Hello,

    I am attempting to model a hidden process variable that is measured by multiple instruments (four in this case), each of which is prone to its own hidden error. This is solvable using relatively simple mathematical techniques such as Lagrange Multipliers, but I am trying to see if Infer.net offers a robust, automated mechanism for estimating the various errors and variables. Here is the code:

        class Program
            static void Main(string[] args)
                // Declare meter and process priors (4 meters, one process):
                Variable<double> meter1mean = Variable.GaussianFromMeanAndVariance(0, 100);
                Variable<double> meter1precision = Variable.GammaFromShapeAndScale(10, .01);
                Variable<double> meter2mean = Variable.GaussianFromMeanAndVariance(0, 100);
                Variable<double> meter2precision = Variable.GammaFromShapeAndScale(10, .01);
                Variable<double> meter3mean = Variable.GaussianFromMeanAndVariance(0, 100);
                Variable<double> meter3precision = Variable.GammaFromShapeAndScale(10, .01);
                Variable<double> meter4mean = Variable.GaussianFromMeanAndVariance(0, 100);
                Variable<double> meter4precision = Variable.GammaFromShapeAndScale(10, .01);
                Variable<double> processmean = Variable.GaussianFromMeanAndVariance(500, 1000);
                Variable<double> processprecision = Variable.GammaFromShapeAndScale(100, .01);

                //declare meter and process dummy data and posteriors
                double[] meter1errordata = new double[25];
                double[] meter2errordata = new double[25];
                double[] meter3errordata = new double[25];
                double[] meter4errordata = new double[25];
                double[] processdata = new double[25];

                // measurement posteriors
                double[] meter1measurement = new double[25];
                double[] meter2measurement = new double[25];
                double[] meter3measurement = new double[25];
                double[] meter4measurement = new double[25];

                // calculate dummy data and posteriors for each measurement time. All meter errors are Gaussian.
                for (int i = 0; i < meter1measurement.Length; i++)
                    // Calculate background/dummy errors for each meter
                    meter1errordata[i] = Rand.Normal(0, 10);
                    meter2errordata[i] = Rand.Normal(0, 10);
                    meter3errordata[i] = Rand.Normal(0, 50);
                    meter4errordata[i] = Rand.Normal(0, 10);

                    // Calculate background/dummy process
                    processdata[i] = Rand.Normal(1000, 1);

                    // Calculate observed meter measurements
                    meter1measurement[i] = meter1errordata[i] + processdata[i];
                    meter2measurement[i] = meter2errordata[i] + processdata[i];
                    meter3measurement[i] = meter3errordata[i] + processdata[i];
                    meter4measurement[i] = meter4errordata[i] + processdata[i];

                    // Declare error distributions on meters:
                    Variable<double> meter1dist = Variable.GaussianFromMeanAndPrecision(meter1mean, meter1precision).Named("meter1dist" + i);
                    Variable<double> meter2dist = Variable.GaussianFromMeanAndPrecision(meter2mean, meter2precision).Named("meter2dist" + i);
                    Variable<double> meter3dist = Variable.GaussianFromMeanAndPrecision(meter3mean, meter3precision).Named("meter3dist" + i);
                    Variable<double> meter4dist = Variable.GaussianFromMeanAndPrecision(meter4mean, meter4precision).Named("meter4dist" + i);

                    // Distribution on the process variable:
                    Variable<double> processdist = Variable.GaussianFromMeanAndPrecision(processmean, processprecision).Named("flowdist" + i);

                    // Constrain the observed meter measurements to be equal to the sum of the meter error and flow distributions:
                    Variable.ConstrainEqual(meter1measurement[i], meter1dist + processdist);
                    Variable.ConstrainEqual(meter2measurement[i], meter2dist + processdist);
                    Variable.ConstrainEqual(meter3measurement[i], meter3dist + processdist);
                    Variable.ConstrainEqual(meter4measurement[i], meter4dist + processdist);

                // Create an inference engine for VMP
                InferenceEngine engine = new InferenceEngine();

                // Retrieve the posterior distributions for each meter:
                Gaussian marginalMeter1Mean = engine.Infer<Gaussian>(meter1mean);
                Gamma marginalMeter1Precision = engine.Infer<Gamma>(meter1precision);
                Gaussian marginalMeter2Mean = engine.Infer<Gaussian>(meter2mean);
                Gamma marginalMeter2Precision = engine.Infer<Gamma>(meter2precision);
                Gaussian marginalMeter3Mean = engine.Infer<Gaussian>(meter3mean);
                Gamma marginalMeter3Precision = engine.Infer<Gamma>(meter3precision);
                Gaussian marginalMeter4Mean = engine.Infer<Gaussian>(meter4mean);
                Gamma marginalMeter4Precision = engine.Infer<Gamma>(meter4precision);

                // Get the posterior distribution on the process variable:
                Gaussian marginalProcessMean = engine.Infer<Gaussian>(processmean);
                Gamma marginalProcessPrecision = engine.Infer<Gamma>(processprecision);

    The code is horribly inefficient, but that his not the point, as I am simply trying to see how well Infer.net solves the problem at this time. Although it does appear to generate correct answers, it has the following problems:

    1. It fails or won't run under VMP and Gibbs.

    2. It is sensitive to the priors and will blow up if they are poorly chosen.

    Do you have any suggestions to improve on this?



    Friday, January 6, 2012 7:14 PM

All replies

  • Hi Philip

    VMP does not support observing the output of a deterministic factor (essentially this is what your ConstrainEqual is doing - it is placing an observation of the sum of the meter variable and the process variable). In your model this is easily and equivalently got around by adding the process variable to the meter mean variable rather than the meter variable, and then feeding the resulting variable to a Gaussian factor with the meter precision as the precision. Then observe that variable (see code below).

    EP will often have difficulty with inference on random variables representing precisions. Tightening the priors (which unfortunately changes the model) or doing initialialisation can help, but this is largely an open question.

    Looking at your code, apart from the inefficiency (you need to make these variable arrays as below), I notice that the prior for the process mean is Gaussian.Variable.GaussianFromMeanAndVariance(500, 1000) - this has a density of essentially 0 at the values (around 1000), where the data are, and so is not a good prior.

    The following works with all algorithms (but you should check it is the model that you want):

    class Program 
      static void Main(string[] args) 
        // ------------------- 
        // The model 
        // ------------------- 
        Variable<int> M = Variable.New<int>().Named("M"); 
        Variable<int> N  = Variable.New<int>().Named("N"); 
        Range m = new Range(M).Named("m"); 
        Range n = new Range(N).Named("n"); 
        // Declare meter and process priors (4 meters, one process): 
        var meterMeans = Variable.Array<double>(m).Named("meterMeans"); 
        var meterPrecs = Variable.Array<double>(m).Named("meterPrecs"); 
        var meterMeanPriors = Variable.Array<Gaussian>(m).Named("MeanPriors"); 
        var meterPrecPriors = Variable.Array<Gamma>(m).Named("PrecPriors"); 
        meterMeans[m] = Variable<double>.Random(meterMeanPriors[m]); 
        meterPrecs[m] = Variable<double>.Random(meterPrecPriors[m]); 
        var processMeanPrior = Variable.New<Gaussian>().Named("processMeanPrior"); 
        var processPrecPrior = Variable.New<Gamma>().Named("processPrecPrior"); 
        var processMean = Variable<double>.Random(processMeanPrior).Named("processMean"); 
        var processPrec = Variable<double>.Random(processPrecPrior).Named("processPrec"); 
        var process = Variable.Array<double>(n).Named("process"); 
        process[n] = Variable.GaussianFromMeanAndPrecision(processMean, processPrec).ForEach(n); 
        var reading = Variable.Array(Variable.Array<double>(n), m).Named("reading"); 
        reading[m][n] = Variable.GaussianFromMeanAndPrecision(meterMeans[m] + process[n], meterPrecs[m]); 
        // ------------------- 
        // The data 
        // ------------------- 
        var meterMeanObservedPrior = new Gaussian[] { 
          Gaussian.FromMeanAndVariance(0, 100), 
          Gaussian.FromMeanAndVariance(0, 100), 
          Gaussian.FromMeanAndVariance(0, 100), 
          Gaussian.FromMeanAndVariance(0, 100)}; 
        var meterPrecObservedPrior = new Gamma[] { 
          Gamma.FromShapeAndScale(1000, .0001), 
          Gamma.FromShapeAndScale(1000, .0001), 
          Gamma.FromShapeAndScale(1000, .0001), 
          Gamma.FromShapeAndScale(1000, .0001)}; 
        var processMeanObservedPrior = Gaussian.FromMeanAndVariance(1000, 100); 
        var processPrecObservedPrior = Gamma.FromShapeAndScale(10000, .0001); 
        //declare meter and process dummy data and posteriors 
        int numMeters = meterMeanObservedPrior.Length; 
        int numMeasurements = 25; 
        double[][] metermeasurement = new double[numMeters][]; 
        for (int j = 0; j < numMeters; j++) 
          metermeasurement[j] = new double[numMeasurements]; 
        double[] errorStdDev = new double[] {10, 10, 50, 10}; 
        // calculate dummy data for each measurement time. All meter errors are Gaussian. 
        for (int i = 0; i < numMeasurements; i++) 
          var processData = Rand.Normal(1000, 1); 
          for (int j = 0; j < numMeters; j++) 
            metermeasurement[j][i] = Rand.Normal(0, errorStdDev[j]) + processData; 
        // Hook up the data to the model 
        M.ObservedValue = numMeters; 
        N.ObservedValue = numMeasurements; 
        meterMeanPriors.ObservedValue = meterMeanObservedPrior; 
        meterPrecPriors.ObservedValue = meterPrecObservedPrior; 
        processMeanPrior.ObservedValue = processMeanObservedPrior; 
        processPrecPrior.ObservedValue = processPrecObservedPrior; 
        reading.ObservedValue = metermeasurement; 
        // Create an inference engine for VMP 
        //InferenceEngine engine = new InferenceEngine(); 
        //InferenceEngine engine = new InferenceEngine(new VariationalMessagePassing()); 
        InferenceEngine engine = new InferenceEngine(new GibbsSampling()); 
        // Retrieve the posterior distributions for each meter: 
        Gaussian[] marginalMeterMean = engine.Infer<Gaussian[]>(meterMeans); 
        Gamma[] marginalMeterPrec = engine.Infer<Gamma[]>(meterPrecs); 
        Gaussian marginalProcessMean = engine.Infer<Gaussian>(processMean); 
        Gamma marginalProcessPrec = engine.Infer<Gamma>(processPrec); 


    Monday, January 9, 2012 4:26 PM
  • John--

    Thanks very much for response and assistance. It was very helpful. I have run some preliminary tests on the code, and it appears to function quite well. Some observations:

    1. The code runs quite well under all three inference methods with the original data that I provided. However, it becomes quite unhappy and fails if I use EP when the observation count is increased significantly. I have not managed to break it so far with VMP or Gibbs.

    2. I modified your code slightly to see if the inference looked good assuming different biases on the meters (i.e., different meter means), and this worked well.

    3. The inference results generally look pretty good. I observed that it doesn't take much data to get a good handle on the meter biases. However, it appears to take quite a bit of data to derive reliable error distributions on the meters.

    I am attempting to scale this approach up so that it can be used to infer process unknowns on a large plant scale. I am curious as to your thoughts regarding the following:

    a. The real meter errors are not strictly Gaussian, but have enough outliers that they are most likely better described by a mixture model. How would this affect the implicit assumption of normality on the reading variable?

    b. The actual process values are not necessarily the same at all measurement sites, but may be linked by mass conservation or other physical relations. We can probably relate their values via a set of predictive models that can be used to link groups of meters or measurements, but the models may have their own errors. It seems straightforward to apply this using a methodology similar to the one you have applied here, but again, I am interested in your thoughts.

    c. Do you see issues in scaling this approach up? And do you think a hybrid approach may be required?

    Again, thank you very much for your time.



    Monday, January 9, 2012 9:02 PM
  • Hi Philip

    a. If you feel that these are better described by a mixture, then you should consider incoporating a mixture into the meter part of the model

    b. In general the more knowledge you can incorporate into your model, the better the inference will be. One of the aims of Infer.NET is to encourage explicit modeling of the process that generated the data rather than force the data into a black box model.

    c. There are several mechanisms for scaling in Infer.NET, and how you do this will depend on the model and the form of the data. If you can split data into independent subsets, then there are standard ways of scaling.


    Wednesday, January 11, 2012 3:43 PM