none
Missing data in matrix

    Question

  • I am trying to implement the generative model presented in "Learning Whom to Trust with MACE" by Dirk Hovy et al, NAACL 2013.

    There is a fixed number of workers and a fixed number of work items. Each work item belongs to some category. Every work item gets assigned a small (say 3) randomly picked workers. The final data is a matrix with observed and unobserved values.

    The unobserved or latent variables are:

    • true label for item n
    • whether a worker m provided an incorrect label for item n

    If a worker provides an incorrect label (e.g. he/she is a spammer), a theta parameter governs their spamming behavior.

    I started by building a model where the data matrix is completely observed and it seemed to work well. The things went south when I put "-1" values in data matrix that indicate the missing value. I followed the last section of "How to handle missing data" section in Tutorials and examples by using a "using (Variable.If(A[n][m] > -1))" block. Unfortunately it yielded an exception "System.ArgumentOutOfRangeException: 'Specified argument was out of the range of valid values'".

    Here is the complete code include sample data:

    using System;
    using System.Linq;
    using MicrosoftResearch.Infer;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer.Maths;
    using MicrosoftResearch.Infer.Models;
    
    namespace MACE
    {
        class Program
        {
            static void Main(string[] args)
            {
                const int numWorkers = 7;
                const int numItems = 10;
                const int numCategories = 3;
    
                //
                // Sample data:
                // 7 workers
                //  - 1-5 are average workers making 1-2 mistakes each
                //  - 6th is a total spammer, always puts 0
                //  - 7th is a perfect worker
                // Each item gets 3 answers (data is missing at random)
                // Item true label is shown in comment
                //
    
                int[][] data = new int[numItems][];
    
                data[0] = new int[] { -1, -1, 1, 0, -1, -1, 0 };   // 0
                data[1] = new int[] { 2, -1, -1, 1, -1, -1, 0 };   // 0
                data[2] = new int[] { 1, -1, -1, -1, -1, 0, 1 };   // 1
                data[3] = new int[] { 1, -1, 1, -1, -1, -1, 1 };   // 1
                data[4] = new int[] { -1, -1, -1, 1, 1, 0, -1 };   // 1
                data[5] = new int[] { 2, 1, 2, -1, -1, -1, -1 };   // 1
                data[6] = new int[] { 0, 2, -1, -1, -1, 0, -1 };   // 0
                data[7] = new int[] { -1, -1, -1, 0, 0, 0, -1 };   // 1
                data[8] = new int[] { 1, 1, -1, -1, 0, -1, -1 };   // 1
                data[9] = new int[] { 0, 2, -1, -1, 2, -1, -1 };   // 2
    
                //
                // model variables
                //
    
                Range n = new Range(numItems).Named("Item");
                Range m = new Range(numWorkers).Named("Worker");
    
                var T = Variable.Array<int>(n).Named("TrueLabels");
                var S = Variable.Array(Variable.Array<bool>(m), n).Named("IsSpammer");
                var A = Variable.Array(Variable.Array<int>(m), n).Named("Answer");
    
                //
                // Parameters and their priors
                //
    
                var theta = Variable.Array<double>(m).Named("theta");
                theta[m] = Variable.Random(new Beta(2, 2)).ForEach(m);
    
                double[] initCounts = Enumerable.Repeat<double>(1.0, numCategories).ToArray();
                var ksi = Variable.Array<Vector>(m).Named("ksi");
                ksi[m] = Variable.Random(new Dirichlet(initCounts)).ForEach(m);
    
                //
                // Generative model
                //
    
                using (Variable.ForEach(n))
                {
                    T[n] = Variable.DiscreteUniform(numCategories);
                    using (Variable.ForEach(m))
                    {
                        S[n][m] = Variable.Bernoulli(theta[m]);
                        using (Variable.If(A[n][m] > -1))
                        {
                            using (Variable.If(S[n][m] == false))
                            {
                                A[n][m] = T[n];
                            }
                            using (Variable.If(S[n][m] == true))
                            {
                                A[n][m] = Variable.Discrete(ksi[m]);
                            }
                        }
                    }
                }
    
                A.ObservedValue = data;
    
                //
                // Class labels -- break symmetry
                //
    
                Discrete[] Tinit = new Discrete[numItems];
                for (int item = 0; item < numItems; item++)
                    Tinit[item] = Discrete.PointMass(Rand.Int(numCategories), numCategories);
                T.InitialiseTo(Distribution<int>.Array(Tinit));
    
                //
                // Inference
                //
    
                InferenceEngine engine = new InferenceEngine();
    
                Console.WriteLine("*** WORK ITEM LABELS ***");
                Discrete[] TMarginal = engine.Infer<Discrete[]>(T);
                for (int item = 0; item < numItems; item++)
                    Console.WriteLine("\t" + TMarginal[item]);
    
                Console.WriteLine("\n*** IS SPAMMER ***");
                Bernoulli[][] SMarginal = engine.Infer<Bernoulli[][]>(S);
                for (int worker = 0; worker < numWorkers; worker++)
                {
                    Console.WriteLine("Worker #{0}", worker);
                    for (int item = 0; item < numItems; item++)
                    {
                        Console.WriteLine("\t" + SMarginal[item][worker]);
                    }
                }
    
                Console.Read();
            }
        }
    }


    • Edited by usptact Saturday, August 05, 2017 2:10 AM
    Friday, August 04, 2017 9:06 PM

Answers

  • The issue is that the engine is trying to infer A, which fails due to the -1 values.  You can fix this with:

                A.AddAttribute(new DoNotInfer());
    

    • Marked as answer by usptact Saturday, August 05, 2017 5:43 AM
    Saturday, August 05, 2017 4:24 AM

All replies

  • The issue is that the engine is trying to infer A, which fails due to the -1 values.  You can fix this with:

                A.AddAttribute(new DoNotInfer());
    

    • Marked as answer by usptact Saturday, August 05, 2017 5:43 AM
    Saturday, August 05, 2017 4:24 AM
  • Thank you, Tom! I added this line after:

    A.ObservedValue = data;

    and everything just works!

    Saturday, August 05, 2017 5:43 AM
  • Is this a good idea to add such attribute to observed variables in all cases? What about variables that I don't want to infer? If I add such attributes consistently in all projects, wouldn't it harm the inference process in the engine? What is the best practice?

    Thanks

    Thursday, August 10, 2017 4:38 PM
  • The best practice in general is to use the engine.OptimiseForVariables option.  This option and DoNotInfer are generally beneficial for the inference process.
    Thursday, August 10, 2017 5:01 PM