locked
Model not working on bigger data set RRS feed

  • Question

  • I am trying to work on DARE model (How to Grade a Test Without Knowing the Answers – ICML 2012). I have made a little change in DefineGenerativeProcess() function, other than that everything is same as DARE model. 
    This model works seamlessly with a smaller dataset consist of 1150 labels, but occurs exception - improper distribution - 

    when data set is bigger: approximately 8000 labels.

    What is the main reason of this problem? Any ideas on how I could solve it?

    public class DARE
        {
            // Ranges - size of the variables
            public static Range task;
            public static Range worker;
            public static Range choice;
            public static Range workerTask;
    
            // Main Variables in the model
            public static VariableArray<double> workerAbility;
            public static VariableArray<double> taskDifficulty;
            public static VariableArray<double> discrimination;
            public static VariableArray<int> trueLabel;
            public static VariableArray<VariableArray<int>, int[][]> workerResponse;
    
            // Variables in model
            public static Variable<int> WorkerCount;
            public static VariableArray<int> WorkerTaskCount;
            public static VariableArray<VariableArray<int>, int[][]> WorkerTaskIndex;
    
            // Prior distributions
            public static Gaussian workerAbilityPrior;
            public static Gaussian taskDifficultyPrior;
            public static Gamma discriminationPrior;
    
            // Posterior distributions
            public static Gaussian[] workerAbilityPosterior;
            public static Gaussian[] taskDifficultyPosterior;
            public static Gamma[] discriminationPosterior;
            public static Discrete[] trueLabelPosterior;
    
            // Inference engine
            public static InferenceEngine Engine;
    
            /// <summary>
            /// The number of inference iterations.
            /// </summary>
            public static int NumberOfIterations
            {
                get;
                set;
            }
    
            /// <summary>
            /// Creates a DARE model instance.
            /// </summary>
            public DARE()
            {
                NumberOfIterations = 35;
            }
    
            /// <summary>
            /// Initializes the ranges, the generative process and the inference engine of the BCC model.
            /// </summary>
            /// <param name="taskCount">The number of tasks.</param>
            /// <param name="labelCount">The number of workers.</param>
            /// <param name="labelCount">The number of labels.</param>
            public static void CreateModel(int taskCount, int workerCount, int labelCount)
            {
                DefineVariablesAndRanges(taskCount, workerCount, labelCount);
                DefineGenerativeProcess();
                DefineInferenceEngine();
            }
    
            /// <summary>
            /// Initializes the ranges of the variables.
            /// </summary>
            /// <param name="taskCount">The number of tasks.</param>
            /// <param name="taskCount">The number of workers.</param>
            /// <param name="labelCount">The number of labels.</param>
            public static void DefineVariablesAndRanges(int taskCount, int workerCount, int labelCount)
            {
                worker = new Range(workerCount).Named("worker"); 
                task = new Range(taskCount).Named("task");
    			choice = new Range(labelCount).Named("choice"); 
    
                // The tasks for each worker
                WorkerTaskCount = Variable.Array<int>(worker).Named("WorkerTaskCount");
                workerTask = new Range(WorkerTaskCount[worker]).Named("workerTask");
                WorkerTaskIndex = Variable.Array(Variable.Array<int>(workerTask), worker).Named("WorkerTaskIndex");
                WorkerTaskIndex.SetValueRange(task);
    
                //worker ability for each worker
                workerAbilityPrior = new Gaussian(0, 1);
                workerAbility = Variable.Array<double>(worker).Named("workerAbility");
                workerAbility[worker] = Variable.Random(workerAbilityPrior).ForEach(worker);
    
                //task difficulty for each task
                taskDifficultyPrior = new Gaussian(0, 1);
                taskDifficulty = Variable.Array<double>(task).Named("taskDifficulty");
                taskDifficulty[task] = Variable.Random(taskDifficultyPrior).ForEach(task);
    
                // discrimination of each task
                discriminationPrior = Gamma.FromMeanAndVariance(1, 0.01);
                discrimination = Variable.Array<double>(task).Named("discrimination");
    			discrimination[task] = Variable.Random(discriminationPrior).ForEach(task);
    
                //unobserved true label for each task
                trueLabel = Variable.Array<int>(task).Named("trueLabel");
                trueLabel[task] = Variable.DiscreteUniform(choice).ForEach(task);
    
                //worker label
                workerResponse = Variable.Array(Variable.Array<int>(workerTask), worker).Named("workerResponse");
            }
    
            /// <summary>
            /// Defines the DARE generative process.
            /// </summary>
            public static void DefineGenerativeProcess()
            {
                // The process that generates the worker's label
                using (Variable.ForEach(worker))
                {
                    using (Variable.ForEach(workerTask))
                    {
                        var index = WorkerTaskIndex[worker][workerTask];
                        var advantage = (workerAbility[worker] - taskDifficulty[index]).Named("advantage");
                        var advantageNoisy = Variable.GaussianFromMeanAndPrecision(advantage, discrimination[index]).Named("advantageNoisy");
                        var correct = (advantageNoisy > 0).Named("correct");
                        using (Variable.If(correct))
                            workerResponse[worker][workerTask] = trueLabel[index];
                        using (Variable.IfNot(correct))
                            workerResponse[worker][workerTask] = Variable.DiscreteUniform(choice);
                    }
                }
            }
    
            // <summary>
            /// Initializes the DARE inference engine.
            /// </summary>
            public static void DefineInferenceEngine()
            {
                Engine = new InferenceEngine(new ExpectationPropagation());
                Engine.Compiler.UseParallelForLoops = true;
                Engine.ShowProgress = false;
                Engine.Compiler.WriteSourceFiles = false;
            }
    
            /// <summary>
            /// Attachs the data to the workers labels.
            /// </summary>
            /// <param name="taskIndices">The matrix of the task indices (columns) of each worker (rows).</param>
            /// <param name="workerLabels">The matrix of the labels (columns) of each worker (rows).</param>
            public static void AttachData(int[][] taskIndices, int[][] workerLabels)
            {
                WorkerTaskCount.ObservedValue = taskIndices.Select(tasks => tasks.Length).ToArray();
                WorkerTaskIndex.ObservedValue = taskIndices;
                workerResponse.ObservedValue = workerLabels;
            }
    
            /// <summary>
            /// Infers the posteriors of DARE using the attached data and priors.
            /// </summary>
            /// <param name="taskIndices">The matrix of the task indices (columns) of each worker (rows).</param>
            /// <param name="workerLabels">The matrix of the labels (columns) of each worker (rows).</param>
            public static void Infer(int[][] taskIndices, int[][] workerLabels)
            {
                AttachData(taskIndices, workerLabels);
                Engine.NumberOfIterations = NumberOfIterations;
    
                workerAbility.AddAttribute(new Sequential());   // needed to get stable convergence
                taskDifficulty.AddAttribute(new Sequential());  // needed to get stable convergence
    
                workerAbilityPosterior = Engine.Infer<Gaussian[]>(workerAbility);
                taskDifficultyPosterior = Engine.Infer<Gaussian[]>(taskDifficulty);
                discriminationPosterior = Engine.Infer<Gamma[]>(discrimination);
                trueLabelPosterior = Engine.Infer<Discrete[]>(trueLabel); 
            }
            
        }

     
    Friday, March 27, 2015 10:30 AM

All replies

  • How is this model different from the one in your other thread?
    Friday, March 27, 2015 10:57 AM
  • Hi cindyak

    Can you

    (a) Remind me where you got this code from

    (b) Say what you changed

    (c) Confirm that you are using a serial schedule (I believe that this should be the default with Infer.NET 2.6 but better to explicitly set it).

    (d) Possibly make a data set available that triggers this exception.John

    Monday, March 30, 2015 12:44 PM
    Owner
  • Hi John Guiver,

    (a) I got original code form Infer.NET 2.6\Samples\C#\ExamplesBrowser\DifficultyAbility.cs file

    (b) I have made a little change in DefineGenerativeProcess() function. According to main source code, each worker performs all the tasks. But in my code, I changed it where a worker can only perform a subset of the tasks from the task pool.

    /// <summary>
            /// Defines the DARE generative process.
            /// </summary>
            public static void DefineGenerativeProcess()
            {
                // The process that generates the worker's label
                using (Variable.ForEach(worker)) 
                {
                    using (Variable.ForEach(workerTask)) // instead of all the tasks, a worker performs subset of tasks
                    {
                        var index = WorkerTaskIndex[worker][workerTask];
                        var advantage = (workerAbility[worker] - taskDifficulty[index]).Named("advantage");
                        var advantageNoisy = Variable.GaussianFromMeanAndPrecision(advantage,discrimination[index]).Named("advantageNoisy");
                        var correct = (advantageNoisy > 0).Named("correct");
                        using (Variable.If(correct))
                            workerResponse[worker][workerTask] = trueLabel[index];
                        using (Variable.IfNot(correct))
                            workerResponse[worker][workerTask] = Variable.DiscreteUniform(choice);
                    }
                }
            }

    (c) According to Customizing the algorithm initialization tutorial I already changed it.

    (d) Dataset




    • Edited by cindyak Tuesday, March 31, 2015 3:33 AM
    Tuesday, March 31, 2015 3:32 AM
  • Hi Cindy

    I am quite confused because the code in Infer.NET 2.6\Samples\C#\ExamplesBrowser\DifficultyAbility.cs doesn't look anything like your code. Your code looks more like the Crowdsourcing code from http://blogs.msdn.com/b/infernet_team_blog/archive/2014/06/25/community-based-bayesian-classifier-combination.aspx.

    As I don't have your complete code, it is difficult to run this and help figure out the problem. But in general we use the Subarray factor to efficiently deal with the non-dense case. Something like:

    using (Variable.ForEach(worker))
     {
         var workerTaskDifficulty = Variable.Subarray(TaskDifficulty, WorkerTaskIndex[worker]);
         using (Variable.ForEach(workerTask))
         {
             var advantage = workerAbility[worker] - workerTaskDifficulty[workerTask];
    
    ...
         }
     }
    John

     
    Tuesday, March 31, 2015 9:56 AM
    Owner
  • Hi John Guiver,

    Thanks for your response. 

    Here is my entire runnable source code. Here you will find EditDARE class is similar to original source code. 

    /// <summary>
        /// The class for the main program.
        /// </summary> 
        class InferLabel
        {
            /// <summary>
            /// The data mapping.
            /// </summary>
            public static DataMapping Mapping
            {
                get;
                private set;
            }
    
            static string Dataset =  "CFWithTrueLabels" ;
    
            static void Main(string[] args)
            {
                var data = Datum.LoadData(@".\Data\" + Dataset + ".csv");
                Mapping = new DataMapping(data);
    
                var labelsPerWorkerIndex = Mapping.GetLabelsPerWorkerIndex(data);
                var TaskPerWorkerIndex = Mapping.GetTaskIndicesPerWorkerIndex(data);
    
                EditDARE.RunDARE(Mapping.TaskCount, Mapping.WorkerCount, Mapping.LabelCount,
                                              labelsPerWorkerIndex, TaskPerWorkerIndex);
            }
        }
    
        /// <summary>
        /// Edited version of DARE model
        /// Reference: ICML 2012 how to grade a test without knowing the answer?
        /// </summary>
        public class EditDARE
        {
            #region Fields
            // const
            public const double ABILITY_PRIOR_MEAN = 0;
            public const double ABILITY_PRIOR_VARIANCE = 1;
            public const double DIFFICULTY_PRIOR_MEAN = 0;
            public const double DIFFICULTY_PRIOR_VARIANCE = 1;
            public const double DISCRIM_PRIOR_SHAPE = 1;
            public const double DISCRIM_PRIOR_SCALE = 0.01;
            const int NUMBER_OF_ITERATIONS = 35;
    
            // Ranges - size of the variables
            public static Range task;
            public static Range worker;
            public static Range choice;
            public static Range workerTask;
    
            // Main Variables in the model
            public static VariableArray<double> workerAbility;
            public static VariableArray<double> taskDifficulty;
            public static VariableArray<double> discrimination;
            public static VariableArray<int> trueLabel;
            public static VariableArray<VariableArray<int>, int[][]> workerResponse;
    
            // Variables in model
            public static Variable<int> WorkerCount;
            public static VariableArray<int> WorkerTaskCount;
            public static VariableArray<VariableArray<int>, int[][]> WorkerTaskIndex;
    
            // Prior distributions
            public static Gaussian workerAbilityPrior;
            public static Gaussian taskDifficultyPrior;
            public static Gamma discriminationPrior;
    
            // Posterior distributions
            public static Gaussian[] workerAbilityPosterior;
            public static Gaussian[] taskDifficultyPosterior;
            public static Gamma[] discriminationPosterior;
            public static Discrete[] trueLabelPosterior;
    
            // Inference engine
            public static InferenceEngine Engine;
            #endregion
    
            public static void RunDARE(int nQuestions, int nSubjects, int nChoices, int[][] workerLabels, int[][] taskIndices)
            {
                worker = new Range(nSubjects).Named("worker");
                task = new Range(nQuestions).Named("task");
                choice = new Range(nChoices).Named("choice");
    
                // The tasks for each worker
                WorkerTaskCount = Variable.Array<int>(worker).Named("WorkerTaskCount");
                workerTask = new Range(WorkerTaskCount[worker]).Named("workerTask");
                WorkerTaskIndex = Variable.Array(Variable.Array<int>(workerTask), worker).Named("WorkerTaskIndex");
                WorkerTaskIndex.SetValueRange(task);
    
                //worker ability for each worker
                workerAbilityPrior = new Gaussian(ABILITY_PRIOR_MEAN, ABILITY_PRIOR_VARIANCE);
                workerAbility = Variable.Array<double>(worker).Named("workerAbility");
                workerAbility[worker] = Variable.Random(workerAbilityPrior).ForEach(worker);
                workerAbility[worker].InitialiseTo(workerAbilityPrior);
    
                //task difficulty for each task
                taskDifficultyPrior = new Gaussian(DIFFICULTY_PRIOR_MEAN, DIFFICULTY_PRIOR_VARIANCE);
                taskDifficulty = Variable.Array<double>(task).Named("taskDifficulty");
                taskDifficulty[task] = Variable.Random(taskDifficultyPrior).ForEach(task);
                taskDifficulty[task].InitialiseTo(taskDifficultyPrior);
    
                // discrimination of each task
                discriminationPrior = Gamma.FromMeanAndVariance(DISCRIM_PRIOR_SHAPE, DISCRIM_PRIOR_SCALE);
                discrimination = Variable.Array<double>(task).Named("discrimination");
                discrimination[task] = Variable.Random(discriminationPrior).ForEach(task);
                discrimination[task].InitialiseTo(discriminationPrior);
    
                //unobserved true label for each task
                trueLabel = Variable.Array<int>(task).Named("trueLabel");
                trueLabel[task] = Variable.DiscreteUniform(choice).ForEach(task);
    
                //worker label
                workerResponse = Variable.Array(Variable.Array<int>(workerTask), worker).Named("workerResponse");
    
                // The process that generates the worker's label
                using (Variable.ForEach(worker))
                {
                    var workerTaskDifficulty = Variable.Subarray(taskDifficulty, WorkerTaskIndex[worker]);
                    var workerTaskDiscrimination = Variable.Subarray(discrimination, WorkerTaskIndex[worker]);
                    var TrueLabel = Variable.Subarray(trueLabel, WorkerTaskIndex[worker]);
    
                    using (Variable.ForEach(workerTask))
                    {
                        var advantage = (workerAbility[worker] - workerTaskDifficulty[workerTask]).Named("advantage");
                        var advantageNoisy = Variable.GaussianFromMeanAndPrecision(advantage, workerTaskDiscrimination[workerTask]).Named("advantageNoisy");
                        var correct = (advantageNoisy > 0).Named("correct");
                        using (Variable.If(correct))
                            workerResponse[worker][workerTask] = TrueLabel[workerTask];
                        using (Variable.IfNot(correct))
                            workerResponse[worker][workerTask] = Variable.DiscreteUniform(choice);
                    }
                }
    
                Engine = new InferenceEngine(new ExpectationPropagation());
                Engine.Compiler.UseParallelForLoops = true;
                Engine.ShowProgress = false;
                Engine.Compiler.WriteSourceFiles = false;
    
                /// Attachs the data to the workers labels.
                WorkerTaskCount.ObservedValue = taskIndices.Select(tasks => tasks.Length).ToArray();
                WorkerTaskIndex.ObservedValue = taskIndices;
                workerResponse.ObservedValue = workerLabels;
    
                Engine.NumberOfIterations = NUMBER_OF_ITERATIONS;
    
                workerAbility.AddAttribute(new Sequential());   // needed to get stable convergence
                taskDifficulty.AddAttribute(new Sequential());  // needed to get stable convergence
    
                trueLabelPosterior = Engine.Infer<Discrete[]>(trueLabel);
            }
        }
    
        /// <summary>
        /// This class represents a single datum, and has methods to read in data.
        /// </summary>
        public class Datum
        {
            /// <summary>
            /// The worker id.
            /// </summary>
            public string WorkerId;
    
            /// <summary>
            /// The task id.
            /// </summary>
            public string TaskId;
    
            /// <summary>
            /// The worker's label.
            /// </summary>
            public int WorkerLabel;
    
            /// <summary>
            /// The task's gold label (optional).
            /// </summary>
            public int? GoldLabel;
    
            /// <summary>
            /// Loads the data file in the format (worker id, task id, worker label, ?gold label).
            /// </summary>
            /// <param name="filename">The data file.</param>
            /// <returns>The list of parsed data.</returns>
            public static IList<Datum> LoadData(string filename)
            {
                var result = new List<Datum>();
                using (var reader = new StreamReader(filename))
                {
                    string line;
                    while ((line = reader.ReadLine()) != null)
                    {
                        var strarr = line.Split(',');
                        int length = strarr.Length;
                        //if (length < 3 || length > 4) //Filter bad entries!!
                        //    continue;
    
                        int workerLabel = int.Parse(strarr[2]);
                        //if (workerLabel < -4 || workerLabel > 4) //Filter bad entries!!
                        //    continue;
    
                        var datum = new Datum()
                        {
                            WorkerId = strarr[0],
                            TaskId = strarr[1],
                            WorkerLabel = workerLabel,
                        };
    
                        if (length == 4)
                            datum.GoldLabel = int.Parse(strarr[3]);
                        else
                            datum.GoldLabel = null;
    
                        result.Add(datum);
                    }
                }
    
                return result;
            }
        }
    
        /// <summary>
        /// Data mapping class. This class manages the mapping between the data (which is
        /// in the form of task, worker ids, and labels) and the model data (which is in term of indices).
        /// </summary>
        public class DataMapping
        {
            #region Fields
            /// <summary>
            /// The mapping from the worker index to the worker id.
            /// </summary>
            public string[] WorkerIndexToId;
    
            /// <summary>
            /// The mapping from the worker id to the worker index.
            /// </summary>
            public Dictionary<string, int> WorkerIdToIndex;
    
            /// <summary>
            /// The mapping from the community id to the community index.
            /// </summary>
            public Dictionary<string, int> CommunityIdToIndex;
    
            /// <summary>
            /// The mapping from the community index to the community id.
            /// </summary>
            public string[] CommunityIndexToId;
    
            /// <summary>
            /// The mapping from the task index to the task id.
            /// </summary>
            public string[] TaskIndexToId;
    
            /// <summary>
            /// The mapping from the task id to the task index.
            /// </summary>
            public Dictionary<string, int> TaskIdToIndex;
    
            /// <summary>
            /// The lower bound of the labels range.
            /// </summary>
            public int LabelMin;
    
            /// <summary>
            /// The upper bound of the labels range.
            /// </summary>
            public int LabelMax;
            #endregion
    
            #region Properties
            /// <summary>
            /// The enumerable list of data.
            /// </summary>
            public IEnumerable<Datum> Data
            {
                get;
                private set;
            }
    
            /// <summary>
            /// The number of label values.
            /// </summary>
            public int LabelCount
            {
                get
                {
                    return LabelMax - LabelMin + 1;
                }
            }
    
            /// <summary>
            /// The number of workers.
            /// </summary>
            public int WorkerCount
            {
                get
                {
                    return WorkerIndexToId.Length;
                }
            }
    
            /// <summary>
            /// The number of tasks.
            /// </summary>
            public int TaskCount
            {
                get
                {
                    return TaskIndexToId.Length;
                }
            }
            #endregion
    
            #region Methods
            /// <summary>
            /// Creates a data mapping.
            /// </summary>
            /// <param name="data">The data.</param>
            /// <param name="numCommunities">The number of communities.</param>
            /// <param name="labelMin">The lower bound of the labels range.</param>
            /// <param name="labelMax">The upper bound of the labels range.</param>
            public DataMapping(IEnumerable<Datum> data, int numCommunities = -1, int labelMin = int.MaxValue, int labelMax = int.MinValue)
            {
                WorkerIndexToId = data.Select(d => d.WorkerId).Distinct().ToArray();
                WorkerIdToIndex = WorkerIndexToId.Select((id, idx) => new KeyValuePair<string, int>(id, idx)).ToDictionary(x => x.Key, y => y.Value);
                TaskIndexToId = data.Select(d => d.TaskId).Distinct().ToArray();
                TaskIdToIndex = TaskIndexToId.Select((id, idx) => new KeyValuePair<string, int>(id, idx)).ToDictionary(x => x.Key, y => y.Value);
                var labels = data.Select(d => d.WorkerLabel).Distinct().OrderBy(lab => lab).ToArray();
    
                if (labelMin <= labelMax)
                {
                    LabelMin = labelMin;
                    LabelMax = labelMax;
                }
                else
                {
                    LabelMin = labels.Min();
                    LabelMax = labels.Max();
                }
                Data = data;
    
                if (numCommunities > 0)
                {
                    CommunityIndexToId = Util.ArrayInit(numCommunities, comm => "Community" + comm);
                    CommunityIdToIndex = CommunityIndexToId.Select((id, idx) => new KeyValuePair<string, int>(id, idx)).ToDictionary(x => x.Key, y => y.Value);
                }
            }
    
            /// <summary>
            /// Returns the matrix of the task indices (columns) of each worker (rows).
            /// </summary>
            /// <param name="data">The data.</param>
            /// <returns>The matrix of the task indices (columns) of each worker (rows).</returns>
            public int[][] GetTaskIndicesPerWorkerIndex(IEnumerable<Datum> data)
            {
                int[][] result = new int[WorkerCount][];
                for (int i = 0; i < WorkerCount; i++)
                {
                    var wid = WorkerIndexToId[i];
                    result[i] = data.Where(d => d.WorkerId == wid).Select(d => TaskIdToIndex[d.TaskId]).ToArray();
                }
    
                return result;
            }
    
            /// <summary>
            /// Returns the matrix of the labels (columns) of each worker (rows).
            /// </summary>
            /// <param name="data">The data.</param>
            /// <returns>The matrix of the labels (columns) of each worker (rows).</returns>
            public int[][] GetLabelsPerWorkerIndex(IEnumerable<Datum> data)
            {
                int[][] result = new int[WorkerCount][];
                for (int i = 0; i < WorkerCount; i++)
                {
                    var wid = WorkerIndexToId[i];
                    result[i] = data.Where(d => d.WorkerId == wid).Select(d => d.WorkerLabel - LabelMin).ToArray();
                }
    
                return result;
            }
    
            /// <summary>
            /// Returns the the gold labels of each task.
            /// </summary>
            /// <returns>The dictionary keyed by task id and the value is the gold label.</returns>
            public Dictionary<string, int?> GetGoldLabelsPerTaskId()
            {
                // Gold labels that are not consistent are returned as null
                // Labels are returned as indexed by task index
                return Data.GroupBy(d => d.TaskId).
                  Select(t => t.GroupBy(d => d.GoldLabel).Where(d => d.Key != null)).
                  Where(gold_d => gold_d.Count() > 0).
                  Select(gold_d =>
                  {
                      int count = gold_d.Distinct().Count();
                      var datum = gold_d.First().First();
                      if (count == 1)
                      {
                          var gold = datum.GoldLabel;
                          if (gold != null)
                              gold = gold.Value - LabelMin;
                          return new Tuple<string, int?>(datum.TaskId, gold);
                      }
                      else
                      {
                          return new Tuple<string, int?>(datum.TaskId, (int?)null);
                      }
                  }).ToDictionary(tup => tup.Item1, tup => tup.Item2);
            }
    
            /// <summary>
            /// For each task, gets the majority vote label if it is unique.
            /// </summary>
            /// <returns>The list of majority vote labels.</returns>
            public int?[] GetMajorityVotesPerTaskIndex()
            {
                return Data.GroupBy(d => TaskIdToIndex[d.TaskId]).
                  OrderBy(g => g.Key).
                  Select(t => t.GroupBy(d => d.WorkerLabel - LabelMin).
                      Select(g => new { label = g.Key, count = g.Count() })).
                      Select(arr =>
                      {
                          int max = arr.Max(a => a.count);
                          int[] majorityLabs = arr.Where(a => a.count == max).Select(a => a.label).ToArray();
                          if (majorityLabs.Length == 1)
                              return (int?)majorityLabs[0];
                          else
                          {
                              return null;
                          }
                      }).ToArray();
            }
    
            /// <summary>
            /// For each task, gets the empirical label distribution.
            /// </summary>
            /// <returns></returns>
            public Discrete[] GetVoteDistribPerTaskIndex()
            {
                return Data.GroupBy(d => TaskIdToIndex[d.TaskId]).
                  OrderBy(g => g.Key).
                  Select(t => t.GroupBy(d => d.WorkerLabel - LabelMin).
                      Select(g => new
                      {
                          label = g.Key,
                          count = g.Count()
                      })).
                      Select(arr =>
                      {
                          Vector v = Vector.Zero(LabelCount);
                          foreach (var a in arr)
                              v[a.label] = (double)a.count;
                          return new Discrete(v);
                      }).ToArray();
            }
            #endregion
        }
    Hope you can tell me why I am getting Improper distribution exception.

    Thanks


    • Edited by cindyak Tuesday, March 31, 2015 1:47 PM
    Tuesday, March 31, 2015 1:45 PM
  • Precision random variables are always tricky to learn. I was able to run with a much tighter prior: 

    discriminationPrior = Gamma.FromShapeAndScale(1, 0.002);

    but I'm not sure it gives you much benefit. It is possible that you could do some damping, but I don't have time to look into this right now. You could also just dispense with discrimination for now and set it as a point mass.

    By the way, you don't need all your Initialise statements.

    John

    Tuesday, March 31, 2015 4:41 PM
    Owner
  • Hi John Guiver,

    Thanks a lot. It was very helpful. It seems to be working after changing discrimination prior scale to 0.002.

     

    But I noticed different dataset requires different discrimination prior scale and number of iterations to work seamlessly without showing any improper distribution exception.

     

    For example,

    1. Dataset "D" produce best result when  discrimination prior scale = 0.01 and number of iteration 7 to 17
    2. Dataset "CF" produce best result when  discrimination prior scale = 0.002 and number of iteration = 35

     

    Now Is there any methodical way to learn how to select these numbers? 

    Wednesday, April 1, 2015 2:44 AM