locked
Struggling to define ball count model RRS feed

  • General discussion

  • Hi,

    I am trying to self-educate myself by following simple examples. Currently I am trying to build the Machine Learning Summer school 2009 example: infer the probability over the number of colored balls in an urn. The number of balls is unknown. It is known that there are two colors and that the probability to pull each color is 50%. The observed data is the observed ball colors after some fixed number of draws (balls are replaced after each draw).

    The materials are here: https://tminka.github.io/papers/mlss2009/

    The generative story seems to be pretty clear to me:

    (1) select the number of balls (N ~ DiscreteUniform)

    (2) Select the prevalence of colors (theta ~ Bernoulli(0.5))

    (3) For each drawn ball

     (a) draw ball index (DiscreteUniform?)

     (b) observe the ball color by using the index

    I followed the worksheet provided in the materials but after many hours I seem to be stuck. Here is my current (not working) code:

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer.Maths;
    
    namespace Counting
    {
    	class Program
    	{
    		static void Main(string[] args)
    		{
    			bool[] data = { true, true, true, true, true, true, true, true, true, true };
    
    			// The probabilistic program
    			// -------------------------
    
    			// Variables describing the population
    			int maxBalls = 8;
    			Range ball = new Range(maxBalls+1); // so that numBalls = (0,...,maxBalls)
    			Variable<int> numBalls = Variable.DiscreteUniform(ball);
    			VariableArray<bool> isBlue = Variable.Array<bool>(ball);
                // ...add code here...
                isBlue[ball] = Variable.Bernoulli(0.5).ForEach(ball);
    
                // Variables describing the observations
                Range draw = new Range(data.Length);
    			VariableArray<bool> observedBlue = Variable.Array<bool>(draw);
                VariableArray<int> ballIndex = Variable.Array<int>(draw);
                VariableArray<bool> dataArray = Variable.Observed<bool>(data, draw);
                using (Variable.ForEach(draw)) {
                    // ...add code here...
                    ballIndex[draw] = Variable.DiscreteUniform( numBalls );
                    observedBlue[draw] = dataArray[ballIndex[draw]];
    			}
    
                // Inference queries about the program
                // -----------------------------------
                InferenceEngine engine = new InferenceEngine();
                // ...add code here...
                Discrete numberOfBalls = engine.Infer<Discrete>( numBalls );
    
                Console.WriteLine("Distribution over number of balls: " + numberOfBalls);
    			Console.WriteLine("Press any key...");
    			Console.ReadKey();
    
    			// Answer key
    			// ----------
    			// 10 blue, without noise:
    			// numBalls = Discrete(0 0.5079 0.3097 0.09646 0.03907 0.02015 0.01225 0.008336 0.006133)
    			// 10 blue, with 20% noise:
    			// numBalls = Discrete(0 0.463 0.2354 0.1137 0.06589 0.04392 0.0322 0.02521 0.02068)
    			// 10 blue, with 50% noise:
    			// numBalls = Discrete(0 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125)
    			// 5 blue/5 green, with 20% noise:
    			// numBalls = Discrete(0 0.08198 0.09729 0.1102 0.1217 0.1324 0.1425 0.1523 0.1617)
    		}
    	}
    }
    

    Monday, April 3, 2017 8:50 AM

All replies

  • The problem is that you are using the observed values in dataArray to define the generative process itself.  You need to first define the process, then attach the observed values to the observedBlue variable.
    Thursday, April 20, 2017 3:00 PM
    Owner
  • Thank you, Tom! I managed to come up with this code that appears to be working.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer.Maths;
    
    namespace Counting
    {
    	class Program
    	{
    		static void Main(string[] args)
    		{
    			bool[] data = { true, true, true, true, true, true, true, true, true, true };
                //bool[] data = { true, true, true, true, true, false, false, false, false, false };
    
    			// The probabilistic program
    			// -------------------------
    
    			// Variables describing the population
    			int maxBalls = 8;
    			Range ball = new Range(maxBalls+1); // so that numBalls = (0,...,maxBalls)
    			Variable<int> numBalls = Variable.DiscreteUniform(ball);
    			VariableArray<bool> isBlue = Variable.Array<bool>(ball);
                Variable<bool> switchedColor = Variable.Bernoulli(0.2);
                isBlue[ball] = Variable.Bernoulli(0.5).ForEach(ball);
    
                // Variables describing the observations
                Range draw = new Range(data.Length);
    			VariableArray<bool> observedBlue = Variable.Array<bool>(draw);
                VariableArray<int> ballIndex = Variable.Array<int>(draw);
                using (Variable.ForEach(draw))
                {
                    ballIndex[draw] = Variable.DiscreteUniform(numBalls);
                    using (Variable<bool>.Switch(ballIndex[draw])) {
                        using (Variable.If(switchedColor))
                        {
                            observedBlue[draw] = !isBlue[ballIndex[draw]];
                        }
                        using (Variable.IfNot(switchedColor))
                        {
                            observedBlue[draw] = isBlue[ballIndex[draw]];
                        }
                    }
                }
    
                observedBlue.ObservedValue = data;
    
                // Inference queries about the program
                // -----------------------------------
                InferenceEngine engine = new InferenceEngine();
                // ...add code here...
                Discrete numberOfBalls = engine.Infer<Discrete>( numBalls );
    
                Console.WriteLine("Distribution over number of balls: " + numberOfBalls);
    			Console.WriteLine("Press any key...");
    			Console.ReadKey();
    
    			// Answer key
    			// ----------
    			// 10 blue, without noise:
    			// numBalls = Discrete(0 0.5079 0.3097 0.09646 0.03907 0.02015 0.01225 0.008336 0.006133)
    			// 10 blue, with 20% noise:
    			// numBalls = Discrete(0 0.463 0.2354 0.1137 0.06589 0.04392 0.0322 0.02521 0.02068)
    			// 10 blue, with 50% noise:
    			// numBalls = Discrete(0 0.125 0.125 0.125 0.125 0.125 0.125 0.125 0.125)
    			// 5 blue/5 green, with 20% noise:
    			// numBalls = Discrete(0 0.08198 0.09729 0.1102 0.1217 0.1324 0.1425 0.1523 0.1617)
    		}
    	}
    }
    


    In this code I attempted to do the final assignment - make the model account for noisy observations (the Variable.If() cases). I am bit stuck here as I don't know where the mistake is. May be this is a topic of another question...

    Saturday, April 22, 2017 5:04 AM
  • You are using a single switchedColor variable for all balls.
    Sunday, April 23, 2017 2:43 PM
    Owner
  • You are right, that was the problem! Is this more efficient to create a single random variable switchedColor under the first "using" or create an array?
    Tuesday, April 25, 2017 6:01 AM
  • Doesn't make a difference here.  Both approaches should generate the same inference code.
    Tuesday, April 25, 2017 8:29 AM
    Owner