locked
Initialisation and shared variables with a compiled algorithm RRS feed

  • Question

  •  

    Dear Infer.NET Team,

    First of all, thank you for making such a wonderful tool.  I was hoping you could answer two questions for me, since I cannot figure them out from the supplied documentation.
    1) Precisely what does InferShared() do?  I expect it sets the next chunk's priors to this chunk's posteriors. The documentation also says something about setting messages - does this mean it reinitializes the shared variabled, and if so to what?   Does it do anything else?  I ask because I am trying to emulate it's function with a CompiledAlgorithm.

    2) I am trying to perform a matrix factorization over a rating matrix (size numUsers by numItems), similar to the Netflix algorithms.  I split the ratings data into 2000-user chunks.  This means that my chunks have disjoint user variables but shared items variables.  At the beginning of each chunk I set the observed values of the user variables priors to their inferred marginals from the previous pass over the data.  I would like to initialise the user variables to their previous inferred marginals as well, so I do

    userVariables.InitialiseTo(Distribution<double>.Array(userVariablesPrior.ObservedValue)). //userVariables, userVariables Prior have range rChunkUsers of size numUsersInChunk
    

    I then compile my algorithm, and run
    compiledAlgorithm.SetObservedValue(numUsersInChunk.NameInGeneratedCode, numUsersInChunk);
    
    compiledAlgorithm.SetObservedValue(userBiasesPrior.NameInGeneratedCode, userBiasesMarginalSavedFromPreviousPass);
    //userBiasesMarginalSavedFromPreviousPass has size equal to the number of users in this chunk

    Now whenever I do compiledAlgorithm.Execute(1) or compiledAlgorithm.Reset() I get an IndexOutOfRange exception in Model_EP.cs. I think this is a general problem with initializing
    to a non-constant sized observed variable array. Is there any way around this?

     

     


    • Edited by Peter Forbes Friday, December 9, 2011 10:32 AM Reformatted code
    Friday, December 9, 2011 10:31 AM

Answers

  • Hi Peter

    We have noted the request for IGeneratedAlgorithm toget its own InferShared convenience method - but no promises.

    You can set the initialiser as a variable as follows. I have just set to uniform in the code below; you should change to get the priors as needed.

    John

    //Model

    Variable<int> numUsersInCurrentChunk = Variable.New<int>().Named("NumUsers");

    Range rUsersInCurrentChunk = new Range(numUsersInCurrentChunk).Named("N");

    VariableArray<Gaussian> userValuesPrior = Variable.Array<Gaussian>(rUsersInCurrentChunk).Named("Prior");

    VariableArray<double> userValues = Variable.Array<double>(rUsersInCurrentChunk).Named("UserVals");

    userValues[rUsersInCurrentChunk] = Variable.Random<double, Gaussian>(userValuesPrior[rUsersInCurrentChunk]);

    VariableArray<double> dataToFit = Variable.Array<double>(rUsersInCurrentChunk).Named("Users");

    dataToFit[rUsersInCurrentChunk] = Variable.GaussianFromMeanAndVariance(userValues[rUsersInCurrentChunk], 1);

    Variable<IDistribution<double[]>> initialiser = Variable.New<IDistribution<double[]>>().Named("Init");

    userValues.InitialiseTo(initialiser);

     

    //Dummy observed values

    numUsersInCurrentChunk.ObservedValue = 0;

    userValuesPrior.ObservedValue = new Gaussian[0];

    dataToFit.ObservedValue = new double[0];

    initialiser.ObservedValue = Distribution<double>.Array(new Gaussian[0]);

     

    InferenceEngine ie = new InferenceEngine();

    var compiledAlgorithm = ie.GetCompiledInferenceAlgorithm(userValues);

     

    for (int i = 0; i < numChunks; i++)

    {

           compiledAlgorithm.SetObservedValue(numUsersInCurrentChunk.NameInGeneratedCode, numUsersPerChunk[i]);

           compiledAlgorithm.SetObservedValue(userValuesPrior.NameInGeneratedCode, userPriorsByChunk[i]);

           compiledAlgorithm.SetObservedValue(dataToFit.NameInGeneratedCode, dataToFitByChunk[i]);

           // TODO: set from previous iteration for this chunk

           var initialiserObserved =  Distribution<double>.Array(Enumerable.Repeat(Gaussian.Uniform(), numUsersPerChunk[i]).ToArray());

           compiledAlgorithm.SetObservedValue(initialiser.NameInGeneratedCode, initialiserObserved);

           compiledAlgorithm.Execute(1);

    }

    Tuesday, December 13, 2011 9:50 AM
    Owner

All replies

  • Hi Peter

    For the straight-forward case where the shared variable is defined in terms of an observable prior, Model.InferShared

    1. sets the prior's observed value to the product the true prior and all output messages except from the current batch and model. This is is equivalent to taking ratio of the marginal and the output message from the current batch and model which is more efficient
    2. runs inference on the model and retrieves the output messages for the current model and batch. Assuming you are doing the efficient ratio version, for EP this just calls GetOutput, for VMP you need to get marginal and divide by the input message from 1.

    For your second question, it would really help if you could post your model code stripped down with toy data. Would that be possible?

    John

    Friday, December 9, 2011 2:22 PM
    Owner
  • Thanks John.  Your first answer is detailed enough that I should be able to use chunked data with a compiled algorithm object.  Do you know IGeneratedAlgorithm will get its own InferShared convenience method in a future version?  If not, please add it to the suggestion list!

    Here is the minimal code for my second example.  I think my problem is with InitialiseTo().  Similar to this thread (http://social.microsoft.com/Forums/en-US/infer.net/thread/ebbc3ae6-738f-4dc7-a998-b92082879612), I found that initialising seemed to have no effect unless it was done before compiling the algorithm.  I would like to reinitialize on each iteration *after* the prior has been updated to its new observed value.

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using MicrosoftResearch.Infer;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer.Models;
    
    namespace SimpleTestCase
    {
        class Program
        {
            static void Main(string[] args)
            {
                //Simple fake data for model
                int numChunks = 2;
                int[] numUsersPerChunk = new int[] { 1, 2 };
                Gaussian[][] userPriorsByChunk = new Gaussian[numChunks][];
                double[][] dataToFitByChunk = new double[numChunks][];
                for(int i=0; i<numChunks; i++){
                    userPriorsByChunk[i] = new Gaussian[numUsersPerChunk[i]];
                    dataToFitByChunk[i] = new double[numUsersPerChunk[i]];
                    for(int j=0; j<numUsersPerChunk[i]; j++){                    
                        userPriorsByChunk[i][j] = Gaussian.FromMeanAndVariance(0,1);
                        dataToFitByChunk[i][j] = 1;
                    }
                }
    
                //Model
                Variable<int> numUsersInCurrentChunk = Variable.New<int>();
                Range rUsersInCurrentChunk = new Range(numUsersInCurrentChunk);
                VariableArray<Gaussian> userValuesPrior = Variable.Array<Gaussian>(rUsersInCurrentChunk);
                VariableArray<double> userValues = Variable.Array<double>(rUsersInCurrentChunk);
                userValues[rUsersInCurrentChunk] = Variable.Random<double, Gaussian>(userValuesPrior[rUsersInCurrentChunk]);
                VariableArray<double> dataToFit = Variable.Array<double>(rUsersInCurrentChunk);
                dataToFit[rUsersInCurrentChunk] = Variable.GaussianFromMeanAndVariance( userValues[rUsersInCurrentChunk], 1);
    
                //Dummy observed values
                numUsersInCurrentChunk.ObservedValue = 0;
                userValuesPrior.ObservedValue = new Gaussian[0];
                dataToFit.ObservedValue = new double[0];
    
                //Set initializer here so it can be compiled in.
                userValues.InitialiseTo(Distribution<double>.Array(userValuesPrior.ObservedValue));
    
                InferenceEngine ie = new InferenceEngine();
                var compiledAlgorithm = ie.GetCompiledInferenceAlgorithm(userValues);
    
                for (int i = 0; i < numChunks; i++)
                {
                    compiledAlgorithm.SetObservedValue(numUsersInCurrentChunk.NameInGeneratedCode, numUsersPerChunk[i]);
                    compiledAlgorithm.SetObservedValue(userValuesPrior.NameInGeneratedCode, userPriorsByChunk[i]);
                    compiledAlgorithm.SetObservedValue(dataToFit.NameInGeneratedCode, dataToFitByChunk[i]);
                    compiledAlgorithm.Execute(1);
                }
            }
        }
    }
    
    


    Friday, December 9, 2011 5:04 PM
  • Hi Peter

    We have noted the request for IGeneratedAlgorithm toget its own InferShared convenience method - but no promises.

    You can set the initialiser as a variable as follows. I have just set to uniform in the code below; you should change to get the priors as needed.

    John

    //Model

    Variable<int> numUsersInCurrentChunk = Variable.New<int>().Named("NumUsers");

    Range rUsersInCurrentChunk = new Range(numUsersInCurrentChunk).Named("N");

    VariableArray<Gaussian> userValuesPrior = Variable.Array<Gaussian>(rUsersInCurrentChunk).Named("Prior");

    VariableArray<double> userValues = Variable.Array<double>(rUsersInCurrentChunk).Named("UserVals");

    userValues[rUsersInCurrentChunk] = Variable.Random<double, Gaussian>(userValuesPrior[rUsersInCurrentChunk]);

    VariableArray<double> dataToFit = Variable.Array<double>(rUsersInCurrentChunk).Named("Users");

    dataToFit[rUsersInCurrentChunk] = Variable.GaussianFromMeanAndVariance(userValues[rUsersInCurrentChunk], 1);

    Variable<IDistribution<double[]>> initialiser = Variable.New<IDistribution<double[]>>().Named("Init");

    userValues.InitialiseTo(initialiser);

     

    //Dummy observed values

    numUsersInCurrentChunk.ObservedValue = 0;

    userValuesPrior.ObservedValue = new Gaussian[0];

    dataToFit.ObservedValue = new double[0];

    initialiser.ObservedValue = Distribution<double>.Array(new Gaussian[0]);

     

    InferenceEngine ie = new InferenceEngine();

    var compiledAlgorithm = ie.GetCompiledInferenceAlgorithm(userValues);

     

    for (int i = 0; i < numChunks; i++)

    {

           compiledAlgorithm.SetObservedValue(numUsersInCurrentChunk.NameInGeneratedCode, numUsersPerChunk[i]);

           compiledAlgorithm.SetObservedValue(userValuesPrior.NameInGeneratedCode, userPriorsByChunk[i]);

           compiledAlgorithm.SetObservedValue(dataToFit.NameInGeneratedCode, dataToFitByChunk[i]);

           // TODO: set from previous iteration for this chunk

           var initialiserObserved =  Distribution<double>.Array(Enumerable.Repeat(Gaussian.Uniform(), numUsersPerChunk[i]).ToArray());

           compiledAlgorithm.SetObservedValue(initialiser.NameInGeneratedCode, initialiserObserved);

           compiledAlgorithm.Execute(1);

    }

    Tuesday, December 13, 2011 9:50 AM
    Owner
  • Thank you very much.  I see that you made "initialiser" a Variable which you then set to the observed value of userValuesPrior, rather than setting it to the value of userValuesPrior.ObservedValue directly. I never thought of this, but it makes perfect sense. Thanks again!
    Tuesday, December 13, 2011 9:56 AM