locked
Trueskill implementation help (Migrated from community.research.microsoft.com) RRS feed

  • Question

  • TimSal posted on 02-19-2010 8:36 AM

    First of all: amazing framework you guys are building!

    I have been trying out Infer.NET by implementing a version of the Trueskill model, using IronPython. Although the implementation works fine for small data sets, it runs out of memory when I increase the amount of data. (it still fits into memory easily in, for example, MATLAB). This is most likely due to the inefficient way in which I am specifying the priors on the skill levels of the players: Because I allow the skills to change over time, I am assigning these priors using the following (simplified) code

     

    # players = chronological list of players

    skills=list() # empty list

    last_game={} # empty dictionary

    for i in range(len(players)):

         # check whether this player has played a game before and specify prior

         if player[i] in last_game:

              game_number=last_game(player[i])

              skills.append(Variable.GaussianFromMeanAndPrecision(skills[game_number],1))

         else:

               skills.append(Variable.GaussianFromMeanAndPrecision(0,1))

         # enter/update player in last_game dictionary

         last_game[player[i]]=i

     

    How can I do this more efficiently? I have tried using various combinations of  "Variable.Array", "ForEach" and "with ... as forBlock" but with no success.

    Friday, June 3, 2011 6:24 PM

Answers

  • jimselkirk replied on 03-01-2011 12:34 PM

    Thanks John / Tom, exactly what I was after. Will post back after my implementation. Jim.

    Friday, June 3, 2011 6:25 PM

All replies

  • John Guiver replied on 02-22-2010 8:45 AM

    Hi Tim

    The following shows how you would scale up for a set of two-player contests. Extensions to more players are straightforward. Keep the dictionary of player skills outside the model. Build the model so that everything is a variable, and therefore the model does not get recompiled.

    I'm afraid I have done this in C# - I will leave you to port to IronPython.

    John

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer;
    namespace TimSal1_TrueSkill
    {
      class TwoPersonTrueSkillModel
     
    {
        // In order of result:
        Variable<Gaussian>[] SkillPrior = new Variable<Gaussian>[2];
        Variable<double>[] Skill = new Variable<double>[2];
        Variable<double>[] Performance = new Variable<double>[2];
        InferenceEngine engine = new InferenceEngine();
       
        public
    TwoPersonTrueSkillModel(double beta)
        {
          for (int i = 0; i < 2; i++)
            SkillPrior[i] = Variable.New<Gaussian>();
          // Skills
          for (int i = 0; i < 2; i++)
            Skill[i] = Variable.Random<double, Gaussian>(SkillPrior[i]);
          // Performances
          for (int i=0; i < 2; i++)
            Performance[i] = Variable.GaussianFromMeanAndPrecision(Skill[i], beta);
          // Observed result
          Variable
    .ConstrainPositive(Performance[0] - Performance[1]);
        }
        public Gaussian[] Infer(Gaussian[] priors)
        {
          for (int i = 0; i < 2; i++)
            SkillPrior[i].ObservedValue = priors[i];

          Gaussian[] skillPosterior = new Gaussian[2];
          for (int i = 0; i < 2; i++)
            skillPosterior[i] = engine.Infer<Gaussian>(Skill[i]);
          return skillPosterior;
        }
      }
      class Program
     
    {
        static void Main(string[] args)
        {
          var model = new TwoPersonTrueSkillModel(1.0);
          var PlayerSkills = new Dictionary<int,Gaussian>();
          var currentGameSkills = new Gaussian[2];
          // Game results
          int[][] games = new int[][] {
          new int[] {0, 3, 0, 2, 2, 0}, // winners
          new int[] {1, 2, 2, 1, 3, 3}}; // losers
          // Set up priors
          for (int g = 0; g < games[0].Length; g++)
          {
             for (int i=0; i < 2; i++)
            {
              if (PlayerSkills.ContainsKey(games[i][g]))
                currentGameSkills[i] = PlayerSkills[games[i][g]];
              else
                currentGameSkills[i] = Gaussian.FromMeanAndPrecision(0.0, 1.0);
            }
            // Get update skills
            Gaussian[] updatedSkills = model.Infer(currentGameSkills);
            for (int i=0; i < 2; i++)
              PlayerSkills[games[i][g]] = updatedSkills[i];
          }
        }
      }
    }

    Friday, June 3, 2011 6:24 PM
  • TimSal replied on 02-23-2010 4:30 AM

    Thanks a lot! The code is a great help, although it's not exactly what I am trying to do: As it stands now, this code only allows you to do one forward pass over the data. (as the old skill levels are discarded) The resulting inference can be improved by saving all the intermediate skill levels and doing additional backward and forward passes (which is not according to the original description of TrueSkill, so perhaps I should have been more elaborate). I was able to code this myself, saving all skill levels, but throwing away all performance variables (and a number of other intermediate variables I use in my version of the model). My implementation in Infer.NET ran out of memory, but perhaps this is because all of these intermediate variables are retained, rather than because of my specification of the skill priors. I will try extending your code with a backward loop and persistent skill variables to see whether that works.

    Last question: How much overhead can I expect in storing the random variables? What I mean is: I can represent a Gaussian variable by two double values, but Infer.NET probably needs some extra data identifying the variable as a Gaussian, how much memory will this take up?

    Thanks again!

    Friday, June 3, 2011 6:24 PM
  • minka replied on 02-24-2010 7:17 AM

    I don't think the issue is in representing Gaussians.  Because Gaussian is a struct type in .NET, it should only require two doubles for storage, with no type tag required. A type tag is only required if the object is boxed. 

    We give some overall guidelines about memory handling in the FAQ: http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Frequently%20Asked%20Questions.aspx

    If you want a precise idea of how much memory is required by Infer.NET on your model, you can look at the generated code.  Set the following flags on the inference engine:

     

     

     

    engine.Compiler.WriteSourceFiles = true;
    e
    ngine.ShowCompileProgress = true;

    (The second flag is so you can verify that the compiler succeeds in writing the files.)  The location and structure of the generated code is documented here:

    http://research.microsoft.com/en-us/um/cambridge/projects/infernet/docs/Structure%20of%20generated%20inference%20code.aspx

    This allows you to see all the memory allocations being done.

    Friday, June 3, 2011 6:24 PM
  • TimSal replied on 02-26-2010 8:41 AM

    OK, thanks. I'll try adapting John's code for my application then.

    Any ideas on how I can generalize the code to a variable number of players without having Infer.NET recompile? I currently just use 10 different versions of the same model (with different numbers of players) and then use the appropriate one for each match, but this seems kind of clumsy.

    Friday, June 3, 2011 6:24 PM
  • John Guiver replied on 02-26-2010 9:33 AM

    Just to add to Tom's reply earlier, it is not the random variables or their marginal distributions which eat up lots of memory, it is all the messages between variables and factors in both directions that consume the memory. Message-passing algorithms are inherently memory-hungry. Large factor graphs which cannot fit in memory need to be split into smaller chunks, scheduled separetely, and the messages between different chunks need to be handled explicitly. Shared variables provide a useful mechanism for doing this.

    As regards your specific question, here is a way to avoid multiple models and recompilation:

    class NPersonTrueSkillModel
    {
      // Observed
      Variable<int> NumPlayers = Variable.New<int>();
      Variable<int> NumPlayersMinus1 = Variable.New<int>();
      VariableArray<int> LeftIndices;
      VariableArray<int> RightIndices;
      VariableArray<Gaussian> SkillPrior;
      // Skills/performances in order of result:
      VariableArray<double> Skill;
      VariableArray<double> Performance;
      InferenceEngine engine = new InferenceEngine();
      public NPersonTrueSkillModel(double beta)
      {
        Range p = new Range(NumPlayers);
        Range p1 = new Range(NumPlayersMinus1);
        SkillPrior = Variable.Array<Gaussian>(p);
        Skill = Variable.Array<double>(p);
        Skill[p] = Variable.Random<double, Gaussian>(SkillPrior[p]);
        Performance = Variable.Array<double>(p);
        Performance[p] = Variable.GaussianFromMeanAndPrecision(Skill[p], beta);
        LeftIndices = Variable.Array<int>(p1);
        RightIndices = Variable.Array<int>(p1);
        var left = Variable.Subarray(Performance, LeftIndices);
        var right = Variable.Subarray(Performance, RightIndices);
        Variable.ConstrainPositive(left[p1] - right[p1]);
      }
      public Gaussian[] Infer(Gaussian[] priors)
      {
        SkillPrior.ObservedValue = priors;
        int n1 = priors.Length-1;
        int[] leftIndices = new int[n1];
        int[] rightIndices = new int[n1];
        for (int i = 0; i < n1; i++)
        {
          leftIndices[i] = i;
          rightIndices[i] = i+1;
        }
        LeftIndices.ObservedValue = leftIndices;
        RightIndices.ObservedValue = rightIndices;
        NumPlayers.ObservedValue = priors.Length;
        NumPlayersMinus1.ObservedValue = n1;
        return engine.Infer<Gaussian[]>(Skill);
      }
    }

    Friday, June 3, 2011 6:24 PM
  • TimSal replied on 02-26-2010 10:33 AM

    Great! Thanks a lot!

    Friday, June 3, 2011 6:24 PM
  • jimselkirk replied on 02-28-2011 3:26 PM

    Hello John,

    How would you expand the above example to work with teams? I am guessing altering NPersonTrueSkillModel to include an extra layer between skill and performance, but would very much appreciate a pointer.

    Jim

    Friday, June 3, 2011 6:25 PM
  • minka replied on 03-01-2011 4:49 AM

    You would need to define an index array membersOfTeam that gives the player numbers on each team.  Using these numbers, extract the performances for the members of each team, sum them, and constrain each higher team sum to be higher than the next team sum.  Here is an example:

    Variable<int> nTeams = Variable.New<int>();

    Range team = new Range(nTeams);

    VariableArray<int> sizeOfTeam = Variable.Array<int>(team);

    Range teamMember = new Range(sizeOfTeam[team]);

    var membersOfTeam = Variable.Array(Variable.Array<int>(teamMember), team);

    var TeamPerformance = Variable.Array<double>(team);

    TeamPerformance[team] = Variable.Sum(Variable.Subarray(Performance, membersOfTeam[team]));

    Range team1 = new Range(nTeams-1);

    LeftIndices = Variable.Array<int>(team1);

    RightIndices = Variable.Array<int>(team1);

    var left = Variable.Subarray(TeamPerformance, LeftIndices);

    var right = Variable.Subarray(TeamPerformance, RightIndices);

    Variable.ConstrainPositive(left[team1] - right[team1]);

     

     

     

    Friday, June 3, 2011 6:25 PM
  • John Guiver replied on 03-01-2011 5:23 AM

    Hi Jim

    Yes, you are correct, you can just add another layer. The easiest ways is just to think of what was player as now team, and to build the team skills from the player skills. The representation of players in the game is now in terms of jagged arrays of integer variables where the outer index corresponds to teams, and the inner to players on the team (and there may be different numbers of player on each team).  Then it is just a few extra lines, assuming you want team skills to be the sum of the individual skills. Here is what the class would look like - I have explicitly named Skills as Team or Player.  The Infer method is pretty much the same, but now it receives and returns 2-D arrays of Gaussians representing team/player on team, and also requires 'observing' the number of players on each team:

      class TrueSkillTeamModel

      {

        // Observed

        VariableArray<int> NumPlayersPerTeam;

        Variable<int> NumTeams = Variable.New<int>();

        Variable<int> NumTeamsMinus1 = Variable.New<int>();

        VariableArray<int> LeftIndices;

        VariableArray<int> RightIndices;

        VariableArray<VariableArray<Gaussian>, Gaussian[][]> SkillPrior;

     

        // Skills/performances in order of result:

        VariableArray<VariableArray<double>, double[][]> PlayerSkill;

        VariableArray<double> TeamSkill;

        VariableArray<double> Performance;

        InferenceEngine engine = new InferenceEngine();

     

        public TrueSkillTeamModel(double beta)

        {

          Range t = new Range(NumTeams);

          Range t1 = new Range(NumTeamsMinus1);

          NumPlayersPerTeam = Variable.Array<int>(t);

          Range p = new Range(NumPlayersPerTeam[t]);

          SkillPrior = Variable.Array(Variable.Array<Gaussian>(p), t);

          PlayerSkill = Variable.Array(Variable.Array<double>(p), t);

          PlayerSkill[t][p] = Variable.Random<double, Gaussian>(SkillPrior[t][p]);

          TeamSkill = Variable.Array<double>(t);

          TeamSkill[t] = Variable.Sum(PlayerSkill[t]);

          Performance = Variable.Array<double>(t);

          Performance[t] = Variable.GaussianFromMeanAndPrecision(TeamSkill[t], beta);

          LeftIndices = Variable.Array<int>(t1);

          RightIndices = Variable.Array<int>(t1);

          var left = Variable.Subarray(Performance, LeftIndices);

          var right = Variable.Subarray(Performance, RightIndices);

          Variable.ConstrainPositive(left[t1] - right[t1]);

        }

     

        public Gaussian[][] Infer(Gaussian[][] priors)

        {

          SkillPrior.ObservedValue = priors;

          int n1 = priors.Length-1;

          int[] leftIndices = new int[n1];

          int[] rightIndices = new int[n1];

          for (int i = 0; i < n1; i++)

          {

            leftIndices[i] = i;

            rightIndices[i] = i+1;

          }

          LeftIndices.ObservedValue = leftIndices;

          RightIndices.ObservedValue = rightIndices;

          NumTeams.ObservedValue = priors.Length;

          NumTeamsMinus1.ObservedValue = n1;

          NumPlayersPerTeam.ObservedValue = priors.Select(g => g.Length).ToArray();

          return engine.Infer<Gaussian[][]>(PlayerSkill);

        }

      }

    Here is a driver program that shows this in action for some simple made-up data:

      class Program

      {

        static void Main(string[] args)

        {

          var model = new TrueSkillTeamModel(1.0);

          var PlayerSkills = new Dictionary<string, Gaussian>();

     

          // Results:

          string[][][] games = new string[][][]

          {

            // Game 1

            new string[][]

            {

              new string[] {"Alice", "Cath", "Ed"},  // Winning team

              new string[] {"Bob", "Fiona",}, // Second placed team

              new string[] {"Dave"} // Third placed team

            },

            // Game 2

            new string[][]

            {

              new string[] {"Alice", "Cath"},  // Winning team

              new string[] {"Bob", "Ed"} // Second placed team

            },

            // Game 2

            new string[][]

            {

              new string[] {"Bob", "Cath", "Dave"},  // Winning team

              new string[] {"Alice", "Fiona"} // Second placed team

            },

            // Game 2

            new string[][]

            {

              new string[] {"Cath", "Dave"},  // Winning team

              new string[] {"Bob", "Ed"}, // Second placed team

              new string[] {"Fiona", "Alice"} // Third placed team

            }

          };

     

          // Set up priors

          for (int g = 0; g < games.Length; g++)

          {

            int numTeams = games[g].Length;

            Gaussian[][] priorSkills = new Gaussian[numTeams][];

            for (int t=0; t < numTeams; t++)

            {

              int numPlayers = games[g][t].Length;

              priorSkills[t] = new Gaussian[numPlayers];

              for (int p = 0; p < numPlayers; p++)

              {

                string playerId = games[g][t][p];

                if (PlayerSkills.ContainsKey(playerId))

                  priorSkills[t][p] = PlayerSkills[playerId];

                else

                  priorSkills[t][p] = Gaussian.FromMeanAndPrecision(0.0, 1.0);

              }

            }

            // Get updated skills

            Gaussian[][] updatedSkills = model.Infer(priorSkills);

     

            for (int t=0; t < numTeams; t++)

            {

              int numPlayers = games[g][t].Length;

              for (int p = 0; p < numPlayers; p++)

              {

                PlayerSkills[games[g][t][p]] = updatedSkills[t][p];

              }

            }

          }

     

          foreach (string plr in PlayerSkills.Keys)

            Console.WriteLine("Skill of {0} = {1}", plr, PlayerSkills[plr]);

        }

      }

     

    Friday, June 3, 2011 6:25 PM
  • John Guiver replied on 03-01-2011 5:51 AM

    Sorry Jim

    As Tom pointed out in his post, the correct model is two sample individual's performances from their individual skills and then sum the performances to get the team performance rather than adding skills to get a team skill and then sampling performance. The correction to my version is as follows:

        public TrueSkillTeamModel(double beta)

        {

          Range t = new Range(NumTeams);

          Range t1 = new Range(NumTeamsMinus1);

          NumPlayersPerTeam = Variable.Array<int>(t);

          Range p = new Range(NumPlayersPerTeam[t]);

          SkillPrior = Variable.Array(Variable.Array<Gaussian>(p), t);

          PlayerSkill = Variable.Array(Variable.Array<double>(p), t);

          PlayerSkill[t][p] = Variable.Random<double, Gaussian>(SkillPrior[t][p]);

          PlayerPerformance = Variable.Array(Variable.Array<double>(p), t);

          PlayerPerformance[t][p] = Variable.GaussianFromMeanAndPrecision(PlayerSkill[t][p], beta);

          Performance = Variable.Array<double>(t);

          Performance[t] = Variable.Sum(PlayerPerformance[t]);

          LeftIndices = Variable.Array<int>(t1);

          RightIndices = Variable.Array<int>(t1);

          var left = Variable.Subarray(Performance, LeftIndices);

          var right = Variable.Subarray(Performance, RightIndices);

          Variable.ConstrainPositive(left[t1] - right[t1]);

        }

     

    where PlayerPerformance is of type VariableArray<VariableArray<double>, double[][]>

    John

    Friday, June 3, 2011 6:25 PM
  • jimselkirk replied on 03-01-2011 12:34 PM

    Thanks John / Tom, exactly what I was after. Will post back after my implementation. Jim.

    Friday, June 3, 2011 6:25 PM
  • Hi John,

    I have checked your answer but the result is not good. There are something weird happen in your code

    1. Because of the difference in the size of result (the 1st and 4th have size 3, the size of 2nd and 3rd are 2), there is an exception.
    2. When I try to add a new dummy array to equalize the size (new string[] {}), the result in your code and the result from the applet are not the same? 

    Would you please take time to check the code again?

    Thank you and best regards,

    Nam


    • Edited by Nam Doan Thursday, April 18, 2013 4:37 PM
    Thursday, April 18, 2013 4:31 PM
  • This code is not going to match the applet because the various parameters like beta are being set differently and also the model here doesn't include skill dynamics.
    Thursday, April 25, 2013 6:38 PM
    Owner
  • The exception could be solved by modifying the function Infer a little. Moving to the observed value of left indices, right indices and prior to the position before the inference.

            public Gaussian[][] Infer(Gaussian[][] priors)
            {
                
                int n1 = priors.Length - 1;
                int[] leftIndices = new int[n1];
                int[] rightIndices = new int[n1];
                for (int i = 0; i < n1; i++)
                {
                    leftIndices[i] = i;
                    rightIndices[i] = i + 1;
                }
                NumTeams.ObservedValue = priors.Length;
                NumTeamsMinus1.ObservedValue = n1;
                NumPlayersPerTeam.ObservedValue = priors.Select(g => g.Length).ToArray();
                SkillPrior.ObservedValue = priors;
                LeftIndices.ObservedValue = leftIndices;
                RightIndices.ObservedValue = rightIndices;
                return engine.Infer<Gaussian[][]>(PlayerSkill);
            }

    Saturday, August 3, 2013 11:34 AM