locked
Q. Can I declare a variable for dataset having dynamic size? (Migrated from community.research.microsoft.com) RRS feed

  • Question

  • sungchul kim posted on 12-06-2010 1:35 AM

    Hi all,

    I have a question. In my system, I want to declare 'VariableArray<Vector> or VariableArray<double[]>' for dataset having dynamic size.

    For example, one of my datasets is term features in advertisements. 

    AdId Term T1 T2 T3 T4 T5

    1 great 1 0.4 2.3 2.1 4.1 5.2

    1 hotel 11 5.3 19.2 1.4 5.2 3.2

    1 book 1 13 5.2 1.2 5.2 5.2

    2 cheap 3 3.2 5.1 5.1 5.2

    2 price 3 12 5 23.2 1.2 5.

    where T1~T5 is features for each word, however I have to use a set of term vectors having same AdId. I've searched this forum and documents many times. I cannot find the answer yet. 

    Friday, June 3, 2011 6:07 PM

Answers

All replies

  • John Guiver replied on 12-06-2010 4:00 AM

    Hi

    The answer depends on what the emphasis of your question is. Defining random variable arrays of dynamic size is straightforward, and is described in Jagged Arrays. Another question is how do you want to use these in your model. The Sparse Bayes Point Machine gives an example of a model which uses dynamic feature vectors. Can you describe your model in more detail?

    John G.

     

    Friday, June 3, 2011 6:08 PM
  • sungchul kim replied on 12-06-2010 8:36 PM

    Thanks for quick reply.

    Actually, the random variable that I want to use is does not need to have dynamic dimension. Sorry for short explanation. What I want to do is if there are two nodes

    A -> B

    where A indicates n-dim vector and B indicates score lists (a list of scalar value). Each score B depends on certain number of vectors in A and that number is dynamic. For example, in my model, A consists of feature vectors of terms in each ad and B is ad score computed using A as follows.

    A = {a1, a2, a3, a4, a5}

    AdId Term T1 T2 T3 T4 T5

    1 great 1 0.4 2.3 2.1 4.1 5.2

    1 hotel 11 5.3 19.2 1.4 5.2 3.2

    1 book 1 13 5.2 1.2 5.2 5.2

    2 cheap 3 3.2 5.1 5.1 5.2

    2 price 3 12 5 23.2 1.2 5.

     

    B = {b1, b2}

    AdId  Score1 Score2 Score3

    1 MAX(<wa1>, <wa2>, <wa3>) SUM(<wa1>, <wa2>, <wa3>) MEAN(<wa1>, <wa2>, <wa3>)

    2 MAX(<wb1>, <wb2>) SUM(<wb1>, <ba2>) MEAN(<wb1>, <wb2>)

    where <wa2> means inner product with w and a1 (Assume that w is given). Each score in ad1 used 3 term vectors and that in ad2 used 2 term vectors from A. Is It possible? and if so, there a provided syntax that can be applied to this case?

    Thanks again.

     

     

    Friday, June 3, 2011 6:08 PM
  • John Guiver replied on 12-07-2010 5:22 AM

    using System;
    using System.Collections.Generic;
    using System.Linq;
    using System.Text;
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer.Maths;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer;

    namespace sungchul_kim1
    {
       
    class Program
       
    {
           
    static void Main(string[] args)
            {
               
    // The model
                var numAds = Variable.New<int>();    // Number of ads
                var a = new Range(numAds);           // Corresponding range
                var numTVs = Variable.Array<int>(a); // Number of term vectors per ad
                var t = new Range(numTVs[a]);        // Corresponding range
                var multiplier = Variable.Array<double>(a);
                // Multiplier to create average
                var tv = Variable.Array(Variable.Array<Vector>(t), a);    // Term vectors
                var wPrior = Variable.New<VectorGaussian>(); // Prior weight distribution
                var w = Variable.Random<Vector, VectorGaussian>(wPrior);
                var w_tv = Variable.Array(Variable.Array<double>(t), a);  // Dot products
                var noise = 0.1;
                var score = Variable.Array<double>(a);
                using (Variable.ForEach(a)){     // For each ad
                     using (Variable.ForEach(t)) // For each of the ad's term vectors
                       
    w_tv[a][t] = Variable.InnerProduct(w, tv[a][t]);
                       
    var average = Variable.Sum(w_tv[a]) * multiplier[a];
                        score[a] =
    Variable.GaussianFromMeanAndPrecision(average, noise);
                }

                // Inference engine
                var engine = new InferenceEngine();
                // For training, set an uninformative prior, and observe the scores
                Vector[][] trainingData = new Vector[][] {
                   
    new Vector[] {
                       
    Vector.FromArray(0.4, 2.3, 2.1, 4.1),
                       
    Vector.FromArray(5.3, 19.2, 1.4, 5.2),
                       
    Vector.FromArray(5.2, 1.2, 5.2, 5.2)},
                   
    new Vector[] {
                       
    Vector.FromArray(3.2, 5.1, 5.1, 5.2),
                       
    Vector.FromArray(5, 23.2, 1.2, 5.0)}
                };
               
    double[] trainingScores = new double[] { 1.0, 2.0 };
                numAds.ObservedValue = trainingData.Length;
                numTVs.ObservedValue = trainingData.Select(d => d.Length).ToArray();
                multiplier.ObservedValue = trainingData.Select(d => 1.0 / d.Length).ToArray();
                tv.ObservedValue = trainingData;
               
    int numFeatures = trainingData[0][0].Count;
                wPrior.ObservedValue =
    VectorGaussian.FromMeanAndPrecision(
                    Vector.Zero(numFeatures), PositiveDefiniteMatrix.Identity(numFeatures));
                score.ObservedValue = trainingScores;
               
    var wPosterior = engine.Infer<VectorGaussian>(w);
               
    Console.WriteLine(wPosterior.GetMean());

               
    // For test, set weight prior to posterior from training, and clear the score obs.
               
    Vector[][] testData = new Vector[][] {
                   
    new Vector[] {
                        Vector.FromArray(1.7, 2.7, 1.51, 4.5),
                       
    Vector.FromArray(3.6, 14.1, 2.4, 5.7)}};
                numAds.ObservedValue = testData.Length;
                numTVs.ObservedValue = testData.Select(d => d.Length).ToArray();
                multiplier.ObservedValue = testData.Select(d => 1.0 / d.Length).ToArray();
                tv.ObservedValue = testData;
                wPrior.ObservedValue = (
    VectorGaussian)wPosterior.Clone();
                score.ClearObservedValue();
               
    var predictedScores = engine.Infer<Gaussian[]>(score);
               
    Console.WriteLine(predictedScores[0]); 
             }
        }
    }

    Friday, June 3, 2011 6:08 PM
  • sungchul kim replied on 12-07-2010 5:34 AM

    Thanks a lot. Your efforts are really helpful to me and my research. I think even I can do research all night log. ^^

    Friday, June 3, 2011 6:08 PM
  • sungchul kim replied on 12-07-2010 5:41 AM

    I have one more question. Do you think it is possible to use Variance(scores) or Max(scores) as a random variable for other score in this model?

    Friday, June 3, 2011 6:08 PM
  • John Guiver replied on 12-07-2010 9:30 AM

    I don't know of a good way to do this. There is a Max factor but it only takes two values - you would need to create a Max factor taking an array, and work out the mathematics for the message updates (as an example, see how the Sum factor and message operators are implemented).

    Friday, June 3, 2011 6:08 PM