Divergence problem (Migrated from community.research.microsoft.com) RRS feed

  • Question

  • sungchul kim posted on 01-21-2011 5:33 AM

    Hi, all.

    I set a model to predict score of an ad using a word features in that ad. As I posted few times, each ad has some number of words and each word is represented as a vector. 

    I exploit similar way to the click model that uses moment matching between gaussian and beta distribution. wa has Gaussian prior, and beta is manually determined. I think this model is not that complex. The problem is using inner product and sum factor makes an estimated weight vectors to be diverge. Do you have any idea to solve this problem??

    class attrModel


            Variable<Vector> wa;

            Variable<int> NumAds = Variable.New<int>();

            VariableArray<int> numTVs;

            VariableArray<VariableArray<Vector>, Vector[][]> Xa;

            VariableArray<double> Attr;

            VariableArray<double> Scr;

            VariableArray<VariableArray<double>, double[][]> AttrUnit;

            VariableArray<Gaussian> Prop;

            InferenceEngine ie;

            public attrModel(int dima, Param beta)


                Range n = new Range(NumAds);


                // Declare Variables

                numTVs = Variable.Array<int>(n);

                Range dimA = new Range(numTVs[n]);


                Xa = Variable.Array(Variable.Array<Vector>(dimA), n).Named("Xa");

                Scr = Variable.Array<double>(n).Named("Scr");

                Attr = Variable.Array<double>(n).Named("Attr");

                AttrUnit = Variable.Array(Variable.Array<double>(dimA), n).Named("AttrUnit");

                Prop = Variable.Array<Gaussian>(n).Named("Prop");   

                // Build Model 

                wa = Variable.Random<Vector>(new VectorGaussian(Vector.Constant(dima, beta.meana), PositiveDefiniteMatrix.IdentityScaledBy(dima, beta.vara))).Named("Wr");            

                AttrUnit[n][dimA] = Variable.InnerProduct(Xa[n][dimA], wa);

                Attr[n] = Variable.Sum(AttrUnit[n]);

                Scr[n] = Variable.GaussianFromMeanAndVariance(Attr[n], beta.betac);

                Variable.ConstrainEqualRandom<double, Gaussian>(Scr[n], Prop[n]);

                ie = new InferenceEngine();


    public void Infer(Vector[][] obXa, int[] imps, int[] clicks, out VectorGaussian waMarginal)


                int adim = obXa[0][0].Count;


                waMarginal = new VectorGaussian(adim);


                Gaussian[] props = new Gaussian[clicks.Length];

                for (int d = 0; d < clicks.Length; d++)


                    int nC = clicks[d];

                    int nE = imps[d];

                    int nNC = nE - nC;

                    double b0 = 1.0 + nC;  // Observations of clicks

                    double b1 = 1.0 + nNC;   // Observations of no clicks

                    Beta b = new Beta(b0, b1);

                    double m, v;

                    b.GetMeanAndVariance(out m, out v);

                    Gaussian g = new Gaussian();

                    g.SetMeanAndVariance(m, v);

                    props[d] = g;



                NumAds.ObservedValue = clicks.Length;

                numTVs.ObservedValue = obXa.Select(d => d.Length).ToArray();

                Xa.ObservedValue = obXa;

                Prop.ObservedValue = props;


                ie.NumberOfIterations = clicks.Length / 10;

                waMarginal = ie.Infer<VectorGaussian>(wa);



    Friday, June 3, 2011 6:16 PM


  • John Guiver replied on 01-24-2011 4:19 AM

    Hi SungChul

    I repeat my general advice of last week: There are various ways to influence these types of problem:

    1. Initialisation of some variables. This will start the algorithm in a good place, and also influence the generated schedule (initialised variables are handled first)
    2. Adjusting priors (in particularly consider beta.betac) - unfortunately this changes your model
    3. Using shared variables, and showing one datum at a time
    4. Damping - there is an experimental Damp factor, but no examples yet.

    Without some data it is difficult to give further advice. I would start off playing with 2 a bit, but you will probably need to go with 3. Can you put together a simple data set which exhibits the problem and send to infersup@microsoft.com?


    Friday, June 3, 2011 6:17 PM