locked
Matrix is Not Positive definite (Migrated from community.research.microsoft.com) RRS feed

  • Question

  • Hari1234 posted on 12-27-2010 11:04 PM

    Tom and John

    1. Thank You for the powerful framework.

    2. I am using Multi classifier Bayes Point machine (using the very lates version the one released on dec 17th 2010). Following pretty much the example solution provided. The only difference being I have 35 features and 4 classes. I am using arounf 56 items for training and I ahve 13 rows for testing. I am getting "Matrix is Not Positive definite" when passing all 13 rows in testdata vecttor. However I have noticed that when I pass test data one row at a time meaning (I train with 56 items and pass one row of test data at a time I am able to get predictions). Is ther some limitation or is it always better to do one test data at a time? here is the line where i get the error when passing all 13 rows in the testdata vector


    Discrete[] yInferred = Distribution.ToArray<Discrete[]>(testModel.ie.Infer(testModel.y));


    3. In my features can they be just numbers (meaning should i take of ranking for example if feel that for  certain feature the higher the value the higher impact it has on class classification, should I rank them before i feed it to the machine or would it automatically take care of it. Similarly otherway round smaller a value a certain feature has it would have higher impact of being belonging to a certain class). I am think it does not matter as long as input training data isconsistent in one or the other direction

    4. Does that example solution automatically handle missing data if the feature has -1? If not can  you please show me how I should modify it to handle missing data.

    5. Any other good books you can suggest other than Mr Christopher's Book.

    Thanks for all your help

    Friday, June 3, 2011 6:11 PM

Answers

  • green replied on 01-14-2011 7:01 AM

    Rejecting redundant clusters appears to resolve the Matrix Is Not Positive Definite problem. I have run the program perhaps 20 times with different number of clusters, and the problem has not recurred. Good news indeed. 

    The next problem is the vast inaccuracy of prediction over the training set. GaussianProcess kernel NNKernel is distinctly better than SquaredExponential or ARD, and for more than half the training rows the prediction is recognisably in the ballpark. Any guidance on what to set the NNKernel contructor parameters to? To get going, I just set both parameters to zeros.

    I will most certainly try k-means clustering.  

    Friday, June 3, 2011 6:13 PM

All replies

  • John Guiver replied on 01-04-2011 11:24 AM

    1. Thanks!

    2: There should not be a limitation, but this is difficult to diagnose without further information:

    • Which model class are you using? I am assuming BPM or BPM_Shared as the sparse classes (BPMSparse and BPMSparse_Shared) should not exhibit this problem.)
    • Could you send the full stack trace?
    • Is your data scaled similarly for each feature?
    • Have you tried decreasing the noise precision?
    • As a last diagnostic resort, please send the data to infersup@microsoft.com.

    3: It can be useful to transform your data via a suitably-chosen transformation (linear or non-linear). This can be useful as a means of standardising or uniformising the data; the data and the weights will then all have similar scaling (see 2).  However, I would not recommend converting the raw data to a ranking (which is what I think you are suggesting). Also, make sure you use exactly the same transformation (with the same parameterisation) for both train, validation, and test.

    4: No, putting in a -1 will not work. If you have missing features, please use the sparse versions of the model in which you specify indices and values.  If you want to learn a full VectorGaussian weight distribution (as in the non-sparse classes), there may be ways to achieve that (perhaps using the Subvector factor) but I don't have a ready answer.

    5: Please see the Resources and References section.

    John

    Friday, June 3, 2011 6:12 PM
  • green replied on 01-06-2011 3:53 AM

    Hi, I am getting the same error message with the same version of Infer.NET (2.4 beta 2):

    Unhandled Exception: MicrosoftResearch.Infer.Maths.PositiveDefiniteMatrixException: The matrix is not positive definite.
       at MicrosoftResearch.Infer.Maths.PositiveDefiniteMatrix.SetToInverse(PositiveDefiniteMatrix A, LowerTriangularMatrix L)
       at MicrosoftResearch.Infer.Distributions.SparseGPFixed.get_InvKernelOf_B_B()
       at MicrosoftResearch.Infer.Distributions.SparseGP.get_Var_B_B()
       at MicrosoftResearch.Infer.Distributions.SparseGP.get_Beta()
       at MicrosoftResearch.Infer.Distributions.SparseGP.Variance(Vector X)
       at MicrosoftResearch.Infer.Factors.SparseGPOp.FuncAverageConditional(Gaussian y, SparseGP func, Vector x, SparseGP result)
       at MicrosoftResearch.Infer.Factors.SparseGPOp.FuncAverageConditional(Double y, SparseGP func, Vector x, SparseGP result)
       at MicrosoftResearch.Infer.Models.User.Model_EP.Changed_numberOfIterationsDecreased_y_x_prior(Int32 numberOfIterations)
       at MicrosoftResearch.Infer.Models.User.Model_EP.Execute(Int32 numberOfIterations, Boolean initialise)
       at MicrosoftResearch.Infer.Models.User.Model_EP.Execute(Int32 numberOfIterations)
       at MicrosoftResearch.Infer.InferenceEngine.Execute(IGeneratedAlgorithm ca)
       at MicrosoftResearch.Infer.InferenceEngine.Infer[TReturn](IVariable var)
       at GaussianProcessExample.GP1.Main(String[] args)

    This being my first shot at using Infer.NET (and indeed C# or any other .NET language), there is every chance it is something to do with my program (which follows the Gaussian Process classifier example but modified for regression):

    using System;
    using System.IO;
    using System.Text.RegularExpressions;
    using System.Collections.Generic;
    using System.Text;
    using System.Linq;
    using MicrosoftResearch.Infer.Models;
    using MicrosoftResearch.Infer.Maths;
    using MicrosoftResearch.Infer.Distributions;
    using MicrosoftResearch.Infer.Distributions.Kernels;
    using MicrosoftResearch.Infer;

    namespace GaussianProcessExample
    {
     class GP1
     {
      static void Main(string[] args)
      {

                            List<double[]> list = new List<double[]>();
                            string filePath = @"d:\temp\train.txt";
                            string line; 
                            if (File.Exists( filePath )) {                
                               StreamReader file = null;                
                               try {    
                                  file = new StreamReader(filePath);
                                  while ((line = file.ReadLine()) != null) {      
                                     string[] words = Regex.Split(line.Trim(), @"\s+");
                                     int n = words.Length;
                                     double[] d = new double[n];
                                     for(int w=0; w<n; w++) {
                                        try {
                                           d[w] = Convert.ToDouble(words[w]);
                                        }  
                                        catch (FormatException) {
                                           Console.WriteLine("Unable to convert'{0}'",words[w]);
                                        }              
                                        catch (OverflowException) {
                                           Console.WriteLine("Outside the range of a Double");
                                        }
                                     }
                                     list.Add(d);
                                  }
                               }                
                               finally {     
                                  if (file != null)         
                                     file.Close();
                               } 
                            }
                            int rows = list.Count;
                            int cols = list[0].Length;
                           double[] outputs = new double[rows];                    
                            Vector[] inputs = new Vector[rows];
                            for (int r=0; r<rows; r++) {
                               double[] row = list[r];
                               if (row.Length!=cols) Console.WriteLine("number of attributes not constant across rows");             
                               outputs[r] = row[0];
                               double[] inp = new double[cols-1];
                               for (int c=1; c<cols; c++) {                          
                                  inp[c-1]=row[c];
                               }
                               inputs[r]=Vector.FromArray(inp);
                            }


       // Set up the GP prior, which will be filled in later
       Variable<SparseGP> prior = Variable.New<SparseGP>().Named("prior");

       // The sparse GP variable - a distribution over functions
       Variable<IFunction> f = Variable<IFunction>.Random(prior).Named("f");

       // The locations to evaluate the function
       VariableArray<Vector> x = Variable.Observed(inputs).Named("x");
       Range j = x.Range.Named("j");

       // The observation model
       VariableArray<double> y = Variable.Observed(outputs, j).Named("y");
       Variable<double> score = Variable.FunctionEvaluate(f, x[j]);
       //   y[j] = Variable.GaussianFromMeanAndVariance(score, 0.1);
       y[j] = score;


       InferenceEngine engine = new InferenceEngine(new ExpectationPropagation());

       // The basis, worry about clustering in a later version
       Vector[] basis = inputs;


       // Fill in the sparse GP prior
       GaussianProcess gp = new GaussianProcess(new ConstantFunction(0), new SquaredExponential(0));
       prior.ObservedValue = new SparseGP(new SparseGPFixed(gp, basis));

       // Infer the posterior Sparse GP
       SparseGP sgp = engine.Infer<SparseGP>(f);
       Console.WriteLine();
       Console.WriteLine("Predictions on training set:");
       for (int i = 0; i < outputs.Length; i++) {
        Gaussian post = sgp.Marginal(inputs[i]);
        Console.WriteLine("f({0}) = {1}", inputs[i], post);
       }
      }
     }
    }

    Friday, June 3, 2011 6:12 PM
  • John Guiver replied on 01-06-2011 10:52 AM

    Not sure without your data; if you can, please send to infersup@microsoft.com .

    I suggest that you keep the observation noise in your model (i.e. the line y[j] = Variable.GaussianFromMeanAndVariance(score, 0.1);) and try increasing the variance.

    Let me know how it goes.

    John

    Friday, June 3, 2011 6:12 PM
  • green replied on 01-06-2011 4:58 PM

    Data and program sent to infersup@microsoft.com

    Observation noise line reinstated, variance increased to 0.5, 2.0, 10.0 but outcome the same.

    Friday, June 3, 2011 6:12 PM
  • John Guiver replied on 01-07-2011 4:22 AM

    I have several recommendations:

    (a) You are using the whole data set as the basis. GPs are highly non-scalable, and this is the reason for the development of sparse versions of GPs. To take advantage of this you should choose a much smaller basis - you can use something like k-means clustering on your input data for example.  Taking your whole data set as a basis leads to very large SparseGP messages being propagated through the graphical model (caused an out of memory exception on my system).

    (b) Your target data is centered around 1.327, but you use a 0 mean function. Either standardise your output data (subtract off the mean, and divide by the standard deviation - this is what I would recommend), or use new ConstantFunction(1.327).

    (c) Different elements/features in you input data have very different scales but you are using a covariance function (SquaredExponential) which assumes the same scales for all features. Either standardise your input data (what I would recommend), or use the ARD kernel to specify the scales:

    var ard = new ARD();
    ard.InitialiseFromData(inputs,
    Vector.FromArray(outputs));

    (d) Reinstate observation noise

    Friday, June 3, 2011 6:12 PM
  • green replied on 01-08-2011 12:33 AM

    Hi John,

    Implemented the recommended options in (b),(c),(d), but same outcome.

    Is there support for k-means clustering in Infer.NET? If so, where would I find it?

    Is Infer.NET likely to remain free for academic use?

    Friday, June 3, 2011 6:12 PM
  • John Guiver replied on 01-10-2011 4:19 AM

    K-means clustering is not supported per se in Infer.NET (though it should be just a few lines to implement in C#). But you can to do Bayesian mixture models to learn clusterings. For example, you could adapt the mixture of Gaussians code provided in the examples browser and then take the means of the posterior mixture component means as the centres.

    Friday, June 3, 2011 6:13 PM
  • green replied on 01-11-2011 9:02 PM

    I included code for clustering following the mixture of Gaussians example. An example of program behaviour now:

    First run after bringing up a cmd window does not throw any exceptions.

    Second run (no code change or recompilation) does not throw any exceptions.

    Third run (no code change or recompilation) throws the Matrix not Positive Definite exception.

    Further information: using the x64 csc compiler on Win 7 Ult x64 NET framework 4. I was a little surprised this worked at all given that the Infer.NET dll's were presumably built for the x86 environment. With the dll's in the same directory, compiled with:

    csc  /r:Infer.Compiler.dll  /r:Infer.Runtime.dll  GP1.cs

    Sending program source to infersup, data as before. 

    Friday, June 3, 2011 6:13 PM
  • green replied on 01-14-2011 1:15 AM

    At 27 (up from 9) mixture components, the program once completed normally, but postmean in the code fragment below becomes NaN. Might this shed light on the above problem? 

             // Infer the posterior Sparse GP

             SparseGP sgp = engine.Infer<SparseGP>(f);

             // Check that training set is classified correctly
             Console.WriteLine();
             Console.WriteLine("Predictions on training set:");
             for (int i = 0; i < trainattr.Length; i++) {
                Gaussian post = sgp.Marginal(trainattr[i]);
                double postmean = post.GetMean();
                Console.WriteLine("f[{0}]  {1}  {2}", i, postmean, traintarget[i]);
             }

    The program has been restructured to improve readability and an attribute of the data has changed, sending program and data to infersup.

    Friday, June 3, 2011 6:13 PM
  • John Guiver replied on 01-14-2011 4:57 AM

    Yes - I think it may shed light on it. Some clusters are redundant and remain at their prior value. This issue is accentuated when you have more cluster centres. I would reject any cluster centres which don't learn:

    mns = ie.Infer<VectorGaussian[]>(means);
    var wghtpost = ie.Infer<Dirichlet>(weights);
    mns = mns.Where((v, i) => wghtpost.PseudoCount[i] > 1).ToArray();
    return mns.Select(mn => mn.GetMean()).ToArray();

    Friday, June 3, 2011 6:13 PM
  • minka replied on 01-14-2011 5:14 AM

    You should be using K-means to find the basis, not a Gaussian mixture model.  Fitting a Gaussian mixture model is not a good method for finding a GP basis, for the same reason that GMMs are not good for vector quantization.  To take an extreme example, suppose the data are generated by a single Gaussian.  Then a correctly-fitted GMM will model this data with a single Gaussian, making all means the same or deleting most of the components.  This is a correct model, but a poor GP basis. For a GP basis, you want K representative and separated points from the dataset.  K-means is designed to do this.

    Friday, June 3, 2011 6:13 PM
  • green replied on 01-14-2011 7:01 AM

    Rejecting redundant clusters appears to resolve the Matrix Is Not Positive Definite problem. I have run the program perhaps 20 times with different number of clusters, and the problem has not recurred. Good news indeed. 

    The next problem is the vast inaccuracy of prediction over the training set. GaussianProcess kernel NNKernel is distinctly better than SquaredExponential or ARD, and for more than half the training rows the prediction is recognisably in the ballpark. Any guidance on what to set the NNKernel contructor parameters to? To get going, I just set both parameters to zeros.

    I will most certainly try k-means clustering.  

    Friday, June 3, 2011 6:13 PM