Sampling from a distribution returns almost the same values RRS feed

  • Question

  • Hi,

    I am creating some test data to test my program. Then I observed that for example my True positives always happen at the same index of my output array.

    I checked the generated data, and observed that sampled data has the same value for my data at each run. I tried to change my variables from static to non-static, rebuilding the project and even restarted visual studio. But it is still the same values. (or with negligible difference)

    Would you please have a look at this? I thought I may need another function to restart the random seed. But in infer.NET example here (http://research.microsoft.com/en-us/um/cambridge/projects/infernet/blogs/bayesianpca.aspx) there was no other method called with sample. I don't know what am I missing here.

    Here is my code for data generation:

     public static List<object> GenerateData(int nData, int nFeature)
                double[][] inputData;
                double[] outputData;
                double w = 1;
                Console.WriteLine("Is sampling generating new data set? ");
                Gaussian data = Gaussian.FromMeanAndVariance(0, 1);
                Random rnd = new Random();
                inputData = new double[nData][];
                outputData = new double[nData];
                for (int i = 0; i < nData; i++)
                    inputData[i] = new double[nFeature + 1];
                    for (int j = 0; j < nFeature; j++)
                        inputData[i][j] = data.Sample();// +rnd.Next(-50, 50);
                        if (j < 10 && i < 10) Console.WriteLine(data.Sample());
                    inputData[i][nFeature] = 1;
                    // output is a function of only three of inputs 
                    outputData[i] = w * inputData[i][1] + w * inputData[i][2];
                List<object> res = new List<object>();
                return res;
    Wednesday, January 28, 2015 11:41 AM


  • Rand.Restart(12347) will be called on each run and says to restart the random number generator at that seed. If you want a different data set each time you run, then just don't call Restart (I see that you call it in two places; also you should not need the calls to the system Random object).

    In general the purpose of calling Restart is so that you can get repeatability for testing and comparison purposes.

    • Marked as answer by Capli19 Wednesday, January 28, 2015 4:16 PM
    Wednesday, January 28, 2015 4:05 PM