locked
C# Incorect ML Time Series forecasting with big values RRS feed

  • Question

  • Hey there) Was making very simple stock prediction app (Simple Colege Project). For feracast i used Microsoft ML Time Series Model. It worked good but i noticed when i make predictions for data with big values like 1700 or 3800 it makes very incorrect predictions. For instance if csv file has small values like 20.21 or 10.63 it works good and predicts similar values in that or close to that range. But if csv file has columns with great values like 3700 or so it just goes crazy and splits prediction values in half. For example for values in range 3136 it made prediction 1800 or 700 depends on model parameters.

    var context = new MLContext();
        var data = context.Data.LoadFromTextFile<StockData>(@"C:\Users\Vlad Mishyn\Desktop\DIPLOM1\ASIX\ASIX\bin\x64\Debug\FinalCsvFile.csv", hasHeader: true, separatorChar: ',');
    
        var pipeline = context.Forecasting.ForecastBySsa(
                                            "Forecast",
                                            nameof(StockData.Close),
                                            windowSize: 7,
                                            seriesLength: 30,
                                            trainSize:365,
                                            horizon: 7,
                                            confidenceLevel: 0.95f,
                                            confidenceLowerBoundColumn: "LowerBoundRentals",
                                            confidenceUpperBoundColumn: "UpperBoundRentals"
                                            );
    
    
        var model = pipeline.Fit(data);
    
        var forecastingEngine = model.CreateTimeSeriesEngine<StockData, StockForecast>(context);
    
        var forecasts = forecastingEngine.Predict();
    
     internal class StockForecast
        {
            public float [] Forecast { get; set; }
    
        }
    
    internal class StockData
        {
            [LoadColumn(0)]
            public DateTime Date { get; set; }
    
            [LoadColumn(1)]
            public float Close { get; set; }
    
            [LoadColumn(2)]
            public float Volume { get; set; }
    
    
            [LoadColumn(3)]
            public float Open { get; set; }
    
            [LoadColumn(4)]
            public float High { get; set; }
    
    
            [LoadColumn(5)]
            public float Law { get; set; }
    
        }
          

    I thought maybe i chosed wrong model for prediction but i saw a lot of people were using this model for prices with more biger values. I was playing with all models parameters in order to find optimal one, but it didn't help. I am new to data prediction subject but even in Python models i've havent seen such model behavior. Just really interested in our opinion about that. This is the way csv file looks like. Unfortunatelly i am still not allowed to attach photos or files.

    Date,Close,Volume,Open,High,Law
    11/13/2020,19.61,6527357,19.06,19.3581,18.69
    11/12/2020,18.93,8879036,19.21,19.3581,18.69
    11/11/2020,19.36,9736736,19.92,19.3581,18.69
    11/10/2020,19.87,11227230,19.77,19.3581,18.69
    11/09/2020,19.73,13947140,19.92,19.3581,18.69
    11/06/2020,19.25,6515724,19.1,19.3581,18.69
    11/05/2020,19.13,8926402,18.6,19.3581,18.69
    11/04/2020,18.28,8512730,18.64,19.3581,18.69
    11/03/2020,18.62,6340196,18.68,19.3581,18.69
    11/02/2020,18.41,7499051,18.18,19.3581,18.69
    10/30/2020,17.96,7883718,17.63,19.3581,18.69
    10/29/2020,17.78,7722001,17.28,19.3581,18.69

    Wednesday, November 25, 2020 3:33 PM

All replies

  • Are you just changing a few of the values to big numbers as an experiment?  Because it should be clear that will screw up the training.  The classifier is looking for trends.  If there's a spike, that's going to make its forecasts wildly wrong.  If all of the values are in the 3000 range, then the resulting predictions should be in the same range.

    Tim Roberts | Driver MVP Emeritus | Providenza &amp; Boekelheide, Inc.

    Sunday, November 29, 2020 7:45 AM
  • No no i am saying about all values  in dataset. For instance when i have dataset with numbers in range from 10 to 20  or from 20- 40 it predicts pretty good. But when i take another dataset with big numbers like 3000 it just gives me splitted prediction values like 1800. I was playing with parameters and normalized dataset but it didn't work for me.  
    Monday, November 30, 2020 7:52 PM
  • Hi Vlad Mishyn,
    This Visual C# forum mainly discusses and asks questions about the C# programming language, IDE, libraries, samples, and tools.
    For questions about ML.NET, I suggest you ask the question on the Microsoft Q&A forum and you can get more professional answer.
    Thank you for your understanding.
    Best Regards,
    Daniel Zhang


    "Visual c#" forum will be migrating to a new home on Microsoft Q&A ! We invite you to post new questions in the "Developing Universal Windows apps" forum’s new home on Microsoft Q&A ! For more information, please refer to the sticky post.

    Wednesday, December 2, 2020 6:09 AM