Machine Learning/Performance issue with Writing to CSV file - C# RRS feed

  • Question

  • Hi,

    Not quite sure which Forum topic to address this issue.

    I have a Machine Learning (ML.NET) -- .NET Core console application (using Multiple classes prediction template).  I've created the ML Model class and I'm now applying the ML Model to another .NET Core console application to predict the class/type of stream bed.

    The ML app reads each row from the CSV input data and predicts the type of stream bed.  As it makes the prediction row by row, I store each row prediction in a StringBuilder.  When all rows have been read, I call the File.WriteAllText() function.

    The Machine Learning console app is working fine, but now, my issue is how do I improve performance?  When I use CSV input file for my app that consists of over 100K rows, the app runs very slowly  (it writes the result to a CSV file at the rate of 1,000+ rows per hour!).  All my data are over 100K rows each, and I need to process about 50 separate CSV files.

    Is there a better way of doing this?  Should I read/write each row first, instead of storing all the prediction rows to a StringBuilder before writing to CSV file?

    Or, is there a faster way of writing the results to a CSV file?  

    Appreciate any advice.

    Marilyn Gambone

    Thursday, May 21, 2020 5:34 PM