none
Filtering/Sorting Data Efficiently in Powershell RRS feed

  • Question

  • I am trying to take a CSV file with over 10 million lines, extract the records I need, have them sorted, and save them to a new file. I have been successful in getting the exact results I need by piping the results of Get-Content to Select-String -Pattern ("$Val1", "$Val2"), but it's taking hours to accomplish, whereas the batch files we have been using take just a few minutes. Is there a way to use a higher -readcount with get-content before piping it to select-string? I have tried the example below but I get an error for every line being read saying the path cannot be found because it does not exist; however, in the path listed in the error it shows the exact data I'm trying to pull. Any help would be greatly appreciated.

    Get-Content $File -ReadCount 1000 | ForEach { Select-String -Path $_ -Pattern ("$Val1", "$Val2") } | Add-Content $OutputFile;


    • Edited by De_Ka Monday, October 1, 2018 5:34 PM Corrected mispelled words and Case mismatch in code
    • Moved by Bill_Stewart Monday, December 17, 2018 6:31 PM This is not "research things for me" forum
    Monday, October 1, 2018 5:32 PM

All replies

  • To filter a CSV just filter it:

    Get-Content $File | 
        Where{ $_ -match "$Val1|$Val2"} |
        Out-File $OutputFile 

    The system buffers the file automatically.  For a line by line match this is about as fast as you can get.

    If you add a sort the whole thing will stall because a sort requires loading the complete file in one step.

    For extremely large file there are third party utilities that use special memory file techniques that are much faster than anything in PS or CMD that can be written with Net but would require advanced programming skills.


    \_(ツ)_/

    Monday, October 1, 2018 5:43 PM