none
Anybody knows why magic number 5 is used as $NThreads in all of the PShell multi thread examples? RRS feed

  • Question

  • I've tryed $NThreads = 8 on notebook with 8-streams I7 and really haven't got a significant difference with executuon time with 5 threads. Though where lies this limitation?

    • Edited by Oleg Kulikov Wednesday, August 22, 2018 8:55 PM
    • Moved by Bill_Stewart Wednesday, December 12, 2018 8:16 PM Unanswerable drive-by question
    Wednesday, August 22, 2018 8:53 PM

All replies

  • What is $NTThreads?  There is no such thing in PowerShell.  PowerShell is not a multithreaded host but supports threaded components. 

    Perhaps you are using non-PS info to make this assumption.


    \_(ツ)_/

    Wednesday, August 22, 2018 9:22 PM
  • You are not correct: look at https://gallery.technet.microsoft.com/scriptcenter/Multi-threading-Powershell-d2c0e2e5, http://www.get-blog.com/?p=189. My own code uses 4-5 threads parsing big files and multi threading decreased execution time of the code more then 5 times. There are some examples of organizing multi threading in PShell, using System.Management.Automation methods. OK, http://www.get-blog.com/?p=18 is an example of the 20 threads using. That is there is no restrictions on the number of the threads.

     

    Thursday, August 23, 2018 1:11 PM
  • The article is old and has nothing to do with threading. Jobs are not threads.  Jobs are separate processes.  You can control the number of jobs released in a batch job execution but the system generally handles this best.

    The code we use in current versions of PowerShell that does threading is called a workflow and that also optimizes best when the system runs it.

    The linked article is just a bit of misdirection.

    The blog article has also been removed or the link you posted is wrong.

    There is no such parameter as $NThreads.


    \_(ツ)_/

    Thursday, August 23, 2018 1:31 PM
  • $NThread is simply a number of the threads used and here are the full set of the links on multithereading I've found:

    https://gallery.technet.microsoft.com/scriptcenter/Multi-threading-Powershell-d2c0e2e5
    https://www.codeproject.com/Articles/261193/ps
    https://www.codeproject.com/Tips/895840/Multi-Threaded-PowerShell-Cookbook

    https://stackoverflow.com/questions/4016451/can-powershell-run-commands-in-parallel
    https://stackoverflow.com/questions/3325911/how-does-threading-in-powershell-work
    http://www.get-blog.com/?p=189
    https://thesurlyadmin.com/2013/02/11/multithreading-powershell-scripts/
    https://powershell.org/2015/08/20/multithreading-using-jobs/
    http://geekswithblogs.net/hroggero/archive/2017/02/22/multithreading-with-powershell-using-runspacepool.aspx

    Besides, as I've mentioned, basing on the examples above I've created my own code which runs fine and really times faster then without multithreading - 1.5 million lines Setupact.log is parsed for the less then 4 seconds producing nearly hundred lines report. Without multithreading it took no less the 15-20 seconds to execute the same parsing.



    Thursday, August 23, 2018 8:52 PM
  • Yes you can do this:

    $MaxThreads = 8
    $RunspacePool = [RunspaceFactory ]::CreateRunspacePool(1, $MaxThreads)

    $RunspacePool.Open()

    There is no variable in PowerShell "$NThreads".  To thread you need to either use a runspacepool or a threadpool and set the min and max number of threads.  There is also a "Task" and "Task Factory" which manages threads better than the other two.

    See: https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task?view=netframework-4.7.2

    Optimal threading is usually around 8 for most multi-code systems.  The exact number depends on the tasks and the system.  Microsft has processes that use 10 to over 30 threads on all systems and other processes that adjust the upper limit based on a custom calculation for the process.

    Threading and thread scheduling is not a simple pick one proposition. 

    If you have a specific example then we can be of help but, in general, the question has no specific answer.

    I PowerShell the easiest way to multithread is with a workflow:

    help about_workflow


    \_(ツ)_/


    • Edited by jrv Thursday, August 23, 2018 9:13 PM
    Thursday, August 23, 2018 9:11 PM
  • Many thnx, but I have already converted big file parser into multithread splitting ReadAllLines buffer into 5 chunks,  each of which is parsed by the same code. I've tryed to use 8 threads but haven't got reasonable improvement .
     And my question was due to the fact that in the most part of the examples which I've used, namely  5 threads was used. Thus I've decided that it may exist some limitations

    Friday, August 24, 2018 8:12 AM
  • Without an example of your code there is no way to know what issue you are asking about.

    \_(ツ)_/

    Friday, August 24, 2018 12:39 PM
  • Omitting many details, the logic of the multithreadiding is given below:

    $buff  = [System.IO.File]::ReadAllLines("$path\Setupact.log");

    ........

    $NThreads = 5;

            $Script =

            {
                Param (
                    [string]$output,
                    [int]$From,
                    [int]$Till,
                    [String[]]$Data
                )
                $need  = "***,Ini,Set,Tru,Ope,End,TI:";
                $i     = $From;
               
                foreach( $s in $Data[$From..$Till])
                {
                    $lng = $s.Length;
                    if ( $lng -gt 50 )
                    {
                        $l = $s.Substring(50);

                        # ....+....1....+....2....+....3....+....4....+....5
                        # 2016-08-12 08:17:03, Info       [0x0803b4] MIG    COnlineWinNTPlatform: ComputerName=R500D
                        # 2018-05-01 09:30:27, Info       [0x0803b8] MIG    Processing profile: C:\Users\Евгений
                        if ( "3b4,3b8".IndexOf( $s.Substring( 38,3 ) ) -ge 0 )
                        {
                            $CommObj.$output.RawData += $s;
                            $CommObj.$output.Indexes += $i;
                        }
                        elseif (
                            $l.StartsWith( "VERBOSE: OS part root (NT)"                   ) -or`
                            $l.StartsWith( 'Plugin {ff9c714f' ) -and $l.Contains( "\Disk" ) -or`
                            $l.StartsWith( '[SetupPlatform.exe] Enumerate and log'        )
                        )
                        {
                            $CommObj.$output.RawData += $s;
                            $CommObj.$output.Indexes += $i;
                        }
                        elseif ( $l.StartsWith( "CSetupDiag" ) )
                        {
                            if ( $l.Contains( "SkuName" ) -or $l.Contains( "OsLanguage" ) )
                            {
                                $CommObj.$output.RawData += $s;
                                $CommObj.$output.Indexes += $i;
                            }
                        }                                           
                        elseif ( $need.IndexOf( $l.Substring( 0, 3 ) ) -ge 0 )
                        {
                            $CommObj.$output.RawData += $s;
                            $CommObj.$output.Indexes += $i;
                        }
                    }# End-Of-IF(if( $lng -gt 50 )
                    ++$i;
                }; # End--of-%{
            }# End-Of-ScriptBlock

        $CommObj =
        @{
            output0 = @{RawData=@(); Indexes=@()};
            output1 = @{RawData=@(); Indexes=@()};
            output2 = @{RawData=@(); Indexes=@()};
            output3 = @{RawData=@(); Indexes=@()};
            output4 = @{RawData=@(); Indexes=@()};
            output5 = @{RawData=@(); Indexes=@()};
            output6 = @{RawData=@(); Indexes=@()};
            output7 = @{RawData=@(); Indexes=@()};
        }
        $sessionState = [System.Management.Automation.Runspaces.InitialSessionState]::CreateDefault();
        $sessionstate.Variables.Add((New-Object -TypeName System.Management.Automation.Runspaces.SessionStateVariableEntry `
        -ArgumentList "CommObj", $CommObj, "object" ) );
      
        # Create runspace pool consisting of $NThreads runspaces
        $RunspacePool = [RunspaceFactory]::CreateRunspacePool( 1, $NThreads, $sessionState, $Host );
        $RunspacePool.Open();
        $Jobs   = @();
        $chunkl = [math]::Round( $bfl / $NThreads ); # $bfl = $buff.Length;
        $from   = 0;
        for( $n = 0; $n -lt $NThreads; $n++ )
        {
            $size = $chunkl;
            if ( $n -eq 0 )
            {
                $size = $bfl - ( $NThreads - 1 ) * $chunkl;
            }
            $output = "output$n";
            $till   = $from + $size - 1;
            $Job    = [powershell]::Create().AddScript($Script);
            $Job    = $Job.AddParameter("output",$output).AddParameter("From",$from).AddParameter("Till",$till).AddParameter("Data",$buff );
            $Job.RunspacePool = $RunspacePool;
            $Jobs += New-Object PSObject -Property @{
                RunNum = $n;
                Job = $Job;
                Result = $Job.BeginInvoke();
            }
            $from += $size;
        }
        Do
        {
            Start-Sleep -m 100;
        } While ( $Jobs.Result.IsCompleted -contains $false) #Jobs.Result is a colection
        $RawData = @();
        $Indexes = @();
        1..$NThreads | %{
            $n = $_ - 1;
            $key = "output$n";
            $RawData += $CommObj.$key.RawData;
            $Indexes += $CommObj.$key.Indexes;
        }

     $RawData contains no more then 2000 lines and a lot of unnecessary information. Thus this array is then thorough parsed but as the number of the line are negligable, it took milliseconds, below are given durations of the different steps in milliseconds:

    Best duration 2858, current 3155
    Get-BuildProperties         :    15
    ReadAllLines                :   550
    Get-Hardware                :   179
    Get-OOBE-boot               :     1
    Get-MicrosoftAccount        :    91
    MainLoop                    :  1875           <----- this is multithread collecting of the RawData
    RawDataParsing              :    73         <------ this is thorough parsing of the RawData 
    UnattendGC                  :   124
    Get-ReportingEvents         :   173
    Get-SystemEvents            :    52
    ReparseEvents               :     0

    And here are the main target of the task:

    --- Installation brief description ------------------------------------------------------------
    Download                    : 2018-08-17 00:12:26  2018-08-17 00:36:56  24 minutes 30 seconds
    Installing 0-100%           : 2018-08-17 00:27:39  2018-08-17 00:55:05  27 minutes 26 seconds
    Step  0-30  %               : 2018-08-17 00:55:50  2018-08-17 01:03:29   7 minutes 39 seconds
    Step 30-75  %               : 2018-08-17 01:03:29  2018-08-17 01:06:55   3 minutes 26 seconds
    Step 75-100 %               : 2018-08-17 01:06:55  2018-08-17 01:12:04   5 minutes  9 seconds
    Setup Logging               : 2018-08-17 00:17:15  2018-08-17 01:12:04  54 minutes 49 seconds
    Upgrade Duration            : 2018-08-17 00:12:26  2018-08-17 01:12:04  59 minutes 38 seconds
    1st reboot                  : 2018-08-17 00:55:50  2018-08-17 00:56:19  29.000 seconds
    2nd reboot                  : 2018-08-17 01:03:29  2018-08-17 01:03:42  13.000 seconds
    3rd reboot                  : 2018-08-17 01:06:55  2018-08-17 01:07:07  12.000 seconds
    OOBE boot                   : 2018-08-17 01:07:23  2018-08-17 01:09:01   1 minutes 38 seconds
    Updated, Installed          : 2018-08-17 00:42:46  2018-08-17 01:09:35  26 minutes 49 seconds
    Offline Installation        : 2018-08-17 00:55:50  2018-08-17 01:09:01  13 minutes 11 seconds

    Script continued            : 3155 milliseconds (15:00:45.713,15:00:48.869)
    Setupact.log lines read     : 585030
    System Events analyzed      : 30

    But full report contains more then a hundred lines and include hardware description read from the log, storage devices description, a lot of the OS properties and so on + all of the converted RawData lines which EXPLAINS the durations in the table of the Upgrade process steps. Get-Hardware function which collects most part of the hardware and system info, also use 4-threads to parse 8000 lines at the beginning of the log: this minimize number of comparisons in the 5 -threads Main Loop.



    Friday, August 24, 2018 4:51 PM
  • Ok.  There is no such variable in PowerShell.  You are creating that variable in your script.

    Yes, a  runspacepool can be set to a minimum and maximum number of threads.  On a single code system 5 may by optimal but that depends on the scripts and the resources of the system

    In general. scripts that do no IO or system calls will be less efficient when asking for more threads. Scripts that do large amounts of IO will be more efficient with more threads.  There is no specific formula for this.  On average the number of cores sets a simple limit.


    \_(ツ)_/

    Friday, August 24, 2018 4:58 PM
  • You might be interested to know that I have an I7G4 quad with 16Gb and an SSD.  I can set up a runspace pool with 8 or more threads and it will run about 20% to 30% faster than 5 threads.  Some tasks I have run have improved to a measurable degree up to 16 maximum threads. 

    I have an i5 dual code with 8Gb that works best for 5 but shows no improvement beyond 5 and is slower at 8 an greater threads.

    Also consider that the thread pool can be dynamically adjusted so a process can be self tuning.

    In computing there is almost never a "one size fits all" rule.  "5" will give a good boost on any system but will not be optimal on all.


    \_(ツ)_/

    Friday, August 24, 2018 5:28 PM
  • Well, in my case , I7, 8 streams, 32Gb RAM, NVMe and some seconds execution time, the latter strongly depends on the system activity and 100-400 milleseconds differense in the execution time is not a big deal. I need an OTHER task with at least a minute time of execution to see the difference between 5 and 8 threads.  


    Saturday, August 25, 2018 6:23 AM