Windows HPC Server Job Submission and Scheduling ForumThis forum covers the Windows HPC Job Scheduler, including questions about Job Submission, Job Scheduling, and Job Activation on both Windows Compute Cluster Server 2003 and Windows HPC Server 2008.© 2009 Microsoft Corporation. All rights reserved.Tue, 24 Nov 2009 10:14:19 Z5f044828-7c80-40af-b96a-0f99be6da51fhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/3d71def2-3a9b-438a-8353-f7e34959368fhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/3d71def2-3a9b-438a-8353-f7e34959368fVolkiehttp://social.microsoft.com/Profile/en-US/?user=VolkieExit code 0Hi there,<br/> <br/> we are trying to setup a testing environment with Windows HPC 2008 Cluster three compute-nodes (for now) and a head-node. All nodes have the evaluation licence key.<br/> All nodes are shown in the HPC Manager/Node Management with Online and Health OK state.<br/> As test program we wanted to use the batchpi.exe from the files section, so we made a new job.<br/> Job Details: Name = TEST1, Template = Default, Project = TEST, Priority = Normal, RunOptions = nothing checked,<br/> Resources = Node/Auto/Auto<br/> Task List: My Task, c:\batchpi.exe 1000000000, all folders are set to c:\<br/> Resource Selection: Run this job only on nodes 1, 2, 3, not the head-node<br/> &lt;Submit&gt;<br/> The job starts on the nodes and ends a second later with a failed job message: Task failed during execution with exit code 0. Please check task's output for error details.<br/> <br/> In the command line (cmd) it runs without problems. We also tried to run the job with mpiexec -n * c:\batchpi.exe 1000000000 and copied the batchpi.exe to the compute-nodes, but nothing helps :(<br/> <br/> What are we doing wrong? Please help.<br/> <br/> Regards<br/> Volkie<br/>Mon, 23 Nov 2009 12:51:19 Z2009-11-24T10:14:19Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/999eaff0-3fcf-4f33-b2c7-620d16f1ef08http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/999eaff0-3fcf-4f33-b2c7-620d16f1ef08Johannes_dehttp://social.microsoft.com/Profile/en-US/?user=Johannes_deJob properties not changing when submitted from saved .xml file Hi,<br/> <br/> <br/> Do the following to reproduce the behaviour:<br/> <br/> <ol> <li>You create a job and save it to an xml-file.</li> <li>You create another job from that xml-file.</li> <li>In  job details -&gt; job run options  you change the maximum run time. </li> </ol> <br/> Bug:<br/> The change is completely ignored when submitting the job.<br/> The change is completely ignored when saving the job to an xml file.<br/> We suppose that there are more bugs like these, and that they have a common source.<br/> <br/> <br/> Regards,<br/> <br/> Johannes<br/><hr class="sig">JHMon, 23 Nov 2009 07:13:41 Z2009-11-24T06:32:25Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/838e0c16-0be4-42f4-be45-f2b6bdf3282ehttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/838e0c16-0be4-42f4-be45-f2b6bdf3282eJohannes_dehttp://social.microsoft.com/Profile/en-US/?user=Johannes_deWhich jobs query the activation filterHi,<br/> <br/> it was my understanding that the activation filter gets called for every job being in the status queued.<br/> That way I could implement fairshares, priorities  and of course licensing issues.<br/> However I just found out with my first own activation filter, that only the job with the highest priority of a user is queried by the activation filter.<br/> <br/> Is there a way to change these settings and to issue a call to the activation filter for each and every job which is in the state &quot;scheduled&quot;?<br/> <br/> Johannes<hr class="sig">JHMon, 23 Nov 2009 11:32:59 Z2009-11-24T06:31:34Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/90d93608-d533-411d-9d85-4b7a1a9d3d86http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/90d93608-d533-411d-9d85-4b7a1a9d3d86Johannes_dehttp://social.microsoft.com/Profile/en-US/?user=Johannes_deNodeGroup not changed if other JobTemplate choosenThe following actions show an unexpected and unwanted behaviour, at least in my opinion:<br/> <br/> <ol> <li>Assign a job-Template (Template0) which <strong>requires</strong> a NodeGroup. Lets say TestGroup0.</li> <li>Assign another Template (Template1) which <strong>requires</strong> another NodeGroup. Lets say TestGroup1.</li> <li>Create a job with Template0 and save it to an xml-file.</li> <li>Create a job from this xml-file and change the job template to Template1. </li> </ol> <br/> Bug symptoms:<br/> You can't submit the job because of error:<br/> &quot;Job template validation failed: The value of property NodeGroups<br/> is out of range. Update the job an try again.&quot;<br/> <br/> Bug symptoms continuing:<br/> You have a look at Resource Selection where you will find NodeGroup0<br/> in the selected node groups window.<br/> <br/> Bug:<br/> The userinterface is actualised, but the data in the job-files is not.<br/> <br/> Work around in this case:<br/> 5.: Add another nodegroup to the selected nodes,<br/> and remove it again in the job properties.<br/> <br/> 6.: The job can be submitted.<br/> <br/> <br/> In my opinion this is a bug.  I want to have different queues implemented by job templates. So e.g. a user tests the job properties with a quick run on my short runtime queue and the resubmits the same job on the long running queue with the intended setup.<br/> <br/> Regards,<br/> <br/> Johannes<hr class="sig">JHMon, 23 Nov 2009 06:41:28 Z2009-11-23T06:41:29Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/8dd7541d-97dc-47ae-96d5-eab108668b22http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/8dd7541d-97dc-47ae-96d5-eab108668b22pbrowntwihttp://social.microsoft.com/Profile/en-US/?user=pbrowntwiHPC 2008 job suspending/resumingI am used to using SGE on a linux cluster with abaqus, when using this and wanting to add a higher priority job any exisiting jobs need to be suspended by using a specific command so that abaqus frees the flexlm tokens.<br/> <br/> Is it possible to do this with HPC or do jobs just have to go until completion, our problem is that some jobs take 2 days to run to completion and to utilize abaqus tokens efficiently we need to be able to suspend and resume them as smaller jobs go into the queue.<br/> <br/> Thanks<br/> <br/> PaulSat, 07 Nov 2009 09:23:05 Z2009-11-19T19:28:14Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/99933d51-d99c-4500-9961-a55c566acc1bhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/99933d51-d99c-4500-9961-a55c566acc1bvancloud_gaohttp://social.microsoft.com/Profile/en-US/?user=vancloud_gaohow to hidden "Remember this password?" when use job submit command?Hi all,<br/>I meet a problem that when i use &quot;job submit&quot; command, it will show the &quot;remember this password?&quot; which will block the command process, since i use this command remotely by web site.<br/>Is there any method to hidden this, or set a default value for it?<br/><br/>thanks,Thu, 19 Nov 2009 12:25:31 Z2009-11-19T19:11:06Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/707871e8-8c5c-43bd-8e05-4b4b70a84e0fhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/707871e8-8c5c-43bd-8e05-4b4b70a84e0fSeifer Linhttp://social.microsoft.com/Profile/en-US/?user=Seifer%20Linx64 program using the IScheduler API in HPC Pack 2008 SDK SP1 got errors<p>Hi:<br/><br/>Below is my program. My platform is Windows Server 2008 Enterprise edition + HPC Pack 2008 SP1 (x64) <br/>For a x86 build of this program, everything works fine!<br/>But for a x64 build, I got the error at pJob-&gt;put_Name()<br/>The error is: job-&gt;put_Name() failed with 0x80040232.<br/>Can anyone help? Thanks very much!<br/>/////////////////////////////////////////////////////////////////////////////////<br/><br/>#define _WIN32_DCOM</p> <p>#include &lt;windows.h&gt;<br/>#include &lt;stdio.h&gt;<br/>#include &lt;comutil.h&gt;<br/>#pragma comment(lib, &quot;comsupp.lib&quot;)</p> <p>// The Microsoft.Hpc.Scheduler.tlb and Microsoft.Hpc.Scheduler.Properties.tlb type<br/>// libraries are included in the Microsoft HPC Pack 2008 SDK. The type libraries are<br/>// located in the &quot;Microsoft HPC Pack 2008 SDK\Lib\i386&quot; or \amd64 folder. Include the rename <br/>// attributes to avoid name collisions.<br/>#import &lt;Microsoft.Hpc.Scheduler.tlb&gt; named_guids no_namespace raw_interfaces_only \<br/>    rename(&quot;SetEnvironmentVariable&quot;,&quot;SetHpcEnvironmentVariable&quot;) \<br/>    rename(&quot;AddJob&quot;, &quot;AddHpcJob&quot;)<br/>#import &lt;Microsoft.Hpc.Scheduler.Properties.tlb&gt; named_guids no_namespace raw_interfaces_only</p> <p>int main(int argc, char **argv)<br/>{<br/>    CoInitializeEx(NULL, COINIT_MULTITHREADED);<br/>    HRESULT hr = S_OK;<br/>    IScheduler *pScheduler = NULL;<br/>    // Get an instance of the Scheduler object. <br/>    hr = CoCreateInstance(__uuidof(Scheduler), // CLSID_Scheduler, <br/>                          NULL,<br/>                          CLSCTX_INPROC_SERVER,<br/>                          __uuidof(IScheduler), // IID_IScheduler, <br/>                          reinterpret_cast&lt;void **&gt;(&amp;pScheduler));<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;CoCreateInstance() failed with 0x%x.\n&quot;, hr);<br/>        if(pScheduler)<br/>        {<br/>            pScheduler-&gt;Release();<br/>        }<br/>        exit(-1);<br/>    }<br/>    <br/>    hr = pScheduler-&gt;Connect(_bstr_t(&quot;localhost&quot;));<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;Connect() failed with 0x%x.\n&quot;, hr);<br/>        pScheduler-&gt;Release();<br/>        exit(-1);<br/>    }<br/>    <br/>    ISchedulerJob *pJob = NULL;<br/>    hr = pScheduler-&gt;CreateJob(&amp;pJob);<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;CreateJob() failed with 0x%x.\n&quot;, hr); fflush(stdout);<br/>        pScheduler-&gt;Release();<br/>        exit(-1);<br/>    }</p> <p>    hr = pJob-&gt;put_Name(_bstr_t(&quot;MyHPCJob&quot;));<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;job-&gt;put_Name() failed with 0x%x.\n&quot;, hr); fflush(stdout);<br/>        pJob-&gt;Release();<br/>        pScheduler-&gt;Release();<br/>        exit(-1);<br/>    }</p> <p>    ISchedulerTask *pTask = NULL;<br/>    hr = pJob-&gt;CreateTask(&amp;pTask);<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;CreateTask() failed with 0x%x.\n&quot;, hr); fflush(stdout);<br/>        pJob-&gt;Release();<br/>        pScheduler-&gt;Release();<br/>        exit(-1);<br/>    }</p> <p>    hr = pTask-&gt;put_Name(_bstr_t(&quot;MyHPCTask&quot;));<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;task-&gt;put_Name() failed with 0x%x.\n&quot;, hr); fflush(stdout);<br/>        pTask-&gt;Release();<br/>        pJob-&gt;Release();<br/>        pScheduler-&gt;Release();<br/>        exit(-1);<br/>    }</p> <p>    hr = pTask-&gt;put_CommandLine(_bstr_t(&quot;hostname&quot;));<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;put_CommandLine() failed with 0x%x.\n&quot;, hr); fflush(stdout);<br/>        pTask-&gt;Release();<br/>        pJob-&gt;Release();<br/>        pScheduler-&gt;Release();<br/>        exit(-1);<br/>    }</p> <p>    hr = pJob-&gt;AddTask(pTask);<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;AddTask() failed with 0x%x.\n&quot;, hr); fflush(stdout);<br/>        pTask-&gt;Release();<br/>        pJob-&gt;Release();<br/>        pScheduler-&gt;Release();<br/>        exit(-1);<br/>    }</p> <p>    hr = pScheduler-&gt;SubmitJob(pJob, _bstr_t(argv[1]), _bstr_t(argv[2]));<br/>    if(FAILED(hr))<br/>    {<br/>        wprintf(L&quot;SubmitJob() failed with 0x%x.\n&quot;, hr);<br/>        pTask-&gt;Release();<br/>        pJob-&gt;Release();<br/>        pScheduler-&gt;Release();<br/>        exit(-1);        <br/>    }<br/><br/>    pTask-&gt;Release();<br/>    pJob-&gt;Release();<br/>    pScheduler-&gt;Release();</p> <p>    return 0;<br/>}</p>Tue, 27 Oct 2009 10:30:35 Z2009-11-19T18:56:26Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4f70e67c-1c1e-4c2b-bc68-787e350ab27bhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4f70e67c-1c1e-4c2b-bc68-787e350ab27bvancloud_gaohttp://social.microsoft.com/Profile/en-US/?user=vancloud_gaohpc 2008 supports network driver as workdirectory?Hi all,<br/>Does hpc 2008 support network driver as workdirectory? I mapped a network driver as Z(<a>\\host\sharedname</a>), and set it as workdirectory when submitting job. But the hpc 2008 throwed an exception that this workdiectory cannot be found.<br/><br/>thanksTue, 03 Nov 2009 05:30:18 Z2009-11-19T01:19:35Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4953d615-291a-4fa4-a0f0-e3b74920cb2ahttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4953d615-291a-4fa4-a0f0-e3b74920cb2aGarapa1http://social.microsoft.com/Profile/en-US/?user=Garapa1Running HPC Job Manager/Cluster Manager from outside AD domainI'm trying to use WIN HPC 2008 on a small cluster. My strength is not Win administration, but rather development.<br/> (This was the first time I have setup an AD domain, and I know nearly nothing about it.)<br/> <br/> To create the cluster I did have to configure an Active Directory domain, which went fine, but now when I try to run Job or Cluster manager from my development machine, which is configured for a workgroup, not the domain, I get the following error:<br/> &quot;HPC Job Manager: The server has rejected the client credentials.&quot;<br/> <br/> I suspect this may be related to the fact my dev. box is not part of the domain.<br/> <br/> Is there a way to get this to work without forcing my development box to join the clusters domain?<br/> <br/> <br/> Thank you,<br/> Cameron<br/> PS<br/> Is this the best forum to ask these types of questions?Tue, 27 Oct 2009 18:43:44 Z2009-11-19T01:17:39Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/2b87f5af-e4c5-49eb-97a6-7e238252ed5fhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/2b87f5af-e4c5-49eb-97a6-7e238252ed5fMarkUMNhttp://social.microsoft.com/Profile/en-US/?user=MarkUMNUsing Kerberos Protocol Transition to submit jobs on behalf of usersHi Guys,<br/> <br/> We are writing software that interacts with HPC Server and we would like to be able to submit jobs on behalf of users without knowing their password.  So far the two options that appear to be viable are to either store a list of the user passwords on the submission node (not particularly secure!), or attempt to use kerberos protocol transition as documented here:<br/> <br/> http://technet.microsoft.com/en-us/library/cc739587(WS.10).aspx<br/> <br/> Has anyone ever tried this?  Any thoughts or suggestions?<br/> <br/> Thanks,<br/> MarkWed, 28 Oct 2009 21:23:09 Z2009-11-19T01:15:22Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/0272461f-6d58-4b7a-aafe-666568d50188http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/0272461f-6d58-4b7a-aafe-666568d50188Don Patteehttp://social.microsoft.com/Profile/en-US/?user=Don%20PatteeWindows HPC Server 2008 R2 Beta 1 now available<p><strong>Our first Beta release is now available!</strong> You can read the full press release at <a href="http://www.microsoft.com/presspass/press/2009/nov09/11-16SC09PR.mspx">http://www.microsoft.com/presspass/press/2009/nov09/11-16SC09PR.mspx</a> if you're in to reading that kind of thing ;)</p> <p>Windows HPC Server 2008 R2 delivers productivity, performance and ease-of-use improvements in several areas, including the following:</p> <ul> <li>Improved scalability, with Windows HPC Server 2008 R2 offering out-of-the-box support for deploying, running and managing clusters up to 1,000 nodes</li> <li>New configuration and deployment options such as diskless boot, mixed-version clusters and support for a remote head node database</li> <li>Improved system management, diagnostics and reporting including an enhanced heat map, multiple customizable tabs, an extensible diagnostic framework and the ability to create richer custom reports</li> <li>Improved support for service-oriented architecture (SOA) workloads including a new fire-and-recollect programming model, finalization hooks, improved Java interoperability, automatic restart and failover of broker nodes, and improved management, monitoring, diagnostics and debugging</li> <li>Message Passing Interface (MPI) and networking enhancements including optimizations for new processors, enhanced support for RDMA over Ethernet and InfiniBand, improved MPI debugging, and a pushbutton HPC LINPACK optimization wizard</li> <li>New ways to accelerate Microsoft Office Excel workbooks such as support for Cluster-Aware User-Defined Functions and the capability to run distributed Excel 2010 for the cluster</li> </ul> <p><strong>Come and join our beta program to give it a try, and you can give us feedback (positive or negative) on it: </strong><a href="http://connect.microsoft.com/HPC/content/content.aspx?ContentID=6923"><strong>http://connect.microsoft.com/HPC/content/content.aspx?ContentID=6923</strong></a><br/>(posting in the 3 main forums, sorry for the spam if you read all of them :) )</p>Tue, 17 Nov 2009 00:16:19 Z2009-11-17T00:16:19Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/74654eab-d236-4814-8528-468facd77aeehttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/74654eab-d236-4814-8528-468facd77aeeBrian Grangerhttp://social.microsoft.com/Profile/en-US/?user=Brian%20GrangerBug in XML parsing of Job Description filesHi,<br/> <br/> I am using Python to write job description files (the .xml files).  Overall this is working really well...<br/> <br/> But, I am finding that the XML parser that is used in the HPC Job Manager/job scheduler has a problem with the ordering of XML attributes in the Job tag.<br/> <br/> Basically, if AutoCalculateMax=&quot;true&quot; AutoCalculateMin=&quot;true&quot; are put to early in the attribute list, they don't do anything.  I found this out because <br/> the Python XML writer, orders the attributes in alphabetical order and these attributes are put first (they start with &quot;A&quot;).  Below, I have included an example<br/> that shows this behavior.  Just import the XML files into the Job Manager and you will see that it is NOT auto calculating the min/max.<br/> <br/> I have heard of XML readers/writers that don't preserve attribute order in round trips (parse, then write), but this is because these tools use something like an unordered data<br/> structure internally for the attributes.  But I have never heard of one whose parsing itself depends on the attribute order.  Can someone look into this issue?<br/> <br/> Cheers,<br/> <br/> Brian<br/> <br/> &lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;<br/> &lt;Job AutoCalculateMax=&quot;true&quot; AutoCalculateMin=&quot;true&quot; Version=&quot;2.000&quot; Name=&quot;IPCluster&quot; UnitType=&quot;Core&quot; MaxCores=&quot;1&quot; MaxNodes=&quot;1&quot; MaxSockets=&quot;1&quot; MinCores=&quot;1&quot; MinNodes=&quot;1&quot; MinSockets=&quot;1&quot; RunUntilCanceled=&quot;false&quot; IsExclusive=&quot;false&quot; UserName=&quot;GNET\bgranger&quot; JobType=&quot;Batch&quot; Priority=&quot;Highest&quot; Project=&quot;IPython&quot; Owner=&quot;GNET\bgranger&quot; xmlns=&quot;http://schemas.microsoft.com/HPCS2008/scheduler/&quot;&gt;<br/>     &lt;Dependencies/&gt;<br/>     &lt;Tasks&gt;<br/>         &lt;Task CommandLine=&quot;\\blue\domainusers$\bgranger\Python\Python25\Scripts\ipcontroller.exe --log-to-file -p default --log-level 10&quot; IsParametric=&quot;false&quot; IsRerunnable=&quot;true&quot; MaxCores=&quot;1&quot; MaxNodes=&quot;1&quot; MaxSockets=&quot;1&quot; MinCores=&quot;1&quot; MinNodes=&quot;1&quot; MinSockets=&quot;1&quot; StdErrFilePath=&quot;controller-err.txt&quot; StdOutFilePath=&quot;controller-out.txt&quot; TaskName=&quot;Controller&quot; UnitType=&quot;Core&quot; WorkDirectory=&quot;\\blue\domainusers$\bgranger\.ipython\cluster_default&quot;&gt;<br/>             &lt;EnvironmentVariables&gt;<br/>                 &lt;Variable&gt;<br/>                     &lt;Name&gt;PYTHONPATH&lt;/Name&gt;<br/>                     &lt;Value&gt;\\blue\domainusers$\bgranger\Python\Python25\Lib\site-packages&lt;/Value&gt;<br/>                 &lt;/Variable&gt;<br/>             &lt;/EnvironmentVariables&gt;<br/>         &lt;/Task&gt;<br/>     &lt;/Tasks&gt;<br/> &lt;/Job&gt;Tue, 10 Nov 2009 06:02:52 Z2009-11-19T01:20:18Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/368dbecb-75eb-4906-a1e7-e857e038e04chttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/368dbecb-75eb-4906-a1e7-e857e038e04cJamieBradfordhttp://social.microsoft.com/Profile/en-US/?user=JamieBradfordJob cancellation and process cleanup<p>I am using Ansoft HFSS v12 on and HPC Server 2008 cluster and have found that if I cancel running job, the running process is killed and is thus unable to clean up after itself.  HFSS has an option to do a 'clean stop', which allows the application to complete running tasks, cleanup lock files and other files related to the job, and to flush any pending data to the result set UNC before closing.  This is sometimes necessary if, during the run, a solve on a particular frequency or set of frequencies fails to converge on a solution, meaning the rest of the run is wasted as the model needs modification.  Without this flushed results data, however, it becomes very difficult to track down the source of the problem<br/><br/>Unfortunately, when HPC Server kills a job, it kills the process(es) involved in a way that prevents this sort of cleanup. I've been told this is just the way it is, but  I'm wondering if anyone has seen this issue and done any work to find a way around it.<br/><br/>Thanks in advance!<br/><br/>Jamie</p>Fri, 30 Oct 2009 22:08:07 Z2009-11-17T00:23:24Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4459d5aa-1e27-4e7b-9081-48a1077f189bhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/4459d5aa-1e27-4e7b-9081-48a1077f189bJ. Scott Millerhttp://social.microsoft.com/Profile/en-US/?user=J.%20Scott%20MillerScheduling DB-bound tasksIn addition to per-node and per-core, computationally intensive tasks, our development group would like to use the HPC cluster to schedule jobs that involve significant database usage. While these jobs are not high-performance in the sense of computation, we would still like to schedule and manage these tasks on our cluster. Because the bottleneck is not local, we would like the tasks that compose each job to execute concurrently. Is there a way to schedule a job such that more than one of its tasks execute on the same core?Wed, 28 Oct 2009 20:50:24 Z2009-11-19T01:19:56Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/52a7c38d-e53f-4028-a3c5-1ca42c3b6052http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/52a7c38d-e53f-4028-a3c5-1ca42c3b6052Brian Grangerhttp://social.microsoft.com/Profile/en-US/?user=Brian%20GrangerHow are tasks killed when you cancel a job?Hi,<br/> <br/> I have a long running process (run as a task inside a job submitted to the scheduler) that installs some signal handlers (I know that windows doesn't have signals like POSIX does, but I am using Python, which abstracts this in a way that make sense).  These signal handlers make sure the process cleans up after itself.  When I run my process by hand from a regular cmd.exe session, the signal handlers get called when I do Control-C.  But, when the process is running on a compute node as a task and I stop the process using &quot;Cancel Job&quot;, the task exits without the signal handlers being called.  This leaves a huge mess each time the process runs that I have to clean up.<br/> <br/> So, questions:<br/> <br/> * How exactly does the scheduler stop processes on compute nodes?<br/> * Can this be changed, to basically mimic what a Control-C does (this is like a SIGINT on POSIX).<br/> <br/> Cheers,<br/> <br/> BrianSun, 08 Nov 2009 22:36:49 Z2009-11-19T01:19:44Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/e6941991-bc27-41c6-8f48-da266eeccff9http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/e6941991-bc27-41c6-8f48-da266eeccff9Brian Grangerhttp://social.microsoft.com/Profile/en-US/?user=Brian%20GrangerPython and Paths: UNC vs mapped drivesHi,<br/> <br/> I have read through the previous threads about some of the issues related to UNC paths, Python, etc.  This turning out to be a huge problem for us.  We are trying to enable an open source Python project (IPython: ipython.scipy.org) to work with the scheduler.  In particular we have a command line program that we want to use to submit parallel Python jobs for users (it is actually more complicated than that, but from the scheduler perspective, that is it).  I want to make sure I understand the issues related to paths...<br/> <br/> Are the following correct descriptions?<br/> <br/> * It looks like you have to use a UNC path for the working directory, rather than a mapped drive.<br/> <br/> * But, cmd.exe doesn't accept this so it defaults the working dir to C:\Windows.<br/> <br/> * But, the environment variables get set correctly (%CCP_WORKDIR%)<br/> <br/> * But muchof Python cannot handle UNC directories.  I am still trying to figure out what parts can and can't handle UNC paths, but this looks like this might be a huge issue for us.<br/> <br/> * It looks like the %PATH% environment variable is completely ignored when finding executables run by tasks.  This, in combination with the ban on mapped drives, means that if an executable is on a shared drive, you have to give its UNC path at the command line:  \\server\share\program.exe<br/> <br/> * Are drives simply never mapped on the compute nodes?  From my tests, it looks like sometimes they are.  For example, sometimes my python script can do os.chdir(r'z:\\documents') on a compute node.  Sometimes not.  Am I just asking for trouble here?<br/> <br/> * We need a robust UNC based method of finding a users home directory.  Currently we use %homedrive%%homepath%, but that gives us the mapped drive version.  It looks like %HOMESHARE% might have the information in UNC form, but are we always guaranteed that will give the UNC path to the home directory?  What if the home directory for a user is local (not shared) - what is %HOMESHARE% in that case?<br/> <br/> Cheers,<br/> <br/> Brian<br/>Sun, 08 Nov 2009 01:02:45 Z2009-11-16T21:01:42Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/589fabfb-5b95-4907-b957-7625ec1ba9b8http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/589fabfb-5b95-4907-b957-7625ec1ba9b8BiedImthttp://social.microsoft.com/Profile/en-US/?user=BiedImtEvent Log Error<p>Hi,</p> <p>when I want to look at the HPC Head Node in the „Microsoft-Windows-HPCServer-Scheduler/Operational“ log of the event view, I get an error<br/>„Some events fields my not display descriptive text because this information cannot be retrieved from the component that raises these events. The component may be misconfigured or corrupted.“<br/>And no Events are shown. Is there a way to fix this issue?</p> <p>Kind regards,<br/>Sven</p>Wed, 15 Jul 2009 19:16:31 Z2009-10-15T23:42:54Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/16173ba2-5af7-4b47-9abe-3b5a7bd9ffc5http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/16173ba2-5af7-4b47-9abe-3b5a7bd9ffc5thegooderichttp://social.microsoft.com/Profile/en-US/?user=thegooderichpcbpws.ps1 - Install failed, no suitable client or server authentication certificates discoveredI'm trying to commission the HPC Basic Profile Web Service, and running hpcbpws.ps1 returns &quot;Install failed, no suitable client or server authentication certificates discovered&quot;.  I'd initially followed the steps outlined in &quot;The Windows HPC Server 2008 Cluster in a Linux Environment&quot; and set up the AD Certificate Services role on our head node.  Since our institution doesn't allow dynamic DNS, I didn't think I could make it an Enterprise cert server, so I made it a standalone in the wizard, otherwise following all the instructions in the document.<br/><br/>A certificate is shown in the Server Manager (domain-server-CA), but I still got the &quot;no suitable cert&quot; error.  I then read the &quot;HPC Basic Profile Web Service Documentation for CTP2&quot; document, and followed the instructions to bind the cert to https in ISS manager, but the error is still the same.  <br/><br/>Any ideas?  I'll probably remove and add the cert server role as enterprise (if it will let me) next.<br/><br/>Cheers!  -EricFri, 18 Sep 2009 19:19:00 Z2009-10-27T17:40:54Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/dc733d6f-2084-4228-9c14-e99436532a7fhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/dc733d6f-2084-4228-9c14-e99436532a7fslnovakhttp://social.microsoft.com/Profile/en-US/?user=slnovakUsing Job.SoftwareLicenseHi all,<br /> <br /> I'm relatively new to developing interaces with HPC Server 2008.&nbsp; In my current application, I need to utilize the SoftwareLicense property of a job to ensure that only one job is being executed on a node at a given time.&nbsp; There will be other job &quot;types&quot; that will be able to run on that node, just not of the first type.&nbsp; To do this, I want to create a fictitious license for the node and require that the job use that specific license.<br /> <br /> When I try implementing this by creating a job with a fake/non-existent license, the job still runs although none of the nodes have that license available.&nbsp; How is this possible?&nbsp; I was expecting that the job would fail due to the fact that none of the available resources/nodes have that license available.<br /> <br /> I've scrounged around through the Microsoft documentation, but I can only dig up two sentences on what the SoftwareLicense property does.<br /> <br /> Would someone mind clearing this up for me?<br /> <br /> Thanks!<br /> <br /> -StefanFri, 09 Oct 2009 19:05:26 Z2009-10-27T17:40:44Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/3e38a392-9d69-4d23-b4f8-179db9fd2e0fhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/3e38a392-9d69-4d23-b4f8-179db9fd2e0fLuke Scharfhttp://social.microsoft.com/Profile/en-US/?user=Luke%20ScharfJob Resource Usage report pedantryI've been looking through the help and Googling, but I can't find the answer to the following question:<br /> What exactly goes in to the &quot;Total Run Time&quot; column in the Job Resource Usage report?<br /> <br /> Total Run Time: The biggest unresolved question is whether this value is the time from when the job was submitted until the time that the job ended?&nbsp; Or is the time from when the job-start?&nbsp; On our other clusters, it's common for jobs to sit around for 24 hours or more before the nodes are available to run it.&nbsp; It would clearly be unfair to charge the user for not using the cluster.<br /> <br /> Total CPU Hours: Does this value log just the CPU cycles used, but not the time that the processes were sitting around blocking and waiting for I/O?<br /> <br /> Thanks,<br /> -Luke<br />Thu, 08 Oct 2009 14:42:59 Z2009-10-09T14:42:41Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/92ffcf11-cb05-4f9e-8476-d4b663f0ef8ahttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/92ffcf11-cb05-4f9e-8476-d4b663f0ef8ahiasahttp://social.microsoft.com/Profile/en-US/?user=hiasaHow to check flexlm license availability before executing a job?Dear all, <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial"><br/></div> <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial">In Windows HPC Server 2008 scheduler, is it possible to check for flexlm license availability before each job in the queue is executed?</div> <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial">If it's possible, could someone tell me how??</div> <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial"><br/></div> <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial">The program I am running is using flexlm network license and right now the job will run even when the license is not available, so the job will stop right away since license is not available.</div> <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial"><br/></div> <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial">I guess there are paid programs out there, but I am looking for free resources.</div> <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial"><br/></div> <div style="font-weight:inherit;font-style:inherit;font-family:inherit;padding:0px;margin:0px;border:0px initial initial">Thank you.</div>Fri, 25 Sep 2009 02:59:10 Z2009-10-05T15:59:24Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/1e51e3e0-f3ea-4cca-a65d-7e6634089b52http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/1e51e3e0-f3ea-4cca-a65d-7e6634089b52winsupporthttp://social.microsoft.com/Profile/en-US/?user=winsupportjob template1. Is it possible to create a job template which contains parameter constrains about amount of memory the job will use? ( to prevent server disk trashing/virtual memory swapping in the event that several single core jobs that requires lots of memory run on a single node).<br/><br/>2. When job runs on compute nodes and it needs a temporary directory to write temporary logs/files, how does this gets created?  Does it use System's TEMP or User's TEMP directory? Does it gets deleted automatically after the job finished?<br/><br/>thank you for your replies...Thu, 10 Sep 2009 01:36:31 Z2009-10-01T22:36:58Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/2f38ed99-40ef-4c9a-87a2-c6c3151e9602http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/2f38ed99-40ef-4c9a-87a2-c6c3151e9602akiladilahttp://social.microsoft.com/Profile/en-US/?user=akiladilaRemote ApplicationsHello All, <div><br/></div> <div>I work on an application that has both a client and a server component.  When the client starts it request that the server starts.  The user can request that the server start locally or that it start remotely.  Under Linux, SSH can be used to start remote processes/applications.  My question &quot;Is what is the best way to start remote applications under windows?&quot;  I know that this question isn't specifically related to Job Submission, but right now I am just searching for ideas.</div> <div><br/></div> <div>Regards,</div> <div>Aquil</div>Tue, 15 Sep 2009 14:50:43 Z2009-10-01T22:36:45Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/dba23fdf-222c-4b62-8e56-a9a1de7d6a79http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/dba23fdf-222c-4b62-8e56-a9a1de7d6a79Marc W dehttp://social.microsoft.com/Profile/en-US/?user=Marc%20W%20deJob shrinking / growing with different resource granularityOur cluster is configured to adjust resources automatically to give precedence to jobs with higher priority (graceful pre-emption plus increasing and decreasing of resources).<br/> <br/> We have one job (A) with many tasks requiring a single (4-core) socket each. The job is at BelowNormal priority with 1-auto <em>sockets</em> for the job resources. <br/> A second job (B) with normal priority was submitted afterwards. This job has several tasks requiring 1 core, 2 of which queued. The job is set to auto-auto <em>cores</em> .<br/> Neither of the jobs/tasks is set to exclusive resource usage.<br/> <br/> We were under the impression that job (A) should shrink whenever it finishes a task if job (B) still has more waiting tasks looking for cores. However, this is seemingly not happening as job (B) has several tasks waiting while job (A) finished several tasks but kept the sockets (all 15 it had) to start new tasks instead of handing the sockets over to (B) which is looking for 2 cores for its two remaining tasks. <br/> <br/> Is this the expected behavior or does it look like we have a configuration problem?<br/> <br/> <br/> On a side note, the activity log of job (A) lists &quot;added 1 core on X&quot; when it should probably be 1 socket.<br/> <br/> <br/> HPC Version: 2.1.1703.0 on Windows 2008 Server.<br/> <br/> <br/> PS: adding a third job caused job (A) to give up its socket. Looks like a socket is only given up if all its cores are requested by jobs with higher priority...<br/>Tue, 15 Sep 2009 11:23:49 Z2009-10-01T22:36:38Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/317f1f25-9356-4178-9a71-bd8c90e2c833http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/317f1f25-9356-4178-9a71-bd8c90e2c8331974http://social.microsoft.com/Profile/en-US/?user=1974Working directory not fully "inherited" by task batch-file commands?Hello all, If I set the working directory to \\server\share, and submit a job with a single task which is a batch file with two commands in it: echo “testing” &gt; out del out It will create the file “out” under \\server\share, but fails on the “del” saying “The file name, directory name, or volume label syntax is incorrect.” If I change the line “del out” to “del \\server\share\out” it succeeds. Whatever I set “Working directory” to, or if I let it default to my profile on the node, it creates “out”, but the “del” never finds it unless I hard-code it. Why must I hard-code the absolute UNC path? Note if I login manually to the node and “cd \\server\share; cmd.exe /c \\server\share\file.bat” it works fine. How can I use relative paths for all files in a task’s batch file? It seems working directory isn’t being used for every command (like del), but only for some (like echo + redirect). This is running 2003 Compute Cluster, Job Manager 1.0.067614. Thanks for any tips. Tue, 08 Sep 2009 18:42:08 Z2009-09-09T11:34:32Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/1573b8cf-fd68-43d9-92c4-ada716fdabdbhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/1573b8cf-fd68-43d9-92c4-ada716fdabdbHeftiSchlumpfhttp://social.microsoft.com/Profile/en-US/?user=HeftiSchlumpf/stdout with append mode?Hi!<br/> I am running a big number of tests on a windows 2008 R2 HPC Server.<br/> <br/> I need to append each output of the tests into one file.<br/> <br/> Normally you pipe it like this: <br/> command &gt;&gt; output.txt<br/> <br/> i am running my test with:<br/> job submit  <br/> and a flag:<br/> /stdout:&quot;out.txt&quot;<br/> <br/> but out.txt is overwritten on each new test.<br/> I need to append, how to do it?<br/> <br/>Tue, 08 Sep 2009 20:05:09 Z2009-09-08T23:16:34Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/b7816c3e-726e-437a-bd4e-464967407571http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/b7816c3e-726e-437a-bd4e-464967407571akiladilahttp://social.microsoft.com/Profile/en-US/?user=akiladilaCan job files have CommandLine attribute with double quotes?I am generating a job file that has CommandLine attribute that has double quotes.  Is there a way to escape these double quotes?  Here is a snippet... <div><br/></div> <div> <div>        &lt;Task Version=&quot;2.000&quot;</div> <div>               Name=&quot;test1.4528.batch_run&quot;</div> <div>               MinCores=&quot;2&quot;</div> <div>               MaxCores=&quot;2&quot;</div> <div>               CommandLine=&quot;C:\Program Files (x86)\Gizmo\client\bin\gizmo -a %COMPUTERNAME% -s C:\Program Files (x86)\Gizmo -c C:\Program Files (x86)\Gizmo\client\config\reference-config.xml -o gizmo.startscript=&quot;main()&quot;&quot;</div> <div>               WorkDirectory=&quot;C:\Users\aabdullah\gizmo\batch_run\results\200992_141522&quot;</div> <div>               StdOutFilePath=&quot;C:\Users\aabdullah\gizmo\batch_run\results\200992_141522\stdout.txt&quot;</div> <div>               StdErrFilePath=&quot;C:\Users\aabdullah\gizmo\batch_batch\results\200992_141522\stderr.txt&quot;</div> <div>               UnitType=&quot;Core&quot; /&gt;</div> <div><br/></div> <div><br/></div> <div>As you can see my gizmo command-line requires an option that specifies a startscript and since there are parenthensis it needs double quotes.</div> </div>Wed, 02 Sep 2009 18:48:08 Z2009-09-02T19:49:06Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/c7bfad32-fd17-4c36-a27c-6150ddf9c923http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/c7bfad32-fd17-4c36-a27c-6150ddf9c923prashshttp://social.microsoft.com/Profile/en-US/?user=prashsJobs stuck in configuring stateHi,<br/> <br/> I am using microsoft APIs (<span class=Apple-style-span style="border-collapse:separate;color:#000000;font-family:'Times New Roman';font-size:16px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span class=Apple-style-span style="font-size:13px">Microsoft.Hpc.Scheduler</span> </span> ) to submit jobs to HPC job scheduler. Our application is installed on 2 separate clusters. Both clusters are on different networks and different domains. Things are working well on one cluster, but on the other cluster the jobs submitted by our application get stuck in &quot;Configuring&quot; state.<br/> <br/> If I right click on the job which is in &quot;Configuring&quot; state (in the HPC cluster manager) and click submit job, it gets submitted but my application keeps waiting for a response.<br/> <br/> Any ideas what might be causing this behavior? Is it something related to permissions?   <br/> <br/> Thanks!<br/> PrashantTue, 18 Aug 2009 01:37:45 Z2009-09-18T23:41:42Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/02067d64-4546-4738-8304-910e831590dbhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/02067d64-4546-4738-8304-910e831590dbakiladilahttp://social.microsoft.com/Profile/en-US/?user=akiladilaHow do you determine which mahcines have been allocated for your job?Under systems, like PBS, TORQUE, LSF, etc. When a job is submitted and goes into the &quot;Running&quot; state the Workload Manager sets an environment variable such as PBS_NODEFILE or LSB_MCPU_HOSTS, which contains the names of the nodes that have been allocated for the job.  Is there a MS Job Scheduler equivalent? Basically, I've created an application that uses the command-line to create a job and then submit it.  I use a jobfile as a &quot;template&quot; and then modify the template based on a users input.  The problem is that my application starts a client application that then invokes a call to mpiexec.  [NOTE: Here template refers to my application template, note the job scheduler template used to define job policy.] My goal is to make sure that the mpiexec uses the resources allocated for the job.  The basic pattern looks as follows: <div><br/></div> <div>query user for job details</div> <div>generate XML job file from a generic xml job file.</div> <div>create new job: </div> <div>     job new /jobfile:[my_jobfile.xml]</div> <div>submit job: </div> <div>     job submit /id:[my_job_id]</div> <div><br/></div> <div>The CommandLine attribute in the Task element of of my job file starts a an application, which in turn starts an MPI application via mpiexec.</div> <div>I would like to be able to make sure mpiexec uses the resources allocated to the job.</div> <div><br/></div> <div>I know that this is somewhat convoluted, but one of my goals is to create a framework that is generic as possible and allows me ot run under multiple platforms and multiple workload managers.  My current abstraction gives me that flexibility so I am hesitant to move to a new implementation unless I can be convinced that it is really the right thing to do.</div> <div><br/></div> <div>Thanks for any suggestions.</div> <div><br/></div> <div>Aquil</div>Mon, 31 Aug 2009 19:15:44 Z2009-09-18T23:41:28Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/8f7e4c9e-253e-4d52-acd7-3003bb9c6cabhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/8f7e4c9e-253e-4d52-acd7-3003bb9c6cabakiladilahttp://social.microsoft.com/Profile/en-US/?user=akiladilaPython Example of using the HPC Basic Profile Web ServiceHello All, <div><br/></div> <div>I have a python application that I would like to use to drive a parallel application on the Windows HPC Server 2008.  I took a look at the C# API provided for communicating with the HPC Basic Profile (HPCBP) Web service, which can run on the head node of a cluster.  In the long run, I think that I may choose to go with a .NET implementation, but right now I would like to continue to use python to drive my parallel application.</div> <div><br/></div> <div>I know that as a comprimise I could use Iron Python, but right now going that route has it's own set of limitations. So my question is does anyone know of an example, sample, tutorial, or outline for communicating with the HPC Server 2008 Job Scheduler via HPCBP using Python?</div> <div><br/></div> <div>Regards</div>Wed, 26 Aug 2009 23:18:13 Z2009-08-28T23:47:42Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/40b9cc26-fd17-41d4-a535-150c2e7e2e34http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/40b9cc26-fd17-41d4-a535-150c2e7e2e34Christoph Müllerhttp://social.microsoft.com/Profile/en-US/?user=Christoph%20M%u00fcllerAre custom resources for scheduling possible?<p>I am relatively new to HPC server and I wonder whether it is possible to define custom per-node resources that are used for scheduling (i. e. can be requested by the user) as it is possible with Sun Grid Engine. The background is the following: Our cluster has GPUs for GPGPU, but the number of cores per node is larger than the number of cores. I want to achieve that if all GPUs are in use (but not all cores) no further GPU jobs are assigned to this node (each GPU is an exclusive resource). <br/><br/>Is this possible and how would I do this? If not, are there any plans for such a feature in future releases of HPC server?<br/><br/>And an additional question: Are there any plans to allow initiating console sessions from remote in the future? It would be really cool not to rely on the auto-logon feature and allow per-user sessions.<br/><br/>Best regards,<br/>Christoph</p>Thu, 06 Aug 2009 19:46:15 Z2009-08-26T07:36:31Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/0b4d9363-7a18-41cf-8338-56aef0454173http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/0b4d9363-7a18-41cf-8338-56aef0454173Luke Scharfhttp://social.microsoft.com/Profile/en-US/?user=Luke%20ScharfHistorical job-queue records?How do I control retention of job records in the Win2k8 HPC scheduling?<br/> <br/> We need to keep them around for a long time (a year +) for accounting purposes.<br/> <br/> Thanks,<br/> -Luke<br/>Wed, 12 Aug 2009 18:37:12 Z2009-08-17T14:34:20Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/34d2b858-4a39-4457-8cb7-f33f8def62fdhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/34d2b858-4a39-4457-8cb7-f33f8def62fdtosa.yasunarihttp://social.microsoft.com/Profile/en-US/?user=tosa.yasunarifailure in "job submit" after HPC Pack SP1 installI can no longer use &quot;job submit&quot; command line and get the following error.   Can you tell me what I do to eliminate error?<br/> <br/> Database Exception<br/> Procedure or function 'Schd_NextTaskId' expects parameter '@numTasks', which was not supplied.<br/> <br/> I need to fix this ASAP so that we can go on.<br/> Our &quot;job submit&quot; was working before the installation of HPC Pack SP1 install. Wed, 05 Aug 2009 20:51:22 Z2009-08-13T18:12:14Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/083b23cb-2bc7-426a-8267-c5b0b54d0675http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/083b23cb-2bc7-426a-8267-c5b0b54d0675Phil Molzerhttp://social.microsoft.com/Profile/en-US/?user=Phil%20MolzerCluster Manager Job State stuck on ConfiguringHi,<br/>I am testing a SOA style service on HPC 2008 Server sp1 cluster.  I have one head node and 2 compute nodes.  All nodes have State=Online and Node Health=OK in the Node Management pane of Cluster Manager.  I ran diagnostics without any issues.<br/><br/>When my client tries to create a session, the cluster manager's Job Management pane shows 2 items.  The first is Job ID 4 WCF service, the second is Job ID 5 WCF service - Broker for service job 4.  The State for both is 'Configuring'.  There is no furher detail in the Job Details.  My client just hangs.  I let it run overnight and it never got past the configuring stage.  <br/><br/>I need advice on how to troubleshoot.  I can't find any more information on what is happening during Configuring state.<br/>thanks<br/>PhilFri, 26 Jun 2009 18:45:03 Z2009-08-07T17:12:56Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/2e66edfd-fa06-4c36-8e23-854d004fea76http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/2e66edfd-fa06-4c36-8e23-854d004fea76trimtrimhttp://social.microsoft.com/Profile/en-US/?user=trimtrimHow to evenly create the processors on the nodesI have a MPI program running on the Windows HPC cluster. The cluster have 50 nodes, each nodes have 8 processors. I tried to run my program using 50 processors. The windows HPC always creates 8 processors on 1 nodes. How can I use 25 nodes and each nodes create 2 processors. I tried to use machine file, however, The windoes HPC didn't create processors as my machine file specified.Mon, 03 Aug 2009 05:50:42 Z2009-08-11T01:13:55Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/1a1113ad-a51f-4c56-a69c-c8ff4cd184fbhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/1a1113ad-a51f-4c56-a69c-c8ff4cd184fbrama1981http://social.microsoft.com/Profile/en-US/?user=rama1981How to Submit a job to a HPC cluster via WPF browser ApplicationHi,<br/><br/>I am using the HPC SDK to submit jobs to the cluster.  The app I am trying to use is WPF browser App.  One the WPF browser app is published, it works fine on machines which have the HPC Client Tools installed.  However  I am getting error when the WPF browser app is run from machines which do not have the HPC client tools.  I also tired with publishing all the HPC related dlls.  Even this does not work.  The error I get is as follows:<br/><br/>=========================================<br/>ERROR SUMMARY<br/> Below is a summary of the errors, details of these errors are listed later in the log.<br/> * An exception occurred while determining platform requirements. Following failure messages were detected:<br/>  + Unable to install or run the application. The application requires that assembly Microsoft.Hpc.Scheduler.Store Version 2.0.0.0 be installed in the Global Assembly Cache (GAC) first. <br/> * An exception occurred while downloading the application. Following failure messages were detected:<br/>  + The DeterminePlatformRequirements method failed. The application cannot be committed.<br/><br/>==================================<br/><br/>I know that the alternative would be to use jsdl, but that seems to more cumbersome.  I would prefer to use the HPC SDK instead.  <br/><br/>Is there a way to get this working without getting the HPC dll's installed in the GAC?<br/><br/>Thanks<br/>RamaMon, 27 Jul 2009 18:40:46 Z2009-08-11T01:14:08Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/cedb2531-39b8-4376-ab05-4e7f6202e82ahttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/cedb2531-39b8-4376-ab05-4e7f6202e82awendy dhttp://social.microsoft.com/Profile/en-US/?user=wendy%20dJob is in a "Failed" state but all tasks have a "finished" stateWe run on HPC 2008 and have a job that indicates that it's failed - and nominates the failed tasks in the error message. However, all tasks for the job have a state of &quot;Finished&quot; (ie there are none that failed). The tasks that the job manager says failed show no sign of failure: they all are Finished; they have no error messages; and they all processed what they were supposed to succesfully. <br/><br/>There's no indication of problems in any event logs and there's no one particular Node that they were running on. (And all other tasks on the nodes were successful)<br/><br/>The job is embarassingly parallel and was running on 3 nodes with 28 cores. There's 271 tasks.<br/><br/>In one instance the job was running for 22 hours. The 5 tasks that failed all started within a 20 minute period. None of the other tasks started during that time. All cores were fully utilised throughout the job so they weren't the only 5 things running at the time.<br/><br/>However, we've also had the problem with the same job (a lot less data) running with the same number of nodes, cores and tasks and running for 5 minutes. And then one of the tasks fails. <br/><br/>We can't reproduce this behaviour at will.<br/><br/>We'd appreciate any thoughts on what's happened and how we can resolve what appears to be incorrect reporting of the state.<br/>Thu, 09 Jul 2009 06:00:05 Z2009-07-31T00:53:21Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/163d5d0a-b6be-4422-860d-e915d5c9c622http://social.microsoft.com/Forums/en-US/windowshpcsched/thread/163d5d0a-b6be-4422-860d-e915d5c9c622rmaghttp://social.microsoft.com/Profile/en-US/?user=rmagUnhandled Exceptions<span style="font-size:11pt;font-family:'Calibri','sans-serif'">If an application that is being run by the Job Manager throws an unhandled exception, the job will continue to keep running indefinitely. If the application running in a task generates an unhandled exception the just-in-time debugger dialog comes up on the node where the task was running and the job remains in a running state.  The only way to get the job to fail is to login to the nodes and close the dialog box, which in turn causes the job to fail.  Is there any way to make the jobs fail without interaction with the node if an exception is thrown?</span>Tue, 16 Jun 2009 12:30:18 Z2009-08-11T01:14:31Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/3f4ed814-f63e-4801-a680-21fbb808106chttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/3f4ed814-f63e-4801-a680-21fbb808106crmaghttp://social.microsoft.com/Profile/en-US/?user=rmagTrace.Assert Performing As Expected When Run From The Cluster Manager<p class=MsoNormal style="margin:0in 0in 0pt"><span style="font-size:small;font-family:Calibri">Trace.Assert doesn't work as expected when running as a job in cluster manager - Here's a VB.NET code sample that illustrates the problem:</span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><span style="font-size:small;font-family:Calibri"> </span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><strong><span style="font-size:14pt;font-family:'Courier New'">    <span style="color:navy">Sub</span> Main(<span style="color:navy">ByVal</span> args() <span style="color:navy">As</span> <span style="color:navy">String</span>)</span></strong><span style="font-size:14pt;font-family:'Courier New'"></span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><strong><span style="font-size:14pt;font-family:'Courier New'">        Console.WriteLine(</span></strong><span style="font-size:14pt;color:#a31515;font-family:'Courier New'">&quot;Before assert&quot;</span><strong><span style="font-size:14pt;font-family:'Courier New'">)</span></strong><span style="font-size:14pt;font-family:'Courier New'"></span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><strong><span style="font-size:14pt;font-family:'Courier New'">        Trace.Assert(<span style="color:navy">False</span>)</span></strong><span style="font-size:14pt;font-family:'Courier New'"></span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><strong><span style="font-size:14pt;font-family:'Courier New'">        Console.WriteLine(</span></strong><span style="font-size:14pt;color:#a31515;font-family:'Courier New'">&quot;After assert&quot;</span><strong><span style="font-size:14pt;font-family:'Courier New'">)</span></strong><span style="font-size:14pt;font-family:'Courier New'"></span></p> <p class=MsoNormal style="margin:0in 0in 0pt"><strong><span style="font-size:14pt;font-family:'Courier New'">    <span style="color:navy">End</span> <span style="color:navy">Sub</span></span></strong></p> <p class=MsoNormal style="margin:0in 0in 0pt"><strong><span style="font-size:14pt;color:navy;font-family:'Courier New'"> </span></strong></p> <p class=MsoNormal style="margin:0in 0in 0pt"><span style="font-size:small;font-family:Calibri">When logged directly into one of the HPC nodes, if you run the application it prints the first line to the console and then puts up a dialog as expected.  But if the application is run as a task in an HPC job, both lines are written to the console and the assert doesn't have any affect.  Is this behavior expected?</span></p>Tue, 16 Jun 2009 18:19:20 Z2009-07-30T18:37:41Zhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/a9ff79e6-68b0-4e0a-b5d5-ec03b16717bdhttp://social.microsoft.com/Forums/en-US/windowshpcsched/thread/a9ff79e6-68b0-4e0a-b5d5-ec03b16717bdcamtphttp://social.microsoft.com/Profile/en-US/?user=camtpTask fails with -1073741819I have tasks that fail with code -1073741819 which when translated indicate an access violation but not sure if this is how i interpret the code. Within our HPC environment jobs are executed from a shelled cmd.exe (not sure if this is standard HPC mechanism for running jobs but its certainly ours). So I presume the return code from our program is passed back to cmd.exe which then passes it back to HPC? In any event, I dont know how it picks up this error code because our jobs seem to indicate success.Thu, 09 Jul 2009 08:41:24 Z2009-08-11T01:15:11Z