locked
Activation Filter: how to get rid of "Remember this password? (Y/N) " ??? RRS feed

  • Question

  • I’m getting a weird behavior trying to resume a job programmatically (Using an activation filter)... I try to find something on the web but I just stuck here. (Using the HPC version 2.1.1.1703.0) :

    I'm using API, to pass a job to "configure" state when it does not have the correct licensing count; then other program it's just running at the background watching the "Configure" queue, and try to resubmit the job again, if not license is available then it will put the job in to configure state again and the cycle repeats.

    But then I got this annoying message where my program just stuck:

     This is the Output from the Debugging:

     HPCLicFilterService.vshost.exe' (Managed): Loaded 'C:\Windows\assembly\GAC_MSIL\System.Runtime.Remoting\2.0.0.0__b77a5c561934e089\System.Runtime.Remoting.dll', Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.

    Enter the password for 'DOMAIN\user' to connect to 'head-node': ? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)rd? (Y/N)? (Y/N)(Y/N)/N))Remember this password? (Y/N)member this password? (Y/N)mber this password? (Y/N)er this password? (Y/N) this password? (Y/N)his password? (Y/N)s password? (Y/N)password? (Y/N)ssword? (Y/N)word? (Y/N)The program '[4024] HPCLicFilterService.vshost.exe: Managed' has exited with code 0 (0x0).

    rd? (Y/N)

     And my program just stuck here for ever... I was trying to find something to just pass an “N” or get rid of this question but I did not find anything useful...

     The other thing is this only happens inside the cluster(Running from the head node or any computer node), If I ran the program Outside the cluster (from my laptop for example, and connecting to the cluster, using Sch.Connect(REMOTE_HOST) ) it just run ok.

     The piece of code that provoke this problem is:

     IScheduler scheduler = new Scheduler ();

    .

    .

    .

    scheduler.Connect(HPCfilterService .HPCClusterHost);

    .

    .

    .

    scheduler.SubmitJob(job, null , null );     < -------- It’s hanging here

    .

    .

    scheduler.Dispose();

     

    I will appreciate if you can help me here, or at least any documentation to refer to (I already read the HPC 2008 Cluster guide...)

     Thanks again.

    Regards

    • Edited by Larry25 Wednesday, May 12, 2010 11:17 AM
    Thursday, May 6, 2010 10:44 AM

Answers

  • The activation filter does not need to submit a job. The job has already been submitted and is ready to execute.

    The activation filter runs just before the job is to be launched on the cluster and its main purpose is to confirm that software licenses are available so that jobs won't fail unnecessarily after they start to execute because licenses aren't available. The job scheduler uses the return code of the activation filter to determine whether the job should be launched or not.

    Windows Server 2008

    If you are running Windows Server 2008 see http://technet.microsoft.com/en-us/library/dd346641(WS.10).aspx "C# Example: An Activation Filter that Checks for License Availability"

  • 0 - The queue will not be blocked. If the filter did not cancel the job, the job will be activated.
  • Any other exit code - The queue will be blocked by this job. The job will not be activated and remains in the queue. The filter will reevaluate the job periodically until either the job passes, or until the job is canceled.

    Windows Server 2008 Beta2

    If you are running Windows Server 2008 R2 Beta2 then there are more return code options available:

    see: http://technet.microsoft.com/en-us/library/ee783563(WS.10).aspx#BKMK_Activation

    The following list describes the supported exit codes for an activation filter, and the corresponding Job scheduler action:

    • 0: The job is started.
    • 1: The job is not started and remains in the queue. The filter reevaluates the job periodically until either the job passes, or until the job is canceled. No other jobs or equal or lower priority are started until the job passes or is canceled.
    • 2: The job is not started, but available resources are reserved for it depending on the Scheduling Mode: In Queued, up to the job’s maximum resources are reserved; in Balanced, the minimum resources are reserved. Other jobs can be started on other resources. The filter reevaluates the job periodically until the job passes.
    • 3: The job is put on hold until the date and time specified by the Hold Until job property. After the hold period, the job is reevaluated by the filter program. If the filter returns with exit code 3 and no Hold Until value is specified for that job, the job is held for the amount of time specified by the Default Hold Duration cluster setting.
    • 4: The job is marked as Failed with an error message that the job was failed by the activation filter.

     

Monday, May 17, 2010 2:16 PM

All replies

  • Hi,

    Will this help if you will cache the credentials for the account, from which you're running your background program, while logged into the headnode? You've mentioned that it works from your laptop, is there a chance that these credentials are already cached in there?

    Thanks,
    Łukasz

    Monday, May 10, 2010 5:55 PM
  • Hi Lukasz,

     

    Thanks for your reply, but, how I can do this? I mean; the background program is running with high user privileges (administrator), and  when I do the same test on HPC Beta 3 it works just fine(running on the HeadNode). Each job will be sent by different user; so which method allow me to cache the credentials, programmatically, with out user intervention?, actually the subitJob(job,Null,Null) will try to catch the cached credentials from the submitted job.

    I really don't know why it works from my laptop "outside" cluster and not from inside (Running the background program in the Head node or a Compute node).

     

    Regards.

    Larry

    • Edited by Larry25 Friday, May 14, 2010 11:02 AM
    Tuesday, May 11, 2010 2:09 PM
  • Hi Larry,

    I was thinking, that it may help if you will only cache credentials of Administrator account, from which you are running your background application. You will need to cache them on the machine where this program supposed to run. I suspect, that it's running fine from your laptop and on beta headnode, because you already cached these credentials in there (credentials cache is separate for every user profile on the machine).

    To try this, you can log into your headnode as domain\administrator (user which is going to run background program) and use:

    cluscfg setcreds /user:domain\administrator

    Programmatic way of doing this is via IScheduler.SetCachedCredentials() method.

    Please let me know if this helped for you, as I may have missed something. I am not sure if your program is running as a standalone app or system service etc...

    You may be also interested in Activation Filter improvements introduced by latest Beta2 release Windows HPC Server 2008 R2:
    http://blogs.technet.com/windowshpc/archive/2010/03/26/activation-filter-new-options-in-beta-2.aspx

    Thanks,
    Łukasz

    Tuesday, May 11, 2010 5:45 PM
  • Hi again Lukasz,

     

    Thanks for your comment, I will do the test and I will let you know.

     

    Yes I'm aware of the new improvements for the exit codes for the Activation filter, and it works great, I have a version of this program that works like a charm with this new exit codes. And of course no more Background program / service it's needed, so I don't need to re-submit the job.

     

    I will make some test and I will let you know.

     

    Thanks!

     

    Regards,

    Larry



    Wednesday, May 12, 2010 11:13 AM
  • Hi Luckasz,

     

    I made some modifications to the code.  But still not luck.

     

    I try to use the IScheduler.SetCachedCredentials() method, but this only work if I setup the "login" and "Password" manually let's say

     

    SetCachedCredentials(USERNAME,PASSWORD); I know how to get the user name from an already submitted job in the queue, but how I can get his/her password??

     

    If I setup this like "domain\user", "*PASSWD*" then it will work just fine. if not it also will hang here:

    Enter the password for 'domain\user' to connect to 'headnode':
    Remember this password? (Y/N)

    So I will get stuck here as well...


    Remember the activation filter and the background program need to run without user intervention...

    Regards,

    Larry

     

    Thursday, May 13, 2010 12:16 PM
  • Hi Larry,

    Sorry about that, but I thought, that you're being asked not for the credentials of the original user, who submitted a job, but for the password of the user, which is running the "background program".

    Could you clarify a few things to help me better understand your environment?

    1. Is the 'DOMAIN\User' from your first post the admin, which is running the background program or the owner of the job which fails to resubmit?

    2. Could you check, if for the job in configuring state, for which Submit() asks for credentials, 'Owner' and 'RunAs' fields are present? You can do this via 'job view <jobid>' or with GUI (after configuring columns visible for a job).

    3. What's the way you are testing your background program on the headnode? Is it running as a standalone application?

    4. When putting job into Configuring state in your activation filter, do you do anything else with this job (any kind of additional modification?)

    Regards,
    Łukasz

    Thursday, May 13, 2010 7:25 PM
  • Hi Lukasz,

     

    Ok let me try explain better my self;


    The activation filter will move the job to configure state, using just IScheduler.ConfigureJob(JobID) method, only if there's no licenses available for the job, in this way the job will not block the queue.

    Because this activation filter only will run when a new jobs arrives, I need to have a second program in the background, to do the job of: look at the configure queue, check which job is waiting for license, then try to resubmit the job using the same user cached credentials in the cluster(after wait for a reasonable time), this background program is running as admin in the Head Node in the cluster, i can't run it as my user because it will thrown a Exception regarding to permissions.

    The method I'm using to resubmit the job is:

    ISchedule.SubmitJob(job,null,null);

    Accordantly with this method, the API will try to look at the cached credentials for the job, if you put null as user name and null as password;

    Now, when I run this program from the Head node then you see the behavior I described at the beginning.( Hangs)

    I made the following tests:

    * Cached the credentials domain\user,password locally and  manually in the Head Node, using cluscfg setcreds ; it fails. (by the way was only for test)

    * Run the program by my self in the command prompt; wait for the question, then supply password and respond "Y"; the program will run nicely only with the jobs the same users submit (me in this case), for any other user will raise a "Permission Denied" exception.

    * Clear cluster cached credentials.

    * Run exactly the same program manually  or from Visual Studio in Debugging mode (from my laptop) ; where i'm logged in with the domain\user;password:

    the program works good with the jobs submitted by me, no questions, no hanging.

       * Works perfectly with submitted jobs from administrator (Head Node, admin)

       * Works perfectly with submitted jobs from another user that is not me.

    I mean the second program is using the cached credentials to resubmit the job. Why the same behavior does not apply from inside the cluster, from the Head Node for example?

    I added the column you suggested "Run As user", and I can see the correct owner  in both columns when is moved to the Configure state (done by the activation filter) and when is moved back to the submit queue (done by the second program).

    So, I don't know but  it's very annoying...(a bug may be??)

    I'm guessing that's something to do with "trusted" privileges... but I'm running out of ideas...

     

    Regards

    Larry

     

    Friday, May 14, 2010 11:00 AM
  • The activation filter does not need to submit a job. The job has already been submitted and is ready to execute.

    The activation filter runs just before the job is to be launched on the cluster and its main purpose is to confirm that software licenses are available so that jobs won't fail unnecessarily after they start to execute because licenses aren't available. The job scheduler uses the return code of the activation filter to determine whether the job should be launched or not.

    Windows Server 2008

    If you are running Windows Server 2008 see http://technet.microsoft.com/en-us/library/dd346641(WS.10).aspx "C# Example: An Activation Filter that Checks for License Availability"

  • 0 - The queue will not be blocked. If the filter did not cancel the job, the job will be activated.
  • Any other exit code - The queue will be blocked by this job. The job will not be activated and remains in the queue. The filter will reevaluate the job periodically until either the job passes, or until the job is canceled.

    Windows Server 2008 Beta2

    If you are running Windows Server 2008 R2 Beta2 then there are more return code options available:

    see: http://technet.microsoft.com/en-us/library/ee783563(WS.10).aspx#BKMK_Activation

    The following list describes the supported exit codes for an activation filter, and the corresponding Job scheduler action:

    • 0: The job is started.
    • 1: The job is not started and remains in the queue. The filter reevaluates the job periodically until either the job passes, or until the job is canceled. No other jobs or equal or lower priority are started until the job passes or is canceled.
    • 2: The job is not started, but available resources are reserved for it depending on the Scheduling Mode: In Queued, up to the job’s maximum resources are reserved; in Balanced, the minimum resources are reserved. Other jobs can be started on other resources. The filter reevaluates the job periodically until the job passes.
    • 3: The job is put on hold until the date and time specified by the Hold Until job property. After the hold period, the job is reevaluated by the filter program. If the filter returns with exit code 3 and no Hold Until value is specified for that job, the job is held for the amount of time specified by the Default Hold Duration cluster setting.
    • 4: The job is marked as Failed with an error message that the job was failed by the activation filter.

     

Monday, May 17, 2010 2:16 PM
  • Hi Steve,

     

    Thanks for your reply, I know all this stuff, actually I have a version that it's working perfectly with the BETA2 filter exit codes.

    My problem is precisselly that I don't want to block the queue because the current job have not X licenses, what happens if something else is coming asking for Y licenses and not X will be blocked as well...

    That's why I instructed my filter to move the job to Configure state, in this way; it will not block the queue. The filter IS NOT submitting the job, is deciding :

     If the job have not enough licenses ; move it to configure

    Else

    Return code 0;

    Then a second program is working in background (that is not the activation filter), watching the configure queue and then Resubmitting the jobs that were waiting for Licenses.

    I know that in the new BETA2 I will not need the background program, because the exit codes will instruct HPC what to do.

    My problem IS not with the activation filter, is with the background program that is resuming the jobs.

    Regards.

     

    Larry
    Tuesday, May 18, 2010 8:49 AM