How do I diagnose Node configuration problems?
-
2011年8月26日 14:27
I have a new cluster where currently only the head node has my service application files installed. Jobs run fine on that single node, but show failures (State: Failed) for task Ids representing the other nodes in the cluster. I fully expect to see errors in this situation (I'm specifically testing for this condition), but I'm not sure how I can get the detail of the error. I was expecting something like "FileNotFound: C:\...\MyService.dll", but instead all I get in the Cluster Manager "View Job" screen is:
The task is running on a node which is no longer usable by the task's job. This could happen because the nodegroups have been changed in the cluster, or because the node has been added to the job's node exclusion list.
Where could I expect to see more detail on the actual problem (ie. "You didn't install the service yet, dummy!")?
I _am_ going to get calls from my customers about nodes they haven't configured correctly.
所有回覆
-
2011年10月3日 14:18
You need to correctly deploy your service.
1. Put your "myservicename.dll" file on each compute node in cluster.
2. Put your service config file "myservicename.config" on each compute and broker node on folder c:\program files\microsoft hpc pack 2008 r2\serviceregistration\ and share it.
Don't forget to change .config file to set a path where .dll is placed.
Also check host files for "Private.*" entries.
I hope, this would help you
-
2011年10月17日 20:43Yes, _I_ understand all that, but if my customer has 1,000 nodes and one of them isn't configured correctly because one of their sysadmins didn't drink enough coffee before installing my software on 999 of those nodes, the diagnostics aren't helping him (and by extension, me, usually at 2am on the weekend) to figure out that he's missing some files.
-
2011年10月18日 9:44
If i understand you correctly, to diagnose SOA service your could run Diagnostic "SOA Service loading test' in HPC Manager and configure it to check your service by writing your service name in "Configure Test Parameters". So you would see an error on incorrect configured nodes.
Or if you would like to check some files existance, you could write simple command like "clusrun if not exist MyFilePath (echo Error)".
-
2011年10月19日 18:55
My customer's admin guy would love to be able to run that diagnotic, but he can't: http://social.microsoft.com/Forums/en/windowshpcdevs/thread/6f1384f9-fab0-4544-90ea-85a8ffb87331