locked
HPC SDM Store Service will not start RRS feed

  • General discussion

  • Although related to my earlier thread I thought I should have this as a separate item.

    I have a configured HPC server 2012 set up with one head node and 4 compute nodes. I need common shared storage which I have on a Pwervault 3200. To get this CSV storage I have added the Failover Clustering feature in server 2012 and created a CSV (have not added a file server role as it does not seem to need it). This does seem to work exactly as I want it. Immediately after having installed the failover clustering and creating a CSV, the HPC Cluster Manager can be run up fine and diagnostics run fine on the compute nodes. But when I try to restart the 'HPC SDM Store' service or do a reboot which then tries to restart it, the service will not restart with :

    error 1064: An exceptioon occured in the service handling the control request.

    I would really like to find a way to understand why this is and fix it as once I do, the system does appear to work as I need it to. Although starting seems a problem, once it is going, the HPC Cluster operation seems unaffected by the presence of the 'failover cluster'.

    Can anyone help in diagnosing why this service won't start, it obviously has something to do with having created a 'failover cluster', maybe permissions have changed or something like that. If I 'destroy' the cluster all springs back to life.

    Thanks



    Thursday, May 30, 2013 11:31 AM

All replies

  • Have you already solved this, or moved on?

    It sounds like the SDM Store service is being confused by the availability of the Fail Over Clustering Service.  The presence of this service with the addition of it being configured is what allows you to install HPC with Fail Over Clustering which incidentally tells the SDM Store service to look for its files on the shared space in the Cluster.  Do note that HPC requires the existence of the File Server Role with Fail Over Clustering to function properly in a Fail Over Cluster.

    I'm curious what your solution was if you already found one, because even when setup properly following the Microsoft Guidelines for Fail Over Clustering with HPC 2012 you will still have times where you just plain cannot restart the SDM store and have to reboot the entire cluster as many times in that scenario failing over won't fix it either.

    Friday, January 31, 2014 2:24 PM
  • I think it should be a product issue which has been reporting back to product team.

    My current suggestion is:

    1. If the bench has only one head node (aka not high availability) (no matter how many compute nodes are), please do not create failover cluster on it. It's okay to install failover clustering feature, but do not create failover cluster on head node.

    2. If the bench is HA (high availability) bench (has >= 2 head nodes), it's by design to create failover cluster to hosting fileserver role.

    If you find other way to fix the issue, please let us know.

    Thanks for reporting the issue!


    BR, Yizhong


    • Edited by Yizhong Wu Tuesday, February 11, 2014 10:32 AM
    Tuesday, February 11, 2014 10:30 AM