locked
Job in "Validating" status for 1 hour? RRS feed

  • Question

  • Hi everyone,

    I'm working on a HPC project.

    Today my team tried to submit a job which contains 5000 tasks but found that it took a lot of time to validate.

    Could you provide some guidance on how to find the culprit>

    The Cluster has 10 nodes, 16 core + 32 GB RAM each


    Wednesday, February 20, 2019 3:40 AM

All replies

  • Hi Khoi-Thinh,

    Which version of HPC Pack are you using? Could you export the job xml and send it to hpcpack@microsoft.com for a further check.

    Regards,

    Yutong Sun

    Thursday, February 21, 2019 5:55 AM
  • Hi Yutong,

    I wrote the wrong number of tasks, it's 50000 not 5000.

    Also after checking HPC logs, my team found out that there was an internal error which is related to SQL server used to power HPC cluster.

    We're using HPC Pack 2016 SP 2 with SQL Server Express on Head Node.

    The error we got is: HPC Scheduler The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information

     

    I believe that there is no way we can't change the SQL query inside HPC cluster.

    Is there any way to address this issue?

    Friday, February 22, 2019 1:11 AM
  • Hi Khoi-Thinh,

    Right, SQL Server Express has limited capacity when executing the query. In this case, you may consider to use SQL Server Enterprise as a remote database for the HPC cluster. Please check this doc for database capacity planning for HPC Pack.

    Meanwhile, if possible you may try to use Parametric Sweep task instead of normal task to reduce the stress on SQL Server and see if it can work with Express. 

    Regards,

    Yutong Sun

    Friday, February 22, 2019 6:43 AM
  • Thank Yutong,

    I would check the remote DB option.

    Monday, February 25, 2019 12:52 AM