none
Too many jobs running at the same time in queued mode? RRS feed

  • Question

  • I am running in queued scheduling mode with graceful preemption and increase resources automatically, grow by pre-emption and decrease resources automatically. Backfilling disabled.

    Sometimes (but far from always) 10-20 similar jobs with the same priority but submitted at different times (all with more single core tasks than the number of cluster cores) are started and progress at the same time.

    I would expect one job to complete (or rather to not have more tasks in queued state) before resource allocation to the next started.

    Why do I see this resource allocation?

    Wednesday, October 28, 2015 3:33 PM

Answers

  • Hi Thomas,

      We've been reported with this issue but we've never been able to reproduce it. It would be good next time when you see this issue please share us the scheduler logs through hpcpack@microsoft.com (the latest two bin files under %CCP_DATA%LogFiles\Scheduler\HpcScheduler_0000*.bin). Meanwhile we will keep reproducing the issue at our side.


    Qiufang Shi

    • Marked as answer by Thomas Kofoed Thursday, October 29, 2015 9:00 AM
    Thursday, October 29, 2015 1:58 AM

All replies

  • Hi Thomas,

      We've been reported with this issue but we've never been able to reproduce it. It would be good next time when you see this issue please share us the scheduler logs through hpcpack@microsoft.com (the latest two bin files under %CCP_DATA%LogFiles\Scheduler\HpcScheduler_0000*.bin). Meanwhile we will keep reproducing the issue at our side.


    Qiufang Shi

    • Marked as answer by Thomas Kofoed Thursday, October 29, 2015 9:00 AM
    Thursday, October 29, 2015 1:58 AM
  • Many thanks. I have sent an email with the two latest bin files as the behavior persists.

    I am not sure if that is of any interest but yesterday we performed a number of reboots of compute servers as well as switching between offline and online state. Friday (6 days ago) we failed over the headnode.

    Thanks for your time and efforts,
    Thomas

    Thursday, October 29, 2015 9:04 AM