locked
WHS Console slow - when will this be fixed? RRS feed

  • Question

  • Does anyone know when MS is going to acknowledge and then fix the WHS Console Slow problem? I can search these forums and the internet and know there is a general problem which needs to be addressed. Far too common a problem - and no real acknowledgement of it or a fix. All the "solves" I have seen are related to removing workload - none address the issue. 

    Some background: I have had WHS for several years; first in a home built server vastly over sized for the problem - Supermicro/Dual Xeons, 2GB memory, you get the picture). And for the last year, in an HP 495EX - also overbuilt for the task.  Both systems used multiple drives, (6) 500GB drives in the first, (4) 1TB drives in the second - all matching generation drives. I have 1.5m+ files, and several client systems backed up.  GB ethernet, high end switches and routers - I do not have a hardware problem. 

    Configuration is 10+ shares in duplication, except the share for video. I do not have any plug-ins installed - gave up on the mcafee stuff because it's mcafee, and its obvious per file overhead. Ended up leaving the firefly and similar services to stick with a standard install - when ever I try to remove them, behavior becomes even less predictable - and there is no way to re-install them without installing the entire HP stack. 

    Both systems had/have problems with extraordinary slow console response - the distribution being 25% of the time being purely unresponsive - console would not come up in an hour plus, and if it did - it would never return after the first click (left it open for 24 hours). 75% of the time, console response times are 5-20 minutes to come up, and 2-10 minutes per function click.  I never know when it will be responsive and when it will not be. As I write this - I am delightfully surprised to note that the console is working for the first time in several weeks. 

    I have been in touch with HP, they're only recommendation was to re-install the WHS software stack - which I have done several times - No help. I have moved the priorities of all of the CPU and IO consuming processes around - no help. Turned off the search indexing - no help. I have stripped down everything I can think of - no help.

    There is only one thing which seems to change the responsiveness. When the demigrator.exe is NOT running - the console system is usable. When it IS running - just forget it. The bad news: if you kill the demigrator process, it is restarted by some other process. You cannot pause it, you cannot change it's behavior. If you push down it's priority - it does not change behavior. I could go chase the event path through the queues and various processes - but that is a lot of work - which I should not have to do. 

    The conclusion I've come to is that this product does not scale up. I'm betting that because service response time to fileIO requests is NOT effected the same way, there is some serialized function which is in the path of the console. To get to the basics: If I cannot administer the console, the product is kind of useless - particularly since server backups have to be instantiated via the console. Great product idea - missed the mark a bit on a key usability function. 

    Point is - from a macro perspective - there is an obvious architectural or technical problem with the console. When is Microsoft going to address this problem? If they have - can someone post a link to the problem and fix? At the very least - give us a functioning way to "pause" the problem process.  

     

    Sunday, September 5, 2010 4:56 PM

All replies

  • Probably never, because probably it actually is workload related. Here's how:

    Drive Extender has to manage 2 items minimum for every file on your server, and 3 for files in shares that are duplicated. In your case, probably 95% of your files are duplicated (skipping the videos folder isn't likely to cut the total much). Going through two copies of 1.5M files making sure that each file that needs changes copied to the second shadow is just going to take some time. To get through your entire collection in an hour (DE runs once an hour) means processing over 400 files per second just to determine if any changes have occurred, etc. That's a lot of I/O, and it's a good bit of CPU, as well. It's entirely possible that Drive Extender is running almost constantly; it's even possible that a DE pass could take over an hour.

    Combine that with CPU and memory hungry add-ons and you'll wind up with exactly what you describe: the console (or other interactive tasks) will be sluggish because of high priority background tasks and low resources (in this case, a relatively slow CPU and probably low memory as well).


    I'm not on the WHS team, I just post a lot. :)
    Sunday, September 5, 2010 6:15 PM
    Moderator
  • Ken, thank you for the information.

    It confirms my assertion this is an architectural issue. Why the WHS team didn't simply extend the file system was something I didn't understand (I also know that most people do things for very good reasons - so I am not throwing rocks here). No file system platform should have to spin through each and every file system object to establish the working set of maintenance operations. Algorithms for delta processing are legion, it just means you have to instrument the work producing functions to use a work queue. The team went to the trouble to use Message Queuing - and then didn't use it to track the set of asynchronous operations? 

    Just for the record, I'm not using any add-ons. They're all turned off. 

    My questions still stand: 

    1. When is Microsoft going to fix this? 
    2. If it is "working as designed" (ergo - not fixing this), when is Microsoft going to specify practical limits to the use of this platform? 
    3. If "working is designed", is Microsoft going to improve the scalability of the product? 
    4. In the short term - can someone provide the user community with a tool to pause the demigrator so we can get administrative functions done? At the very least give us the information we need to do it ourselves with a command line script. 
    I guess the last point is - where is the appropriate place to post these questions?  

    Sunday, September 5, 2010 7:34 PM
  • Ken, thank you for the information.

    It confirms my assertion this is an architectural issue. Why the WHS team didn't simply extend the file system was something I didn't understand (I also know that most people do things for very good reasons - so I am not throwing rocks here). No file system platform should have to spin through each and every file system object to establish the working set of maintenance operations. Algorithms for delta processing are legion, it just means you have to instrument the work producing functions to use a work queue. The team went to the trouble to use Message Queuing - and then didn't use it to track the set of asynchronous operations? 

    Just for the record, I'm not using any add-ons. They're all turned off. 

    My questions still stand: 

    1.  When is Microsoft going to fix this?

    Ken gave you the best answer he could, which is "probably never".  The product is now 3 years old and the new version of WHS (codenamed Vail) is now in beta testing.  FWIW, DE in the new verison is significantly different and probably wouldn't produce the same issues you're seeing now.  (Vail is block-based and DE runs in real-time whereas v1 is file-based and DE does its checks hourly.) 
    2.  If it is "working as designed" (ergo - not fixing this), when is Microsoft going to specify practical limits to the use of this platform?

    I don't see them ever coming out and saying a finite limit to the number of files that can be stored on WHS. 

    3.  If "working is designed", is Microsoft going to improve the scalability of the product?

    Chances are the answer to your first question is the same answer here. 

    4.  In the short term - can someone provide the user community with a tool to pause the demigrator so we can get administrative functions done? At the very least give us the information we need to do it ourselves with a command line script.

    MS is not going to write a tool to effectively "break" their own product.  At best, perhaps some individual could, but even then, it would cause issues with duplication and unless a user knew exactly how to use it and understood its effects, they would be foolish to use it at all.

    I guess the last point is - where is the appropriate place to post these questions?
    The best way to get in contact with the WHS team directly is to file a bug report/product suggestion on Connect.
    Sunday, September 5, 2010 8:08 PM
    Moderator
  • Awesome. Thank you!
    Sunday, September 5, 2010 8:53 PM
  • 4.  In the short term - can someone provide the user community with a tool to pause the demigrator so we can get administrative functions done? At the very least give us the information we need to do it ourselves with a command line script.

    MS is not going to write a tool to effectively "break" their own product.  At best, perhaps some individual could, but even then, it would cause issues with duplication and unless a user knew exactly how to use it and understood its effects, they would be foolish to use it at all.

    Why not integrate such functionality directly into the console, during the init of the console it would pause the demigrator and allow it to load in a timely fashion. It could then either resume the migrator or keep it paused when the console is closed. Perhaps have the console default to closing itself after 15 minutes of inactivity.
    Thursday, January 20, 2011 7:07 AM