locked
Server goes offline after an hour RRS feed

  • Question

  • After running more or less problem-free for well over a year, my WHS 2011 started, about a week ago, to malfunction. Symptoms: no backups for the last week. Launchpad reports server offline. No access via remote desktop. My router doesn't see the server-- it really is offline...

    If I do a hard reboot, the server comes back online. I can then try to do a manual backup, but after about an hour, the server goes offline again; back to square one.

    This may be of no consequence, but I observed that when I'm running the manual backup, the tray icon never turns blue like it should--it just stays yellow. Backup gets to about 10% and then fails.

    After one recent hard reboot, I looked in the error log of the server and found that it said that the server hadn't been shut down properly. So I did a software restart and tried again, but I got the same story.

    Thanks for any help.


    Saul

    Sunday, July 15, 2012 5:27 AM

Answers

  • Never mind. I misread what you typed. The UPS could potentially be a source of problems at some point (A n offline or line interactive UPS may "panic" when it switches to battery because the output voltage sags; this is why I use only expensive online UPSes), but isn't likely to be generating the error you mention.

    Take a look at this Microsoft knowledgebase article. I think the most likely scenario will be #3, followed by #1. Best of luck if it's hardware related; we will not be able to solve that for you.


    I'm not on the WHS team, I just post a lot. :)

    • Marked as answer by Sean Zhu - Monday, July 23, 2012 6:35 AM
    Tuesday, July 17, 2012 6:08 PM

All replies

  • Saul,

    I hope that others will chime in with suggestions about resolving this issue, but I just wanted to pick up one thing that you wrote:

     "I observed that when I'm running the manual backup, the tray icon never turns blue like it should--it just stays yellow"

    In WHS 2011, unlike WHS v1, the tray icon will never turn blue during a backup. Microsoft removed that feature. The fact that it's yellow does point to some sort of network issue...

    Sunday, July 15, 2012 4:17 PM
  • Thanks, Geoff, I must have been imagining things with that blue color. :-) Not imagining the server going offline, however...

    Saul

    Monday, July 16, 2012 6:37 AM
  • I remember having had an issue where a specific client would not be able to see my server and no backups would be taken. In my particular case this was a network issue that was resolved by rebooting the switch to which my client was connected.
    Monday, July 16, 2012 8:18 AM
  • Thanks, Sven. In this case, it applies to all the clients, and I tried rebooting the router to no avail...

    Saul

    Tuesday, July 17, 2012 3:47 AM
  • I see on this thread Thanks, Sven. http://social.microsoft.com/Forums/en-US/whsvailbeta/thread/352134db-6316-48d2-8367-1b5e3002b3c4 where IPv6 might be an issue. Any thoughts on that?

    Saul

    Tuesday, July 17, 2012 3:58 AM
  • Hi Saul,

    Regarding the IPv6 angle discussed in the post to which you referred: I can understand the theory, but the question I ask myself is if it applies to your situation. As you stated yourself, your system has been up and running without significant issues for well over a year. So why does this problem start occurring now? From what I understand in the other post, that issue applied to a fresh setup.

    If I were you, I would focus on any changes that were introduced in your network right before this problem started occurring:

    • Perhaps a new device was added somewhere, not directly related to your server, but somehow influencing it?
    • Perhaps new software was installed on your server (e.g. firewall, add-ins)?
    • Perhaps passwords were changed?
    • Also, check your event logs for any errors or warnings.

    In addition, it might be useful if you post a little more information about your setup and problem:

    • Do you have a single router, with all clients and server connected directly to the router?
    • Are there any additional switches?
    • Do you have any wireless connections?
    • Does your server have a fixed IP address (recommended), or is it assigned through DHCP?
    • Are you able to ping the server by name and by IP address before/after it goes offline?
    • When it goes offline, is it still running, and does the server have a valid IP address?
    • Are you able to ping from the server to another machine before/after it goes offline? 

    Regards,

    Sven

    Tuesday, July 17, 2012 8:00 AM
  • "server goes offline": anything in the event logs on the server?

    It sounds, though, like something has configured the server (or at least the server's NIC) to sleep after a period of inactivity.


    I'm not on the WHS team, I just post a lot. :)


    • Edited by Ken Warren Tuesday, July 17, 2012 1:57 PM
    Tuesday, July 17, 2012 1:56 PM
  • Thanks, Ken. Yes, the event logs seem to be jam-packed with events. In the Administrative Events Log I notice a gap from 7/7 (the day of the last successful backup) to 7/11, when I first noticed the problem and rebooted the server. Since then, there seems to be a pattern of lots of events, followed by a gap when the server is down, presumably. Here's an interesting one:

    Another one says Critical Level, Source: Kernel-Power. The last one before it went offline last night was Source: PrintService, Task Category Initializing. There are no printers attached to the server...

    (There doesn't seem to be any way to attach a complete log file to this posting, is that correct?)

    Is there another log that would be more informative?


    Saul

    Tuesday, July 17, 2012 2:34 PM
  • Thanks, Sven, I will post the info you suggest tonight.

    Saul


    • Edited by scandib Tuesday, July 17, 2012 2:36 PM typo
    Tuesday, July 17, 2012 2:36 PM
  • ... Critical Level, Source: Kernel-Power. ...

    Do you, by any chance, have a UPS connected to your server, complete with data cable to allow monitoring?

    I'm not on the WHS team, I just post a lot. :)

    Tuesday, July 17, 2012 3:09 PM
  • Yes, the server is plugged into a UPS, a Tripplite. No data cable that I know of, however. What would that consist of?


    Saul

    Tuesday, July 17, 2012 3:23 PM
  • Never mind. I misread what you typed. The UPS could potentially be a source of problems at some point (A n offline or line interactive UPS may "panic" when it switches to battery because the output voltage sags; this is why I use only expensive online UPSes), but isn't likely to be generating the error you mention.

    Take a look at this Microsoft knowledgebase article. I think the most likely scenario will be #3, followed by #1. Best of luck if it's hardware related; we will not be able to solve that for you.


    I'm not on the WHS team, I just post a lot. :)

    • Marked as answer by Sean Zhu - Monday, July 23, 2012 6:35 AM
    Tuesday, July 17, 2012 6:08 PM
  • Hi,

    Did you ever get this sorted? I am having the same problem with WHS2011 hard hanging after an hour. I get the error message referred in the knowledgebase article appearing in the log when this happens. I'm sure it is scenario 3, but no overheating and I can't think what other hardware problems could have caused this. My server is not connected to a UPS.

    Is there any significance in the 1 hour between restart and hanging? Any chance this could be a virus?

    Keith

    Wednesday, August 1, 2012 7:15 AM
  • Hi Keith--

    I'm still working on it--no solution yet. I took a break to deal with other issues that I thought might be affecting this one--specifically slow download speeds from Comcast. Got that taken care of, but there was no change to the WHS 2011 problem.

    However, I have done quite a bit of troubleshooting and I've had some interesting results:

    • I've checked my hardware, and it all appears to be okay--all of the hard disks show up as healthy in Disk Management; I also ran chkdsk on all of them.
    • I opened the case and reseated the memory.
    • I flashed the BIOS (I have a Zotac board). That produced the most interesting result--when I restarted after flashing the BIOS, for the first time the Launchpad reported that the server was online. (No little flashing flag in the LLH corner). It looked good--but then I got a blue screen.

    • I'm currently seeing what happens when I run in Safe Mode w Networking. Started it up that way this morning, and I'll check when I get home to see if it is still running.
    • Also, looking in one of the error logs, I noticed that there was a Backup error--supposedly, the server could not find a suitable disk on which to run a backup. This despite the fact that I have over 600 GB free on the specified backup drive (which is actually a spanned drive consisting of 2 2TB drives, making a total of 4GB).

    I suspect that my troubles relate to this spanned drive, although I'm not sure why that should be. After all, it seemed to function fine for a year. But the behavior now is reminiscent of the behavior I saw shortly after I built this server, when I tried to use the Drive Bender add-in. (see my post from then.) At that time, the server would also go offline after an hour or so. When I reformatted the drives with the spanned drive, that issue went away.


    Saul

    Wednesday, August 1, 2012 5:39 PM
  • And I can now report that when I got home, the server was displaying a blue screen. "A process or thread crucial to system operation has unexpectedly exited or been terminated"

    Suggestions for the next step in troubleshooting? Apparently there is a crash dump, although I'll have to locate it.

    Thursday, August 2, 2012 4:05 AM
  • You will need to look up the specific STOP code shown on the blue screen, and possible some of the additional codes as well) to determine exactly what failed, then spend some time with your search engine of choice to see what might be causing the problem. Memory (a bad stick) is always a popular choice. :)

    I'm not on the WHS team, I just post a lot. :)

    Thursday, August 2, 2012 3:22 PM
  • Interesting - thanks for the replies. I'll try reseating the memory too.

    My h/w is an HP microserver with the os on a 64gb SSD and 4x1GB WD greens running as a s/w raid5 array. I wonder if the raid5 is causing an issue.

    Also interesting about Drivebender - I tried installing that I while ago but decided raid5 was better for me. I don't think I ever uninstalled it so will look into that as well.

    Keith

    Thursday, August 2, 2012 5:29 PM
  • Given that you are running raid 5 and the WD green drives are not recommended in a raid environment could that be the root cause of your problems? My understanding is that the raid can get confused by the timing problems when waiting for the green drives to startup.

    I like and use the WD green drives but do not use raid. My duplicator of choice is Stablebit drive pool.

    Dave


    The Frog on the Lilypad at Home


    • Edited by frogz1 Thursday, August 2, 2012 8:16 PM typo
    Thursday, August 2, 2012 8:15 PM
  • Thanks, Ken, that sounds like my next step.

    Saul

    Friday, August 3, 2012 5:57 PM
  • I'm not using RAID, but I also have the OS on a 64GB SSD. I wonder if that could be a factor?

    Saul

    Friday, August 3, 2012 5:58 PM
  • Restarted it up today and it crashed within an hour. No blue screen, but the last event recorded on the System Windows Log was an error, which read:

    "Event 36, volsnap: The shadow copies of volume D: were aborted because the shadow copy storage could not grow due to a user imposed limit."

    Any idea what that means?


    Saul

    Saturday, August 4, 2012 1:13 AM
  • And D: is my spanned disk where client computer backups are supposed to go. Dashboard says it has 626.5 GB of free space...

    Saul

    Saturday, August 4, 2012 1:16 AM
  • Hmm,

    Sounds like my first step is to back-out the RAID 5 and take the drives back to standard. Not all bad as the array is pretty slow. Funny that it could suddenly cause an issue after 6 months running OK - could be something to do with capacity though.

    If that doesn't work I'll switch out the SSD for a standard HDD and see what happens.

    Saul - what SSD is yours - mines a Crucial M4?

    Will report back when I get this done.

    Cheers for the help and advice.

    Keith

    Monday, August 6, 2012 11:31 AM
  • Yep, I have the same SSD. Would be a shame if that was the problem.

    I ran a memory test (windiag) on my memory and it passed w no errors but I'm going to try substituting different memory anyway.

    If that doesn't change anything, the only other hardware items are the drives, the CPU, and the MOBO, so if it's a hardware problem (as it appears to be) it has to be one of those.


    Saul

    Tuesday, August 7, 2012 2:09 AM
  • Hi,

    I'm thinking it might be the SSD - esp given that's the common factor between our 2 systems.

    I've done a bit of reading and it seems there have been some problems with the M4 with random freezes which often re-occur after a short period. I'm thinking if the OS has frozen then of course the server would stop responding to all requests until rebooted, exactly what I'm experiencing.

    Some interesting threads here

    http://forum.crucial.com/t5/Solid-State-Drives-SSD/Freezing-of-the-OS-on-the-Crucial-M4/td-p/76674

    and here

    http://forum.crucial.com/t5/Solid-State-Drives-SSD/Crucial-M4-128-GB-Random-Freezes/td-p/95787

    Am going to try swapping mine out for the 250Gb standard HDD that came with my server, reinstall the OS and see what happens.

    Keith

    Wednesday, August 8, 2012 11:54 AM
  • And this one...

    http://forum.crucial.com/t5/Solid-State-Drives-SSD/SSD-constantly-freeze-after-5000-hours-can-t-update-firmware/td-p/104754

    My drive was probably somewhere neer the 5000 hours mark when this started happening as its in a 24x7 server.

    I'm now going to try updating the M4 firmware before swapping out the OS drive. Fingers crossed!

    Keith

    Wednesday, August 8, 2012 12:13 PM
  • Sorry for the multiple posts - keep finding new stuff. Think this thread says it all...

    http://forum.crucial.com/t5/Solid-State-Drives-SSD/BSOD-Crucial-M4/td-p/79098

    Wednesday, August 8, 2012 12:18 PM
  • Wow, this really sounds like the problem. Thanks for digging this stuff up! And I haven't even gotten to the other threads yet. :-)

    I wonder if it's just the Crucial SSD or if it's a risk with any of them?


    Saul

    Thursday, August 9, 2012 5:33 AM
  • Same here, right around 5000 hours, in that ballpark at any rate.. Let me know how the firmware update goes.

    Saul

    Thursday, August 9, 2012 5:35 AM
  • So, managed to install the new firmware (000F) using the windows installer. Not the best installer in the world but did the job.

    Device manager verified the install and server has now been running for 3 hours....

    I think it was a firmware bug on Crucial SSDs only - guess they're still gitting to grips with these things!

    Hope it works for you - thanks for your original post, I'd never have found the problem otherwise!

    Keith

    Thursday, August 9, 2012 1:33 PM
  • Great news! I will give it a try tonight. I haven't looked the instructions yet--do you install from a file on the SSD itself, or from a DVD, thumbdrive, etc.?

    Saul

    Thursday, August 9, 2012 2:35 PM
  • Well, I went ahead and ran the update this morning. So far, looks good. Cautiously optimistic. When I left for work it had been running for about two hours and had nearly completed a server backup. Tonight I'll try client computer backups, assuming it's still running.

    One odd thing I observe, and this was happening even before the update--when I'm viewing the Computers and Backup tab of the Dashboard, and when a backup is in progress, the entire dashboard screen "flashes"-- kind of a hard refresh, every 10 seconds or so. I don't remember if it's always done that, or if it's something relatively new. Doesn't seem to happen on the other tabs of the dashboard. Every see anything like that? If everything else is okay, crash-wise, maybe I'll start another thread about the flashing issue.


    Saul

    Thursday, August 9, 2012 6:11 PM
  • It worked--no more crashes! But now I have another problem. The server runs, but it's offline. I can't connect via the Launchpad/Dashboard from client computers, or by remote access, and when I open the Dashboard directly by hooking a monitor up to the server, all the clients appear offline. My router has assigned the server a different IP address from what it had before, but I don't see why that should matter.

    I'll do a little more investigating and then maybe open a new thread...if it's not one thing it's another.


    Saul

    Friday, August 10, 2012 4:38 PM