locked
SMB Shares stop working after a while, requiring reboot to fix RRS feed

  • Question

  • Hi All

    I have been having issues with my WHS 2011 box and was hoping on suggestions on what I can try, as it is causing me a lot of grief (or more correctly, annoying the wife and therefore requiring urgent attention!)

    Basically, after a random amount of time (varying from half a day to several days), my network shares stop working from any computer on the network.   If I try to access via a mapped drive, or \\server or otherwise, the client just sits there - no timeouts, just nothing happens.

    I can access the server fine during this time (via RDP or using Radmin which I use to remote control the box), including accessing the Sql Server instance I have installed on there - everything seems to work fine, except the Network shares.  Via radmin, if I run a \\server (on the server itself), I can view and access all the shares okay.

    I saw a few threads such as http://social.microsoft.com/forums/en-US/whssoftware/thread/9f4393e9-4289-40cd-aa00-6dcd0e2f29ea/  which had similar cirumstances, and although they are on WHS 1,  I thought I would see if copying a lot of data would cause the issue.   I copied around 80 gb across the network yesterday, and it had no problems at all.  It was still working this morning, but I got home from work tonight and the issue had occurred.

    I thought the issue may have been the network adapter locking up, so I setup a script that uses devcon.exe to disable and then re-enable the network card (if I instead tried to disable the network adapter via RDP or Radmin, I'd be kicked out and then not be able to access the machine to re-enable it!)     I tried running this via Radmin tonight, and it definately cycled the network adapter, but after it came back up, I still could not access the network shares.      So I perform a reboot, and all good.

    I had already installed the latest version of the drivers for the network card (Realtek onboard gigabit ethernet), so my next plan was to purchase a standalone ethernet card to see if I can rule out that.

    If anyone has any extra suggestions of what I can try, I'd really appreciate it!

    Thanks

    Kane

     

    Monday, September 5, 2011 11:15 AM

All replies

  • Can you ping your server once you lose access to the shares? Have you investigated your server status once your shares drop off the network? Anything in the event log?

    Also, have you tried reinstalling Windows Home Server and leaving off the line of business applications like SQL Server? (Though they really should work; your license prohibits installing them, but the code base is shared with SBS Essentials, which does permit LoBs.)


    I'm not on the WHS team, I just post a lot. :)
    Monday, September 5, 2011 5:18 PM
  • Can you ping your server once you lose access to the shares? Have you investigated your server status once your shares drop off the network? Anything in the event log?

    Also, have you tried reinstalling Windows Home Server and leaving off the line of business applications like SQL Server? (Though they really should work; your license prohibits installing them, but the code base is shared with SBS Essentials, which does permit LoBs.)


    I'm not on the WHS team, I just post a lot. :)


    Thanks for the reply Ken

    Yes, I can ping the server find once I lose the shares.  All other access works fine (sql server, IIS etc) - it's just the shares.

    I haven't tried reinstalling, and to be honest want to leave that as the very last resort - firstly because this issue has been happening since a fresh installation anyway, and secondly because of the inconvenience.     

    I have noticed one thing in the event log that could possibly be related, although I'm yet to be 100% sure that it does correlate with the time I get the issue.   I have a Rocketraid 2680 card in the box, and every now and then I get an error "The device, \Device\Scsi\rr26801, did not respond within the timeout period."    Last night after seeing this, I installed the latest driver for the card, and also flashed the Bios to the latest version.   So I'll play the waiting game again, and see if the issue pops up again.

    I agree the next step will have to be removing SQL Server to see if that is the culprit

    Thanks

    Kane

     

    Monday, September 5, 2011 11:38 PM
  • ... I have noticed one thing in the event log that could possibly be related, although I'm yet to be 100% sure that it does correlate with the time I get the issue.   I have a Rocketraid 2680 card in the box, and every now and then I get an error "The device, \Device\Scsi\rr26801, did not respond within the timeout period."    Last night after seeing this, I installed the latest driver for the card, and also flashed the Bios to the latest version.   So I'll play the waiting game again, and see if the issue pops up again.

    Are you using enterprise drives with your RAID HBA? Every manufacturer has lines of drives intended for use in a RAID array; they include some form of shortened error recovery so that they won't "drop" from an array unexpectedly. The event you mention suggests that one of your drives (at least) is a consumer drive, and it's occasionally going into a long error recovery cycle (which can be up to 2 minutes in length, depending on manufacturer and the issue the drive thinks it's detecting). That plays hob with RAID HBAs, which will usually drop a drive if it doesn't respond within a few seconds. Whatever the actual cause of the event you mention, it's certainly the source of your issue. When the RAID array stops responding, everything on the array goes away as far as Windows is concerned.

    SQL Server is unlikely to be the culprit. That said, if you need to run a LoB on your server, to remain in compliance with the license you should be running Windows Small Business Server Essentials 2011, not Windows Home Server 2011.


    I'm not on the WHS team, I just post a lot. :)
    Tuesday, September 6, 2011 2:14 AM
  • Are you using enterprise drives with your RAID HBA? Every manufacturer has lines of drives intended for use in a RAID array; they include some form of shortened error recovery so that they won't "drop" from an array unexpectedly. The event you mention suggests that one of your drives (at least) is a consumer drive, and it's occasionally going into a long error recovery cycle (which can be up to 2 minutes in length, depending on manufacturer and the issue the drive thinks it's detecting). That plays hob with RAID HBAs, which will usually drop a drive if it doesn't respond within a few seconds. Whatever the actual cause of the event you mention, it's certainly the source of your issue. When the RAID array stops responding, everything on the array goes away as far as Windows is concerned.

    Oh, perhaps I've erred in this situation - I did do a bunch of reading when setting it up, and I'm sure I saw that many people had used WD EARS drives in a Raid array with no problems. (I have four 2TB WD EARS drives (00MVWB0) in a Raid5 array) I wasn't necessarily interested in performance increases - I just wanted some redundancy.

    Also, I've loaded up the RR2680 console app, and there are no errors in the logs that suggest any failures.. There are still some errors in there from a couple months ago when I first setup the box (one of the brand new drives failed to load up, so I got it replaced), but nothing since then.

    That being said, when my network share issue pops up, I can still access the data on the Raid array no problems at all - do you think the issue you've mentioned could still apply? I am hoping the driver and bios update will solve that error, but perhaps I'm being a bit too optimistic!
    Tuesday, September 6, 2011 2:46 AM

  • Oh, perhaps I've erred in this situation - I did do a bunch of reading when setting it up, and I'm sure I saw that many people had used WD EARS drives in a Raid array with no problems. (I have four 2TB WD EARS drives (00MVWB0) in a Raid5 array) I wasn't necessarily interested in performance increases - I just wanted some redundancy.

    After doing some more reading, I'm wondering about the Advanced Drive Format (ADF) on these WD Drives - when setting up the whs box, I read that WHS 2011 supports ADF, so I didn't think I needed to either use the WD calibration tool, or put a jumper on the drives. 

    But thinking about it more - I don't think WHS 2011 actually sees the drives themselves - it just sees the Raid device.... So perhaps there could still be issues?

    That being said, I'm not getting any of the reported issues with very slow speeds that seem to be an indication of these problems.

    I only have around 1TB of data (and have a spare non-raid partition with more than 1TB spare space), so I could certainly move all data off and reconfigure things if need be.

    Would you definately recommend against using these EARS drive in a Raid5 array?      What "enterprise" drives would you recommend?    Not too keen on discarding the 5x2tb drives  (1 as hot spare), but if I had to, in order to fix this issue, I would!

    Thanks!

    Kane

    Tuesday, September 6, 2011 4:03 AM
  • Wait and see if the update to the RAID card helps. If there's an incompatibility between advanced format and your RAID HBA, though, you'll need to communicate with Highpoint to see how it can be resolved, if it can be resolved at all.

    As for enterprise drives, I don't generally spec hardware at that level. I'm more likely to say "X servers with Y sockets for quad CPU, ZZ GB RAM, SAN to support them, etc." Every major manufacturer has their line of enterprise drives, and they all deliver roughly comparable performance, so pick the one you like personally. But before I'd replace the drives, I'd ditch the RAID HBA.


    I'm not on the WHS team, I just post a lot. :)
    Tuesday, September 6, 2011 12:46 PM
  • Wait and see if the update to the RAID card helps. If there's an incompatibility between advanced format and your RAID HBA, though, you'll need to communicate with Highpoint to see how it can be resolved, if it can be resolved at all.

    As for enterprise drives, I don't generally spec hardware at that level. I'm more likely to say "X servers with Y sockets for quad CPU, ZZ GB RAM, SAN to support them, etc." Every major manufacturer has their line of enterprise drives, and they all deliver roughly comparable performance, so pick the one you like personally. But before I'd replace the drives, I'd ditch the RAID HBA.


    I'm not on the WHS team, I just post a lot. :)


    Thanks Ken - Appreciate your help.   And yes, hopefully the updates fix it (I'll definately be posting my results for any others that have the same sort of issue)

    As a general rule for WHS boxes, you would recommend to bypass the Raid HBA and run the drives straight off the motherboard SATA sockets?   What strategy would you use for redundancy - software Raid? 

    Cheers

    Kane

     

    Tuesday, September 6, 2011 12:51 PM
  • Redundancy is overrated. RAID is for high availability, not data protection. If you want data protection, back your server up and take the backup off-site.


    I'm not on the WHS team, I just post a lot. :)
    Tuesday, September 6, 2011 12:57 PM
  • Given I had just about gone 72 hours without an issue, I thought that the Raid Bios and driver updates had fixed the issue..   But, unfortunately, it looks like the issue has just re-occurred.    I was actually at home when it happened, went straight into Event Viewer, and the pesky error messages have popped up from the RR2680.

    So looks like it's definately the Raid setup causing the issue - not sure if it's the drives, a dodgy HBA or otherwise.

    Oh well, will now get rid of the HBA completely!

    Thanks for your help - much appreciated

    Kane

     

    ps - regarding redundancy - I do back up, both using a WHS Backup to 2 rotating usb hard drives, and also have a third usb hard drive that I mnaually take another copy of all the important stuff on to every now and then (photos, documents, home videos of kids etc)

    Thursday, September 8, 2011 6:52 AM