locked
WHS file-conflicts, disk failure & server hangs/reboots RRS feed

  • Question

  • Okay, I have been having hardware/software problems on/off the last week. I have removed two seagate 1.5TB drives during this time. One reported failure in seatools, and the other did not. However, the other one was knocking really loud, so obviously had some failure. One of the drives was reported as missing from the pool, but the other one never had a reported error (from WHS standpoint).

    All the while, I was getting file conflict errors. I would delete these files and go on. I did a chkdsk overnight run on the individual drives (using some script mentioned in these forums), and it indeed found a few errors (and supposedly fixed them) of file conflicts. Now, what makes me wonder is why am i getting these problems? What is the purpose of folder duplication, if I get these file conflicts, and non-reported drive failures??!! What is WHS doing for me? ANd just now, I get a backup service not running error, and corrupt backups. I attempt a repair, and the whs console crashes. All the while, all the drives are reported as healthy. What am I supposed to do? I went with WHS as I'm not a true techie, but i'm getting nothing as to what might be wrong. I've replaced motherboard as well. Do i need a fresh install of whs?
    • Edited by bondisdead Tuesday, May 26, 2009 9:01 PM
    Saturday, May 23, 2009 7:18 AM

Answers

  • Okay, I have been having hardware/software problems on/off the last week. I have removed two seagate 1.5TB drives during this time. One reported failure in seatools, and the other did not. However, the other one was knocking really loud, so obviously had some failure. One of the drives was reported as missing from the pool, but the other one never had a reported error (from WHS standpoint).

    All the while, I was getting file conflict errors. I would delete these files and go on. I did a chkdsk overnight run on the individual drives (using some script mentioned in these forums), and it indeed found a few errors (and supposedly fixed them) of file conflicts. Now, what makes me wonder is why am i getting these problems? What is the purpose of folder duplication, if I get these file conflicts, and non-reported drive failures??!!

    WHS doesn't report an error in the Console unless the nightly automatic chkdsk finds errors after 4 consecutive attempts.

    What is WHS doing for me? ANd just now, I get a backup service not running error, and corrupt backups. I attempt a repair, and the whs console crashes. All the while, all the drives are reported as healthy. What am I supposed to do? I went with WHS as I'm not a true techie, but i'm getting nothing as to what might be wrong. I've replaced motherboard as well. Do i need a fresh install of whs?
    How many hard drives do you have total in the server (and what size and type are they, including manufacturer)?  What is the wattage on your power supply?  Is your server in a reasonably cool environment (or at least well ventilated)?
    Saturday, May 23, 2009 3:28 PM
    Moderator
  • Check your memory (memtest86+ is okay, and free, but you should let it run for at least 8 hours), and your power supply.
    I'm not on the WHS team, I just post a lot. :)
    Tuesday, May 26, 2009 9:51 PM
    Moderator

All replies

  • Okay, I have been having hardware/software problems on/off the last week. I have removed two seagate 1.5TB drives during this time. One reported failure in seatools, and the other did not. However, the other one was knocking really loud, so obviously had some failure. One of the drives was reported as missing from the pool, but the other one never had a reported error (from WHS standpoint).

    All the while, I was getting file conflict errors. I would delete these files and go on. I did a chkdsk overnight run on the individual drives (using some script mentioned in these forums), and it indeed found a few errors (and supposedly fixed them) of file conflicts. Now, what makes me wonder is why am i getting these problems? What is the purpose of folder duplication, if I get these file conflicts, and non-reported drive failures??!!

    WHS doesn't report an error in the Console unless the nightly automatic chkdsk finds errors after 4 consecutive attempts.

    What is WHS doing for me? ANd just now, I get a backup service not running error, and corrupt backups. I attempt a repair, and the whs console crashes. All the while, all the drives are reported as healthy. What am I supposed to do? I went with WHS as I'm not a true techie, but i'm getting nothing as to what might be wrong. I've replaced motherboard as well. Do i need a fresh install of whs?
    How many hard drives do you have total in the server (and what size and type are they, including manufacturer)?  What is the wattage on your power supply?  Is your server in a reasonably cool environment (or at least well ventilated)?
    Saturday, May 23, 2009 3:28 PM
    Moderator
  • Thanks for your response and the info on chkdsk. It has not reported any errors at all. It once reported drive missing, which I believed was a bad motherboard, as the sata port would no longer work. I have replaced the MB with an identical one (ASUS P5Q-E). Does WHS automatically attempt to  perform repairs using chkdsk, or is that up to me?

    I have eight drives in my system and a 500W power-supply. It's an Antec P180 tower case. I've got alot of fans in this case, including two that bring fresh air in and blow them across the drives. Temps are in the 30's and 40's celcius. The hottest is the SYS drive, which is a Hitachi 1TB. I do wonder if it's causing problems, which is in turn corrupting the system? Here is my current setup: Five Seagate 1.5TB drives, one Hitachi 1TB, one Samsung 1TB green drive and a temporary WD 1TB USB drive. The USB drive is there because I had to remove two seagate 1.5TB. Total size is 9.55TB with 500GB free. All eight drives are healthy. Temps are currently 30's for all drives, except the Samsung, which is a cool 26C.

    If I run an individual chkdsk on the seven data drives (all but hitachi), they are clean. However, I notice alot of bad sectors on most of the Seagate drives. I guess those drives really are garbage.

    After I clean things up (remove file conflict files, run chkdsk with repair), the system will be clean for almost a day. But just last night (when I typed up the messaga), everything went red. It's been okay now since last night, but I'll see how long it lasts!

    Could my hitachi sys drive be failing?
    Saturday, May 23, 2009 4:18 PM
  • Thanks for your response and the info on chkdsk. It has not reported any errors at all.

    I'm guessing if you looked at the event logs on the server, you would find a few hard drive errors there (but not 4 consecutive days worth, which would then trigger the Console notification).

    It once reported drive missing, which I believed was a bad motherboard, as the sata port would no longer work. I have replaced the MB with an identical one (ASUS P5Q-E). Does WHS automatically attempt to  perform repairs using chkdsk, or is that up to me?

    The chkdsk it runs is read-only (which means you would need to run a full chkdsk /r to repair it, or just replace the drive).

    I have eight drives in my system and a 500W power-supply. It's an Antec P180 tower case. I've got alot of fans in this case, including two that bring fresh air in and blow them across the drives. Temps are in the 30's and 40's celcius. The hottest is the SYS drive, which is a Hitachi 1TB. I do wonder if it's causing problems, which is in turn corrupting the system?

    Doubtful (certainly not causing your other drives to fail).

    Here is my current setup: Five Seagate 1.5TB drives, one Hitachi 1TB, one Samsung 1TB green drive and a temporary WD 1TB USB drive. The USB drive is there because I had to remove two seagate 1.5TB. Total size is 9.55TB with 500GB free. All eight drives are healthy. Temps are currently 30's for all drives, except the Samsung, which is a cool 26C.

    If I run an individual chkdsk on the seven data drives (all but hitachi), they are clean. However, I notice alot of bad sectors on most of the Seagate drives. I guess those drives really are garbage.

    Are you aware of the Seagate 1.5 TB drive freezing issues when they first came out?  Any chance you have some of the bad ones?  (Do a Google search for Seagate 1.5TB freeze and you'll find more info on it.)  For what it's worth, I have 1 of those drives as well, but I made sure I had a more recent version before I started using it in my server.

    After I clean things up (remove file conflict files, run chkdsk with repair), the system will be clean for almost a day. But just last night (when I typed up the messaga), everything went red. It's been okay now since last night, but I'll see how long it lasts!

    Could my hitachi sys drive be failing?
    I guess it's possible (although not likely).  In any event, you could run chkdsk /r on both C and D partitions on your primary drive to verify that drive is healthy.
    Saturday, May 23, 2009 6:28 PM
    Moderator
  • Should I just run the chkdsk via remote desktop, then use explorer properties, and check drive? This will the schedule the checks on the next boot up. Is this the best way? Where do I find the log file? TIA.

    Oh, I am well aware of the seagate 7200.11 problems. I updated all the firmware back in december/january with the supposedly "good stuff". There must still be issues with these drives, as there should be so many bad sectors.

    Saturday, May 23, 2009 6:32 PM
  • Should I just run the chkdsk via remote desktop, then use explorer properties, and check drive? This will the schedule the checks on the next boot up. Is this the best way?

    Yes.

    Where do I find the log file?

    The event logs can be found by right-clicking My Computer, then select Manage.  You will see Event Viewer.  Those are the event logs.

    TIA.

    Oh, I am well aware of the seagate 7200.11 problems. I updated all the firmware back in december/january with the supposedly "good stuff". There must still be issues with these drives, as there should be so many bad sectors.

    Saturday, May 23, 2009 7:32 PM
    Moderator
  • I could not find those chkdsk logs in any of the Event Viewer logs. Nevertheless, I ran chkdsk /f on all drives. While this appears to have gotten rid of my file-conflict issues, my WHS is now locking up or shutting down by itself. It was locked up the other morning, and when I looked at log files, it had recovered from an unexpected shutdown on several occasions. There were messages about NTFS errors, and a corrupt drive. The volume number was listed, but I couldn't match it to a drive.

    I re-ran chkdsk in read-only mode, and noticed that 4/5 seagate 1.5TB drives had bad-sectors (the one with zero bad-sectors, has only been installed for 2-weeks). One of the drives had a ridiculous number of bad sectors. something like 98,000 kb, or .08% of the total disk space! However, when I ran the drive thru all of the Seagate Seatools tests, it paseed: short/long SMART tests and short/long generic tests.

    My question is whether this is an indicator of failure, and could it be leading to the WHS locking up and shutting down? Note above that I already replaced the motherboard.

    Tuesday, May 26, 2009 9:00 PM
  • Check your memory (memtest86+ is okay, and free, but you should let it run for at least 8 hours), and your power supply.
    I'm not on the WHS team, I just post a lot. :)
    Tuesday, May 26, 2009 9:51 PM
    Moderator
  • Check your memory (memtest86+ is okay, and free, but you should let it run for at least 8 hours), and your power supply.
    I'm not on the WHS team, I just post a lot. :)

    thanks for your response. I will try a memtest86+ run tonight. Power supply is 430 Watts, and using the extreme power-supply calculator lite v2.5 for my 8-drive system, under unrealistic 90% load condition, it recommends < 350 watts. I think I'm okay there, unless you are suggesting it might be faulty. I guess I could swap it out and see what happens.

    What do you think about the high number of bad-sectors in the one drive and the logs about ntfs errors?
    Tuesday, May 26, 2009 10:09 PM
  • I'm suggesting your power supply could be underpowered or faulty. Even though a calulator is saying your PS is adequate, you may still be putting too much stress on one rail (probably 5v), leading to power problems everywhere. And if your PS is putting out dirty enough power, you could be causing damage to other components as a result.

    As for the errors you're seeing, memory issues are a surprisingly common source of such problems, which is why I suggested a good memory test. 

    I'm not on the WHS team, I just post a lot. :)
    Tuesday, May 26, 2009 10:24 PM
    Moderator
  • I want a solid system, so will test ram and swap out power supply. it's an antec earthwatts 80% efficient supply, so not a cheapie.

    As a side note, none of the other drives have reported bad sectors, just the notorious seagate 1.5TB drives.

    Tuesday, May 26, 2009 10:34 PM
  • Start by testing memory thoroughly. Antec usually are pretty good power supplies, but it's always possible to get unlucky. You can try a different power supply, but first check the specs on your current supply. Make sure that you're not exceeding the capacity of any of the rails.
    I'm not on the WHS team, I just post a lot. :)
    Wednesday, May 27, 2009 3:12 PM
    Moderator
  • Thought I would update this thead with the stuff I did. I ran memtest86+ overnight, and did not find any problems. I replaced the drive with a ton of bad sectors with another one. If I would write zeros to the drive, and reformat, the number of bad sectors would go away. The drive passed all seagate short/long tests. I replaced my 480Watt Antec with a high-end 620Watt Corsair. Not sure it was an issue, but did it just in case. Had another drive in the meantime which started "knocking" pretty frequently. This has been an early sign of failure on these drives. It only seems to happen when the drive gets close to full. It will also pass all Seagate tests.

    Knock on wood, system has been stable for over a week now! :-)

    I think the thing to do when I get file conflict errors in the future, is to immediately run chkdsk, in fix-errors mode, on all the disks. In the past, it has identified sectors on the bad disk that have gone bad. Only problem is, I have had to run this chkdsk interactively as I need to see a log of the run. Or I need to get better at writing dos scripts, so that I can dump the results somewhere that I can find!

    Saturday, June 13, 2009 3:32 PM
  • I just had a drive fail in my WHS. I power the machine down, removed the bad drive, and powered back up. I was able to login via the console and see the status of all shared folders. Powered the machine back down.

    I purchased a new drive, put it in place of the previous drive which was slot two, and powered back on. The new drive light up purple, so I let it sit overnight to see if it would clear itself. Unfortunately it didn't. Now I can't log into the machine at all . Powered down, removed new drive, powered back up, and still can't log in.

    Any suggestions here would be greatly appreciated.  BTW, shared folders will show all the file directories, but you can't access them....weird.
    Thursday, June 28, 2012 2:42 PM