BitShop, Inc.
Ashburn, VA
(703) 940-6703
Fax:
(703) 563-3826

EMail Us
 

 
Our Blogs
 
Loading BitShop News...
   
 
Morals
 

BitShop abides by: the Rotary International Four Way Test - Find out what that means..

   
 
NEXT STEP
 

To navigate our site click the menu at the top.

   
 
Non-BitShop Ads (Hold Control when you click)
 
Jan12

Written by:Steve Radich - Founder BitShop, Inc.
1/12/2010 3:59 AM 

This is what you REALLY don't want to see when replacing a failed drive in a RAIDZ (Raid5 like) array:

  pool: tank1
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: resilver in progress for 28h8m, 100.00% done, 0h0m to go
config: (note: 100% done doesn't mean done, I don't know who wrote the % done formula but it's useless).

        NAME             STATE     READ WRITE CKSUM
        tank1            DEGRADED     0     0 6.32K
          raidz1-0       DEGRADED     0     0 12.8K
            c7t8d0       DEGRADED     0     0     0  too many errors
            c7t4d0       DEGRADED     0     0     0  too many errors
            c7t10d0      FAULTED     27   563     0  too many errors
            c7t11d0      DEGRADED     0     0     0  too many errors
            c7t12d0      DEGRADED     0     0     0  too many errors
            replacing-5  DEGRADED     0     0     0
              c7t13d0    FAULTED      4 9.60K     0  too many errors
              c7t3d0     ONLINE       0     0     0  384G resilvered

errors: Permanent errors have been detected in the following files:

        /tank1/sqlbackups/...masked..._200911240001.BAK

Mathematically speaking tihs should be impossible, this was a simple drive replacement and resilver (rebuild the raid).

For those that say Raid-5 is sufficient parity and that recovery times aren't that bad here's a good example of why you need Raid-6 or other raid solutions.  This array had all drives working before c7t13d0 failed - That drive was replaced immediately.

I would say the enclosure or controller has failed, however sitting in front of the computer I hear clicking like drives are going bad.  Seems impossible.

A zpool clear tank1 c7t10d0 (and all others) resulted in the pool trying again to resilver - I reduced the queue depth to try to reduce the load on the disks during this recovery - unfortunately the time to work on this server with very little load on it is almost over.

I'll update with another blog post once resolved.

Tags:

Your name:
Your email:
(Optional) Email used only to show Gravatar.
Your website:
Title:
Comment:
Security Code
CAPTCHA image
Enter the code shown above in the box below
Add Comment  Cancel 
 
Please Share Your Comments With Us
 



Submit Comment
Excellent Info0.00%0
Great Info0.00%0
Useful Info0.00%0
Not so useful0.00%0
Confusing / Useless0.00%0

Number of Comments0,Average of Ratings
No comment.
 
Network Status
 

All servers Operational

   
 
Learn More!
 

Find more about our founder

Steve Radich:

LinkedIn Profile

Contact Us