Those drives appear to have had several errors in the past and should not be trusted. In the past those errors were from heat issues which could have causes permanent damage.
is there a more intelligent way to test the cable? and would those sata cables go bad after a time, i've not had reported crc errors for months. and is the serial number reported in the smartctl output the same as the manufacturer's serial number?
thank you for those answers. last question: when i see crc errors, does that explicitly mean that there is data corruption or that it was noticed and repaired?
If zfs does not tell you explicitly that you have data corruption in files or metadata you probably don't. May be worth doing a scrub to correct any correctable errors and make sure there no errors in data that hasn't been accessed recently.
thank you for those answers. last question: when i see crc errors, does that explicitly mean that there is data corruption or that it was noticed and repaired?
so if i am running raidz2 and all the other drives come back fine after a long smart test, would it be that bad to run it until it actually does fail? it is not under warranty and i don't have the money to replace it until the end of the month.
Since your drives have been running hot (discovered just a day or two ago) you would be risking
your entire pool of data. If you continue to use the server and this one disk fails, AND before the
end of the month another drive craps out, then you now have no redundancy left and the next drive
to fail takes your data with it when it dies.
If the drive cannot be replaced until funds become availible, shut down your server and wait it out.
good news! i got the temps down (ranging from 26-31 now). but after all that, i had a drive fail while setting it up. interestingly enough, it wasnt the one that we have been talking about this whole time. so i broke down and got another drive, but have no idea how to replace it. when i run zpool status -v, i get:
Code:
[root@freenas] ~# zpool status -v
pool: volume
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://illumos.org/msg/ZFS-8000-2Q
scan: resilvered 3.43M in 0h0m with 0 errors on Tue Apr 14 18:48:23 2015
config:
NAME STATE READ WRITE CKSUM
volume DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
gptid/92e122fb-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
gptid/8e751139-6874-11e4-a79e-c86000cb131c ONLINE 0 0 0
gptid/93c736f3-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
gptid/94203f52-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
gptid/947317c4-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
gptid/94d99f4a-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
15112422042946592908 UNAVAIL 0 0 0 was /dev/gptid/95402b37-5a13-11e4-919c-c86000cb131c
4162859240354443090 UNAVAIL 0 0 0 was /dev/gptid/95c39268-5a13-11e4-919c-c86000cb131c
gptid/962ade58-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
but when i view disks, i see 8 of the 9 online.
which one do i replace? and why does the one screen show 7 drives healthy, and another show 8?
drive replaced and all is healthy. new last question: when i get emails like this:
Code:
pool: volume
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 40K in 20h1m with 0 errors on Mon Apr 20 11:22:46 2015
config:
NAME STATE READ WRITE CKSUM
volume ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/92e122fb-5a13-11e4-919c-
c86000cb131c ONLINE 0 0 0
gptid/8e751139-6874-11e4-a79e-c86000cb131c ONLINE 0 0 0
gptid/93c736f3-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
gptid/94203f52-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
gptid/947317c4-5a13-11e4-919c-c86000cb131c ONLINE 0 0 2
gptid/94d99f4a-5a13-11e4-919c-c86000cb131c ONLINE 0 0 2
gptid/95402b37-5a13-11e4-919c-c86000cb131c ONLINE 0 0 2
gptid/92f858cb-e622-11e4-9d41-90e2ba66cab8 ONLINE 0 0 0
gptid/962ade58-5a13-11e4-919c-c86000cb131c ONLINE 0 0 0
errors: No known data errors
according to the link, parity has fixed the corruption and my data is all well and good. but, it also says that if i continue getting these, it could be a drive failing. if i don't clear it, i get the email every day. is it the same message? is it saying that i should clear them and if i continue getting them after clearing them then i may have a problem, or should this be going away all by itself eventually (when the next scrub takes place)?
You have 3 disks with 3 checksum errors each. Usually that's a sign of a failing disk, but SMART can tell you more.
In the meantime I'd log those 3 gpt-ids, figure out what disks those are, and figure it if they have a problem, share a common controller or cable, etc.
Then, do a "zpool clear" and do another scrub and see if any errors develop. If they come back, you've got something to be concerned about.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.