Hello,
I finally set up my FreeNAS but I ran straight into a difficult hardware error. It started when one WD Red fell flat during writing to the pool. FreeBSD put it offline so I removed it while the server was still running. I checked its SMART values on another computer and they PASSED. Then I reconnected it to the server using a different SATA cable, because I suspected a bad cable, but poof! all WD Reds were sent to sleep as soon as I connected the SATA cable. All drives are brand new. I haven't checked their SMART values yet, but I bet they're all fine.
My hardware is as the following:
Supermicro X10SLH-F (C226, Haswell)
Intel Xeon E3 1230v3
bequiet! Straight Power E9 580W
4x Western Digital Red 3TB WD30EFRX
1x Seagate Barracuda 3TB ST3000DM001
1x Western Digital Black 1TB WD1001FALS (it's going to be replaced with a 3TB drive)
The cables I used where the ones that came with the Supermicro board.
The drives are connected to the 6 SATA ports in that order as seen in the list. It was the second drive which fell out of the RAID at first. I checked the message log and this is the error:
I have no idea what those errors mean. I don't think it is reallocating sectors, because those didn't show in the smart log.
Actually I suspect the WD Black to be the culprit... I was hesitant to use this drive because it gave me some trouble on my desktop before. When this drive was connected to the onboard controller, hotswapping of any drive resulted in drive resets, it didn't even have to be the WD Black being hotswapped. Just like the thing that happened when I reconnected drive #2 in my server. The WD Black didn't even reset, it was all the WD Reds that went offline. It was a daring move to use the WD Black to begin with but I hoped for the problem not to occur with the C226 chipset. Yet this is just a wild guess. The reason for these errors might be something different completely. Maybe the SATA cables Supermicro gave me aren't so super after all.
Well, this is an ongoing "investigation", after all I want to get this NAS running. But if anyone has some ideas in the meantime, I would appreciate it. Maybe somone had some trouble with the WD Black WD1001FALS? Or the WD Red WD30EFRX? Or the Supermicro X10SLH-F? Or even the bequiet! Straight Power E9 580W? Every single bit except the WD Black is brand-new, and even that drive gave me some trouble. So actually it could be anything. Maybe even that flimsy Fractal Design case I shouldn't have bought.
Anyway here's a smartctl output of the first drive that went offline, drive #2. Yeah I tried short self-tests, but smartctl left me with a prompt and I wondered how the heck is it going to notify me that the self-test finished, so I aborted it. The second time it seems to have been aborted by issuing the smartctl -a command. Weird stuff.
PS: Oh, btw. The pool is still saveable, right? I mean 4 drives were "destroyed", but the data on them should be fine. I mean it would be quite premature for ZFS to give up on a pool just because the drives dropped out and nothing else. The actual data on the drives is most probably fine, the drives are most probably fine (by themselves but not all together).
I finally set up my FreeNAS but I ran straight into a difficult hardware error. It started when one WD Red fell flat during writing to the pool. FreeBSD put it offline so I removed it while the server was still running. I checked its SMART values on another computer and they PASSED. Then I reconnected it to the server using a different SATA cable, because I suspected a bad cable, but poof! all WD Reds were sent to sleep as soon as I connected the SATA cable. All drives are brand new. I haven't checked their SMART values yet, but I bet they're all fine.
My hardware is as the following:
Supermicro X10SLH-F (C226, Haswell)
Intel Xeon E3 1230v3
bequiet! Straight Power E9 580W
4x Western Digital Red 3TB WD30EFRX
1x Seagate Barracuda 3TB ST3000DM001
1x Western Digital Black 1TB WD1001FALS (it's going to be replaced with a 3TB drive)
The cables I used where the ones that came with the Supermicro board.
The drives are connected to the 6 SATA ports in that order as seen in the list. It was the second drive which fell out of the RAID at first. I checked the message log and this is the error:
Code:
Feb 13 23:50:43 freenas kernel: ahcich1: Timeout on slot 3 port 0 Feb 13 23:50:43 freenas kernel: ahcich1: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd c0 serr 00000000 cmd 0004c317 Feb 13 23:50:43 freenas kernel: (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Feb 13 23:50:43 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout Feb 13 23:50:43 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:51:14 freenas kernel: ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080) Feb 13 23:51:44 freenas kernel: ahcich1: Timeout on slot 3 port 0 Feb 13 23:51:44 freenas kernel: ahcich1: is 00000000 cs 00000000 ss 00000000 rs 00000008 tfd 150 serr 00000000 cmd 0004c317 Feb 13 23:51:44 freenas kernel: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Feb 13 23:51:44 freenas kernel: (aprobe0:ahcich1:0:0:0): CAM status: Command timeout Feb 13 23:51:44 freenas kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retry was blocked Feb 13 23:54:47 freenas kernel: ahcich1: Timeout on slot 29 port 0 Feb 13 23:54:47 freenas kernel: ahcich1: is 00000000 cs 20000000 ss 00000000 rs 20000000 tfd c0 serr 00000000 cmd 0004dd17 Feb 13 23:54:47 freenas kernel: (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Feb 13 23:54:47 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout Feb 13 23:54:47 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:55:19 freenas kernel: ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080) Feb 13 23:55:49 freenas kernel: ahcich1: Timeout on slot 29 port 0 Feb 13 23:55:49 freenas kernel: ahcich1: is 00000000 cs 00000000 ss 00000000 rs 20000000 tfd 150 serr 00000000 cmd 0004dd17 Feb 13 23:55:49 freenas kernel: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Feb 13 23:55:49 freenas kernel: (aprobe0:ahcich1:0:0:0): CAM status: Command timeout Feb 13 23:55:49 freenas kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retry was blocked Feb 13 23:58:34 freenas kernel: ahcich1: Timeout on slot 13 port 0 Feb 13 23:58:34 freenas kernel: ahcich1: is 00000000 cs 00002000 ss 00000000 rs 00002000 tfd c0 serr 00000000 cmd 0004cd17 Feb 13 23:58:34 freenas kernel: (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Feb 13 23:58:34 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout Feb 13 23:58:34 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:06 freenas kernel: ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080) Feb 13 23:59:36 freenas kernel: ahcich1: Timeout on slot 13 port 0 Feb 13 23:59:36 freenas kernel: ahcich1: is 00000000 cs 00000000 ss 00000000 rs 00002000 tfd 150 serr 00000000 cmd 0004cd17 Feb 13 23:59:36 freenas kernel: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Feb 13 23:59:36 freenas kernel: (aprobe0:ahcich1:0:0:0): CAM status: Command timeout Feb 13 23:59:36 freenas kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retry was blocked Feb 13 23:59:45 freenas kernel: ahcich1: Timeout on slot 0 port 0 Feb 13 23:59:45 freenas kernel: ahcich1: is 00000000 cs 00000001 ss 00000000 rs 00000001 tfd 10c1 serr 00000000 cmd 0004c017 Feb 13 23:59:45 freenas kernel: ahcich1: Error while READ LOG EXT Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 10 53 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 50 50 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 90 58 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 90 55 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 50 51 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 10 54 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 50 57 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 d0 55 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 90 50 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 d0 52 6f 40 0c 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 13 23:59:45 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 13 23:59:57 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 90 bb 6f 40 0c 00 00 00 00 00 Feb 13 23:59:57 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 13 23:59:57 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 41 (DRDY ERR), error: 10 (IDNF ) Feb 13 23:59:57 freenas kernel: (ada1:ahcich1:0:0:0): RES: 41 10 90 bb 6f 40 0c 00 00 00 00 Feb 13 23:59:57 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: ahcich1: Timeout on slot 28 port 0 Feb 14 00:00:07 freenas kernel: ahcich1: is 00000000 cs 10000000 ss 00000000 rs 10000000 tfd 10c1 serr 00000000 cmd 0004dc17 Feb 14 00:00:07 freenas kernel: ahcich1: Error while READ LOG EXT Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 10 1a 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 50 1b 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 90 19 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 50 20 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 50 1d 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 d0 1d 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 10 1c 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 d0 1e 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 90 21 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 50 1f 70 40 0c 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:00:07 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:02:29 freenas kernel: ahcich1: Timeout on slot 21 port 0 Feb 14 00:02:29 freenas kernel: ahcich1: is 00000000 cs 00200000 ss 00000000 rs 00200000 tfd 10c1 serr 00000000 cmd 0004d517 Feb 14 00:02:29 freenas kernel: ahcich1: Error while READ LOG EXT Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 40 50 1c c2 40 0c 00 00 00 00 00 Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 a0 1c c2 40 0c 00 00 00 00 00 Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): ATA status: 00 () Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): RES: 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:02:29 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:04:02 freenas kernel: ahcich1: Timeout on slot 3 port 0 Feb 14 00:04:02 freenas kernel: ahcich1: is 00000000 cs 00000008 ss 00000000 rs 00000008 tfd c0 serr 00000000 cmd 0004c317 Feb 14 00:04:02 freenas kernel: (ada1:ahcich1:0:0:0): FLUSHCACHE48. ACB: ea 00 00 00 00 40 00 00 00 00 00 00 Feb 14 00:04:02 freenas kernel: (ada1:ahcich1:0:0:0): CAM status: Command timeout Feb 14 00:04:02 freenas kernel: (ada1:ahcich1:0:0:0): Retrying command Feb 14 00:04:34 freenas kernel: ahcich1: AHCI reset: device not ready after 31000ms (tfd = 00000080) Feb 14 00:04:58 freenas kernel: (aprobe0:ahcich1:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 00 00 Feb 14 00:04:58 freenas kernel: (aprobe0:ahcich1:0:0:0): CAM status: Unconditionally Re-queue Request Feb 14 00:04:58 freenas kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retry was blocked Feb 14 00:04:58 freenas kernel: ada1 at ahcich1 bus 0 scbus1 target 0 lun 0 Feb 14 00:04:58 freenas kernel: ada1: <WDC WD30EFRX-68EUZN0 80.00A80> s/n WD-WCC4N0581357 detached Feb 14 00:04:58 freenas kernel: GEOM_ELI: Crypto WRITE request failed (error=6). gptid/7978e1a0-94e3-11e3-9d3d-002590dc48b0.eli[WRITE(offset=103467245568, length=4096)] Feb 14 00:04:58 freenas kernel: GEOM_ELI: g_eli_read_done() failed gptid/7978e1a0-94e3-11e3-9d3d-002590dc48b0.eli[READ(offset=99794751488, length=12288)] Feb 14 00:04:58 freenas kernel: GEOM_ELI: g_eli_read_done() failed gptid/7978e1a0-94e3-11e3-9d3d-002590dc48b0.eli[READ(offset=270336, length=8192)] Feb 14 00:04:58 freenas kernel: GEOM_ELI: g_eli_read_done() failed gptid/7978e1a0-94e3-11e3-9d3d-002590dc48b0.eli[READ(offset=2998444761088, length=8192)] Feb 14 00:04:58 freenas kernel: GEOM_ELI: g_eli_read_done() failed gptid/7978e1a0-94e3-11e3-9d3d-002590dc48b0.eli[READ(offset=2998445023232, length=8192)] Feb 14 00:04:58 freenas kernel: GEOM_ELI: Crypto WRITE request failed (error=6). gptid/7978e1a0-94e3-11e3-9d3d-002590dc48b0.eli[WRITE(offset=286384291840, length=4096)] Feb 14 00:04:58 freenas kernel: GEOM_ELI: Device gptid/7978e1a0-94e3-11e3-9d3d-002590dc48b0.eli destroyed. Feb 14 00:04:58 freenas kernel: GEOM_ELI: Detached gptid/7978e1a0-94e3-11e3-9d3d-002590dc48b0.eli on last close. Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00 Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): CAM status: ATA Status Error Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): ATA status: d1 (BSY DRDY SERV ERR), error: 04 (ABRT ) Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): RES: d1 04 ff ff ff ff ff ff ff ff ff Feb 14 00:05:03 freenas kernel: (aprobe0:ahcich1:0:0:0): Error 5, Retries exhausted
I have no idea what those errors mean. I don't think it is reallocating sectors, because those didn't show in the smart log.
Actually I suspect the WD Black to be the culprit... I was hesitant to use this drive because it gave me some trouble on my desktop before. When this drive was connected to the onboard controller, hotswapping of any drive resulted in drive resets, it didn't even have to be the WD Black being hotswapped. Just like the thing that happened when I reconnected drive #2 in my server. The WD Black didn't even reset, it was all the WD Reds that went offline. It was a daring move to use the WD Black to begin with but I hoped for the problem not to occur with the C226 chipset. Yet this is just a wild guess. The reason for these errors might be something different completely. Maybe the SATA cables Supermicro gave me aren't so super after all.
Well, this is an ongoing "investigation", after all I want to get this NAS running. But if anyone has some ideas in the meantime, I would appreciate it. Maybe somone had some trouble with the WD Black WD1001FALS? Or the WD Red WD30EFRX? Or the Supermicro X10SLH-F? Or even the bequiet! Straight Power E9 580W? Every single bit except the WD Black is brand-new, and even that drive gave me some trouble. So actually it could be anything. Maybe even that flimsy Fractal Design case I shouldn't have bought.
Anyway here's a smartctl output of the first drive that went offline, drive #2. Yeah I tried short self-tests, but smartctl left me with a prompt and I wondered how the heck is it going to notify me that the self-test finished, so I aborted it. The second time it seems to have been aborted by issuing the smartctl -a command. Weird stuff.
Code:
smartctl 6.2 2013-07-26 r3841 [x86_64-w64-mingw32-win7-sp1] (sf-6.2-1) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Red (AF) Device Model: WDC WD30EFRX-68EUZN0 Firmware Version: 80.00A80 User Capacity: 3.000.592.982.016 bytes [3,00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s) Local Time is: Fri Feb 14 02:45:20 2014 WEST SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 41) The self-test routine was interrupted by the host with a hard or soft reset. Total time to complete Offline data collection: (39360) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 395) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x703d) SCT Status supported. SCT Error Recovery Control supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 100 253 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 213 213 021 Pre-fail Always - 4316 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 6 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 100 253 000 Old_age Always - 0 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 5 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 5 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 4 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 6 194 Temperature_Celsius 0x0022 118 115 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short captive Interrupted (host reset) 90% 4 - # 2 Short offline Aborted by host 40% 4 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
PS: Oh, btw. The pool is still saveable, right? I mean 4 drives were "destroyed", but the data on them should be fine. I mean it would be quite premature for ZFS to give up on a pool just because the drives dropped out and nothing else. The actual data on the drives is most probably fine, the drives are most probably fine (by themselves but not all together).