I actually JUST posted the part below in another thread before reading the last couple of pages of this one and felt it needed repeating since some folks just don't get the importance of running the proper hardware, especially ECC ram. This is real-world how stuff could have gone south real fast with my ZFS RaidZ2 pool had it not been for the hard-knock lessons learned by others. Don't even bother running ZFS if you aren't willing to run ECC ram.
http://forums.freenas.org/index.php...lesector-offline-uncorrectable-sectors.22131/
------------------------
Wow, talk about a rough week for hardware failures! After swapping out the HDD that was throwing all the errors and resilvering the RaidZ2 pool, I shut the SM X10SL7-F down and did a cold reboot. Wouldn't you know, the darn thing wouldn't come back up! The screen would stay blank and a steady 4 bios beeps would sound. One each second, for four seconds. Found out this was a fatal memory error as not even the bios screen would come up.
Luckily, I had a spare 16GB kit that I was about to install in my Supermicro X10SAE workstation. Got it installed and the server booted right back up! I tried the defective ram on a different X10 board and had the same 4 beeps, so I'm pretty sure the ram has failed. I've never had ram fail before, and it was just dumb luck on the timing to have a spare set handy!
Once the server was back online, I fired up IPMI View and noticed some Correctable ECC events in the IPMI System Event Log! Scarily, these dates/times correlate exactly to when the monthly zpool scrub fires off! Yikes! I have since completed a new scrub, so would it be safe to say I likely dodged any possible data corruption? Thank goodness for ECC ram. Anyone who runs ZFS without it is asking for serious trouble!
http://forums.freenas.org/index.php...lesector-offline-uncorrectable-sectors.22131/
------------------------
Wow, talk about a rough week for hardware failures! After swapping out the HDD that was throwing all the errors and resilvering the RaidZ2 pool, I shut the SM X10SL7-F down and did a cold reboot. Wouldn't you know, the darn thing wouldn't come back up! The screen would stay blank and a steady 4 bios beeps would sound. One each second, for four seconds. Found out this was a fatal memory error as not even the bios screen would come up.
Luckily, I had a spare 16GB kit that I was about to install in my Supermicro X10SAE workstation. Got it installed and the server booted right back up! I tried the defective ram on a different X10 board and had the same 4 beeps, so I'm pretty sure the ram has failed. I've never had ram fail before, and it was just dumb luck on the timing to have a spare set handy!
Once the server was back online, I fired up IPMI View and noticed some Correctable ECC events in the IPMI System Event Log! Scarily, these dates/times correlate exactly to when the monthly zpool scrub fires off! Yikes! I have since completed a new scrub, so would it be safe to say I likely dodged any possible data corruption? Thank goodness for ECC ram. Anyone who runs ZFS without it is asking for serious trouble!
Code:
202,System Event,06/24/2014 06:45:22 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 203,System Event,06/24/2014 06:48:19 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 204,System Event,06/24/2014 07:16:10 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 205,System Event,06/24/2014 07:16:45 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 206,System Event,06/24/2014 07:24:50 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 207,System Event,06/24/2014 07:25:39 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 208,System Event,06/24/2014 07:31:36 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 209,System Event,06/24/2014 07:33:58 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 210,System Event,06/24/2014 07:33:58 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 211,System Event,06/24/2014 07:34:43 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 212,System Event,06/24/2014 07:34:58 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 213,System Event,06/24/2014 07:49:10 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 214,System Event,06/24/2014 08:19:54 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 215,System Event,06/24/2014 08:22:32 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 216,System Event,06/24/2014 09:32:00 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 217,System Event,06/24/2014 09:50:07 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 218,System Event,06/24/2014 10:09:11 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 219,System Event,06/24/2014 18:19:01 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 220,System Event,06/24/2014 18:19:02 Tue,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 221,System Event,06/25/2014 00:25:37 Wed,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 222,System Event,06/25/2014 00:25:37 Wed,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 223,System Event,06/25/2014 01:14:29 Wed,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 224,System Event,06/25/2014 05:15:03 Wed,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 225,System Event,06/25/2014 05:15:04 Wed,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 226,System Event,06/25/2014 07:01:47 Wed,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 227,System Event,06/25/2014 08:06:15 Wed,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 228,System Event,06/25/2014 18:46:34 Wed,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 229,System Event,06/26/2014 14:42:42 Thu,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1) 230,System Event,06/26/2014 14:42:43 Thu,Memory,,Assertion: Memory| Event = Correctable ECC@DIMMB2(CPU1)