hrana
Cadet
- Joined
- May 17, 2014
- Messages
- 4
System Specs
Over the last 3 days, I noticed that the rear 12 disks would not survive a scrub without TrueNAS having a kernel panic.
I have done the following:
1) Upgraded to U3.1. Result: No change
2) Updated the firmware on the 9405W-16e. Result: No change.
3) Swapped power supplies (2x PWS-1K28P-SQ, 1000W). The system only uses 350-400W. Result: No change.
3) Scrubbed the front 24 disks without an issue. The rear 12 disks always cause a crash on scrub.
4) Searched to hell and back on Google and forums for an answer.
I have not done but am ready to:
1) Replace the rear 12-port expander.
2) Swap out all of the cables (external, internal, and the internal-to-external connector that sites in a PCI bracket)
3) Ask for help here as I am sure it is probably something really basic that I am missing or got wrong.
All of the crash logs reference the aiodXX process.
Here is the latest crash log (and I can provide all of the others, if needed):
Does anyone have an idea?
- Motherboard make and model: Supermicro X10DRC-T4+ in SC846BE1C (SAS3) 24-bay case
- CPU make and model: 2x Intel Xeon E5-2699v4
- RAM quantity: 512GB RDIMM ECC
- Hard drives, quantity, model numbers, and RAID configuration, including boot drives: 12x WD Red 8TB in 2x 6-drive Z2 zpools in one vdev. 24x WD Red 8TB, 3x 8-drive Z2 zpools in one vdev. The bootdrive is a Supermicro SATADOM 64GB module.
- Hard disk controllers: HBA 9405W-16e to SC847BE1C (SAS3) 36-bay case (1x 24-port Supermicro SAS3 expander and 1x 12-port Supermicro SAS3 expander) acting as a JBOD through the Supermicro PTJBOB-CB3 controller. The two external SAS3 cables connect through a 4-port ext-to-int adapter. Two cables go to the 24-port expander. Two cables go from the 24-port to 12-port expander. Finally, two cables go from the 12-port expander to the ext-to-int adapter to allow for future daisy-chaining of another JBOD.
- Network cards: Intel x550-T2, X550-T4 (onboard), i350-T4
Over the last 3 days, I noticed that the rear 12 disks would not survive a scrub without TrueNAS having a kernel panic.
I have done the following:
1) Upgraded to U3.1. Result: No change
2) Updated the firmware on the 9405W-16e. Result: No change.
3) Swapped power supplies (2x PWS-1K28P-SQ, 1000W). The system only uses 350-400W. Result: No change.
3) Scrubbed the front 24 disks without an issue. The rear 12 disks always cause a crash on scrub.
4) Searched to hell and back on Google and forums for an answer.
I have not done but am ready to:
1) Replace the rear 12-port expander.
2) Swap out all of the cables (external, internal, and the internal-to-external connector that sites in a PCI bracket)
3) Ask for help here as I am sure it is probably something really basic that I am missing or got wrong.
All of the crash logs reference the aiodXX process.
Here is the latest crash log (and I can provide all of the others, if needed):
Code:
Fatal trap 12: page fault while in kernel mode cpuid = 11; apic id = 16 fault virtual address = 0x3454f8f9 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff827c62ce stack pointer = 0x0:0xfffffe022f4227e8 frame pointer = 0x0:0xfffffe022f422820 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 34994 (aiod26) trap number = 12 panic: page fault cpuid = 11 time = 1621361081 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe022f4224a0 vpanic() at vpanic+0x17b/frame 0xfffffe022f4224f0 panic() at panic+0x43/frame 0xfffffe022f422550 trap_fatal() at trap_fatal+0x391/frame 0xfffffe022f4225b0 trap_pfault() at trap_pfault+0x4f/frame 0xfffffe022f422600 trap() at trap+0x286/frame 0xfffffe022f422710 calltrap() at calltrap+0x8/frame 0xfffffe022f422710 --- trap 0xc, rip = 0xffffffff827c62ce, rsp = 0xfffffe022f4227e8, rbp = 0xfffffe022f422820 --- avl_rotation() at avl_rotation+0x3e/frame 0xfffffe022f422820 zfs_rangelock_enter_impl() at zfs_rangelock_enter_impl+0x4e8/frame 0xfffffe022f422880 zfs_get_data() at zfs_get_data+0x15f/frame 0xfffffe022f422910 zil_commit_impl() at zil_commit_impl+0xe11/frame 0xfffffe022f422a70 zfs_fsync() at zfs_fsync+0xc1/frame 0xfffffe022f422ab0 VOP_FSYNC_APV() at VOP_FSYNC_APV+0x7b/frame 0xfffffe022f422ae0 aio_process_sync() at aio_process_sync+0x121/frame 0xfffffe022f422b40 aio_daemon() at aio_daemon+0x227/frame 0xfffffe022f422bb0 fork_exit() at fork_exit+0x7e/frame 0xfffffe022f422bf0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe022f422bf0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic
Does anyone have an idea?