[12.0-U3.1] Kernal Panics?

hrana · May 18, 2021

System Specs

Motherboard make and model: Supermicro X10DRC-T4+ in SC846BE1C (SAS3) 24-bay case
CPU make and model: 2x Intel Xeon E5-2699v4
RAM quantity: 512GB RDIMM ECC
Hard drives, quantity, model numbers, and RAID configuration, including boot drives: 12x WD Red 8TB in 2x 6-drive Z2 zpools in one vdev. 24x WD Red 8TB, 3x 8-drive Z2 zpools in one vdev. The bootdrive is a Supermicro SATADOM 64GB module.
Hard disk controllers: HBA 9405W-16e to SC847BE1C (SAS3) 36-bay case (1x 24-port Supermicro SAS3 expander and 1x 12-port Supermicro SAS3 expander) acting as a JBOD through the Supermicro PTJBOB-CB3 controller. The two external SAS3 cables connect through a 4-port ext-to-int adapter. Two cables go to the 24-port expander. Two cables go from the 24-port to 12-port expander. Finally, two cables go from the 12-port expander to the ext-to-int adapter to allow for future daisy-chaining of another JBOD.
Network cards: Intel x550-T2, X550-T4 (onboard), i350-T4

This is a lab system that I have been testing for several months. Everything was fine under FreeNAS 11.3-U5. I upgraded to TrueNAS 12.0-U2 and everything was fine. The system has been running under 12.0-U3 without issues for several weeks.

Over the last 3 days, I noticed that the rear 12 disks would not survive a scrub without TrueNAS having a kernel panic.

I have done the following:

1) Upgraded to U3.1. Result: No change
2) Updated the firmware on the 9405W-16e. Result: No change.
3) Swapped power supplies (2x PWS-1K28P-SQ, 1000W). The system only uses 350-400W. Result: No change.
3) Scrubbed the front 24 disks without an issue. The rear 12 disks always cause a crash on scrub.
4) Searched to hell and back on Google and forums for an answer.

I have not done but am ready to:

1) Replace the rear 12-port expander.
2) Swap out all of the cables (external, internal, and the internal-to-external connector that sites in a PCI bracket)
3) Ask for help here as I am sure it is probably something really basic that I am missing or got wrong.

All of the crash logs reference the aiodXX process.

Here is the latest crash log (and I can provide all of the others, if needed):

Code:

Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 16
fault virtual address    = 0x3454f8f9
fault code        = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff827c62ce
stack pointer            = 0x0:0xfffffe022f4227e8
frame pointer            = 0x0:0xfffffe022f422820
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 34994 (aiod26)
trap number        = 12
panic: page fault
cpuid = 11
time = 1621361081
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe022f4224a0
vpanic() at vpanic+0x17b/frame 0xfffffe022f4224f0
panic() at panic+0x43/frame 0xfffffe022f422550
trap_fatal() at trap_fatal+0x391/frame 0xfffffe022f4225b0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe022f422600
trap() at trap+0x286/frame 0xfffffe022f422710
calltrap() at calltrap+0x8/frame 0xfffffe022f422710
--- trap 0xc, rip = 0xffffffff827c62ce, rsp = 0xfffffe022f4227e8, rbp = 0xfffffe022f422820 ---
avl_rotation() at avl_rotation+0x3e/frame 0xfffffe022f422820
zfs_rangelock_enter_impl() at zfs_rangelock_enter_impl+0x4e8/frame 0xfffffe022f422880
zfs_get_data() at zfs_get_data+0x15f/frame 0xfffffe022f422910
zil_commit_impl() at zil_commit_impl+0xe11/frame 0xfffffe022f422a70
zfs_fsync() at zfs_fsync+0xc1/frame 0xfffffe022f422ab0
VOP_FSYNC_APV() at VOP_FSYNC_APV+0x7b/frame 0xfffffe022f422ae0
aio_process_sync() at aio_process_sync+0x121/frame 0xfffffe022f422b40
aio_daemon() at aio_daemon+0x227/frame 0xfffffe022f422bb0
fork_exit() at fork_exit+0x7e/frame 0xfffffe022f422bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe022f422bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Does anyone have an idea?

ThreeDee · May 18, 2021

hrana said:
I have not done but am ready to:

1) Replace the rear 12-port expander.

That's where I'd start .. starting with the connectors/cables (easy cheap stuff first)

Important Announcement for the TrueNAS Community.

[12.0-U3.1] Kernal Panics?

hrana

Cadet

ThreeDee

Guru

Similar threads

Important Announcement for the TrueNAS Community.

[12.0-U3.1] Kernal Panics?

hrana

Cadet

ThreeDee

Guru

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "[12.0-U3.1] Kernal Panics?"

Similar threads