alexmarkley
Dabbler
- Joined
- Jul 27, 2021
- Messages
- 40
Yesterday I upgraded my TrueNAS SCALE box from Bluefin 22.12.3.2 to Cobia 23.10.0. After the upgrade, I noticed that none of my SATA drives are reporting temperatures anymore.
Here are some relevant screenshots:
At a low level, everything still seems to be working:
The important hardware details on this machine:
Do we know if this issue is "just" a reporting issue? Does it also impact alert notifications? If it's just a GUI bug, it's annoying. If it breaks drive temperature alerts, that's a lot more concerning.
I had posted in an older thread about this issue but @morganL recommended I start a fresh thread. For reference: https://www.truenas.com/community/t...ting-or-storage-dashboard-in-cobia-rc.112857/
This should go without saying, but I'm happy to provide additional debug information and/or perform additional troubleshooting if it helps track this down.
Here are some relevant screenshots:
At a low level, everything still seems to be working:
Code:
root@veritas2[~]# for DISK in $(smartctl --scan | awk '{ print $1; }'); \ do echo "==== Smartctl on $DISK ====" ; \ smartctl -a $DISK | grep -E '^Temperature:|Airflow_Temperature|ATTRIBUTE_NAME|Health Information'; \ echo ; done ==== Smartctl on /dev/sda ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 061 048 000 Old_age Always - 39 (Min/Max 29/41) ==== Smartctl on /dev/sdb ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 059 044 000 Old_age Always - 41 (Min/Max 31/44) ==== Smartctl on /dev/sdc ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 060 049 000 Old_age Always - 40 (Min/Max 30/44) ==== Smartctl on /dev/sdd ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 064 043 000 Old_age Always - 36 (Min/Max 30/41) ==== Smartctl on /dev/sde ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 061 053 000 Old_age Always - 39 (Min/Max 28/40) ==== Smartctl on /dev/sdf ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 067 044 040 Old_age Always - 33 (Min/Max 29/35) ==== Smartctl on /dev/sdg ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 058 049 000 Old_age Always - 42 (Min/Max 30/44) ==== Smartctl on /dev/sdh ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 058 048 000 Old_age Always - 42 (Min/Max 31/44) ==== Smartctl on /dev/sdi ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 064 038 040 Old_age Always In_the_past 36 (2 54 39 30 0) ==== Smartctl on /dev/sdj ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 062 040 000 Old_age Always - 38 (Min/Max 30/43) ==== Smartctl on /dev/sdk ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 069 045 000 Old_age Always - 31 (Min/Max 28/35) ==== Smartctl on /dev/sdl ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 066 048 000 Old_age Always - 34 (Min/Max 27/35) ==== Smartctl on /dev/sdm ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 063 045 000 Old_age Always - 37 (Min/Max 27/38) ==== Smartctl on /dev/sdn ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 064 052 000 Old_age Always - 36 (Min/Max 28/37) ==== Smartctl on /dev/sdo ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0022 068 048 000 Old_age Always - 32 (Min/Max 28/36) ==== Smartctl on /dev/sdp ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0032 071 061 000 Old_age Always - 29 ==== Smartctl on /dev/sdq ==== ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 190 Airflow_Temperature_Cel 0x0032 072 062 000 Old_age Always - 28 ==== Smartctl on /dev/nvme0 ==== SMART/Health Information (NVMe Log 0x02) Temperature: 48 Celsius ==== Smartctl on /dev/nvme1 ==== SMART/Health Information (NVMe Log 0x02) Temperature: 42 Celsius
The important hardware details on this machine:
- Supermicro A+ Server 1014S-WTRT
- AMD EPYC 7232P 8-Core Processor @ 3.10GHz
- 128 GB ECC DDR4 3200
- Micron 480 GB NVMe boot device
- Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3 (rev 02) HBA controller
- 15 SATA drives on an external SAS-3 backplane for tank pool
- Samsung SSD 990 PRO 1TB NVMe as an L2ARC for tank
- couple of mirrored SSDs for apps and VMs in a separate pool
Do we know if this issue is "just" a reporting issue? Does it also impact alert notifications? If it's just a GUI bug, it's annoying. If it breaks drive temperature alerts, that's a lot more concerning.
I had posted in an older thread about this issue but @morganL recommended I start a fresh thread. For reference: https://www.truenas.com/community/t...ting-or-storage-dashboard-in-cobia-rc.112857/
This should go without saying, but I'm happy to provide additional debug information and/or perform additional troubleshooting if it helps track this down.