Hey, a few quick updates. I was able to offline the drive, and format with the commands provided above. 4TB drive takes around 6-8 hours for a full reformat. you can watch the progress if you add --verbose to the sg_format command previously mentioned. Do not ctrl-c it, it will not immediately output a percentage for progress, mine took 5-10 minutes for the first .0x% to come out. After online'ing the drive, it shows faulted, but Daisuke thinks it might be a linux issue identifying the drive as bad. Going to reboot once the other drives that I'm reformatting (not in the pool) are complete. A HUGE thank you to @Daisuke for all the help!!!
zpool shows a random number where the drive should be:
, but I managed to find the drive in question with the serial through smartctl, e.g.
Code:
➜ ~ smartctl -i --all /dev/sdab
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Vendor: HGST
Product: H7240AS60SUN4.0T
Revision: A3A0
Compliance: SPC-4
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
LU is fully provisioned
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Logical Unit id: 0x5000cca073425558
Serial number: 001533E5GX8X PEH5GX8X
Device type: disk
Transport protocol: SAS (SPL-3)
Local Time is: Sat Dec 17 19:53:36 2022 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Temperature Warning: Enabled
I'm able to "replace" the disk with /dev/sdab from the dropdown menu in the pool storage area. Says it's currently resilvering!
Synopsis
I noticed some users get this warning message, after Bluefin upgrade:
Code:
Disk(s): sda ... are formatted with Data Integrity Feature (DIF) which is unsupported.
Another case is disks are not being seen by pool, yet the pool reports no errors. You might deal with one or a combination of both issues listed below. Bluefin's newer kernel/ZFS version makes available these feature checks and throws a warning, which explains why you did not see it before. Linux cannot read disks formatted with 520-byte sectors, therefore you need to perform both procedure checks listed below.
Data Integrity Field
DIF extends the disk sector from its traditional 512-byte, to 520-byte, by adding 8 additional protection bytes. You might also find also disk sectors extended to 528-byte by custom firmware. OEM rebranded HDDs or SSDs from major storage vendors are plagued with this "enhancement" not supported by Linux.
If you look in /var/log/messages, you should see warnings similar to:
T10 Protection Information
T10-PI is an extension of the existing T10 SCSI Block Commands specification, covering communication between SCSI controllers and storage devices, Protection Information (PI) adds an extra 8 bytes of information to the 512-byte sectors typical of enterprise hard drives.
format.log will contain the format command output, useful for further troubleshooting.
I don't know how the ZFS pools react to sg_format, technically you need to take the disk offline, format the disk, then add back the disk into pool. While the formatting is simple to execute, I would like to get input from experienced users what are the correct steps for the offline disk procedure.
Not to derail or go off topic, but reading this thread makes me nervous. How long has this been going on?
When you order a WD drive from Amazon, New Egg, B&H, Best Buy, etc, where is there any indication on the merchant's website that you might be getting a branded disk?
I'm looking at different results for WD Red Plus drives, and I'm not seeing any clear label on the sticker or the product description. We're not even talking SAS drives, just your typical SATA drives that home users purchase for their NAS servers (or heck, their own personal computers.)
Goodness, can this even permeate into white label drives that you shuck from an external enclosure?
How can you know ahead of time if the drive you're purchasing online is a "branded disk"?
This reminds me of the SMR debacle in which Western Digital only cleaned up their act after-the-fact to make their marketing clear to their customers after there was a surge of public negative blowback.
This is very common for disks purchased on eBay as refurbished, 520-byte sector or with custom firmware disks have been existent for many years. These disks are mostly present into appliances like EMC or NetApp arrays or special server configurations.
Whoever sells you the disks, knows very well they are branded or 520-byte, just ask. If they don't know and reply that the disks have been pulled from some type of array, the disks will probably be affected. A responsible seller will hook the array to a Linux machine and format all disks in one shot, but most sellers don't bother. Retail disks are never branded or formatted with 520-byte sectors, unless they are swapped and returned by purchaser, then sold again by retail seller.
ASMT ASM1156-PM 0 peripheral_type: disk [0x0]
PROTECT=0
Unit serial number: 00000000000000000000
LU name: 5000000000000001
mode sense(10) cdb: [5a 00 01 00 00 00 00 00 fc 00]
mode sense(10):
Fixed format, current; Sense key: Illegal Request
Additional sense: Invalid field in cdb
bad field in MODE SENSE (10) [mode_page 1 not supported?]
sg_format -vFs 512 /dev/sg2 0.00s user 0.00s system 24% cpu 0.010 total
I don't recall seeing this type of error, especially that the serial number was purposely erased. Must be a custom firmware. Try these commands and let me know the output: See this post.
BTW, this is the proper format users post their Linux commands results (command, followed by server response, the # tells me you are executing the command as root, $ means you are a regular user):
I don't recall seeing this type of error, especially that the serial number was purposely erased. Must be a custom firmware. Try these commands and let me know the output:
BTW, this is the proper format users post their Linux commands results (command, followed by server response, the # tells me you are executing the command as root, $ means you are a regular user):
got. thank you for all ur patience and help, especially with the proper etiquette here.
the only way i got these to show up outside of the BIOS was using a hard drive dock that is USB connected.
Here are the results
Code:
root@truenas[~]# smartctl -Hi /dev/sg2
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: WDC WD80EFZZ-68BTXN0
Serial Number: [No Information Found]
Firmware Version: [No Information Found]
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: [No Information Found]
Local Time is: Sun Aug 7 06:26:41 2022 PDT
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 82-83 don't show if SMART supported.
SMART support is: Ambiguous - ATA IDENTIFY DEVICE words 85-87 don't show if SMART is enabled.
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
Code:
root@truenas[~]# sg_format -6vFs 512 /dev/sg2
ASMT ASM1156-PM 0 peripheral_type: disk [0x0]
PROTECT=0
Unit serial number: 00000000000000000000
LU name: 5000000000000001
mode sense(6) cdb: [1a 00 01 00 fc 00]
mode sense(6):
Fixed format, current; Sense key: Illegal Request
Additional sense: Invalid field in cdb
bad field in MODE SENSE (6) [mode_page 1 not supported?]
I had the drives connected to the motherboard on 2 different machines 1 at a time with windows and linux and couldnt get them to show up other than in bios. So the USB is my last resort here before returning the drives.
@kamalovlast command is telling me the disk is using a custom firmware. I would recommend to post in r/homelab and list all command outputs you tried. I'm sure someone will post additional guidelines. Please post the Reddit link here, and if you managed to fix the disk. See this post. Where did you purchased your disk?
@kamalov last command is telling me the disk is using a custom firmware. I would recommend to post in r/homelab and list all command outputs you tried. I'm sure someone will post additional guidelines. Please post the Reddit link here, and if you managed to fix the disk. Where did you purchased your disk?
Ask the seller to issue you an RMA or even better a refund. I only purchase HGST Ultrastar Helium refurbished (grade A) CMR hard drives. They have a stunning MTBF (Mean Time Between Failures) of 2.5M hours and reputable vendors offer a 5 years warranty, even if refurbished. Helium doesn’t affect the AFR (Annualized Failure Rate) of hard drives, versus air-filled drives. Yes, these hard drives are that good, I personally never had any issues with them for many years.
@rollee that's an easy fix. Do an sg_map to make sure you are formatting the right disk, next in Scale UI, take the disk offline. Example for my disk:
Once the disk offline, run as root:
Code:
# time sg_format -v -F /dev/sg7
This will take several hours, you should definitely use tmux, as described into OP. If your ssh connection is lost, you will need to start again and who knows what errors you might deal with after.
Once the disk formatted, reboot the server and bring back the disk online. The resilvering will start automatically, should take a while also.
If drives are seen by Linux, they are 512-byte formatted but have additional T10 protection which Linux does not like.
BTW, I updated the guide to use the actual disk name, instead device id, it removes confusion. You can use either, the mapping is automatically done from device id to disk.
TrueNAS specs
OS TrueNAS Scale, Bluefin running on two mirrored USB's
Hardware
Dell 720XD, with two E5-2650 V2, 4 x 16GB Samsung PC3-14900R ECC RAM,
12 x 6TB SAS Dell 7.2K in a RaidZ3
2 x 1TB SSD Samsung 870 EVO in a Mirror
When I upgraded to bluefin and upgraded it displayed the error most people seem to be having,
Disk(s): sdb, sdi, sdm, sdl, sdn, sdc, sdd, sde, sdf, sdg, sdj are formatted with Data Integrity Feature (DIF) which is unsupported.
2022-12-19 16:45:08 (Australia/Adelaide)
but since upgrading one of my SAS drives has stopped registering in TrueNAS, it was working fine before upgrading and I can see the drive as a blank drive duing the installation media GUI, I have tried checking all cables and connectors are working correctly from what I can see, I have also tried the following
Code:
# fdisk -1 /dev/sda
OUTPUT for unresponsive disk
fdisk: cannot open /dev/sda: Input/output error
OUTPUT for anyother mechanical disk
Disk /dev/sdn: 5.46 TiB, 6001175126016 bytes, 11721045168 sectors
Disk model: MG04SCA60EE
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: D9012A69-2086-4F9A-80B7-38EBB47551C5
Device Start End Sectors Size Type
/dev/sdn1 128 4194304 4194177 2G Linux swap
/dev/sdn2 4194432 11721045134 11716850703 5.5T Solaris /usr & Apple ZFS
smartctl -l selftest /dev/sda
OUTPUT
root@truenas[~]# smartctl -l selftest /dev/sda
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.79+truenas] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed 80 43957 - [- - -]
# 2 Background short Completed 80 43954 - [- - -]
# 3 Background short Completed 80 43952 - [- - -]
# 4 Background short Completed 80 43936 - [- - -]
# 5 Background short Completed 80 43895 - [- - -]
# 6 Background short Completed 80 43866 - [- - -]
# 7 Background short Completed 80 43858 - [- - -]
# 8 Background short Completed 80 43842 - [- - -]
# 9 Background short Completed 80 5 - [- - -]
#10 Reserved(7) Completed 64 5 - [- - -]
Long (extended) Self-test duration: 37873 seconds [631.2 minutes]
When in the storage tab I am able to see the drive that has apparently failed in the pool under unassigned but when I go into the pool and select the failed drive (Label has changed to 3869088571791395513) and attempt to replace it with the unassigned drive i get Error: [EFAULT] Unable to GPT format the disk "sda": Warning! Read error 5; strange behavior now likely! Warning: Partition table header claims that the size of partition table entries is 0 bytes, but this program supports only 128-byte entries. Adjusting accordingly, but partition table may be garbage. Warning! Read error 5; strange behavior now likely! Warning: Partition table header claims that the size of partition table entries is 0 bytes, but this program supports only 128-byte entries. Adjusting accordingly, but partition table may be garbage. Unable to save backup partition table! Perhaps the 'e' option on the experts' menu will resolve this problem. Warning! An error was reported when writing the partition table! This error MIGHT be harmless, or the disk might be damaged! Checking it is advisable.
Not sure if this is useful or not, saw you asked for it when helping the other people :)
The troubleshooting steps and formatting process are into OP, you can format the disk accordingly. Since Linux cannot read 520-byte sectors, is save to start by formatting the disk to 512-byte sectors. If you cannot perform this action successfully, it usually means your drive has custom firmware applied or is just defective. You can reach at specialized forums like STH or r/homelab and hope someone with knowledge has the answer.
The goal of this thread is to fix the DIF and T10 issues in Bluefin, not to determine why a disk cannot be replaced into pool. Is best you create your own thread for that.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.