After Hyper-V issue Bare metal issue is worse

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
Well after the hickups in the beginning my ryzen system is rock solid. No more hard lockups, stable network connections, steady smb performance and my vm's run just fine. But I'm planing to switch to scale when it's released because i hope the hardware compatibility is better with linux under the hood then freebsd.
My problem seems to be gone, I think the user didn't port over correctly, but strange how it worked and then didn't work. I also changed a setting in BIOS, not sure if that helped.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
How quickly you gloss over the problems of the OS developer. Look, Ryzen was the first architecture on the market to run a huge number of cores, each with both local cache, and access over a system bus to other cores' cache. Keeping data consistent in cache, with both local and remote CPUs reading and writing at different rates, and with extra latencies introduced by cores ramping up and down in power by ramping up and down their clocks is a non-trivial problem. Even the Linux developers had trouble with it, and they've only just overcome those issues in recent kernels. FreeBSD has a much smaller developer pool and market share, so they're obviously going to take longer to climb that same learning curve.

Unless you've written code and had to support it out in the field, on inadequate fixed budgets and limited manpower, you don't know what you're talking about.

Yes, I write software for a living. If the TrueNAS system has inadequate budget and limited manpower then you have to agree that is a serious issue. Ryzen is in its 4th generation and Intel is becoming more like AMD not the other way around. AMD makes plenty of server products not sure why someone would think they don't. I have 7 computers and 5 are AMD. I have been using both AMD and Intel for 20+ years. You can't very well tell half of the desktop user base that their processor choice is why they are having issues and expect to be taken seriously. Seems to me there are plenty of people building TrueNAS on Ryzen and having good success, blaming the processor might mean that you don't have a very good idea what the problem is and are just pointing to something without actually knowing.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Fanboi Ryzen all you like. There are unresolved hardware errata that AMD will NOT fix. (For example, see this Reddit thread.) We've provided the known work-arounds, but the experience here is to steer clear of Ryzen if you value your system's stability and data's integrity.
 

ethebubbeth

Cadet
Joined
Nov 4, 2014
Messages
8
The issue here is the network dropping. I'm guessing it's more likely related to the Realtek 8111H NIC on that board than the Ryzen CPU. They can have issues going down under load on FreeBSD. My understanding is the Realtek driver included in the FreeBSD kernels is rather old and you might need to look into loading a newer one. I do not know if TrueNAS ships with a newer Realtek driver.
 

Hellione

Explorer
Joined
Jan 23, 2021
Messages
55
you might be right. every server product out there recommends intel nic´s. and there is no need to use a board with realtek. you can get pretty good ryzen boards with intel lan. or just throw in an intel nic for some bucks and disable the onboard rtl shit. if you need a recommendation, i built my system with the asrock x570 phantom gaming. ok i dont use the onboard intel nic, i use a intel 82599 based 10G SFP+ card instead. it´s running stable as-a-rock since day 1. i put in 64gb kingston 2666 ecc (use the asrock recommended module list), for sure no overclocking shit^^, all 8 sata ports and all 2 nvme ports are equipped with drives. asrock knows how to build boards, they are very competent in the server market too, and they support ecc on all am4 boards i know, even their consumer products like the x570pg.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
The issue here is the network dropping. I'm guessing it's more likely related to the Realtek 8111H NIC on that board than the Ryzen CPU. They can have issues going down under load on FreeBSD. My understanding is the Realtek driver included in the FreeBSD kernels is rather old and you might need to look into loading a newer one. I do not know if TrueNAS ships with a newer Realtek driver.

Yeah, the problem is that it happens with any NIC I use. It happens the same with the RealTek or the Mellanox ConnectX-3. It happens bare metal or through Hyper-V. It happens under load and idle. It happen whether I am writes to my z2 array or a single SSD.

They say disable C-States in BIOS, but the only thing in my BIOS is global c-states, I can't find anything else.

Also happens on 2 different motherboards and two different chips.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
you might be right. every server product out there recommends intel nic´s. and there is no need to use a board with realtek. you can get pretty good ryzen boards with intel lan. or just throw in an intel nic for some bucks and disable the onboard rtl ****. if you need a recommendation, i built my system with the asrock x570 phantom gaming. ok i dont use the onboard intel nic, i use a intel 82599 based 10G SFP+ card instead. it´s running stable as-a-rock since day 1. i put in 64gb kingston 2666 ecc (use the asrock recommended module list), for sure no overclocking ****^^, all 8 sata ports and all 2 nvme ports are equipped with drives. asrock knows how to build boards, they are very competent in the server market too, and they support ecc on all am4 boards i know, even their consumer products like the x570pg.

Wow, an x570? If it works on that it should work on mine. But I have been testing with MSI x470 Gaming Plus and a MSI b350 Tomahawk and I have tried the built in RealTek NIC and the Mellanox Connectx-3, I have tried bare metal and Hyper-V. I even removed the 10G switch from the network and same issue. Could it be my router? I have a ATT Gateway and a TP-Link router, nothing else seems to be having a problem but maybe FreeBSD is more sensitive? It's about the only thing that I haven't tested, but I can't test without internet because it is the qBittorrent client that I am focused on but it does happen when writing any large batch of small files. Large files fail to write almost never and reads from the NAS are almost always good. I wrote 40TB of large files and only few dozen failed out of about 50,000 files.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
I think I misunderstood you, I never disabled ErpReady. I will look for that when I get home. Maybe that is the solution
I think ErpReady is disabled by default, I have the same board in my office and it is disabled, but I will double check.
 

Hellione

Explorer
Joined
Jan 23, 2021
Messages
55
hmm at first i would test everything locally. do you have a managed or websmart switch? you could see link up-down counters there. i use a d-link dgs-1510-28x and the truenas is connected with twinax / dac cable. just copy files locally to test it, send ping while doing that, and so on. and yes the x570 is nice, nice cpu-chipset throughput of 8GB/s for future ssd upgrades hehe. it´s also cheap in my country, only <150 euro. msi does not mention ecc memory on their products, and that is one very important option for me at a 24/7 server, so i always use asrock if i want to use consumer grade boards in a server.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
hmm at first i would test everything locally. do you have a managed or websmart switch? you could see link up-down counters there. i use a d-link dgs-1510-28x and the truenas is connected with twinax / dac cable. just copy files locally to test it, send ping while doing that, and so on. and yes the x570 is nice, nice cpu-chipset throughput of 8GB/s for future ssd upgrades hehe. it´s also cheap in my country, only <150 euro. msi does not mention ecc memory on their products, and that is one very important option for me at a 24/7 server, so i always use asrock if i want to use consumer grade boards in a server.

I actually own that exact board x570 phantom gaming for my 5800x, but that PC is not for a NAS server, eventually I will put a 6800XT in it, but it will be a while. It isn't the greatest overclock board in the world, vdroop is pretty bad but I get pretty good performance for a $99 board, I bought it on sale way before I bought the 5800x .

Another thing that is odd, when the small files fail to write to the NAS, I will have 4-5 video encoders reading from the NAS and they don't fail. I am not sure what to make of that fact, but it doesn't seem to be a complete network failure, just a failure of those files to write. Every encoder is running from a different PC and reading from the NAS across the network.

Seems like the ZFS file system is finding some kind of corruption in the files and refusing to write them? It shouldn't be bad memory because I have tested on two machines and 3 different sets of memory.
 

Hellione

Explorer
Joined
Jan 23, 2021
Messages
55
but keep in mind, video clients buffer some seconds, you might not notice a short network outage.
overclocking is not my world. i prefer stability and free time for myself xD. if i want more performance, i buy a new cpu.
 

Hellione

Explorer
Joined
Jan 23, 2021
Messages
55
you could simply
ping your-nas-ip -t > c:\ping.txt
and check later if there is any line with no answer :smile:
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
but keep in mind, video clients buffer some seconds, you might not notice a short network outage.
overclocking is not my world. i prefer stability and free time for myself xD. if i want more performance, i buy a new cpu.

I overclock for stability, I don't trust MB manufacturers to throw voltage at my CPU. I had an ASRock board blow up at the socket running stock, so I run a very safe, very stable overclock ever since.

I don't know how much buffering the encoders are doing, but with 5 going you would think occasionally one would error out every once in a while but not seeing it happen.

Of the 40 TB I wrote to my NAS, when I did a CRC check, not a single file failed reading it back and 100% matched the source.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
you could simply
ping your-nas-ip -t > c:\ping.txt
and check later if there is any line with no answer :smile:
I am running this test now, 1 hour should be a good test. Write failure is usually 30-40 minutes.
 

RegularJoe

Patron
Joined
Aug 19, 2013
Messages
330
if VMware ESXi makes it work it is because VMware is taking control of your power management. Even in VMware we have had to tell our Intel hardware to not slow the clock down to save power but run full tilt all the time and save no power. VMware is very stable but it also has a VERY small compatibility list and once you run your hardware on it all the buttons and knobs are gone, no fiddling. It always seems when I fiddle I change some thing that makes for more issues now or later. I have learned to not change it unless you are absolutely sure you know the ramifications. I have an older HP N54L Gen 7 micro server and it has an AMD CPU, so AMD can work.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Only a crap ton of computers are made by AMD,

Actually, AMD doesn't make computers.

if you can't get FreeBSD stable on Ryzen then there is a serious problem with development there.

Problems similar to the ones Linux had getting stable on Ryzen. Heck, even Windows had a bunch of issues early on. The FreeBSD community is a smaller community, and just isn't going to have developer coverage for every rando desktop board out there. If you think this is a problem, you are absolutely encouraged to donate money or systems to the FreeBSD Foundation, because developing for large numbers of platforms is a cash intensive business.

I can't think of any product that doesn't run on both AMD and Intel. Overclocking is a very good idea, letting the system handle boost can lead to unnecessary high voltages, I would know, because I had a ASRock board blow up at the socket running stock.

Well I think this says it all.
 

tony95

Contributor
Joined
Jan 2, 2021
Messages
117
you could simply
ping your-nas-ip -t > c:\ping.txt
and check later if there is any line with no answer :smile:

Ping statistics for 192.168.0.108:
Packets: Sent = 9694, Received = 9694, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
Minimum = 0ms, Maximum = 3ms, Average = 0ms
Control-C


Ran for several hours (forgot about it) and 0 packet lost.
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Try ping -t -f -l <packet size> <your NAS IP> and vary the packet sizes from 1500 to 9000.
 
Top