tl;dr OP is right, don't be jerks and prepare the update properly. Its YOUR data ... or YOUR head if the data are someone else's.
>> If you're sitting on a toilet and you're bored of playing the mobile games here is the full post ...
The way how the OP describes the process might get one thinking that FreeNAS is a sh!tty product which explodes into radioactive particles taking whole neighborhood with it upon system update. Reality is (as
@ornias already nated) that the described process more or less matches
the usual approach when talking about enterprise systems. (Which Free/TrueNAS actually is ... unless you pick the BETA or initial release - see
THIS)
And this is not just some "i read some crap so i will just copy/paste it"... not at all. I am working for a company which is taking care of various (inter)national companies. I did major upgrades/migrations of systems which are crucial for the company operation (you don't want to get your trucks/logistic getting stuck on the way or whole warehouse going dark. Or how about all of your terminals in every single store you have across the country not being able accept payments?). There is no room for ANY mistakes and some of the projects are planned for several months before pulling the trigger. Yet sometimes shit hits the fan no matter how many of the preventive measures you did.
So if i drill down to OPs points and remove the relation to "FreeNAS" i can easily match them to what i am doing everytime i am about to update something on productive system...
Read the release notes carefully, along with the Guide for the new version, section 1.1.
Oh yeah, every (fcking) time RTFM !!! ... When i was rookie i did something on non-productive system twice w/o fully read the release notes/installation docs. It worked fine. Third time i screwed the system and we had to restore the system DB. Yea, been there as well.
Prepare an action plan to mitigate all the gotchas in your specific installation.
I don't like project managers, their excel sheets, their status meetings to which i am being invited, their questions where they will not understand the techie answers anyway, ... etc. BUT i respect the needs of a proper project plan. So yea, even the fact that you wrote couple of bullets on a paper/in notepad and look at them IS actually a project plan. You can easily spot a gap and mitigate potential issue.
Actually i did that
HERE when i was about to update my FreeNAS 9.10 to 11.2
Have printed screenshots of all relevant configuration screens, in case your configuration backup doesn't work, and you need to re-enter everything again by hand. Make sure these are current.
Okay "
printed" i don't like as i see this as a waste of paper (i am not some ecologic activist but i just don't understand the reason of printing something what i can easily view/show on my PC/Notebook. I saw ppl printing a fcking email conversation and distributing the copies on meeting ... USE THE FCKING BEAMER BEHIND YOU ... you moron! And forward the email to all participants if necessary). Anyway ... just make a screenshots. Or be sure you can export the configuration
and you're able to read it outside the system! Some things tends to break/reset to default upon update (in general, not related to FreeNAS) so it is better to have screenshots so you could easily re-do the config w/o spending whole night on it (like when you initially set it up 2 years ago...)
Backup your pool, in case you need to reconstitute it.
Backup, backup, backup ... yes, we're doing backups of everything. Hell i even copy certain directories somewhere else before i do some specific actions even i know there is a full FS backups in place of all of the filesystems on that host.
Test your backups to make sure they work.
Oh yea, we're doing regular DB and FS restore tests to validate integrity of the backups. Few times i even requested a test system into which i could restore recent backup so i could do a test-upgrade there. Note that usually we have more than one nonprod system with the same software constellation like the productive system so we actively "testing" the update there before we go for prod system. Yet this is not 100% bulletproof and some things like data in the DB or size of it, or maybe cluster solution can make things different...
Review the steps needed to reboot back into your current version from the Guide, section 2.5.5.
There is ALWAYS a "Fallback" plan and "point of no return". If that gets crossed and something is not working ... weeellll .. it is restore timeeee!
All this prep work may seem like overkill, but if something goes wrong, you'll have a plan to address it, and as a last resort, revert back to the last known working configuration.
Yes and no ... it depends how important the system/data are. I can wreck some sandbox as many times as i want and i can always build it back from scratch or restore from backup. With QA/Preprod system i can no do that and IF there are issues i need to fix/revert them. At least i am not under that huge time pressure from client side. With prod systems ... as i said, there is no room for stupid mistakes (yet they're happening...).
Remember, a system upgrade is a high-risk event. Many things can go wrong, and an ounce of prevention is better than a pound of cure.
We don't even have to speak about
upgrade (major changes). Even a tiny/small patch can wreak havoc. I recall a situation from few years ago when we were doing a regular patching of kernels (not OS but application). We did several systems - all OK. Then we did two non-prod systems with same constellation (at Mon and Wed). Client confirmed that everything is OK and we have a GO for productive system during weekend. We did it and we had confirmation from two persons from client side that everything is fine, system is stable, data consistent, communication works, etc ... our checklist was all green and the one on the other side as well. We called it a day around 11PM and went to bed. The sh*tstorm started at Monday morning when ppl actually come to office an put a load on the system. First it went slow, then crashed (more than once). I was called in for immediate check followed by war-room meeting with high management asking WTF is going on. We did investigation, collected data and immediately rolled back to previous version. Sadly (luckily?!) we found a data corruption so we had to restore ~15TB database.
All of the collected data were sent to vendor support as from our side everything was done properly. Well we faced a very nasty bug which nobody faced yet. The version passed vendor released tests, caused no issues across dozen of other clients which updated as well, also passed our test-updates. Yet we were the winners... A specific version of DB combined with specific data within DB and actions performed by end users caused unexpected behavior which just wrecked the system within few minutes and caused data loss.
Vendor immediately pulled back the patch release and also VeryHigh announcement was sent across clients that there is a risk of data loss with further descriptions about mitigation
See ... this is similar to the nasty
bug of 11.3-U2 where you could actually lose data. Just for the record 11.3-U2 was released 7.4. 2020. The bug was reported 17.4.2020 and 11.3-U2.1 was released 22.4.2020 so 5 days after. I am not sure if the 11.3-U2 was pulled out from the update servers but the issue was heavily discussed on the forums in relevant section. Personally i am watching the "Installation and Updates" subforum closely when i am about to update my system and i am checking the recent posts/threads a second before i click the "Restart and apply update".
So yea, even the minor updates/patches can cause mayhem... I am not speaking about ransomware infecting 3500 workstations and few hundred servers (maybe you've heard about HYDRO last year? )
Sorry for quite long post. I just wanted to share my view here. A decade ago i would just hit the button and went for a beer or something. I grow up over the years ... And more over my wife would mostly kill me if i would lost her collection of black/white movies from early days of Czech cinematography...
--------------
//EDIT:
I never did any kind of this prep on Windows or Mac OS in 30 years and never lost any data or had to "roll back" or these kind of things.
Hence my disappointment. Clearly I just had wrong expectations about the mission of FeeNAS.
See and i have different experience AND Windows is more like minor OS in our environment (it is mostly AIX and SLES). Yet i saw Windows hosts BSODing after regular patching the moment MSSQL services started. I saw windows native cluster silently screwed because network card arrangement switched places for NO bloody reason! Sadly nobody noticed (or even thought that this could happen by a simple patchset which had nothing to do with network at the first place... ). We found it by the hard way when one of the host was shut down by the clusterware due to HW issue. All of the systems were moved to other cluster node prior the emergency stop yet some of them failed to start. Well our Windows admins found that the NICs are messed up. It was fixed quickly but the retrospective investigation WHY that happened led to the one patchset which messed it up...
So yes, even the MAJOR players on enterprise playfield (Microsoft, IBM, Oracle, SAP, SUSE, ...) doing mistakes. So pardon me when i get triggered when someone shares his disappointment with FreeNAS due to the fact that he should do few precaution tasks prior updating the system...