Please double-check my logic before applying deduplication to backups.

fkelava

Cadet
Joined
Nov 10, 2020
Messages
1
Hey everyone,

As a freshly minted TrueNAS user with relatively meagre experience running ZFS, I'd appreciate it if you could give me a sanity check before proceeding with backup dataset deduplication. As far as I understand it, deduplication should be approached with care as it _is_ an irreversible process, and one that can have fairly nasty consequences if it goes beyond sane memory constraints.

The idea is this; the pool currently contains a single device, a 6TiB Toshiba X300 drive. It is divided into multiple datasets, but I intend to deduplicate only one; the dataset that contains VM backups from the local Proxmox VE deployment. The only VMs that currently have a strict backup schedule (once weekly) are the twin domain controllers running on Proxmox - the backups are generally 5-6GiB in size, and they are (owing to the fact these are domain controllers) largely unchanging. It stands to reason that they should constitute a load well suited to deduplication. The deduplicated backup dataset has a hard quota of 500GiB, which I do not intend to deviate from. Backups of larger devices/VMs that are prone to constant change will go into a separate, non-deduplicated dataset.

Can I check ahead of time (I have some older backups that I can squeeze into the dataset before enabling dedup) what the theoretical deduplicability is? If so, how?

TrueNAS is running in a VM. It is allotted 16GiB RAM, a limit that I am willing to increase if needed. I believe 6TiB, of which one small dataset is deduplicated, should fit in that constraint. If not, I am willing to increase the amount of memory assigned to it. Setup attached below.

HP Z440 workstation
Intel Xeon E5-1620v3 (4C/8T@3.50GHz) | 32GiB RAM [thereof 16 assigned to TrueNAS VM, 16 to other domain infra]
Proxmox VE 6.2-4 [Linux 5.4.34-1-pve #1 SMP PVE 5.4.34-2 (Thu, 07 May 2020 10:02:02 +0200), pve-manager/6.2-4/9824574a] | TrueNAS-12.0-RELEASE

Drive configuration:
1x Samsung PM981 256GB SSD - currently hosting Proxmox/TrueNAS
1x Toshiba DT01ACA100 1TiB - currently not in use, will soon host Proxmox/TrueNAS so the SSD remains open for ZIL/L2ARC if need be
1x Toshiba HDWE160 (X300) 6TiB - currently the only device in the pool. More of the same arriving in due course.

Does this appear to be a sane course of action? Would you suggest something else? Am I correct in assuming I'd be able to fit this within the current memory limit or do you believe more would be required? If more/supplementary info is required, I'd be happy to provide it.

Kind regards, and thank you in advance for your time and help.
 

Heracles

Wizard
Joined
Feb 2, 2018
Messages
1,401
Hey @fkelava,

Personally, I would not do this :
I would not use a virtual server for production
I would not do deduplication at all
I would not do deduplication with less than 128G of RAM
I would not run anything from a single drive vDev

As for me, that is a clear and big No-Go...
 
Last edited:
Top