Duperemove can be used to de-duplicate a BTRFS directory/file system.
It should be available in package repositories:
apt -y install duperemove
To deduplicate the directory duplicates
recursively:
duperemove -hdr duplicates/
If there is a large number of files and duperemove
will be ran on a cron it is recommended to use a hash file. This will store the current deduplication state and allow subsequent runs to be much faster.
To use a hash file:
duperemove -hdr --hashfile=.duplicates.hash duplicates/
To get the disk usage before/after deduplication, use the following command:
btrfs fi df duplicates/
The compsize
utility can also be used:
sudo compsize duplicates/
A systemd timer/service can be used to automatically deduplicate the directories automatically.
Create the service referencing the directories to deduplicate in the file /etc/systemd/system/dedupe.service
:
[Unit]
Description=Deduplicate BTRFS file system directories
[Service]
Type=simple
TimeoutSec=3600
TimeoutStartSec=3600
TimeoutStopSec=3600
ExecStart=/usr/bin/duperemove -hdr --hashfile=/path/to/.dedupe.hash /path/to/dedupe
ExecStart=/usr/bin/duperemove -hdr --hashfile=/another/path/to/.dedupe.hash /another/path/to/dedupe
[Install]
WantedBy=default.target
Create a timer referencing the above service in the file /etc/systemd/system/dedupe.timer
:
[Unit]
Description=Deduplicate BTRFS directories daily
RefuseManualStart=no
RefuseManualStop=no
[Timer]
OnCalendar=daily
Persistent=true
RandomizedDelaySec=7200
Unit=dedupe.service
[Install]
WantedBy=timers.target
Finally enable/start the timer:
systemctl enable dedupe.timer --now