Administration

Automate backup via rsync - data security for admins

I automate my rsync backupto avoid failures and keep recoveries predictable. With clear Cron jobsSSH transport and incremental runs, I efficiently secure web servers, databases and configurations.

Key points

AutomationTime-controlled jobs reduce errors and effort.
EfficiencyDelta transfer saves bandwidth and memory.
SecuritySSH, key management and offsite targets.
StrategyGFS retention and clear RPO/RTO targets.
TransparencyLogging, monitoring and restore tests.

Why I automate backups

I consistently secure productive systems because a single failure can stop entire projects and I Availability wants to guarantee. A scheduled backup run at 02:00 replaces error-prone manual work and ensures clean data statuses. I define clear targets for each server: How much data loss is acceptable (RPO) and how quickly recovery must take place (RTO). These targets influence the schedule, storage target and options so that I can reliably safeguard operations. For web servers in particular, I reduce the risks of hardware defects, ransomware or accidental deletion to a calculable minimum.

rsync briefly explained: Functionality and strengths

rsync only transfers changes, uses an efficient Delta transfer and avoids unnecessary copies. This significantly reduces runtimes, network load and IO on the target system. I work in archive mode (-a) so that rights, times, owners and symlinks remain consistent. I use -delete to keep mirrors up to date, but pay attention to the intended use and combine it with separate directories for versioning. I use SSH for transport, direct paths for local jobs and add compression (-z) and bandwidth limit (-bwlimit) if required.

Automation with Cron: step by step

I start with a simple script, because clear Baselines can be expanded later. First I install rsync, if it is missing, and create a secure working directory for logs and status. Then I write a script with sources, target and sensible options including exclusions. The cronjob runs daily or hourly depending on the RPO and writes log files for evaluation and alerting. A dry run (-n) before the first productive run prevents unwanted deletions.

# Installation (Debian/Ubuntu)
sudo apt-get install rsync

# Minimal run locally
rsync -a /data/www/ /backup/www/

# Remote mirror via SSH with deletions
rsync -a --delete -e "ssh -i /root/.ssh/backup_key" /data/www/ backup@backuphost:/srv/backups/www/

# Cron: daily at 02:00 a.m.
0 2 * * * * /usr/local/bin/rsync-backup.sh >> /var/log/rsync-backup.log 2>&1

Set up SSH backups securely

I use SSH keys with limited rights because Key management reduces the attack surface. On the target system, I limit commands via authorized_keys and use a separate backup user. Fail2ban, firewall rules and restrictive SSH options (e.g. PasswordAuthentication no) increase security. I store the host key so that man-in-the-middle has no chance. If you are looking for a structured start, you can find tried and tested ideas at Automate backups.

Pull instead of push: security benefits in practice

Where possible, I leave the Backing up the backup server instead of pushing the production server. This leaves production systems without outgoing keys, and a compromised web server cannot delete offsite backups. On the target, I limit the key in the authorized_keys with restrictive options and a forced command.

# Example authorized_keys on the backup target
from="10.0.0.10",no-agent-forwarding,no-pty,no-port-forwarding,no-X11-forwarding,command="/usr/local/bin/rsync-serve-backups"
/home/backup/.ssh/authorized_keys

The called script only allows rsync server calls and sets path limits. This is how I achieve a Principle of minimal rightswithout complicating operation.

Versioning and storage with hard links

For several stands I build daily, weekly and monthly folders with -link-dest, because Hardlinks Save memory and simplify restores. Each generation refers to identical, unchanged files from the previous backup; new or changed files are physically stored in the new folder. This allows me to achieve many restore points with moderate storage requirements. I remove old generations with a simple rotation script without risking data consistency. A fixed schedule (e.g. 7 days, 4 weeks, 6 months) keeps storage planning clear and transparent.

Control resources: Bandwidth, CPU and I/O

I limit the data throughput with -bwlimit so that Productive load remains stable and users do not experience any losses. I use nice and ionice to reduce the priorities of the backup processes. I switch on compression (-z) on slow networks and leave it off on fast local media. For large files, I select -partial to be able to continue interrupted transfers. For local mirrors, I often use -whole-file because the delta algorithm has no advantages there.

Consistent data statuses: snapshots and databases

To maintain consistent backups despite open files, I use Snapshots or application hooks. File systems such as LVM, ZFS or Btrfs allow fast snapshots, which I use as a source for rsync. This allows me to logically freeze the data status without blocking services for a long time.

# Example: LVM snapshot as consistent source
lvcreate -L 10G -s -n data_snap /dev/vg0/data
mount /dev/vg0/data_snap /mnt/data_snap
rsync -a --delete /mnt/data_snap/www/ backup@host:/srv/backups/www/
umount /mnt/data_snap
lvremove -f /dev/vg0/data_snap

For Databases I separate logic and files. I back up MySQL/MariaDB via dump or Percona/Xtra solutions, PostgreSQL with pg_dump or basebackups. The sequence is important: first create a consistent dump, then transfer the dump path via rsync. This prevents half-written files in the backup.

Typical sources of error and how to avoid them

The most common stumbling block is the slash at the end of a path, so I check Path information double: /src/ vs. /src. I test exclusions with -dry-run and -itemize-changes to see the effect. I quote patterns with spaces correctly and keep the exclude file in the repository. Before -delete I check mounts, because an unmounted target can lead to unwanted deletion. Finally, I log return codes and activate alarms so that I can see errors immediately.

Backup strategy: GFS and recovery targets

I set RPO/RTO first, because clear Goals guide every decision on frequency, storage location and retention. A common scheme is GFS: daily differential, weekly full or merged via hard links, monthly long-term. Compliance requirements influence the retention period, so I separate short-lived operational data from long-term archives. For critical systems, I plan offsite backups in addition to local copies. This combination protects against site failures and enables both fast and remote restores.

Cron or systemd-timer: Reliable planning

Cron is simple and robust. For hosts that occasionally sleep or restart, I also use systemd-timer with dependencies and missed run handling. This ensures that no run fails after a reboot and that the sequence is correct (e.g. after network recovery).

# /etc/systemd/system/rsync-backup.service
[Unit]
Description=Rsync Backup
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/rsync-backup.sh
Nice=10
IOSchedulingClass=best-effort
IOSchedulingPriority=7

# /etc/systemd/system/rsync-backup.timer
[Unit]
Description=Daily Rsync backup timer

[Timer]
OnCalendar=02:00
Persistent=true

[Install]
WantedBy=timers.target

Table: Important rsync options in everyday life

I use a few, but effective Optionswhich I document for each job and version in the repo. Archive mode forms the basis and reduces configuration errors. I keep mirrors clean with -delete, but only use it with the correct target check. For versioning, I combine -link-dest with a rotation plan. The following table shows the most important switches and their use.

Option	Purpose	Note
-a	Archive mode	Assumes rights, times, ownership for Consistency
-z	Compression	Useful for WAN, often omitted locally
-delete	Removes removed legacy files	Only use with mirrors, dry run beforehand
-bwlimit=KBPS	Throttle bandwidth	Protects Productive systems from load peaks
-link-dest=DIR	Versioning via hard links	Economical generations with separate folders
-e "ssh ..."	Safe transportation	Keys, host keys, restrictive users
-n / -dry-run	Test run without changes	Shows planned actions in advance

Test recovery: Restore exercises

I regularly test the restore, because a backup without a restore is only Appearance is. For samples, I restore individual files and entire webroots in staging environments. I back up databases separately via dump and import them on a test basis to check consistency. Checksums help me to confirm integrity and detect silent bit errors. After each test, I adapt the documentation, scripts and playbooks so that the next emergency runs faster.

Bare metal and system backups: special features

For system or bare-metal backups, I extend the rsync options to ACLs, xattrs and hardlinks to take with you. On Linux, -aAXH and -numeric-ids have proven their worth. I exclude pseudo file systems such as /proc, /sys, /run, /dev/pts and save boot and configuration files documented separately.

# System backup (example)
rsync -aAXH --numeric-ids \
  --exclude={"/proc/*","/sys/*","/run/*","/dev/pts/*","/tmp/*"} \
  / root@backup:/srv/backups/hostA/current/

# Restore (simplified, from live media)
rsync -aAXH --numeric-ids /srv/backups/hostA/current/ /mnt/target/
chroot /mnt/target update-initramfs -u
grub-install /dev/sda && update-grub

I plan more time for such restores, document the sequence and keep bootloader steps to hand. This significantly reduces stress in an emergency.

Plesk integration: cleverly combining panel jobs

I combine Plesk tasks with rsync so that Panel backups and offsite copies work together. Post-backup hooks immediately transfer fresh backups to external storage. Schedules coincide with maintenance windows so that deployments and backups do not interfere with each other. Log rotation in the panel and on the target system prevents growing log directories. A good starting point is provided by Plesk Best Practices with a focus on repeatable processes.

cPanel integration: Homedirs and databases

I then drag cPanel backups to an external host via rsync so that Offsite protection remains without additional load. The directory structure facilitates selective restores of individual accounts. For large reseller setups, I plan differential runs overnight and full stands at the weekend. In combination with quotas and rotations, I keep storage requirements transparent. A practical addition is the information on cPanel backups for solid day-to-day operations.

Scaling and structure: manage many jobs cleanly

When environments grow, I structure sources and excludes centrally. With -files-from I transfer lists that I version in the repo. This way I change records without touching scripts and keep locations consistent.

# files-from example
/etc/backup/www.list
/data/www/
/etc/nginx/
/var/www/html/

rsync -a --delete --files-from=/etc/backup/www.list / backup@host:/srv/backups/www/

I avoid overlaps by clearly separating path responsibility (e.g. Webroot separately from logs). For large sets, I consciously plan parallelism - rather a few, well-timed jobs than dozens of competing processes.

Robustness in operation: locking, retries, timeouts

In order to avoid overlaps, I use flock or lock files. I intercept network problems with Retries and -partial. With -timeout I terminate hanging connections cleanly and raise an alarm.

# /usr/local/bin/rsync-backup.sh (excerpt)
#!/usr/bin/env bash
set -euo pipefail

exec 9> /var/lock/rsync-backup.lock
flock -n 9 || { echo "Backup already running"; exit 1; }

LOG=/var/log/rsync-backup.log
SRC=/data/www/
DST=backup@backuphost:/srv/backups/www/

for i in {1..3}; do
  if rsync -a --delete --partial --timeout=600 -e "ssh -i /root/.ssh/backup_key" "$SRC" "$DST"; then
    echo "OK"; exit 0
  fi
  echo "Retry $i" | tee -a "$LOG"
  sleep 60
done
echo "Error after retries" >> "$LOG"; exit 1

Options for special cases: ACLs, xattrs, sparse and atomicity

I adapt rsync depending on the type of data. For web and system paths, I back up ACLs and xattrs with -A -X. Large, sparsely used files (VM images, databases) benefit from -sparse. With -delay-updates and -delete-delay I minimize intermediate states and achieve quasi-atomic updates on the target. For sensitive data, I avoid -inplace so that defective transfers do not overwrite the last good copy.

# Example for extensive metadata
rsync -aAXH --sparse --delete-delay --delay-updates SRC/ DST/

If I need absolutely atomic directories (e.g. for staging), I synchronize to a temporary target and run then use mv to switch to the live directory.

Secure deletion limits and plausibility checks

To prevent disasters caused by misconfiguration, I set -max-delete than Guardrail. I also check mounts, free memory and inodes before the run. I have the last successful backup logged and warn of outliers (extreme deletion or modification rates).

# Protection against mass deletion
rsync -a --delete --max-delete=5000 SRC/ DST/

# Plausibility check (simple)
df -h /srv/backups
df -i /srv/backups

Briefly summarized: This is how I proceed

I define RPO/RTO first, because clear Priorities guide every technical choice. I then write a lean script, test with -dry-run and log every execution. With SSH keys, bandwidth limits and versioning, I back up efficiently and traceably. Offsite destinations supplement local copies, and regular restore exercises confirm its suitability. So my rsync backup remains reliable, fast and always ready for use.