How to diff directories over ssh
Categories:
Efficiently Diff Directories Over SSH

Learn how to compare the contents of two directories on a remote server, or between local and remote, using SSH and common Linux utilities.
Comparing directories is a common task for developers and system administrators. Whether you're verifying deployments, checking for configuration drift, or simply synchronizing files, knowing how to perform a directory diff remotely over SSH is invaluable. This article will guide you through various methods, from simple diff commands to more robust rsync techniques, ensuring you can effectively manage your remote file systems.
Understanding the Challenge: Remote File Comparison
Directly comparing directories on a remote server or between a local machine and a remote server presents a challenge because diff typically operates on local files. To overcome this, we leverage SSH to execute commands on the remote host or to securely transfer file listings for comparison. The primary goal is to identify differences in file names, sizes, modification times, and content without necessarily downloading all files.
flowchart TD
A[Local Machine] -->|SSH Connection| B[Remote Server]
B --> C{Directory A}
B --> D{Directory B}
C -- Compare --> E[Differences]
D -- Compare --> E
E -->|Report| AConceptual flow of comparing remote directories via SSH.
Method 1: Using diff with SSH and tar
One effective way to compare two directories on a remote server is to create compressed archives of each directory, transfer them to a temporary location (or stream them), and then use diff on the extracted contents. This method is particularly useful when you need a detailed content comparison.
ssh user@remote_host 'tar -cf - /path/to/dir1' | tar -xf - -C /tmp/dir1_local
ssh user@remote_host 'tar -cf - /path/to/dir2' | tar -xf - -C /tmp/dir2_local
diff -r /tmp/dir1_local /tmp/dir2_local
Comparing two remote directories by streaming tar archives locally.
tar archives can consume significant bandwidth. Consider using rsync for more efficient comparisons, especially if you only need to check for file existence or modification times.Method 2: Leveraging rsync for Efficient Comparison
rsync is a powerful utility for synchronizing files and directories, but it also has excellent capabilities for comparing them. By using the --dry-run (-n) and --itemize-changes (-i) flags, rsync can show you exactly what would change without actually performing any transfers. This is often the most efficient method for checking differences between a local and remote directory.
rsync -avn --delete /path/to/local/dir/ user@remote_host:/path/to/remote/dir/
Using rsync in dry-run mode to compare local and remote directories.
The output of rsync -avn will show you files that are different, missing, or extra. The --delete flag is crucial if you want to see files that exist remotely but not locally (or vice-versa, depending on the direction of the sync). The -i (itemize-changes) flag provides a more detailed breakdown of why a file is considered different (e.g., size, modification time, permissions).
rsync -avni --delete /path/to/local/dir/ user@remote_host:/path/to/remote/dir/
Detailed rsync dry-run output with itemized changes.
/path/to/local/dir/) is important with rsync. Without it, rsync would copy the directory itself into the destination, rather than its contents.Method 3: Comparing Directory Listings
For a quick check of file names and basic attributes (like size or modification time), you can compare the output of ls -lR from both directories. This method is less resource-intensive than content-based diffs but won't tell you about content differences within files.
diff <(ls -lR /path/to/local/dir) <(ssh user@remote_host 'ls -lR /path/to/remote/dir')
Comparing ls -lR output between local and remote directories.
This command uses process substitution (<()) to feed the output of ls -lR from the local directory and the remote directory (executed via SSH) directly into the diff command. This allows for a line-by-line comparison of the directory listings.
ls -lR output is sensitive to differences in file permissions, ownership, and timestamps, even if the file content is identical. Use this method when you need to identify such metadata discrepancies.