Working with remote files as if they're local: SSHFS
Table of Contents
The problem #
A common task for both digital nomads and research scientist is accessing often huge, heterogeneous datasets on remote systems. There are plenty of powerful tools out there that synchronize data between any 2 computers, but those are more tailored to copying changes back and forth, with a physical copy present on either system. Often, you do not need the entire dataset, or even part of it, you just want to browse, edit, load, view, make small changes, and save. If your data is stored in compute clusters, then SSHFS can be very useful.
The solution #
SSHFS is a filesystem in user space that take a mountpoint, a local directory, and translates any file operations to remote operations, yet to the local operating system (and you) it appears as if these are local files. SSHFS falls into the category of network mounted filesystems, other variants are e.g. NFS, but that requires the presence of an NFS server. SSHFS on the other hand will work wherever the remote server has an SSH server active, which almost all do, as this is the default way to securely authenticate and access such clusters.
Dependencies #
You need
On Fedora, you need fuse-sshfs
sudo dnf install fuse-sshfs
Next, we create a mountpoint
, essentially an empty directory that will be our access point to the remote filesystem.
mkdir -p /home/$USER/remotefiles
Suppose you have access on a remote system called remote.com
, with a user account remoteme
, and a file accounts.txt
:
ls /home/remoteme/myfiles
accounts.txt
Next, we establish the connection:
sshfs remoteme@remote.com:/home/remoteme/myfiles /home/$USER/remotefiles
remoteme
would be your remote user name, and you’d first configure access (key based) in $USER/.ssh/config
.
Now if we do locally:
ls /home/$USER/remotefiles
> accounts.txt
You can now do anything you’d like with this file, and the changes are written back instantly.
So far this isn’t that different from using say rsync
or other copy based syncing operations.
However, if your remote directory has 1e6 files, and 20PB of storage full of datasets, and you simply want to browse files or quickly view some images, the advantages are clear.
SSHFS will only load what is needed (accessed), whereas syncing protocols will copy everything (there are exceptions).
Advanced usage #
Caching for low latency #
Because your file operations are now translated to network operations, on slow networks you can suffer high latency. If you know you’re the only one accessing these files, then you can afford to let SSHFS cache the files, and only write out the cache at a specified time. This dramatically speeds up file operations. Enabling this is straightforward, you add the options:
-o cache=yes -o cache_timeout=$CACHE -o kernel_cache
CACHE
would be an integer value in seconds to keep the cache for this many seconds, e.g. 300=5 minutes.
Robustness with reconnecting #
If your network connection goes down, now the link with the remote server is broken, requiring you to reissue the command. Instead, you can tell SSHFS to do this itself, by adding:
-o reconnect
Symlinks #
Quite often directories in Linux are set up as symlinks, e.g. symbolic links. By default SSHFS does not follow these, but it’s easy to enable:
-o follow_symlinks
Compression #
If your data is compressible, e.g. text files or non-compressed images (tiff), then you can further lower latency (and bandwidth), at the cost of CPU, by enabling on the fly compression:
-C
Debugging #
If you want to check if a directory is still mounted:
mount | grep sshfs
Bringing it all together #
A script that does all of the above can be found here, for example:
./remotemount.sh remote.server.com:/home/$REMOTEUSER /home/$USER/mountpoint 240
and to unmount:
./remotemount.sh /home/$USER/mountpoint
Conclusion #
SSHFS offers an elegant, fast, low latency method to access files interactively on a remote machine, relying solely on SSH. You don’t need root, apart from the installation of FUSE on your local machine, for mounting you do not need root access.
Alternatives & Resources #
- Arch Wiki The always super documentation from Arch Linux has its own page on SSHFS
- Red Hat docs
- Sirikali Gui fronted for encrypted filesystem, and SSHFS
- Globus: Syncing protocol, with gridftp support, needs an account
- SSHFS for windows
- RSync fast incremental file transfer
- NFS Network File System