Frequently Use{d|ful} Linux System Administration Commands
Table of Contents
This page collects shell commands I tend to use a lot, albeit infrequently. They were distributed across note, history files on different systems, post-its, and slack sticky posts. Keeping them in place seems somewhat better organized, and at the same time I can review them for sanity. Because I work with RHEL based systems most of these assume systemd and dnf in some form or other. Keeping them in place seems somewhat better organized, and at the same time I can review them for sanity.
** These are not by any means safe for copy-paste usage, these are system administration commands that can ruin your system, or worse, a production system **
Git (to be moved to sep page) #
Change the last commit’s email (used for privacy at github)
git commit --amend author="XY <email>"
Tmux #
Enable mouse
#~/.tmux.conf
set -g mouse on
Package management #
Install security updates without touching kernel #
sudo dnf update --security --exclude="kernel*"
Motivation, updates invariably break things, and updating the kernel means a recompilation of graphic drivers, which are still snowflakes in terms of stability, in my experience at least. Similarly, while firmware updates on Linux has been much improved with the advent of fwupd, my last ‘just how recent is my backup’ panic was induced by a firmware upgrade that touched UEFI, and removed the bootloader entries. Restoring that from an encrypted disk is less than amusing.
Fixing CVEs is critical though, so this updates any program with a CVE, but otherwise keeps the system untouched.
Device management #
Show a tree of USB devices. The tree format is useful to see which device controls others.
lsusb -tv
Pick out a specific device
lsusb -svv <X>
where X is the device id.
Full listing of all hardware
<sudo> lswh
Get RAM configuration (aka ‘how many free slots are there left?’)
sudo dmidecode -t 17 | more
Reads the DMI information of RAM sticks and parses it into human format.
For example to know how many slots you have
sudo dmidecode -t 16 | grep Devices | wc -l
Finding the nr of filled RAM slots (not perfect)
sudo dmidecode -t 17 | grep -v "GB" | wc -l
Maximum size your board supports
sudo dmidecode -t 16 | grep Maximum
GPU #
GPUs are ubiquitous in HPC, but getting them work can be non-trivial. The following commands are Frequently used to debug drivers/compute libraries, focusing on NVidia mostly because I don’t have access to other hardware.
Nvidia-smi #
Essential tool, most use this as is without parameters, but it’s very powerful beyond that
nvidia-smi
Print raw CSV like stats
nvidia-smi stats
Switch to monitoring
nvidia-smi dmon
With detailed selection
nvidia-smi dmon -s puct -o T -c 1
Where
- s (select metric)
- p power
- c clock
- t (RxTx)
- u utilization
- c
- sample c times, report
- d
- o T display Time
History #
The unsung hero feature of shells, the history file and its commands allow you to timetravel to times when you solved a problem you’re facing, again.
history | grep <query>
Execute the 19th command
!19
Get the 19th command, but do not execute it, because you want to change(e)/print(p) it
!19:p
Security #
SELinux #
sestatus
SELinux controls pretty much everything, and that includes which contexts can access ports. So if you change e.g. SSH, you need to tell SELinux. Modifying a port
sudo semanage port -l | grep ^ssh
semanage port -a -t ssh_port_t -p tcp <NEW>
# or if that port is used by another service
sudo semanage port -m -t ssh_port_t -p tcp <NEW>
Debugging SELinux #
Look at the raw logs
sudo cat /var/log/audit/audit.log | grep "AVC"
Look at the same logs, but let setroubleshoot translate it to human
sealert -a /var/log/audit/audit.log
It will then suggest to do the right thing: file a bug. Given that you still need to ‘fix’ the problem, for now, usually it also tells you how to override This example is for iptables, DO NOT COPY.
ausearch -c 'iptables' --raw | audit2allow -M my-iptables
semodule -i my-iptables.pp
Firewall #
(RHEL based) Firewalld works with zones (~ configurations/templates/policies).
With all commands, the shell will happily oblige and do exactly what you wrote, which is not always what you meant. The occasional typo wiping out a local filesystem is a fun exercise for your backup routine, it’s a different problem if you break SSH or firewall config, locking yourself out, and you have no out of band (web console), so you need to physically go to the machine.
So you’d do something like
- configure the daemon
- tell SELinux about the change
- tell firewallcmd about the change
- review / test runtime
- If 0 issues, make it permanent
Adding SSH to public. Permanent means the configuration is, well, permanent, aka it survives reloading. Use permanent if you have tested (runtime) the config change.
sudo firewall-cmd --permanent --zone=public --add-service=ssh
sudo firewall-cmd --reload
Add a port
sudo firewall-cmd --zone=<ZONE> --add-port=<PORT>/<PROTOCOL> --permanent
e.g. –add-port=22/tcp
Viewing status
sudo firewall-cmd --zone=public --list-all
Managing processes #
Searching #
Quickly finding out what running processes match the query, useful for e.g. tracking if something under your account is running that shouldn’t, if a process you just killed actually was killed, and did not ignore the command and so on.
ps aux | grep "query"
Timeout #
timeout k[s,m,d] <command>
If command does not finish < k sec/min/days, kill it. Useful for ’try but not beyond’ use cases, e.g. SSH potentially hanging.
Interrupting #
kill [-SIGNR] PID
pkill [-SIGNR] regex
For example
kill -USR1 PID
for the PID of dd
would ask it to print it’s current status (bytes written).
kill -9 PID
Sends SIGKILL to the process. Default is SIGTERM. Processes in a locked / frozen / unresponsive state may not even respond to SIGKILL. As a last resort, you could try inducing a segmentation fault, which occurs when a program references memory it should not, typical for C/C++ programs and libraries. This is unwise unless everything else failed, and it’s very likely the killed process will not clean up (file/socket descriptors), though the OS may do it for you.
kill -11 PID
Stopping a swathe of processes #
Sometimes you want to halt all processes that match a certain pattern, not just 1 by 1.
ps aux | grep <pattern> | grep -v "grep" | awk '{print $2}' | xargs | kill [-SIGNR]
Decomposed
grep -v "grep"
By default, the grep process will match it’s own pattern, so it will be listed, killing it is meaningless, because the PID won’t exist anymore anyway, and you don’t want to in the first place, so do a negative 2nd match.
awk '{print $2}'
Print the 2nd column of each line
xargs | kill [-SIGNR]
For each line selected by grep, execute kill with predefined signal.
Files #
Get the full path of a file/dir #
readlink -f $X
or in Julia
abspath(X)
Find the last modified entry in a directory. #
printlastmodified () { ls -alsht --ignore=. --ignore=.. $1 | head -n 2 | cut -d$'\n' -f 2 ; }
Use case: imagine having 100 jobs writing log files, finding the one that’s active (when the others are deadlocked) is pretty useful. Note Do not use this on a large, slow filesystem, it’s not efficient at all.
Archiving #
You can either craft a oneliner for zip, or download 7zip (on CLI), and run that.
7z a archive dir
7z d archive pattern -x exclude -i include
7z x|e archive # extract with / out structure
{pop|push}d #
Deep directory traversal is a pain, a typical workflow is exploratory, e.g. you’re in a code directory, explore (cd) in a data directory, and now need to go back.
Autocomplete helps, but it’d be easier if the shell just unwinds your steps. Bash and zsh support push/popd
cd
pwd
> $HOME
cd /dev/shm
pwd
> /dev/shm
cd -
pwd
> $HOME
More advanced:
pushd <somedir> # add . or somedir to stack
popd # unwinds stack
dirs # show history currently recorded
Doing it automatically with ZSH
# ~/.zshrc
setopt autopushd
Now cd X equates to pushd X
Archives #
Verbose compression of a directory
tar -xvzf archive.tgz filepattern
Listing files
tar -tvf archive.tgz
Using parallel compression with human readable progress
tar --use-compress-program="pigz -p $CORES --best --recursive" -c --checkpoint-action=ttyout='%{%Y-%m-%d %H:%M:%S}t (%d sec): #%u, %T%*\r' $DIRECTORY -f $ARCHIVE
Needs pigz.
Remote mounting #
File synchronization can be too heavy handed if all you want to do is ‘browse’ a remote filesystem. Mounting over SSHFS allows you to do just that, but it will be at the cost of efficiency.
MOUNTPOINT=/your/empty/dir
OPTIONS="-C -o follow_symlinks -o cache=yes -o reconnect -o cache_timeout=300 -o kernel_cache"
sshfs $REMOTE:/home/$REMOTEUSER $MOUNTPOINT $OPTIONS
Decrease the timeout to get more ‘instant’ reads of writes. Reconnect is a blessing, you run this on a laptop, hibernate, and it will reestablish the tunnel. The caching options give a nice speed/latency boost if you work alone, but if multiple users/processes access the files you’re mounting, you may want to either mount read only, or lower the cache settings to see updates instantly.
Network #
DIY latency measurement. Test how variable the latency of the network is, by pinging the IPv4 address of Cloudflare’s DNS servers. The idea being that it’s very unlikely for them to fail, so any failure in measuring response has to be the network route to them.
watch -d -n 2 "ping -A -c 10 -i 0.2 1.1.1.1 | tail -n 2"
This sends out 10 pings, 200ms apart to 1.1.1.1, then strips the summary and watches it for differences. It’s a quick heuristic, if during a zoom this shows sudden spikes in the leading digits of min/max rtt, then I know whatever stuttering or drops in quality will be on my side, not the endpoints.
Logs #
When I’m performing sysadmin tasks, the first place I look at login is usually the system log. Same holds for pretty much any aberrant system behavior, from unstable Wifi to freezes, high power usage and so on.
journalctl --disk-usage
sudo journalctl --vacuum-size=200M
Interactively follow journal entries
journalctl --all --follow
When enabling or reconfiguring a new service, tracking only that service is essential:
journalctl -f -u <service>
Quickly view the last section with human interpretation
(sudo) journalctl -xe
-x Lookup meaning in catalog for more meaningful info -e jump to end
Searching
journalctl | grep "myquery"
# OR
journalctl -g "PERLREGX"
The second one being a bit more powerful.
Force a flush to disk (for example, when you suspect imminent hardware crash, instability etc)/
journalctl --flush # to /var/log/journal/
journalctl --sync # force disk writes
Access control lists #
ACL controls user access to a Linux filesystem in addition to the standard Linux user/group system, advantages being that one file can now be assigned or denied rights by many users and many groups in a far more granular way. These are common in high performance computing clusters, and allow you useful things such as share folders between three groups: current students, visiting students, and a visiting postdoc.
View current rights
getfacl <file>
Set defaults for new files
setfacl -d -m g:<group>:rwx $TARGET
-d default (not current)
Set current rights
setfacl -R -m g:<group>:rwx $TARGET
-R : recurse -m : modify
Note #
Make sure the entire path to a target has +x for the group or user in question on each folder. If $TARGET is /a/b/c/d, then all of a, b, c need execute permission on the folder for the group or user, otherwise the permissions on ’d’ might be correct, but there’s no way for the users to get to ’d’ in the first place (execute on folder is access contents).
module purge && module load julia/1.6.2 && module load singularity