Frequently Use{d|ful} Linux System Administration Commands

Table of Contents

This page collects shell commands I tend to use a lot, albeit infrequently. They were distributed across note, history files on different systems, post-its, and slack sticky posts. Keeping them in place seems somewhat better organized, and at the same time I can review them for sanity. Because I work with RHEL based systems most of these assume systemd and dnf in some form or other. Keeping them in place seems somewhat better organized, and at the same time I can review them for sanity.

** These are not by any means safe for copy-paste usage, these are system administration commands that can ruin your system, or worse, a production system **

Git (to be moved to sep page) #

Change the last commit’s email (used for privacy at github)

git commit --amend author="XY <email>"

Tmux #

Enable mouse

#~/.tmux.conf
set -g mouse on

Package management #

Install security updates without touching kernel #

sudo dnf update --security --exclude="kernel*"

Motivation, updates invariably break things, and updating the kernel means a recompilation of graphic drivers, which are still snowflakes in terms of stability, in my experience at least. Similarly, while firmware updates on Linux has been much improved with the advent of fwupd, my last ‘just how recent is my backup’ panic was induced by a firmware upgrade that touched UEFI, and removed the bootloader entries. Restoring that from an encrypted disk is less than amusing.

Fixing CVEs is critical though, so this updates any program with a CVE, but otherwise keeps the system untouched.

Device management #

Show a tree of USB devices. The tree format is useful to see which device controls others.

lsusb -tv

Pick out a specific device

lsusb -svv <X>

where X is the device id.

Full listing of all hardware

<sudo> lswh

Get RAM configuration (aka ‘how many free slots are there left?’)

sudo dmidecode -t 17 | more

Reads the DMI information of RAM sticks and parses it into human format.

For example to know how many slots you have

sudo dmidecode -t 16 | grep Devices | wc -l

Finding the nr of filled RAM slots (not perfect)

sudo dmidecode -t 17 | grep -v "GB" | wc -l

Maximum size your board supports

sudo dmidecode -t 16 | grep Maximum

GPU #

GPUs are ubiquitous in HPC, but getting them work can be non-trivial. The following commands are Frequently used to debug drivers/compute libraries, focusing on NVidia mostly because I don’t have access to other hardware.

Nvidia-smi #

Essential tool, most use this as is without parameters, but it’s very powerful beyond that

nvidia-smi

Print raw CSV like stats

nvidia-smi stats

Switch to monitoring

nvidia-smi dmon

With detailed selection

nvidia-smi dmon -s puct -o T -c 1

Where

s (select metric)
- p power
- c clock
- t (RxTx)
- u utilization
c
- sample c times, report
d
- sample every d secs, keep running
o T display Time

History #

The unsung hero feature of shells, the history file and its commands allow you to timetravel to times when you solved a problem you’re facing, again.

history | grep <query>

Execute the 19th command

!19

Get the 19th command, but do not execute it, because you want to change(e)/print(p) it

!19:p

Security #

SELinux #

sestatus

SELinux controls pretty much everything, and that includes which contexts can access ports. So if you change e.g. SSH, you need to tell SELinux. Modifying a port

sudo semanage port -l | grep ^ssh
semanage port -a -t ssh_port_t -p tcp <NEW>
# or if that port is used by another service
sudo semanage port -m -t ssh_port_t -p tcp <NEW>

Debugging SELinux #

Look at the raw logs

sudo cat /var/log/audit/audit.log | grep "AVC"

Look at the same logs, but let setroubleshoot translate it to human / steps.

sealert -a /var/log/audit/audit.log

It will then suggest to do the right thing: file a bug. Given that you still need to ‘fix’ the problem, for now, usually it also tells you how to override This example is for iptables, DO NOT COPY.

ausearch -c 'iptables' --raw | audit2allow -M my-iptables
semodule -i my-iptables.pp

Firewall #

(RHEL based) Firewalld works with zones (~ configurations/templates/policies).

Warning! Leave a running ssh session open when changing a firewall or ssh config so you don’t lock yourself out.

With all commands, the shell will happily oblige and do exactly what you wrote, which is not always what you meant. The occasional typo wiping out a local filesystem is a fun exercise for your backup routine, it’s a different problem if you break SSH or firewall config, locking yourself out, and you have no out of band (web console), so you need to physically go to the machine.

Note 2: Do not forget SELinux. Allowing a port does not mean the daemon (httpd, sshd, nginx, …) is actually allowed to listen on the port.

So you’d do something like

configure the daemon
tell SELinux about the change
tell firewallcmd about the change
review / test runtime
If 0 issues, make it permanent

Adding SSH to public. Permanent means the configuration is, well, permanent, aka it survives reloading. Use permanent if you have tested (runtime) the config change.

sudo firewall-cmd --permanent --zone=public --add-service=ssh
sudo firewall-cmd --reload

Add a port

sudo firewall-cmd --zone=<ZONE> --add-port=<PORT>/<PROTOCOL> --permanent

e.g. –add-port=22/tcp

Viewing status

sudo firewall-cmd --zone=public --list-all

Managing processes #

Searching #

Quickly finding out what running processes match the query, useful for e.g. tracking if something under your account is running that shouldn’t, if a process you just killed actually was killed, and did not ignore the command and so on.

ps aux | grep "query"

Timeout #

timeout k[s,m,d] <command>

If command does not finish < k sec/min/days, kill it. Useful for ’try but not beyond’ use cases, e.g. SSH potentially hanging.

Interrupting #

kill [-SIGNR] PID
pkill [-SIGNR] regex

For example

kill -USR1 PID

for the PID of dd would ask it to print it’s current status (bytes written).

kill -9 PID

Sends SIGKILL to the process. Default is SIGTERM. Processes in a locked / frozen / unresponsive state may not even respond to SIGKILL. As a last resort, you could try inducing a segmentation fault, which occurs when a program references memory it should not, typical for C/C++ programs and libraries. This is unwise unless everything else failed, and it’s very likely the killed process will not clean up (file/socket descriptors), though the OS may do it for you.

kill -11 PID

Stopping a swathe of processes #

Sometimes you want to halt all processes that match a certain pattern, not just 1 by 1.

ps aux | grep <pattern> | grep -v "grep" | awk '{print $2}' | xargs | kill [-SIGNR]

Decomposed

grep -v "grep"

By default, the grep process will match it’s own pattern, so it will be listed, killing it is meaningless, because the PID won’t exist anymore anyway, and you don’t want to in the first place, so do a negative 2nd match.

awk '{print $2}'

Print the 2nd column of each line

xargs | kill [-SIGNR]

For each line selected by grep, execute kill with predefined signal.

Files #

Get the full path of a file/dir #

readlink -f $X

or in Julia

abspath(X)

Find the last modified entry in a directory. #

printlastmodified () { ls -alsht --ignore=. --ignore=.. $1 | head -n 2 | cut -d$'\n' -f 2 ; }

Use case: imagine having 100 jobs writing log files, finding the one that’s active (when the others are deadlocked) is pretty useful. Note Do not use this on a large, slow filesystem, it’s not efficient at all.

Archiving #

You can either craft a oneliner for zip, or download 7zip (on CLI), and run that.

7z a archive dir

7z d archive pattern -x exclude -i include

7z x|e archive # extract with / out structure

{pop|push}d #

Deep directory traversal is a pain, a typical workflow is exploratory, e.g. you’re in a code directory, explore (cd) in a data directory, and now need to go back.

Autocomplete helps, but it’d be easier if the shell just unwinds your steps. Bash and zsh support push/popd

cd
pwd
> $HOME
cd /dev/shm
pwd
> /dev/shm
cd -
pwd
> $HOME

More advanced:

pushd <somedir> # add . or somedir to stack
popd # unwinds stack
dirs # show history currently recorded

Doing it automatically with ZSH

# ~/.zshrc
setopt autopushd

Now cd X equates to pushd X

Archives #

Verbose compression of a directory

tar -xvzf archive.tgz filepattern

Listing files

tar -tvf archive.tgz

Using parallel compression with human readable progress

tar --use-compress-program="pigz -p $CORES --best --recursive" -c --checkpoint-action=ttyout='%{%Y-%m-%d %H:%M:%S}t (%d sec): #%u, %T%*\r' $DIRECTORY -f $ARCHIVE

Needs pigz.

Remote mounting #

File synchronization can be too heavy handed if all you want to do is ‘browse’ a remote filesystem. Mounting over SSHFS allows you to do just that, but it will be at the cost of efficiency.

MOUNTPOINT=/your/empty/dir
OPTIONS="-C -o follow_symlinks -o cache=yes -o reconnect -o cache_timeout=300 -o kernel_cache"
sshfs $REMOTE:/home/$REMOTEUSER $MOUNTPOINT $OPTIONS

Decrease the timeout to get more ‘instant’ reads of writes. Reconnect is a blessing, you run this on a laptop, hibernate, and it will reestablish the tunnel. The caching options give a nice speed/latency boost if you work alone, but if multiple users/processes access the files you’re mounting, you may want to either mount read only, or lower the cache settings to see updates instantly.

Network #

DIY latency measurement. Test how variable the latency of the network is, by pinging the IPv4 address of Cloudflare’s DNS servers. The idea being that it’s very unlikely for them to fail, so any failure in measuring response has to be the network route to them.

watch -d -n 2  "ping -A -c 10 -i 0.2 1.1.1.1 | tail -n 2"

This sends out 10 pings, 200ms apart to 1.1.1.1, then strips the summary and watches it for differences. It’s a quick heuristic, if during a zoom this shows sudden spikes in the leading digits of min/max rtt, then I know whatever stuttering or drops in quality will be on my side, not the endpoints.

Logs #

When I’m performing sysadmin tasks, the first place I look at login is usually the system log. Same holds for pretty much any aberrant system behavior, from unstable Wifi to freezes, high power usage and so on.

journalctl --disk-usage
sudo journalctl --vacuum-size=200M

Interactively follow journal entries

journalctl --all --follow

When enabling or reconfiguring a new service, tracking only that service is essential:

journalctl -f -u <service>

Quickly view the last section with human interpretation

(sudo) journalctl -xe

-x Lookup meaning in catalog for more meaningful info -e jump to end

Searching

journalctl | grep "myquery"
# OR
journalctl -g "PERLREGX"

The second one being a bit more powerful.

Force a flush to disk (for example, when you suspect imminent hardware crash, instability etc)/

journalctl --flush # to /var/log/journal/
journalctl --sync # force disk writes

Access control lists #

ACL controls user access to a Linux filesystem in addition to the standard Linux user/group system, advantages being that one file can now be assigned or denied rights by many users and many groups in a far more granular way. These are common in high performance computing clusters, and allow you useful things such as share folders between three groups: current students, visiting students, and a visiting postdoc.

View current rights

getfacl <file>

Set defaults for new files

setfacl -d -m g:<group>:rwx $TARGET

-d default (not current)

Set current rights

setfacl -R -m g:<group>:rwx $TARGET

-R : recurse -m : modify

Note #

Make sure the entire path to a target has +x for the group or user in question on each folder. If $TARGET is /a/b/c/d, then all of a, b, c need execute permission on the folder for the group or user, otherwise the permissions on ’d’ might be correct, but there’s no way for the users to get to ’d’ in the first place (execute on folder is access contents).

module purge && module load julia/1.6.2 && module load singularity