Even though RAM is cheap these days, there are some conditions in which your Linux server could run out of it completely. Just the other day, I noticed my main hosting server went down - I could still ping it, but DNS, SSH and Apache2 were not responding, so I had to call the datacenter to have them reboot my system. After analyzing the system, I realized that some unknown process ate up all the memory and all the swap space! I used Cacti to monitor my server’s performance, and so I could see that it took a nose-dive after getting hit with a few million requests in a couple days (these were raw mobile device detection requests). After the system ran out of memory, it started swapping. This lasted for about 2 weeks before depleting the swap space, at which point it struggled on for another 18 hours. At this point, it was critically starved of memory and oom_killer (Out of Memory Killer) was invoked to start killing processes in a vain attempt to free up memory. The oom_killer seems to have very little intelligence as to which processes to kill first, as sshd and named were early victims. After this episode, I decided to create a script that adjusts the order in which key processes were killed, to make sure I have access to the server in the event of a memory leak or OOM condition.
oom_adjust.sh to the rescue!
I’ve created oom_adjust.sh to adjust (and periodically readjust via cron) the order in which the processes may be killed by oom_killer. The script uses a config file called oom_adjust.conf (by default it looks for it in /etc) in which you can list processes and the oom_adj value that you want to give them. The possible values are from -17 (never kill) to 15 (kill first).
# Adjust process oom_adj values so they are more or less likely to be killed in an oom event # procname oom_adj # Keep sshd ALIVE sshd -17 # DNS is very important to me too named -8 # I'd prefer that MySQL stays alive, but it's not required mysqld -1 # Apache2 is a memory hog, but I'll give it a fighting chance # I'm giving it 0 since the workers will respawn at 0 anyway apache2 0 # Sphinx search is cool, but I can live without it if an oom occurs searchd 3 # Memcache is in the same boat as Sphinx search memcached 3 # I only use mongodb for testing on this server mongod 5 # It would be nice if smtpd stayed up, so I still get alerts smtpd 5 # These services can be killed first pure-ftpd 10 pure-ftpd-mysql 10 snmpd 10 fail2ban-server 10 ntpd 10 authdaemond 10 saslauthd 10 qmgr 10 pickup 10
Here is the main script, which I’ve symlink’d into /usr/sbin for convinience. oom_adjust.sh
#!/bin/sh # oom_adjust.sh Out of Memory Killer (oom_killer) Priority Adjustment Script # by Steve Kamerman <email@example.com>, Jan 2011 # https://www.stevekamerman.com OOM_ADJ_FILE=/etc/oom_adjust.conf if [ ! -f $OOM_ADJ_FILE ]; then echo "oom_adjust.sh: config file $OOM_ADJ_FILE was not found" >&2 exit 1 fi echo "oom_adjust.sh is setting oom_killer priorities" for LINE in `cat $OOM_ADJ_FILE | sed -e '/^[# \t].*/d' | sed -e '/^$/d' | sed -e 's/ /:/'`; do NAME=`echo $LINE | cut -d":" -f1` ADJ=`echo $LINE | cut -d":" -f2` echo " Setting $NAME to $ADJ" for PID in `pidof $NAME`; do echo $ADJ > /proc/$PID/oom_adj done done exit 0
If your distro uses /etc/rc.local, you can put call this script there to apply the adjustments on startup. I also call it on my servers via crontab every night to keep the processes in check, in case they have respawned/restarted with a different PID.