Even though RAM is cheap these days, there are some conditions in which your Linux server could run out of it completely. Just the other day, I noticed my main hosting server went down – I could still ping it, but DNS, SSH and Apache2 were not responding, so I had to call the datacenter to have them reboot my system. After analyzing the system, I realized that some unknown process ate up all the memory and all the swap space! I used Cacti to monitor my server’s performance, and so I could see that it took a nose-dive after getting hit with a few million requests in a couple days (these were raw mobile device detection requests).  After the system ran out of memory, it started swapping.  This lasted for about 2 weeks before depleting the swap space, at which point it struggled on for another 18 hours.  At this point, it was critically starved of memory and oom_killer (Out of Memory Killer) was invoked to start killing processes in a vain attempt to free up memory.  The oom_killer seems to have very little intelligence as to which processes to kill first, as sshd and named were early victims.  After this episode, I decided to create a script that adjusts the order in which key processes were killed, to make sure I have access to the server in the event of a memory leak or OOM condition.

oom_adjust.sh to the rescue!

I’ve created oom_adjust.sh to adjust (and periodically readjust via cron) the order in which the processes may be killed by oom_killer.  The script uses a config file called oom_adjust.conf (by default it looks for it in /etc) in which you can list processes and the oom_adj value that you want to give them.  The possible values are from -17 (never kill) to 15 (kill first).

oom_adjust.conf

# Adjust process oom_adj values so they are more or less likely to be killed in an oom event
# procname oom_adj

# Keep sshd ALIVE
sshd -17

# DNS is very important to me too
named -8

# I'd prefer that MySQL stays alive, but it's not required
mysqld -1

# Apache2 is a memory hog, but I'll give it a fighting chance
# I'm giving it 0 since the workers will respawn at 0 anyway
apache2 0

# Sphinx search is cool, but I can live without it if an oom occurs
searchd 3

# Memcache is in the same boat as Sphinx search
memcached 3

# I only use mongodb for testing on this server
mongod 5

# It would be nice if smtpd stayed up, so I still get alerts
smtpd 5

# These services can be killed first
pure-ftpd 10
pure-ftpd-mysql 10
snmpd 10
fail2ban-server 10
ntpd 10
authdaemond 10
saslauthd 10
qmgr 10
pickup 10

Here is the main script, which I’ve symlink’d into /usr/sbin for convinience.
oom_adjust.sh

#!/bin/sh

# oom_adjust.sh Out of Memory Killer (oom_killer) Priority Adjustment Script
# by Steve Kamerman <stevekamerman@gmail.com>, Jan 2011
# http://www.stevekamerman.com

OOM_ADJ_FILE=/etc/oom_adjust.conf

if [ ! -f $OOM_ADJ_FILE ]; then
        echo "oom_adjust.sh: config file $OOM_ADJ_FILE was not found" >&2
        exit 1
fi

echo "oom_adjust.sh is setting oom_killer priorities"
for LINE in `cat $OOM_ADJ_FILE | sed -e '/^[# \t].*/d' | sed -e '/^$/d' | sed -e 's/ /:/'`; do
        NAME=`echo $LINE | cut -d":" -f1`
        ADJ=`echo $LINE | cut -d":" -f2`
        echo "  Setting $NAME to $ADJ"
        for PID in `pidof $NAME`; do
                echo $ADJ > /proc/$PID/oom_adj
        done
done
exit 0

If your distro uses /etc/rc.local, you can put call this script there to apply the adjustments on startup.  I also call it on my servers via crontab every night to keep the processes in check, in case they have respawned/restarted with a different PID.

« »