7.15 Defensive garbage collection
We tend to be worried about the fact that crackers will destroy our
systems and make them unusable, but many operating systems are
programmed to do this to themselves! There are few systems which can
survive a full system disk and yet many logging agents go on filling
up disks without ever checking to see how full they are getting. In
short they choke themselves in a self-styled denial of service attack.
Cfagent can help here by rotating logs frequently and by tidying
temporary file directories:
# Disabling these log files weekly prevents them from
# growing so enormous that they fill the disk!
/tmp pattern=* age=1
Process garbage collection is just as important.
There are lot's of reasons why process tables
fill up with unterminated processes. One example
is faulty X terminal software which does not kill
its children at logout. Another is that programs
like netscape and pine tend to go into loops from
which they never return, gradually loading the system
with an ever increasing glacial burden. Just killing
old processes can cause your system to spring back
from its ice age blues (hopefully without littering
the system with too many dead mammoths or bronze age
axe-bearers). If the host concerned has important
duties then this lack of responsiveness can compromise
key services. It also gives local users a way of carrying
out denial of service attacks on the system.
If users always log out at the end of the day and
log in again the day after then this is easy to address
with cfengine. Here is some code to kill commonly hanging
processes. Note that on BSD like systems process options
"aux" are required to see the relevant processes:
This pattern works like this: as processes become more than a day old
they name of the month appears in the date of the process start
time. These are matched by the regular expression. The include
lines then filter the list of the processes further picking out
lines which include the specified strings.
On some BSD-like systems the default ps option string is
"-ax" and you might need to reset it to something
which adds the start date in order to make this work.
Another job for process management is to clean up processes which
have hung, gone amok or which are left over from old logins. Here
is a regular expression which detects non-root processes which
have clocked up more than 100 hours of CPU time. This is a depressingly
common phenomenon when a program goes into an infinite loop. It can
starve other processes of resources in a very efficient denial
of service attack.
Under NT this is not so simple, since the process table for
the cygwin library applies only to processes which have been
started by programs working under the Unix process emulation.
Hopefully this short-coming can be worked around at some point
in the future.
# Kill processes which have run on for too long e.g. 999:99 cpu time
# Careful a pattern to match 99:99 will kill everything!
"[0-9][0-9][0-9][0-9]:[0-9][0-9]" signal=term exclude=root
"[0-9][0-9][0-9]:[0-9][0-9]" signal=term exclude=root