> Apache Security: Chapter 8. Logging and Monitoring


8 Logging and Monitoring

One of the most important tasks of an administrator is to configure a system to be secure, but it is also necessary to know it is secure. The only way to know a system is secure (and behaving correctly) is through informative and trustworthy log files. Though the security point of view is almost all we care about, we have other reasons to have good logs, such as to perform traffic analysis (which is useful for marketing) or to charge customers for the use of resources (billing and accounting).

Most administrators do not think about the logs much before an intrusion happens and only realize their configuration mistakes when it is discovered that critical forensic information is not available. In this chapter, we will cover the subjects of logging and monitoring, which are important to ensure the system records relevant information from a security perspective.

This chapter covers the following:

Apache can produce many types of logs. The two essential types are the access log, where all requests are noted, and the error log, which is designed to log various informational and debug messages, plus every exceptional event that occurs. Additional information can be found in module-specific logs, as is the case with mod_ssl, mod_rewrite and mod_security. The access log is created and written to by the module mod_log_config, which is not a part of the core, but this module is so important that everyone treats it as if it is.

You only need to be familiar with three configuration directives to manage request logging:

In fact, you will need to use only two. The CustomLog directive is so flexible and easy to use that you will rarely need to use TransferLog in your configuration. (It will become clear why later.)

Other directives are available, but they are deprecated and should not be used because CustomLog can achieve all the necessary functionality. Some have been removed from Apache 2:

Before covering the process of logging to files, consider the format of our log files. One of the benefits of Apache is its flexibility when it comes to log formatting. All this is owed to the LogFormat directive, whose default is the following, referred to as the Common Log Format (CLF):

LogFormat "%h %l %u %t \"%r\" %>s %b" common

The first parameter is a format string indicating the information to be included in a log file and the format in which it should be written; the second parameter gives the format string a name. You can decipher the log format using the symbol table. The table is available from the Apache reference documentation (http://httpd.apache.org/docs-2.0/mod/mod_log_config.html). It is reproduced in Table 8-1.

Table 8-1. Standard logging format strings

Format string



The percent sign


Remote IP address


Local IP address


Bytes sent (excluding headers)


Bytes sent (excluding headers); a dash (-) is used instead of a zero

%...{ Name }C

The contents of the cookie Name


Time taken to serve the request, in microseconds (Apache 2 only)

%...{ Name }e

The contents of the environment variable Name




Remote host


Request protocol

%...{ Name }i

The contents of the request header Name


Remote log name (from identd)


Request method

%...{ Name }n

Contents of the note Name

%...{ Name }o

Contents of the response header Name


Canonical port of the server


Process ID

%...{ Format }P

Depending on Format, Process ID (pid) or thread ID (tid)


Query string


Request line


Response status


Time, in common log format

%...{ Format }t

Time, in custom format


Time taken to serve the request, in seconds


Remote user


The URL, excluding the query string


Canonical server name


Server name according to UseCanonicalName directive


Connection status at the end of the request (“X” for aborted, “+” for persistent, and “-” for closed)

You have a lot of fields to play with. Format strings support optional parameters, as represented by the “ . . . ” in each format string representation in the table. Optional parameters can be used for the following actions:

Apache modules can collaborate on logging if they create a named note (a text string) and attach it to the request. If the %{note}n format string is used, the contents of the note will be written to the log. A change in the Apache architecture in the second generation allows for modules to collaborate and provide custom format strings. These format strings are available if the module that provides them is included in the configuration. (See Table 8-2.)

Table 8-2. Format string directives available only in Apache 2

Format string




Total bytes received, on a network level


Total bytes sent, on a network level

%{ Variable }x

The contents of the variable Variable

%{ Variable }c

Deprecated cryptography format function, included for backward compatibility with mod_ssl 1.3.x

With the inclusion of mod_logio, you can measure the number of bytes transferred for every request. This feature allows hosting providers to put accurate billing mechanisms in place. (With Apache 1, you can only record the size of the response body, leaving request headers, request body, and response headers unmeasured.)

Now that you are familiar with format strings, look at commonly used log formats (see Table 8-3). (You will need to define these formats in httpd.conf if they are not already there.)

Table 8-3. Commonly used log formats


LogFormat string

common (the default)

%h %l %u %t "%r" %>s %b


%h %l %u %t "%r" %>s %b%{Referer}i" "%{User-Agent}i


%v %h %l %u %t "%r" %>s %b


%v %h %l %u %t "%r" %>s %b%{Referer}i" "%{User-Agent}i

Though you can create your own log format, you will most likely use one of the formats above since that is what web server log analyzers support. Nevertheless, the ability to create logs with a custom format is convenient for advanced uses, as we shall see later in this chapter.

TransferLog is the basic request logging directive, which creates an access log with the given filename:

TransferLog /var/www/logs/access_log

The filename can be given with an absolute path, as above; if a relative filename is supplied, Apache will create the full path by pre-pending the server home directory (e.g. /usr/local/apache).

By default, the TransferLog directive uses the Common Log Format (CLF), which logs every request on a single line with information formatted (as shown in Section Here is an example of what such a line looks like: - - [29/Jun/2004:14:36:04 +0100] "POST /upload.php 
HTTP/1.1" 200 3229

However, if a LogFormat directive has been used earlier in the configuration file, the TransferLog directive will use the format it defined and not the CLF. This is unexpected and can lead to errors since changing the order in which formats are defined can lead to a different format being used for the log files. I prefer not to use TransferLog, and instead use the CustomLog directive (which forces me to explicitly define the log format).

The real power comes from using the CustomLog directive. The equivalent to the TransferLog usage described above looks like this:

CustomLog /var/www/logs/access_log custom

The explicit naming of the log format helps us avoid mistakes. I like this directive because of its conditional logging features. Have a look at the following configuration fragment:

# determine which requests are static - you may need to
# adjust the regular expression to exclude other files, such
# as PDF documents, or archives
SetEnvIfNoCase REQUEST_URI "\.(gif|png|jpg)$" static_request
# only log dynamic requests
CustomLog /var/www/logs/application_log combined env=!static_request

The conditional logging opens the door to many interesting logging opportunities, which really helps in real life. Most commonly, you will use mod_setenvif or mod_rewrite (which can also set environment variables) to determine what gets logged.

I mentioned that, by default, Apache uses the CLF, which does not record many request parameters. At the very least you should change the configuration to use the combined format, which includes the UserAgent and the Referer fields.

Looking at the log format string table shown in the LogFormat section, you can see over twenty different format strings, so even the use of a combined format results in loss of information. Create your own log format based on your information requirements. A nice example can be found at:

“Profiling LAMP Applications with Apache’s Blackbox Logs” by Chris Josephes (http://www.onlamp.com/pub/a/apache/2004/04/22/blackbox_logs.html)

In the article, Chris makes a case for a log format that allows for web serving troubleshooting and performance management. At the end, he names the resulting log format Blackbox.

The Apache error log contains error messages and information about events unrelated to request serving. In short, the error log contains everything the access log doesn’t:

The format of the error log is fixed. Each line essentially contains only three fields: the time, the error level, and the message. In some rare cases, you can get raw data in the error log (no time or error level). Apache 2 adds the Referer information to 404 responses noted in the error log.

Error logs are created using the ErrorLog configuration directive. Standard file naming conventions apply here; a relative filename will be assumed to be located in the server main folder.

ErrorLog /var/www/logs/error_log

The directive can be configured globally or separately for each virtual host. The LogLevel directive configures log granularity and ensures more information is not in the log than necessary. Its single parameter is one of the levels in Table 8-4. Events that are on the specified level or higher will be written to the log file.

Table 8-4. Error log levels




Emergencies (system unstable)


Alerts to act on immediately


Critical conditions


Error messages


Warning messages


Normal but significant conditions


Informational messages


Debugging information

The default setting is warn. However, Apache always logs the messages of level notice when logging to text files. Some interesting messages are emitted on the informational level (e.g., that a client timed out on a connection, a potential sign of a DoS attack). Consider running the error log on the information level:

LogLevel info

Take some time to observe the error log to get a feeling as to what constitutes normal Apache behavior. Some messages seem dangerous but may not be.

On server startup, you will get a message similar to this one:

[Mon Jul 05 12:26:27 2004] [notice] Apache/2.0.50 (Unix) DAV/2
PHP/4.3.4 configured -- resuming normal operations

You will see a message to log the shutdown of the server:

[Mon Jul 05 12:27:22 2004] [notice] caught SIGTERM, shutting down

Most other relevant events will find their way to the error log as well.

The Apache error log is good at telling you that something bad has happened, but it may not contain enough information to describe it. For example, since it does not contain information about the host where the error occurred, it is difficult to share one error log between virtual hosts.

There is a way to get more informational error messages using the mechanism of custom logging. Here is an example:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{error-notes}n\"" commone
CustomLog logs/super_error_log commone

Most of the time, the error message that caused a request to fail is contained in the error-notes note. By adding the contents of that variable to the log line output to the access log, we can get any request detail we want and the error message at the same time. This trick does not remove a need for the error log but makes forensic log analysis much easier.

Apache processes should never crash, but when they do, a message such as the following will appear in the error log:

[Mon Jul  5 08:33:08 2004] [notice] child pid 1618 exit signal
Segmentation fault (11)

A segmentation fault appears because of an error in Apache code or because a hacker is taking advantage of the web server through a buffer overflow attack. Either way, this is bad and you have to find out why it is happening. Having frequent unexplained segmentation faults is a reason for concern.

Your first impulse after discovering a segmentation fault will probably be to find the request that caused it. Due to the inadequate format of the error log, this may be difficult. Segmentation fault messages appear only in the main error log and not in the virtual hosts. Finding the corresponding request log entry may prove difficult when hosting a server with more than a couple of virtual hosts since the information about which virtual host was being processed at the time is unavailable.

To make the matter worse, the request usually is not logged to the access log. The logging phase is one of the last phases of request processing to take place, so nothing is logged when the server crashes during one of the earlier phases.

The purpose of mod_forensics (available since Versions 1.3.31 and 2.0.50) is to reveal the requests that make the server crash. It does that by having a special log file where requests are logged twice: once at the beginning and once at the end. A special utility script is used to process the log file. If a request appears only once in the log file, we know the server crashed before it could log the request for the second time.

To enable mod_forensics you also need to enable mod_unique_id. After you add the module to your configuration, decide where to put the new log file:

ForensicLog /var/www/logs/forensic_log

After restarting the server, the beginning of each request will be marked with a log of the request data (with headers but excluding the request body). Here is an example:

+QOmjHtmgtpkAADFIBBw|GET /cgi-bin/modsec-test.pl
Keep-Alive:300|User-Agent:Mozilla/5.0 %28Windows%3b U%3b Windows NT 5.1%3b
en-US%3b rv:1.7%29 Gecko/20040616

For each request that was properly handled, the unique identifier will be written to the log, too:


As you can see, a lot of data is being logged, so implement frequent log rotation for the forensic log. I don’t think it is a good idea to leave mod_forensics enabled on a production server because excessive logging decreases performance.

The chances of catching the offending request with mod_forensics are good though in some rare instances this module will fail:

Once you figure out the request, you should determine which of the active modules causes it. Your goal here is to determine whether to contact the module author (for a third-party module) or the Apache developers at dev@apache.org (for standard modules).

If you have to continue on your own, consider the following tips:

One major disadvantage of Apache’s (and most other web servers’) logging facilities is that there is no way to observe and log request and response bodies. While most web application attacks take place through GET requests, that is only because they are performed (or programmed) by less capable attackers. The dangerous types will take the extra two minutes to craft a POST request, knowing the chances of the attack being logged are very small.

However, audit logging becomes a possibility with the help of mod_security (http://www.modsecurity.org). This module (described further in Chapter 12) adds audit logging configuration directives that can be placed almost anywhere in the configuration. It works with the main server, virtual servers, or in a directory context. To specify the audit log file and start audit logging, add the following to your configuration:

SecAuditEngine On
SecAuditLog /var/www/logs/audit_log

After the installation and configuration, you will be able to log the contents of those POST payloads for the first time. Below is an example of an individual audit log entry, where mod_security denied the request because a pattern “333” was detected in the request body. (“333” is not a real attack but something I often use for testing to make sure my configuration works.)

Request: - - [29/Jun/2004:12:04:05 +0100] "POST /cgi-bin/
HTTP/1.0" 500 539
Handler: cgi-script
POST /cgi-bin/modsec-test.pl HTTP/1.0
Connection: Close
Content-Length: 5
Content-Type: application/x-www-form-urlencoded
User-Agent: mod_security regression test utility
mod_security-message: Access denied with code 500. Pattern match 
"333" at POST_PAYLOAD.
mod_security-action: 500
HTTP/1.0 500 Internal Server Error
Connection: close
Content-Type: text/html; charset=iso-8859-1

The entry begins with a few request identifiers followed by the request headers and the request body, followed by the response headers. The module will automatically detect and make use of the unique ID generated by mod_unique_id. This variable can help track a request over several log files. Currently, the module does not support response body logging, though the filter architecture of Apache 2 allows for it.

The audit engine of mod_security supports several logging levels (configured with the SecAuditEngine directive):

An experimental feature in the Apache 2 version of mod_security adds performance measurement support. Measuring script performance can be difficult because the response is typically generated and transmitted back to the client concurrently. The only measure normally available is the total time it took to process a request. But that number does not mean much. For example, for a client accessing the server over a slow link (e.g., a modem connection), the processing time will be long but that does not indicate a fault.

You can measure performance of individual processes but only if you separate them first. This can be done if the response is not sent to the client as it is being generated. Instead, the response is kept in a memory buffer until generation is complete: This is called buffering. mod_security already introduces buffering into the request processing but for different reasons (security). With buffering in place, performance measurement becomes trivial. mod_security records elapsed time at three points for each request:

These measurements are useful when used in a custom log together with information provided by the mod_logio module, because to make sense of the numbers you need to know the number of bytes sent to, (format string %I) and from, (format string %O) the server:

CustomLog logs/timer_log "%t \"%r\" %>s - %I %O -\
%{mod_security-time1}n %{mod_security-time2}n \
%{mod_security-time3}n %D

Each entry in the log will look something like this:

[19/Nov/2004:22:30:08 +0000] "POST /upload.php HTTP/1.1" 200
- 21155 84123 - 673761 687806 5995926 7142031

All times are given in microseconds, relative to the beginning of request processing. The following conclusions can be made out of the line given in the previous example (with the figures rounded to the nearest millisecond so they are easier to read):

Include the application logs on the list of logs you monitor. At the very least, you should integrate the logs of the application engine with the rest of the logs. For example, configuring PHP to send errors to the Apache error log (described in Chapter 3) removes one thing from the TODO list. For each application, you should do the following:

  1. Determine (from the documentation, or by talking to the programmers) what logs the application produces.

  2. Classify logs according to the material they contain. How sensitive are the application logs? They are often verbose and may contain passwords and credit card numbers.

  3. Implement log rotation.

Consider the following five recommendations to increase the security of your application logs:

  • The application logs will have to be written to by the web server processes and, thus, have to be owned by the web server user. Do not jeopardize the security of the main Apache logs because of that! Create a separate folder in which to keep the application logs and allow the web server process to write there.

  • Being owned by the web server user, application logs are in danger since an attacker will most likely come through the web server. To minimize the danger, implement a custom rotation script to periodically rotate the logs. The idea is to move the logs to a separate directory, change the ownership (to root), and change the permissions (so the web server user cannot get to them any more).

  • If the sensitive data in the log files is not needed (or is needed for a limited time only), consider removing it from the logs at the same time as the rotation.

  • If you can, move the logs from the server altogether. A complete discussion on centralized logging strategies can be found below.

  • If you cannot get the logs out of the server, consider encrypting them on a regular basis with a public encryption key (while not storing the private key on the same server).

The default logging format is adequate to generate traffic statistics but inadequate for forensic analysis. We need to use the custom logging facility and design a log format that provides us with the information we need. By starting with the combined log format and adding more fields, we increase the information logged while retaining backward-compatibility with traffic analysis software.

We add six fields to the log format:

The new log format will be shown soon after discussing how the information needed for the additional fields may be obtained. For example, integration with applications is required to achieve adequate logging levels. This comes in two forms: usage of HTTP status codes and integration with PHP.

First, the application must make use of HTTP status codes other than 200 (which is used by default) where appropriate. These codes are very useful but not many applications utilize them. There are five code categories (see Table 8-5).

The 4XX category is particularly interesting and is the one we use the most (see Table 8-6).

With the status codes in mind, Table 8-7 presents the codes an application should return, given various events.

At first, I thought using the 401 status would be impossible since it would make the browser ask users to enter their credentials. Having done some tests, I determined that returning the status code alone (without the WWW-Authenticate header) is insufficient to trigger the authentication process. The 401 status can be used after all, and it appears in the access log.

When installed as a module, PHP integrates with Apache and allows direct communication between modules to take place. Other application engines may provide similar support. We will take advantage of the POST request body being available to the PHP code. We can, therefore, take the body and return it to Apache, along with other parameters known to the application (the username and the session identifier). This is possible because Apache has a feature called notes, which was specifically designed for inter-module communication.

The following code fragment sends some of the information from the PHP module to Apache, where the information is available for other modules to use. It creates four Apache notes: x_username, x_sessionid, x_request, and x_log.

function inform_apache($username, $sessionid) {
    // reconstruct the first line of the request
    $request = $_SERVER["REQUEST_METHOD"];
    $request .= " " . $_SERVER["REQUEST_URI"];
    // add any available POST parameters
    if (count($_POST) != 0) {
        // some POST requests contain parameters in the URI 
        if (strpos($request, "?") =  = false) $request .= "?";
        else $request .= "&";
        $count = 0;
        foreach($_POST as $name => $value) {
            if ($count != 0) $request .= "&";
            $request .= urlencode($name) . "=" . urlencode($value);
    $request .= $_SERVER["SERVER_PROTOCOL"];
    // send the parameters to Apache through notes
    apache_note("x_username", $username);
    apache_note("x_sessionid", $sessionid);
    apache_note("x_request", $request);
    // set an environment variable to trigger
    // conditional logging
    apache_setenv("x_log", "true");

Sending a message from the application to the logging module can be useful. This can be done through a warning note:

function warn_apache($warning) {
    apache_note("x_warning", $warning);

Apache does a good job with log format definition, but some features are missing, such as log rotation and log compression. Some reasons given for their absence are technical, and some are political:

Of course, nothing prevents third-party modules from implementing any kind of logging functionality, including rotation. After all, the default logging is done through a module ( mod_log_config) without special privileges. However, at the time of this writing no modules exist that log to files and support rotation. There has been some work done on porting Cronolog (see Section in the Section 8.2.2 section) to work as a module, but the beta version available on the web site has not been updated recently.

Piped logging is a mechanism used to offload log manipulation from Apache and onto external programs. Instead of giving a configuration directive the name of the log file, you give it the name of a program that will handle logs in real time. A pipe character is used to specify this mode of operation:

CustomLog "|/usr/local/apache/bin/piped.pl /var/www/logs/piped_log" combined

All logging directives mentioned so far support piped logging. Many third-party modules also try to support this way of logging.

External programs used this way are started by the web server and restarted later if they die. They are started early, while Apache is still running as root, so they are running as root, too. Bugs in these programs can have significant security consequences. If you intend to experiment with piped logging, you will find the following proof-of-concept Perl program helpful to get you started:

use IO::Handle;
# check input parameters
if ((!@ARGV)||($#ARGV != 0)) {
    print "Usage: piped.pl <log filename>\n";
# open the log file for appending, configuring
# autoflush to avoid potential data loss
$logfile = shift(@ARGV);
open(LOGFILE, ">>$logfile") || die "Failed to open $logfile for writing";
# handle log entries until the end
while (my $logline = <STDIN>) {
    print LOGFILE $logline;

If you prefer C to Perl, every Apache distribution comes with C-based piped logging programs in the support/ folder. Use these programs for skeleton source code.

Though the piped logging functionality serves the purpose of off-loading the logging task to an external program, it has some drawbacks:

Because no one has unlimited storage space available, logs must be rotated on a regular basis. No matter how large your hard disk, if you do not implement log rotation, your log files will fill the partition.

Log rotation is also very important to ensure no loss of data. Log data loss is one of those things you only notice when you need the data, and then it is too late.

There are two ways to handle log rotation:

The correct procedure to rotate a log from a script is:

Here is the same procedure given in a shell script, with the added logic to keep several previous log files at the same location:

cd /var/www/logs
mv access_log.3.gz access_log.4.gz
mv access_log.2.gz access_log.3.gz
mv access_log.1.gz access_log.2.gz
mv access_log accesss_log.1
/usr/local/apache/bin/apachectl graceful
sleep 600
gzip access_log.1

Without the use of piped logging, there is no way to get around restarting the server; it has to be done for it to re-open the log files. A graceful restart (that’s when Apache patiently waits for a child to finish with the request it is processing before it shuts it down) is recommended because it does not interrupt request processing. But with a graceful restart, the wait in step 3 becomes somewhat tricky. An Apache process doing its best to serve a client may hang around for a long time, especially when the client is slow and the operation is long (e.g., a file download). If you proceed to step 4 too soon, some requests may never be logged. A waiting time of at least 10 minutes is recommended.

Many Linux distributions come with a utility called logrotate, which can be used to rotate all log files on a machine. This handy program takes care of most of the boring work. To apply the Apache log rotation principles to logrotate, place the configuration code given below into a file /etc/logrotate.d/apache and replace /var/www/logs/* with the location of your log files, if different:

/var/www/logs/* {
    # rotate monthly
    # keep nine copies of the log
    rotate 9
    # compress logs, but with a delay of one rotation cycle
    # restart the web server only once, not for
    # every log file separately
    # gracefully restart Apache after rotation
        /usr/local/apache/bin/apachectl graceful > /dev/null 2> /dev/null

Use logrotate with the -d switch to make it tell you what it wants to do to log files without doing it. This is a very handy tool to verify logging is configured properly.

There are two schools of thought regarding Apache log configurations. One is to use the CustomLog and ErrorLog directives in each virtual host container, which creates two files per each virtual host. This is a commonsense approach that works well but has two drawbacks:

To overcome these problems, the second school of thought regarding configuration was formed. The idea is to have only two files for all virtual hosts and to split the logs (creating one file per virtual host) once a day. Log post-processing can be performed just before the splitting. This is where the vcombined access log format comes into play. The first field on the log line, the hostname, is used to determine to which virtual host the entry belongs. But the problem is the format of the error log is fixed; Apache does not allow its format to be customized, and we have no way of knowing to which host an entry belongs.

One way to overcome this problem is to patch Apache to put a hostname at the beginning of every error log entry. One such patch is available for download from the Glue Logic web site (http://www.gluelogic.com/code/apache/). Apache 2 offers facilities to third-party modules to get access to the error log so I have written a custom module, mod_globalerror, to achieve the same functionality. (Download it from http://www.apachesecurity.net/.)

Logging to the local filesystem on the same server is fine when it is the only server you have. Things get complicated as the number of servers rises. You may find yourself in one or more of the following situations:

The solution is usually to introduce a central logging host to the system, but there is no single ideal solution. I cover several approaches in the following sections.

Logging via syslog is the default approach for most system administrators. The syslog protocol (see RFC 3164 at http://www.ietf.org/rfc/rfc3164.txt) is simple and has two basic purposes:

Since all Unix systems come with syslog preinstalled, it is fairly easy to start using it for logging. A free utility, NTsyslog (http://ntsyslog.sourceforge.net), is available to enable logging from Windows machines.

The most common path a message will take starts with the application, through the local daemon, and across the network to the central logging host. Nothing prevents applications from sending UDP packets across the network directly, but it is often convenient to funnel everything to the localhost and decide what to do with log entries there, at a single location.

Apache supports syslog logging directly only for the error log. If the special keyword syslog is specified, all error messages will go to the syslog:

ErrorLog syslog:facility

The facility is an optional parameter, but you are likely to want to use it. Every syslog message consists of three parts: priority, facility, and the message. Priority can have one of the following eight values: debug, info, notice, warning, error, crit, alert, and emerg. Apache will set the message priority according to the seriousness of the message. Message facility is of interest to us because it allows messages to be grouped. Possible values for facility are the following: auth, authpriv, cron, daemon, kern, lpr, mail, mark, news, security, syslog, user, uucp, and local0 through local7. You can see many Unix legacy names on the list. Local facilities are meant for use by user applications. Because we want only Apache logs to go to the central server, we will choose an unused facility:

ErrorLog syslog:local4

We then configure syslog to single out Apache messages (that is, those with facility local4) and send them to the central logging host. You need to add the following lines at the bottom of /etc/syslog.conf (assuming the central logging host occupies the address

# Send web server error messages to the central host

At the remote server, the following addition to /etc/syslog.conf makes local4 log entries go into a single file:

local4.*: /var/www/logs/access_log

To send access log entries to syslog, you must use piped logging. One way of doing this is through the logger utility (normally available on every Unix system):

CustomLog "|/usr/bin/logger -p local5.info" combined

I have used the -p switch to assign the priority and the facility to the syslog messages. I have also used a different facility (local5) for the access log to allow syslog to differentiate the access log messages from the error log messages. If more flexibility is needed, send the logs to a simple Perl script that processes them and optionally sends them to syslog. You can write your own script using the skeleton code given in this chapter, or you can download, from this book’s web site, the one I have written.

Not everyone uses syslog, because the syslog transport protocol has three drawbacks:

On top of all this, the default daemon (syslogd) is inadequate for anything but the simplest configurations. It supports few transport modes and practically no filtering options.

Attempts have been made to improve the protocol (RFC 3195, for example) but adoption of such improvements has been slow. It seems that most administrators who decide on syslog logging choose to resolve the problems listed above by using Syslog-NG (http://www.balabit.com/products/syslog_ng/) and Stunnel (http://www.stunnel.org). Syslog-NG introduces reliable logging via TCP, which is nonstandard but does the job when Syslog-NG is used on all servers. Adding Stunnel on top of that solves the authentication and confidentiality problems. The combination of these two programs is the recommended solution for automated, reliable, and highly secure logging.

Chapter 12 of Linux Server Security by Michael D. Bauer, which covers system log management and monitoring and includes detailed coverage of Syslog-NG, is available for free download from O’Reilly (http://www.oreilly.com/catalog/linuxss2/ch12.pdf).

Remember how I said that some developers do not believe the web server should be wasting its time with logging? Well, some people believe in the opposite. A third-party module, mod_log_sql, adds database-logging capabilities to Apache. The module supports MySQL, and support for other popular databases (such as PostgreSQL) is expected. To obtain this module, go to http://www.outoforder.cc/projects/apache/mod_log_sql.

The module comes with comprehensive documentation and I urge you to read through it before deciding whether to use the module. There are many reasons to choose this type of logging but there are also many reasons against it. The advantage of having the logs in the database is you can use ad-hoc queries to inspect the data. If you have a need for that sort of thing, then go for it.

After you configure the database to allow connections from the web server, the change to the Apache configuration is simple:

# Enable the required modules
LoadModule log_sql_module modules/mod_log_sql.so
LoadModule log_sql_mysql_module modules/mod_log_sql_mysql.so
# The location of the database where logs will be stored
LogSQLLoginInfo mysql://user:pass@ 
# Automatically create tables in the database
LogSQLCreateTables on
# The name of the access_log table
LogSQLTransferLogTable access_log
# Define what is logged to the database table
LogSQLTransferLogFormat AbHhmRSsTUuv

After restarting the server, all your logs will go into the database. I find the idea of putting the logs into a database very interesting, but it also makes me uneasy; I am not convinced this type of data should be inserted into the database in real-time. mod_log_sql is a fast module, and it achieves good performance by having each child open its own connection to the database. With the Apache process model, this can turn into a lot of connections.

Another drawback is that you can create a central bottleneck out of the database logging server. After all, a web server can serve pages faster than any database can log them. Also, none of the web statistics applications can access the data in the database, and you will have to export the logging data as text files to process it. The mod_log_sql module comes with a utility for doing this export.

Though I am not quite convinced this is a good solution for all uses, I am intrigued by the possibility of using database logging only for security purposes. Continue logging to files and log only dynamic requests to the database:

LogSQLRequestAccept .html .php

With this restriction, the load on the database should be a lot smaller. The volume of data will also be smaller, allowing you to keep the information in the database longer.

Every once in a while, one encounters a technology for which the only word to describe it is “cool.” This is the case with the Spread Toolkit (http://www.spread.org), a reliable messaging toolkit. Specifically, we are interested in one application of the toolkit, mod_log_spread (http://www.backhand.org/mod_log_spread/).

The Spread Toolkit is cool because it allows us to create rings of servers that participate in reliable conversation. It is not very difficult to set up, and it almost feels like magic when you see the effects. Though Spread is a generic messaging toolkit, it works well for logs, which are, after all, only messages.

Though the authors warn about complexity, the installation process is easy provided you perform the steps in the correct order:

In our example Spread configuration, we will have four instances of spread, three web servers with mod_log_spread running and one instance of spreadlogd. We specify the ring of machines using their names and IP addresses in the spread.conf file:

Spread_Segment {

In the Apache configuration on each web server, we let the modules know the port the Spread daemon is listening on. We send the logs to a spread group called access:

SpreadDaemon 4803
CustomLog $access vcombined

The purpose of the spreadlogd daemon is to collect everything sent to the access group into a file. The configuration (spreadlogd.conf) is self-explanatory:

BufferSize = 65536
Spread {
    Port = 4803
    Log {
        RewriteTimestamp = CommonLogFormat
        Group = access
        File = access_log

With this configuration in place, the three web servers send their logs to the Spread ring over the network. All members of the ring receive all messages, and the group names are used to differentiate one class of messages from another. One member of the ring is the logging daemon, and it writes the logs into a single file. The problem of cluster logging is elegantly solved.

The beauty of Spread is its flexibility. I have used only one logging group in the configuration above, but there can be any number of groups, each addressed to a different logging daemon. And it is not required to have only one logging daemon; two or more such daemons can be configured to log the same group, providing redundancy and increasing availability.

On top of all this, the authors mention speed improvements in the range of 20 to 30 percent for busy web servers. Though Spread does offer virtual hosting support, it does not work well with a large number of messaging groups. I do not see this as a problem since a sensible logging strategy is to use a logging format where the hostname is a part of the logging entry, and split logs into per-virtual host files on the logging server.

The module does not support error logging (because it cannot be done on Apache 1 without patching the core of the server) but a provided utility script error_log_spread.pl can be used, together with piped logging.

mod_log_spread only works with Apache 1 at the moment. This is not a problem since we have the piped logging route as a choice. Besides, as just mentioned, mod_log_spread does not support error logging, so you would have to use piped logging on a production system anyway. To support Apache 2, I have slightly improved the error_log_spread.pl utility script, adding a -c switch to force a copy of the logs to be stored on a local filesystem. This is necessary because error logs are often needed there on the server for diagnostic purposes. The switch makes sense only when used for the error log:

CustomLog "|/usr/local/apache/bin/log_spread.pl -g access" vcombined
ErrorLog "|/usr/local/apache/bin/log_spread.pl -g error -c /var/www/

After covering the mechanics of logging in detail, one question remains: which strategy do we apply? That depends on your situation and no single perfect solution exists. Use Table 8-8 as a guideline.

Here is some general advice about logging:

Successful log analysis begins long before the need for it arises. It starts with the Apache installation, when you are deciding what to log and how. By the time something that requires log analysis happens, you should have the information to perform it.

A complete log analysis strategy consists of the following steps:

Log analysis is a long and tedious process. It involves looking at large quantities of data trying to make sense out of it. Traditional Unix tools (e.g., grep, sed, awk, and sort) and the command line are very good for text processing and, therefore, are a good choice for log file processing. But they can be difficult to use with web server logs because such logs contain a great deal of information. The bigger problem is that attackers often utilize evasion methods that must be taken into account during analysis, so a special tool is required. I have written one such tool for this book: logscan.

logscan parses log lines and allows field names to be used with regular expressions. For example, the following will examine the access log and list all requests whose status code is 500:

$ logscan access_log status 500

The parameters are the name of the log file, the field name, and the pattern to be used for comparison. By default, logscan understands the following field names, listed in the order in which they appear in access log entries:

logscan also attempts to counter evasion techniques by performing the following operations against the request_uri field:

You will find the following web server log forensics resources useful:

The key to running a successful project is to be in control. System information must be regularly collected for historical and statistical purposes and allow real-time notification when something goes wrong.

The first thing to consider when it comes to event monitoring is whether to implement real-time monitoring. Real-time monitoring sounds fancy, but unless an effort is made to turn it into a useful tool, it can do more harm than good. Imagine the following scenario:

This is real-time monitoring gone bad. Real problems often go undetected because of too many false positives. A similar lesson can be learned from the next example, too:

The two cases I have just described are not something I invented to prove a point. There are numerous administrative and development teams suffering like that. These problems can be resolved by following four rules:

One way to implement periodic monitoring is to use the concept of Artificial Ignorance invented by Marcus J. Ranum. (The original email message on the subject is at http://www.ranum.com/security/computer_security/papers/ai/.) The process starts with raw logs and goes along the following lines:

The idea is to uncover a specific type of event, but without the specifics. The numerical value is used to assess the seriousness of the situation. Here is the same logic implemented as a Perl script (I call it error_log_ai) that you can use:

#!/usr/bin/perl -w
# loop through the lines that are fed to us
while (defined($line = <STDIN>)) {
    # ignore "noisy" lines
    if (!( ($line =~ /Processing config/)
        || ($line =~ /Server built/)
        || ($line =~ /suEXEC/) )) {
        # remove unique features of log entries
        $line =~ s/^\[[^]]*\] //;
        $line =~ s/\[client [^]]*\] //;
        $line =~ s/\[unique_id [^]]*\]//;
        $line =~ s/child pid [0-9]*/child pid X/;
        $line =~ s/child process [0-9]*/child process X/;
        # add to the list for later
        push(@lines, $line);
@lines = sort @lines;
# replace multiple occurences of the same line
$count = 0;
$prevline = "";
foreach $line (@lines) {
    next if ($line =~ /^$/);
    if (!($line eq $prevline)) {
        if ($count != 0) {
            $prefix = sprintf("%5i", $count);
            push @outlines, "$prefix $prevline";
        $count = 1;
        $prevline = $line;
    } else {
undef @lines;
@outlines = sort @outlines;
print "--httpd begin------\n";
print reverse @outlines;
print "--httpd end--------\n";

The script is designed to take input from stdin and send output to stdout, so it is easy to use it on the command line with any other script:

# cat error_log | error_log_ai.pl | mail ivanr@webkreator.com

From the following example of daily output, you can see how a long error log file was condensed into a few lines that can tell you what happened:

--httpd begin------
  38 [notice] child pid X exit signal Segmentation fault (11)
  32 [info] read request line timed out
  24 [error] File does not exist: /var/www/html/403.php
  19 [warn] child process X did not exit, sending another SIGHUP
   6 [notice] Microsoft-IIS/5.0 configured -- resuming normal operations
   5 [notice] SIGHUP received.  Attempting to restart
   4 [error] File does not exist: /var/www/html/test/imagetest.GIF
   1 [info] read request headers timed out
--httpd end--------

Simple Event Correlator (SEC, available from http://www.estpak.ee/~risto/sec/) is the tool to use when you want to implement a really secure system. Do not let the word “simple” in the name fool you; SEC is a very powerful tool. Consequently, it can be a bit difficult to configure.

It works on the same principles as Swatch, but it keeps track of events and uses that information when evaluating future events. I will give a few examples of SEC to demonstrate its capabilities.

SEC is based around several types of rules, which are applied to events. The rule types and their meanings are:

Do not worry if this looks confusing. Read it a couple of times and it will start to make sense. I have prepared a couple of examples to put the rules above in the context of what we do here.

The following two rules cause SEC to wait for a nightly backup and alert the administrator if it does not happen:

# At 01:59 start waiting for the backup operation
# that takes place at 02:00 every night. The time is
# in a standard cron schedule format.
type = Calendar 
time = 59 1 * * *
action = event %s
# This rule will be triggered by the previous rule
# it will wait for 31 minutes for the backup to
# arrive, and notify the administrator if it doesn't
type = PairWithWindow
ptype = SubStr
action = shellcmd notify.pl "%s"
ptype2 = SubStr 
action2 = none
window = 1860

The following rule counts the number of failed login attempts and notifies the administrator should the number of attempts become greater than six in the last hour. The shell script could also be used to disable login completely from that IP address.

type = SingleWithThreshold
ptype = RegExp
pattern = LOGIN FAILED, IP=([0-9.]+)
window = 3600
thresh = 6
desc = Login failed from IP: $1
action = shellcmd notify.pl "Too many login attempts from: $1"

SEC uses the description of the event to distinguish between series of events. Because I have included the IP address in the preceding description, the rule, in practice, monitors each IP address. Therefore, it may be a good idea to add another rule to watch the total number of failed login attempts during a time interval:

type = SingleWithThreshold
ptype = RegExp
pattern = LOGIN FAILED, IP=([0-9.]+)
window = 3600
thresh = 24
desc = Login failed (overall)
action = shellcmd notify.pl "Too many login attempts"

This rule would detect a distributed brute-force hacking attempt.

In an ideal world, you would monitor your Apache installations via a Network Management System (NMS) as you would monitor other network devices and applications. However, Apache does not support Simple Network Management Protocol (SNMP). (There is a commercial version of the server, Covalent Apache, that does.) There are two third-party modules that implement limited SNMP functionality:

My experiences with these modules are mixed. The last time I tried mod_snmp, it turned out the patch did not work well when applied to recent Apache versions.

In the absence of reliable SNMP support, we will have to use the built-in module mod_status for server monitoring. Though this module helps, it comes at a cost of us having to build our own tools to automate monitoring. The good news is that I have built the tools, which you can download from the book’s web site.

The configuration code for mod_status is probably present in your httpd.conf file (unless you have created the configuration file from scratch). Find and uncomment the code, replacing the YOUR_IP_ADDRESS placeholder with the IP address (or range) from which you will be monitoring the server:

# increase information presented
ExtendedStatus On
<Location /server-status>
    SetHandler server-status
    Order Deny,Allow
    Deny from all
    # you don't want everyone to see what
    # the web server is doing
    Allow from YOUR_IP_ADDRESS

When the location specified above is opened in a browser from a machine that works from the allowed range you get the details of the server status. The Apache Foundation has made their server status public (via http://www.apache.org/server-status/), and since their activity is more interesting than anything I have, I used it for the screenshot shown in Figure 8-1.

There is plenty of information available; you can even see which requests are being executed at that moment. This type of output can be very useful for troubleshooting, but it does not help us with our primary requirement, which is monitoring. Fortunately, if the string ?auto is appended to the URL, a different type of output is produced. The example screenshot is given in Figure 8-2. This type of output is easy to parse with a computer program.

In the following sections, we will build a Perl program that collects information from a web server and stores the information in an RRD file. We will discuss another Perl program that can produce fancy activity graphs. Both programs are available from the web site for this book.

We need to understand what data we have available. Looking at the screenshot (Figure 8-2), the first nine fields are easy to spot since each is presented on its own line. Then comes the scoreboard, which lists all processes (or threads) and tells us what each process is doing. The legend can be seen in the first screenshot, Figure 8-1. The scoreboard is not useful to us in the given format but we can count how many times each activity occurs in the scoreboard and create 10 more variables for storing this information. Therefore, we have a total of 19 variables that contain information obtained from the mod_status machine-parsable output.

First, we write the part of the Perl program that fetches and parses the mod_status output. By relying on existing Perl libraries for HTTP communication, our script can work with proxies, support authentication, and even access SSL-protected pages. The following code fetches the page specified by $url:

# fetch the page
my $ua = new LWP::UserAgent;
my $request = HTTP::Request->new(GET => $url);
my $response = $ua->request($request);

Parsing the output is fairly simple. Watch out for the incompatibility between the mod_status output in Apache 1 and Apache 2.

# Fetch the named fields first
# Set the results associative array. Each line in the file
# results in an element in the array. Each element
# has a key that is the text preceding the colon in a line 
# of the file, and a value that is whatever appears after
# any whitespace after the colon on that line.
my %results = split/:\s*|\n/, $response->content;
# There is a slight incompatibility between
# Apache 1 and Apache 2, so the following makes
# the results consistent between the versions. Apache 2 uses
# the term "BusyWorkers" where Apache 1 uses "BusyServers".
if ($results{"BusyServers"}) {
    $results{"BusyWorkers"} = $results{"BusyServers"};
    $results{"IdleWorkers"} = $results{"IdleServers"};
# Count the occurrences of certain characters in the scoreboard
# by using the translation operator to find and replace each
# particular character (with itself) and return the number of
# replacements.
$results{"s_ _"} = $results{"Scoreboard"} =~ tr/_/_/;
$results{"s_s"} = $results{"Scoreboard"} =~ tr/S/S/;
$results{"s_r"} = $results{"Scoreboard"} =~ tr/R/R/;
$results{"s_w"} = $results{"Scoreboard"} =~ tr/W/W/;
$results{"s_k"} = $results{"Scoreboard"} =~ tr/K/K/;
$results{"s_d"} = $results{"Scoreboard"} =~ tr/D/D/;
$results{"s_c"} = $results{"Scoreboard"} =~ tr/C/C/;
$results{"s_l"} = $results{"Scoreboard"} =~ tr/L/L/;
$results{"s_g"} = $results{"Scoreboard"} =~ tr/G/G/;
$results{"s_i"} = $results{"Scoreboard"} =~ tr/I/I/;

After writing this code, I realized some of the fields mod_status gave me were not very useful. ReqPerSec, BytesPerSec, and BytesPerReq are calculated over the lifetime of the server and practically remain constant after a certain time period elapses. To get around this problem, I decided to keep the output from the previous run and manually create the statistics by comparing the values of the Total Accesses and Total kBytes fields, as appropriate, in relation to the amount of time between runs. The code for doing this can be seen in the program (apache-monitor) on the book’s web site.

Next, we store the data into an RRD file so that it can be processed by an RRD tool. We need to test to see if the desired RRD file (specified by $rrd_name in the following) exists and create it if it does not:

if (! -e $rrd_name) {
  # create the RRD file since it does not exist
    # store data at 60 second intervals
    "-s 60",
    # data fields. Each line defines one data source (DS)
    # that stores the measured value (GAUGE) at maximum 10 minute
    # intervals (600 seconds), and takes values from zero.
    # to infinity (U).
    "DS:sc_ _:GAUGE:600:0:U",
    # keep 10080 original samples (one week of data,
    # since one sample is made every minute)
    # keep 8760 values calculated by averaging every
    # 60 original samples (Each calculated value is one
    # day so that comes to one year.) 

Finally, we add the data to the RRD file:

RRDs::update($rrd_name, $time
    . ":" . $results{"Total Accesses"}
    . ":" . $results{"Total kBytes"}
    . ":" . $results{"CPULoad"}
    . ":" . $results{"Uptime"}
    . ":" . $results{"ReqPerSec"}
    . ":" . $results{"BytesPerSec"}
    . ":" . $results{"BytesPerReq"}
    . ":" . $results{"BusyWorkers"}
    . ":" . $results{"IdleWorkers"}
    . ":" . $results{"s_ _"}
    . ":" . $results{"s_s"}
    . ":" . $results{"s_r"}
    . ":" . $results{"s_w"}
    . ":" . $results{"s_k"}
    . ":" . $results{"s_d"}
    . ":" . $results{"s_c"}
    . ":" . $results{"s_l"}
    . ":" . $results{"s_g"}
    . ":" . $results{"s_i"}

Creating graphs from the information stored in the RRD file is the really fun part of the operation. Everyone loves the RRDtool because no skills are required to produce fabulous graphs. For example, the Perl code below creates a graph of the number of active and idle servers throughout a designated time period, such as the third graph shown in Figure 8-3. The graph is stored in a file specified by $pic_name.

    "-v Servers",
    "-s $start_time",
    "-e $end_time",
    # extracts the busyWorkers field from the RRD file
    # extracts the idleWorkers field from the RRD file
    # draws a filled area in blue
    "AREA:busy#0000ff:Busy servers",
    # draws a line in green
    "LINE2:idle#00ff00:Idle servers"

I decided to create four graphs out of the available data:

The graphs are shown in Figure 8-3. You may want to create other graphs, such as ones showing the uptime and the CPU load. Note: The live view of the web server statistics for apache.org are available at http://www.apachesecurity.net/stats/, where they will remain for as long as the Apache Foundation keeps their mod_status output public.

Two scripts, parts of which were shown above, are used to record the statistics and create graphs. Both are available from the web site for this book. One script, apache-monitor, fetches statistics from a server and stores them. It expects two parameters. The first specifies the (RRD) file in which the results should be stored, and the second specifies the web page from which server statistics are obtained. Here is a sample invocation:

$ apache-monitor /var/www/stats/apache.org http://www.apache.org/server-status/

For a web page that requires a username and password, you can embed these directly in the URL (e.g., http://username:password@www.example.com/server-status/). The script is smart enough to create a new RRD file if one does not exist. To get detailed statistics of the web server activity, configure cron to execute this script once a minute.

The second script, apache-monitor-graph, draws graphs for a given RRD file. It needs to know the path to the RRD file (given as the first parameter), the output folder (the second parameter), and the duration in seconds for the time period the graphs need to cover (the third parameter). The script calculates the starting time by deducting the given duration from the present time. The following invocation will create graphs for the last six hours:

$ apache-monitor-graph /var/www/stats/apache.org /var/www/stats/ 21600

Four files will be created and stored in the output folder, each showing a single graph:

$ cd /var/www/stats
$ ls

You will probably want to create several graphs to monitor the activity over different time periods. Use the values in seconds from Table 8-9.

Calling the graphing script every five minutes is sufficient. Having created the graphs, you only need to create some HTML code to glue them together if you want to show multiple graphs on a single page (see Figure 8-3).