> Apache Security: Chapter 12. Web Intrusion Detection


Contents
Previous
Next

12 Web Intrusion Detection

In spite of all your efforts to secure a web server, there is one part you do not and usually cannot control in its entirety: web applications. Web application design, programming, and maintenance require a different skill set. Even if you have the skills, in a typical organization these tasks are usually assigned to someone other than a system administrator. But the problem of ensuring adequate security remains. This final chapter suggests ways to secure applications by treating them as black boxes and examining the way they interact with the environment. The techniques that do this are known under the name intrusion detection.

This chapter covers the following:

Intrusion detection has been in use for many years. Its purpose is to detect attacks by looking at the network traffic or by looking at operating system events. The term intrusion prevention is used to refer to systems that are also capable of preventing attacks.

Today, when people mention intrusion detection, in most cases they are referring to a network intrusion detection system (NIDS). An NIDS works on the TCP/IP level and is used to detect attacks against any network service, including the web server. The job of such systems, the most popular and most widely deployed of all IDSs, is to monitor raw network packets to spot malicious payload. Host-based intrusion detection systems (HIDSs), on the other hand, work on the host level. Though they can analyze network traffic (only the traffic that arrives to that single host), this task is usually left to NIDSs. Host-based intrusion is mostly concerned with the events that take place on the host (such as users logging in and out and executing commands) and the system error messages that are generated. An HIDS can be as simple as a script watching a log file for error messages, as mentioned in Chapter 8. Integrity validation programs (such as Tripwire) are a form of HIDS. Some systems can be complex: one form of HIDS uses system call monitoring on a kernel level to detect processes that behave suspiciously.

Using a single approach for intrusion detection is insufficient. Security information management (SIM) systems are designed to manage various security-relevant events they receive from agents, where an agent can listen to the network traffic or operating system events or can work to obtain any other security-relevant information.

Because many NIDSs are in place, a large effort was made to make the most of them and to use them for web intrusion detection, too. Though NIDSs work well for the problems they were designed to address and they can provide some help with web intrusion detection, they do not and cannot live up to the full web intrusion detection potential for the following reasons:

  • NIDSs were designed to work with TCP/IP. The Web is based around the HTTP protocol, which is a completely new vocabulary. It comes with its own set of problems and challenges, which are different from the ones of TCP/IP.

  • The real problem is that web applications are not simple users of the HTTP protocol. Instead, HTTP is only used to carry the application-specific data. It is as though each application builds its own protocol on top of HTTP.

  • Many new protocols are deployed on top of HTTP (think of Web Services, XML-RPC, and SOAP), pushing the level of complexity further up.

  • Other problems, such as the inability of an NIDS to see through encrypted SSL channels (which most web applications that are meant to be secure use) and the inability to cope with a large amount of web traffic, make NIDSs insufficient tools for web intrusion detection.

Vendors of NIDSs have responded to the challenges by adding extensions to better understand HTTP. The term deep-inspection firewalls refers to systems that make an additional effort to understand the network traffic on a higher level. Ultimately, a new breed of IDSs was born. Web application firewalls (WAFs), also known as web application gateways, are designed specifically to guard web applications. Designed from the ground up to support HTTP and to exploit its transactional nature, web application firewalls often work as reverse proxies. Instead of going directly to the web application, a request is rerouted to go to a WAF first and only allowed to proceed if deemed safe.

Web application firewalls were designed from the ground up to deal with web attacks and are better suited for that purpose. NIDSs are better suited for monitoring on the network level and cannot be replaced for that purpose.

Though most vendors are focusing on supporting HTTP, the concept of application firewalls can be applied to any application and protocol. Commercial products have become available that act as proxies for other popular network protocols and for popular databases. (Zorp, at http://www.balabit.com/products/zorp/, available under a commercial and open source license, is one such product.)

Learn more about intrusion detection to gain a better understanding of common problems. I have found the following resources useful:

I already covered one form of web intrusion detection in Chapter 8. Log-based web intrusion detection makes use of the fact that web servers produce detailed access logs, where the information about every request is kept. It is also possible to create logs in special formats to control which data is collected. This cost-effective method introduces intrusion detection to a system but there is a drawback. Log-based web intrusion detection is performed only after transactions take place; therefore, attack prevention is not possible. Only detection is. If you can live with that (it is a valid decision and it depends on your threat model), then you only need to take a few steps to implement this technique:

  1. Make sure logging is configured and takes place on all web servers.

  2. Optionally reconfigure logging to log more information than that configured by default.

  3. Collect all logs to a central location.

  4. Implement scripts to examine the logs regularly, in real time or in batch mode (e.g., daily).

That is all there is to it. (Refer to Chapter 8 for a detailed discussion.)

With real-time intrusion detection, not only can you detect problems, but you can react to them as well. Attack prevention is possible, but it comes with a price tag of increased complexity and more time required to run the system. Most of this chapter discusses the ways of running real-time web intrusion detection. There are two approaches:

Which of these two you choose depends on your circumstances. The web server-based approach is easy to implement since it does not mandate changes to the network design and configuration. All that is needed is the addition of a module to the web server. But if you have many web servers, and especially if the network contains proprietary web servers, then having a single place from which to perform intrusion detection can be the more efficient approach. Though network-based web IDSs typically perform full separation of clients and servers, web server-based solutions can be described more accurately as separating clients from applications, with servers left unprotected in the middle. In this case, therefore, network-based protection is better because it can protect from flaws in web servers, too.

With Apache and mod_security you can choose either approach to real-time web intrusion detection. If network-based web intrusion detection suits your needs best, then you can build such a node by installing an additional Apache instance with mod_security to work in a reverse proxy configuration. (Reverse proxy operation is discussed in Chapter 9.) Aside from initial configuration, the two modes of operation are similar. The rest of this chapter applies equally to both.

Later in this chapter, I will present a web intrusion detection solution based on open source components. The advantage of using open source components is they are free and familiar (being based on Apache). Products from the commercial arena have more features, and they have nice user interfaces that make some tasks much easier. Here I will present the most important aspects of web IDSs, even if some features are present only in commercial products. I expect the open source products to catch up, but at this point a discussion of web intrusion detection cannot be complete without including features available only in commercial products. The following sections describe some common intrusion detection features.

If you have ever worked to develop a firewall policy, you may have been given (good) advice to first put rules in place to deny everything, and then proceed to allow what is safe. That is a positive security model. On the other side is a negative security model, in which everything that is not dangerous is allowed. The two approaches each ask a question:

A negative security model is used more often. You identify a dangerous pattern and configure your system to reject it. This is simple, easy, and fun, but not foolproof. The concept relies on you knowing what is dangerous. If there are aspects of the problem you are not aware of (which happens from time to time) then you have left a hole for the attacker to exploit.

A positive security model (also known as a white-list model) is a better approach to building policies and works well for firewall policy building. In the realm of web application security, a positive security model approach boils down to enumerating every script in the application. For each script in the list, you need to determine the following:

This is what programmers are supposed to do but frequently do not. Using the positive security model is better if you can afford to spend the time to develop it. One difficult aspect of this approach is that the application model changes as the application evolves. You will need to update the model every time a new script is added to the application or if an existing one changes. But it works well to protect stable, legacy applications that no one maintains anymore.

Automating policy development can ease problems:

Rule-based IDSs comprise the majority of what is available on the market. In principle, every request (or packet in the case of NIDS) is subject to a series of tests, where each test consists of one or more inspection rules. If a test fails, the request is rejected as invalid.

Rule-based IDSs are easy to build and use and are efficient when used to defend against known problems or when the task is to build a custom defense policy. But since they must know about the specifics of every threat to protect from it, these tools must rely on using extensive rule databases. Vendors maintain rule databases and distribute their tools with programs to update IDS installations automatically.

This approach is unlikely to be able to protect custom applications or to protect from zero-day exploits (exploits that attack vulnerabilities not yet publicly known). This is where anomaly-based IDSs work better.

The idea behind anomaly-based protection is to build a protection layer that will observe legal application traffic and then build a statistical model to judge the future traffic against. In theory, once trained, an anomaly-based system should detect anything out of the ordinary. With anomaly-based protection, rule databases are not needed and zero-day exploits are not a problem. Anomaly-based protection systems are difficult to build and are thus rare. Because users do not understand how they work, many refuse to trust such systems, making them less popular.

The stateless nature of the HTTP protocol has many negative impacts on web application security. Sessions can and should be implemented on the application level, but for many applications the added functionality is limited to fulfilling business requirements other than security. Web IDSs, on the other hand, can throw their full weight into adding various session-related protection features. Some of the features include:

Enforcement of entry points

At most web sites, you can start browsing from any site URL that is known to you. This is often convenient for attackers and inconvenient for defenders. An IDS that understands sessions will realize the user is making his first request and redirect him back to the default entry point (possibly logging the event).

Observation of each user session individually

Being able to distinguish one session from another opens interesting possibilities, e.g., it becomes possible to watch the rate at which requests are made and the way users navigate through the application going from one page to another. Looking at the behavior of just one user it becomes much easier to detect intrusion attempts.

Detecting and responding to brute-force attacks

Brute-force attacks normally go undetected in most web applications. With state management in place, an IDS tracks unusual events (such as login failures), and it can be configured to take action when a threshold is reached. It is often convenient to slow down future authentication attempts slightly, not enough for real users to notice but enough to practically stop automated scripts. If an authentication script takes 50 milliseconds to make a decision, a script can make around 20 attempts per second. If you introduce a delay of, say, one second, that will bring the speed to under one attempt per second. That, combined with an alert to someone to investigate further, would provide a decent defense.

Implementation of session timeouts

Sessions can be expired after the default timeout expires, and users would be required to re-authenticate. Users can be logged out after a time of inactivity.

Detection and prevention of session hijacking

In most cases, session hijacking results in a change of IP address and some other request data (that is, request headers are likely to be different). A stateful monitoring tool can detect the anomalies and prevent exploitation from taking place. The recommended action to take is to terminate the session, ask the user to re-authenticate, and log a warning.

Allowing only links provided to the client in the previous request

Some tools can be strict and only allow users to follow the links that have been given in the previous response. This seems like an interesting feature but can be difficult to implement. One problem with it is that it prevents the user from using more than one browser window with the application. Another problem is that it can cause incompatibilities with applications using JavaScript to construct links dynamically.

One area where network-based IDSs have had trouble with web traffic is with respect to evasion techniques (see Chapter 10). The problem is there are so many ways to alter incoming (attack) data, so it keeps the original meaning and the application interprets it, but it is modified sufficiently to sneak under the IDS radar. This is an area where dedicated web IDSs are providing significant improvement. For example, just by looking at whole HTTP requests at a time, an entire class of attacks based on request fragmentation is avoided. And because they understand HTTP well and can separate dynamic requests from requests for static resources (and so choose not to waste time protecting static requests that cannot be compromised), they can afford to apply many different anti-evasion techniques that would prove too time consuming for NIDSs.

mod_security is a web application firewall module I developed for the Apache web server. It is available under the open source GPL license, with commercial support and commercial licensing as an option. I originally designed it as a means to obtain a proper audit log, but it grew to include other security features. There are two versions of the module, one for each major Apache branch, and they are almost identical in functionality. In the Apache 2 version, mod_security uses the advanced filtering API available in that version, making interception of the response body possible. The Apache 2 version is also more efficient in terms of memory consumption. In short, mod_security does the following:

In this section, I present a deployment guide for mod_security, but the principles behind it are the same and can be applied to any web application firewall. For a detailed reference manual, visit the project documentation area at http://www.modsecurity.org/documentation/.

The basic ingredients of every mod_security configuration are:

The purpose of this section is to present enough information as to how these ingredients interact with each other to enable you to configure and use mod_security. The subsequent sections will cover some advanced topics to give you more insight needed in some specific cases.

To install mod_security, you need to compile it using the apxs tool, as you would any other module. Some contributors provide system-specific binaries for download, and I put links to their web sites at http://www.modsecurity.org/download/. If you have installed Apache from source, apxs will be with other Apache binaries in the /usr/local/apache/bin/ folder. If you cannot find the apxs tool on your system, examine the vendor-provided documentation to learn how to add it. For example, on Red Hat systems apxs is a part of the httpd-devel package.

Position to the correct source code directory (there’s one directory for each Apache branch) and execute the following commands:

# /usr/local/apache/bin/apxs -cia mod_security.c
# /usr/local/apache/bin/apachectl stop
# /usr/local/apache/bin/apachectl start

After having restarted Apache, mod_security will be active but disabled. I recommend the following configuration to enable it with minimal chances of denying legitimate requests. You can enable mod_security with fewer configuration directives. Most options have default settings that are the same as the following configurations, but I prefer to configure things explicitly rather than wonder if I understand what the default settings are:

# Enable mod_security
SecFilterEngine On
   
# Retrieve request payload
SecFilterScanPOST On
   
# Reasonable automatic validation defaults
SecFilterCheckURLEncoding On
SecFilterCheckCookieFormat Off
SecFilterNormalizeCookies Off
SecFilterCheckUnicodeEncoding Off
   
# Accept almost all byte values
SecFilterForceByteRange 1 255
   
# Reject invalid requests with status 403
SecFilterDefaultAction deny,log,status:403
   
# Only record the relevant information 
SecAuditEngine RelevantOnly
SecAuditLog /var/www/logs/audit_log
   
# Where to store temporary and intercepted files
SecUploadDir /var/www/logs/files/
# Do not store intercepted files for the time being
SecUploadKeepFiles Off
   
# Use 0 for the debug level in production
# and 4 for testing
SecFilterDebugLog /var/www/logs/modsec_debug_log
SecFilterDebugLevel 4

Starting from the top, this configuration data enables mod_security and tells it to intercept request bodies, configures settings for various encoding validation and anti-evasion features (explained below), configures the default action list to handle invalid requests, and configures the two log types.

After adding the configuration data to your httpd.conf file, make a couple of requests to the web server and examine the audit_log and modsec_debug_log files. Without any rules configured, there won’t be much output in the debug log but at least you will be certain the module is active.

You must understand what mod_security does and in what order for every request. Generally, processing consists of four phases:

Initialization

At the beginning of this phase, mod_security determines whether it should process the request. No processing will be performed unless the module is explicitly enabled in configuration (via SecFilterEngine On). Similarly, if the module is configured only to process dynamic requests (via SecFilterEngine DynamicOnly) and the current request is for a static resource, processing will end immediately.

If the processing is to continue, the module will initialize its structures, read in the complete request body (if one is present and if request body buffering is enabled), and perform initial request validation as defined in the configuration. The initial request validation covers the whole of the request: the first line, the headers, and the parameters. If any part of the request fails validation, the request will be rejected. This will happen even if the default action (configured using the SecFilterDefaultAction directive) is configured to allow requests to proceed in case of a rule match. This exception is necessary for mod_security to have consistent internal structures to base the rest of processing on. If you do not want a request to be rejected under any circumstances, then disable all encoding validation options.

Input analysis

In the input analysis phase, the rule engine is activated to apply rules to the requests and perform actions specified in the configuration. If the request passes this phase, Apache will call the request handler to process the request.

Output analysis

The output analysis phase exists only in the Apache 2 version of the module and only occurs if output buffering is enabled. In that case, mod_security intercepts output and stores it until the entire response is generated. After that, the rule engine is activated again but this time to analyze the response data.

Logging

The logging phase is the last to take place. This phase does not depend on previous phases. For example, the mod_security rule engine may be turned off but the audit engine may continue to work. Similar to what takes place at the beginning of the initialization phase, the first task that is performed at the beginning of the logging phase is to determine whether logging should take place, based on your configuration.

As mentioned in Chapter 10, evasion techniques can be used to sneak in malicious payload undetected by web intrusion detection software. To counter that, mod_security performs the following anti-evasion techniques automatically:

  • Decodes URL-encoded text (e.g., changing %26 to &)

  • Converts Windows folder separation characters to Unix folder separation characters (\ to /)

  • Removes self references (converting /./ to /)

  • Removes redundant folder separation characters (e.g., changing // to /)

  • Changes content to lowercase

  • Converts null bytes to spaces

In some ways, encoding validation can be treated as anti-evasion. As mentioned previously, web servers and applications are often very flexible and allow invalid requests to be processed anyway. Using one of the following encoding validation options, it is possible to restrict what is accepted:

URL encoding validation

Certain invalid URL encodings (e.g., %XV, as explained in Chapter 10) can be used to bypass application security mechanisms. When URL encoding validation is turned on for mod_security, requests will be rejected if any of the two possible invalid encoding situations are encountered: invalid hexadecimal numbers or missing hexadecimal numbers.

Unicode encoding validation

Invalid or overlong Unicode characters are often dangerous. Turning on Unicode encoding validation can detect three types of problems: invalid characters, missing bytes, and overlong characters. This type of validation is off by default since many applications do not understand Unicode, and it is not possible to detect whether they do by looking at a request. Applications that are not Unicode aware sometimes use character combinations that are valid but that resemble special Unicode characters. Unicode validation would interpret such combinations as attacks and lead to false positives.

Cookie format validation

This option enforces strict cookie formats. It is disabled by default.

Cookie value normalization

Cookie values are often URL encoded though such encoding is not mandated by the specification. Performing normalization (which includes all anti-evasion actions) on the value allows a rule to see through the encoding. However, if URL encoded cookies are not used, false positives are possible. Enable cookie value normalization only if appropriate.

Byte range validation

Some applications use a small range of byte values (such as 0-255). For example, applications designed only for the English-speaking population might only use values between 32 and 126, inclusive. Restricting the bytes that can be used in a request to a small range can be beneficial as it reduces the chances of successful buffer overflow attack. This validation option is controlled with the SecFilterForceByteRange directive (as described in the Section 12.2.5.2).

The best part of mod_security is its flexible rule engine. In the simplest form, a rule requires only a single keyword. The SecFilter directive performs a broad search against the request parameters, as well as against the request body for POST requests:

SecFilter KEYWORD

If the keyword is detected, the rule will be triggered and will cause the default action list to be executed.

The keyword is actually a regular expression pattern. Using a simple string, such as 500, will find its occurrence anywhere in the search content. To make full use of mod_security, learn about regular expressions. If you are unfamiliar with them, I suggest the link http://www.pcre.org/pcre.txt as a good starting point. If you prefer a book, check out Mastering Regular Expressions by Jeffrey E. F. Friedl (O’Reilly), which is practically a regular expression reference guide.

Here are a couple of points I consider important:

I will demonstrate what can be done with regular expressions with a regular expression pattern you will find useful in the real world: ^[0-9]{1,9}$. This pattern matches only numbers and only ones that have at least one but up to nine digits.

Although broad rules are easy to write, they usually do not work well in real life. Their use significantly increases the chances of introducing false positives and reducing system availability to its legitimate users (not to mention the annoyance they cause). A much better approach to rule design is to consider the impact and only apply rules to certain parts of HTTP requests. This is what SecFilterSelective is for. For example, the following rule will look for the keyword only in the query string:

SecFilterSelective QUERY_STRING KEYWORD

The QUERY_STRING variable is one of the supported variables. The complete list is given in Tables Table 12-1 (standard variables available for use with mod_rewrite or CGI scripts) and Table 12-2 (extended variables specific to mod_security). In most cases, the variable names are the same as those used by mod_rewrite and the CGI specification.

Table 12-1. Standard rule variables

Variable name

Description

REMOTE_ADDR

IP address of the client.

REMOTE_HOST

Host name of the client, when available.

REMOTE_USER

Authenticated username, when available.

REMOTE_IDENT

Remote username (provided by the identd daemon but almost no one uses it any more).

REQUEST_METHOD

Request method (e.g., GET, POST).

SCRIPT_FILENAME

Full system path for the script being executed.

PATH_INFO

The extra part of the URI given after the script name. For example, if the URI is /view.php/5, the value of PATH_INFO is /5.

QUERY_STRING

The part of the URI after the question mark, when available (e.g. id=5).

AUTH_TYPE

The string Basic or Digest, when available.

DOCUMENT_ROOT

Path to the document root, as specified with the DocumentRoot directive.

SERVER_ADMIN

The email address of the server administrator, as specified with the ServerAdministrator directive.

SERVER_NAME

The hostname of the server, as specified with the ServerName directive.

SERVER_ADDR

The IP address of the server where the request was received.

SERVER_PORT

Server port where the request was received.

SERVER_PROTOCOL

The protocol specified in the request (e.g., HTTP/1.1).

SERVER_SOFTWARE

Apache version, as configured with ServerTokens.

TIME_YEAR

Current year (e.g., 2004).

TIME_MON

Current month as a number (e.g., 10 for October).

TIME_DAY

Current day of month as a number.

TIME_HOUR

Current hour as a number in a 24-hour day (e.g., 14 for 2 PM).

TIME_MIN

Current minute.

TIME_SEC

Current second.

TIME_WDAY

Current weekday as a number (e.g., 4 for Thursday when Monday is considered to be the first day of the week).

TIME

Current time as a combination of individual elements listed above in the form YmdHMS (e.g., 20041014144619 for October 14 2004, 14:46:19).

THE_REQUEST

Complete first line of the request (e.g., GET /view.php?id=5 HTTP/1.0).

REQUEST_URI

The second token on the request line (e.g., /view.php?id=5).

REQUEST_FILENAME

A synonym for SCRIPT_FILENAME.

Table 12-2. Extended rule variables

Variable Name

Description

POST_PAYLOAD

Gives access to the raw request body except for requests using the multipart/form-data encoding (which is required for file uploads). In such cases, the request body will probably contain binary data and interfere with regular expressions. To get around this problem, mod_security takes the original request apart and re-creates and gives access to a fake request body in the application/x-form-urlencoded format, effectively hiding the differences between the two formats.

HTTP_ headername

Value of the header headername. The prefix HEADER_ (in place of HTTP_) will also work.

ENV_ envname

Value of the environment variable envname.

ARG_ varname

Value of the parameter varname.

ARGS

Gives direct access to a single string containing all parameters and their values, which is equal to the combined value of QUERY_STRING and POST_PAYLOAD. (The request body will be faked if necessary, as discussed above.)

ARGS_COUNT

Number of parameters in the request.

ARGS_NAMES

List of the names of all parameters given to the script.

ARGS_VALUES

List of the values of all parameters given to the script.

FILE_NAME_ varname

The filesystem name of the file contained in the request and associated with the script parameter varname.

FILE_SIZE_ varname

The size of file uploaded in the parameter varname.

FILES_COUNT

Number of files contained in the request.

FILES_NAMES

List of the filesystem names of all files contained in the request.

FILES_SIZES

List of the sizes of all files.

HEADERS

List of all request headers, in the form “Name: Value“.

HEADERS_COUNT

Number of headers in the request.

HEADERS_NAMES

List of the names of all headers in the request.

HEADERS_VALUES

List of the values of all headers in the request.

SCRIPT_UID

The uid of the owner of the script that will handle the request.

SCRIPT_GID

The gid of the group of the script that will handle the request.

SCRIPT_USERNAME

The username equivalent to the uid. Using a username is slower than using a uid since mod_security needs to perform a lookup every time.

SCRIPT_GROUPNAME

The group name equivalent to the gid. Using a group name is slower than using a gid as well.

SCRIPT_MODE

Script permissions, in the standard Unix format, with four digits with a leading zero (e.g., 0755).

COOKIE_ cookiename

Value of the cookie cookiename.

COOKIES_COUNT

Number of cookies in the request.

COOKIES_NAMES

List of the names of all cookies given to the script.

COOKIES_VALUES

List of the values of all cookies given to the script.

When using selective rules, you are not limited to examining one field at a time. You can separate multiple variable names with a pipe. The following rule demonstrates how to access named parts of the request, in this example, a parameter and a cookie:

# Look for the keyword in the parameter "authorized"
# and in the cookie "authorized". A match in either of
# them will trigger the rule.
SecFilterSelective ARG_authorized|COOKIE_authorized KEYWORD

If a variable is absent in the current request the variable will be treated as empty. For example, to detect the presence of a variable, use the following format, which triggers execution of the default action list if the variable is not empty:

SecFilterSelective ARG_authorized !^$

A special syntax allows you to create exceptions. The following applies the rule to all parameters except the parameter html:

SecFilterSelective ARGS|!ARG_html KEYWORD

Finally, single rules can be combined to create more complex expressions. In my favorite example, I once had to deploy an application that had to be publicly available because our users were located anywhere on the Internet. The application has a powerful, potentially devastating administration account, and the login page for users and for the administrator was the same. It was impossible to use other access control methods to restrict administrative logins to an IP address range. Modifying the source code was not an option because we had no access to it. I came up with the following two rules:

SecFilterSelective ARG_username ^admin$ chain
SecFilterSelective REMOTE_ADDR !^192\.168\.254\.125$

The first rule triggers whenever someone tries to log in as an administrator (it looks for a parameter username with value admin). Without the optional action chain being specified, the default action list would be executed. Since chain is specified, processing continues with execution of the second rule. The second rule allows the request to proceed if it is coming from a single predefined IP address (192.168.254.125). The second rule never executes unless the first rule is satisfied.

You can do many things when an invalid request is discovered. The SecFilterDefaultAction determines the default action list:

# Reject invalid requests with status 403
SecFilterDefaultAction deny,log,status:403

You can override the default action list by supplying a list of actions to individual rules as the last (optional) parameter:

# Only log a warning message when the KEYWORD is found
SecFilter KEYWORD log,pass

The full list of supported actions is given in Table 12-3.

Table 12-3. mod_security action list

Action

Description

allow

Skip over the remaining rules and allow the request to be processed.

auditlog

Log the request to the audit log.

chain

Chain the current rule with the one that follows. Process the next rule if the current rule matches. This feature allows many rules to be used as one, performing a logical AND.

deny

Deny request processing.

exec:filename

Execute the external script specified by filename on rule match.

id:n

Assign a unique ID n to the rule. The ID will appear in the log. Useful when there are many rules designed to handle the same problem.

log

Log the rule match. A message will go into the Apache error log and into the audit log (if such logging is enabled).

msg:text

Assign a message text to the rule, which will appear in the log.

noauditlog

Do not log the request to the audit log. All requests that trigger a rule will be written to the audit log by default (unless audit logging is completely disabled by configuration). This action should be used when you don’t want a request to appear in the audit log (e.g., it may be too long and you do not need it).

nolog

Do not log the rule match.

pass

Proceed to the next rule in spite of the current rule match. This is useful when you want to perform some action but otherwise don’t want to reject the request.

pause:n

Pause for n milliseconds on rule match. Be careful with this one; it makes it easy to DoS yourself by having many Apache processes sleep for too long a time.

redirect:url

Perform a redirection to the address specified by url when a request is denied.

setenv:name=value

Set the environment variable name to value. The value is optional. 1 is used if the parameter is omitted.

skipnext:n

On rule match skip the next n rules (or just one if the parameter is omitted).

status:n

Configure the status n to be used to deny the request.

There are three places where, depending on the configuration, you may find mod_security logging information:

Here is an example of an error message resulting from invalid content discovered in a cookie:

[Tue Oct 26 17:44:36 2004] [error] [client 127.0.0.1]
mod_security: Access denied with code 500. Pattern match "!(^$|^[a-zA-Z0-9]+$)"
at COOKIES_VALUES(sessionid) [hostname "127.0.0.1"]
[uri "/cgi-bin/modsec-test.pl"] [unique_id bKjdINmgtpkAADHNDC8AAAAB]

The message indicates that the request was rejected (“Access denied”) with an HTTP 500 response because the content of the cookie sessionid contained content that matched the pattern !(^$|^[a-zA-Z0-9]+$). (The pattern allows a cookie to be empty, but if it is not, it must consist only of one or more letters and digits.)

In addition to the basic information presented in the previous sections, some additional (important) aspects of mod_security operation are presented here.

The use of mod_security results in increased memory consumption by the Apache web server. The increase can be very small, but it can be very big in some rare circumstances. Understanding why it happens will help you avoid problems in those rare circumstances.

When mod_security is not active, Apache only sees the first part of the request: the request line (the first line of the request) and the subsequent headers. This is enough for Apache to do its work. When request processing begins, the module that does the processing feeds the request body to where it needs to be consumed. In the case of PHP, for example, the request body goes directly to PHP. Apache almost never sees it. With mod_security enabled, it becomes a requirement to have access to the complete request body before processing begins. That is the only approach that can protect the application. (Early versions of mod_security did look at the body bit by bit but that proved to be insufficient.) That is why mod_security reads the complete request into its own buffer and later feeds it from there to the processing module. Additional memory space is needed so that the anti-evasion processing can take place. A buffer twice the size of the request body is required by mod_security to complete processing.

In most cases, this is not a problem since request bodies are small. The only case when it can be a problem is when file upload functionality is required. Files can be quite large (sizes of over 100 MB are not unheard of), and mod_security will want to put all of them into memory, twice. If you are running Apache 1, there is no way around this but to disable request body buffering (as described near the end of this chapter) for those parts of the application where file upload takes place. You can also (and probably should) limit the maximum size of the body by using the Apache configuration directive LimitRequestBody. But there is good news for the users of Apache 2. Because of its powerful content filtering API, mod_security for Apache 2 is able to stream the request body to the disk if its size is larger than a predefined value (using the directive SecUploadInMemoryLimit , set to 64 KB by default), so increased memory consumption does not take place. However, mod_security will need to store the complete request to the disk and read it again when it sends it forward for processing.

A similar thing happens when you enable output monitoring (described later in this chapter). Again, the output cannot and will not be delivered to the client until all of it is available to mod_security and after the analysis takes place. This process introduces response buffering. At the moment, there is no way to limit the amount of memory spent doing output buffering, but it can be used in a controlled manner and only enabled for HTML or text files, while disabled for binary files, via output filtering, described later in this chapter.

Although mod_security supports the exec action, which allows a custom script to be executed upon detecting an invalid action, Apache offers two mechanisms that allow for tight integration and more flexibility.

One mechanism you should use is the ErrorDocument, which allows a script to be executed (among other things) whenever request processing returns with a particular response status code. This feature is frequently used to create a “Page not found” message. Depending on your security policy, the same feature can be used to explain that the security system you put in place believes something funny is going on and, therefore, decided to reject the request. At the same time, you can add code to the script to do something else, for example, to send a notification somewhere. An example script for Apache integration comes with the mod_security distribution.

The other thing you can do is add mod_unique_id (distributed with Apache and discussed in Chapter 8) into your configuration. After you do, this module will generate a unique ID (guaranteed to be unique within the server) for every request, storing it in the environment variable UNIQUE_ID (where it will be picked up by mod_security). This feature is great to enable you to quickly find what you are looking for. I frequently use it in the output of an ErrorDocument script, where the unique ID is presented to the user with the instructions to cite it as reference when she complains to the support group. This allows you to quickly and easily pinpoint and solve the problem.

Deploying a web firewall for a known system requires planning and careful execution. It consists of the following steps:

Probably the best advice I can give is for you to learn about the system you want to protect. I am asked all the time to provide an example of a tight mod_security configuration, but I hesitate and almost never do. Intrusion detection (like many other security techniques) is not a simple, fire-and-forget, solution in spite of what some commercial vendors say. Incorrect rules, when deployed, will result in false positives that waste analysts’ time. When used in prevention mode, false positives result in reduced system availability, which translates to lost revenue (or increased operations expenses, depending on the way you look at it).

In step 2, you need to decide whether intrusion detection can bring a noticeable increase in security. This is not the same as what I previously discussed in this chapter, that is, whether intrusion detection is a valid tool at all. Here, the effort of introducing intrusion detection needs to be weighed against other ways to solve the problem. First, understand the time commitment intrusion detection requires. If you cannot afford to follow up on all alerts produced by the system and to work continuously to tweak and improve the configuration, then you might as well give up now. The other thing to consider is the nature and the size of the system you want to protect. For smaller applications for which you have the source code, invest in a code review and fix the problems in the source code.

Establishing a protection policy is arguably the most difficult part of the work. You start with the list of weaknesses you want to protect and, having in mind the capabilities of the protection software, work out a feasible protection plan. If it turns out the tool is not capable enough, you may look for a better tool. Work on the policy is similar to the process of threat modeling discussed in Chapter 1.

Installation and configuration is the easy part and already covered in detail here. You need to work within the constraints of your selected tool to implement the previously designed policy. The key to performing this step is to work on a development server first and to test the configuration thoroughly to ensure the protection rules behave as you would expect them to. In the mod_security distribution is a tool ( run_test.pl) that can be used for automated tests. As a low-level tool, run_test.pl takes a previously created HTTP request from a text file, sends it to the server, and examines the status code of the response to determine the operation’s success. Run regression tests periodically to test your IDS.

Deploying in detection mode only is what you do to test the configuration in real life in an effort to avoid causing disturbance to normal system operation. For several weeks, the IDS should only send notifications without interrupting the requests. The configuration should then be fine-tuned to reduce the false positives rate, hopefully to zero. Once you are confident the protection is well designed (do not hurry), the system operation mode can be changed to prevention mode. I prefer to use the prevention mode only for problems I know I have. In all other cases, run in the detection mode at least for some time and see if you really have the problems you think you may have.

There is a set of rules I normally use as a starting point in addition to the basic configuration given earlier. These rules are not meant to protect from direct attacks but rather to enforce strict HTTP protocol usage and make it more difficult for attackers to make manual attacks. As I warned, these rules may not be suitable for all situations. If you are running a public web site, there will be all sorts of visitors, including search engines, which may be a little bit eccentric in the way they send HTTP requests that are normal. Tight configurations usually work better in closed environments.

# Accept only valid protocol versions, helps
# fight HTTP fingerprinting.
SecFilterSelective SERVER_PROTOCOL !^HTTP/(0\.9|1\.0|1\.1)$
   
# Allow supported request methods only.
SecFilterSelective REQUEST_METHOD !^(GET|HEAD|POST)$
   
# Require the Host header field to be present.
SecFilterSelective HTTP_Host ^$
   
# Require explicit and known content encodings for methods
# other than GET or HEAD. The multipart/form-data encoding
# should not be allowed at all if the application does not
# make use of file upload. There are many automated attacks
# out there that are using wrong encoding names.
SecFilterSelective REQUEST_METHOD !^(GET|HEAD)$ chain
SecFilterSelective HTTP_Content-Type \
!(^application/x-www-form-urlencoded$|^multipart/form-data;)
   
# Require Content-Length to be provided with
# every POST request. Length is a requirement for
# request body filtering to work.
SecFilterSelective REQUEST_METHOD ^POST$ chain
SecFilterSelective HTTP_Content-Length ^$
   
# Don't accept transfer encodings we know we don't handle
# (you probably don't need them anyway).
SecFilterSelective HTTP_Transfer-Encoding !^$

You may also choose to add some of the following rules to warn you of requests that do not seem to be from common browsers. Rules such as these are suited for applications where the only interaction is expected to come from users using browsers. On a public web site, where many different types of user agents are active, they result in too many warnings.

# Most requests performed manually (e.g., using telnet or nc)
# will lack one of the following headers.
# (Accept-Encoding and Accept-Language are also good
# candidates for monitoring since popular browsers
# always use them.)
SecFilterSelective HTTP_User-Agent|HTTP_Connection|HTTP_Accept ^$ log,pass
   
# Catch common nonbrowser user agents.
SecFilterSelective HTTP_User-Agent \
(libwhisker|paros|wget|libwww|perl|curl) log,pass

Ironically, your own monitoring tools are likely to generate error log warnings. If you have a dedicated IP address from which you perform monitoring, you can add a rule to skip the warning checks for all requests coming from it. Put the following rule just above the rules that produce warnings:

# Allow requests coming from 192.168.254.125
SecFilterSelective REMOTE_ADDR ^192.168.254.125$ allow

Though you could place this rule on the top of the rule set, that is a bad idea; as one of the basic security principles says, only establish minimal trust.

Web IDSs are good at enforcing strict protocol usage and defending against known application problems. Attempts to exploit common web application problems often have a recognizable footprint. Pattern matching can be used to detect some attacks but it is generally impossible to catch all of them without having too many false positives. Because of this, my advice is to use detection only when dealing with common web application attacks. There is another reason to adopt this approach: since it is not possible to have a foolproof defense against a determined attacker, having a tight protection scheme will force such an attacker to adopt and use evasion methods you have not prepared for. If that happens, the attacker will become invisible to you. Let some attacks through so you are aware of what is happening.

The biggest obstacle to reliable detection is the ability for users to enter free-form text, and this is common in web applications. Consequently, content management systems are the most difficult ones to defend. (Users may even be discussing web application security in a forum!) When users are allowed to enter arbitrary text, they will sooner or later attempt to enter something that looks like an attack.

In this section, I will discuss potentially useful regular expression patterns without going into details as to how they are to be added to the mod_security configuration since the method of adding patterns to rules has been described. (If you are not familiar with common web application attacks, reread Chapter 10.) In addition to the patterns provided here, you can seek inspiration in rules others have created for nonweb IDSs. (For example, rules for Snort, a popular NIDS, can be found at http://www.snort.org and http://www.bleedingsnort.com.)

Database attacks are executed by sneaking an SQL query or a part of it into request parameters. Attack detection must, therefore, attempt to detect commonly used SQL keywords and metacharacters. Table 12-4 shows a set of patterns that can be used to detect database attacks.

Table 12-4. Patterns to detect SQL injection attacks

Pattern

Query example

delete[[:space:]]+from

DELETE FROM users

drop[[:space:]]+table

DROP TABLE users

create[[:space:]]+table

CREATE TABLE newusers

update.+set.+=

UPDATE users SET balance = 1000

insert[[:space:]]+into.+values

INSERT INTO users VALUES (1, ’admin')

select.+from

SELECT username, balance FROM users

union.+select

Appends to an existing query: ... UNION ALL SELECT username FROM users

or.+1[[:space:]]*= [[:space:]]1

Attempt to modify the original query to always be true: SELECT * FROM users WHERE username = 'admin' and password = 'xxx' OR 1=1--

.+--

Attempt to escape out of a string and inject a query, and then comment out the rest of the original query: SELECT * FROM users WHERE username = 'admin' OR username= 'guest' --

So far, I have presented generic SQL patterns. Most databases have proprietary extensions of one kind or another, which require keywords that are often easier to detect. These patterns differ from one database to another, so creating a good set of detection rules requires expertise in the deployed database. Table 12-5 shows some interesting patterns for MSSQL and MySQL.

Table 12-5. Database-specific detection patterns

Pattern

Attack

exec.+xp_

MSSQL. Attempt to execute an extended stored procedure: EXEC xp_cmdshell.

exec.+sp_

MSSQL. Attempt to execute a stored procedure: EXEC sp_who.

@@[[:alnum:]]+

MSSQL. Access to an internal variable: SELECT @@version.

into[[:space:]]+outfile

MySQL. Attempt to write contents of a table to disk: SELECT * FROM '/tmp/users’.

load[[:space:]]+data

MySQL. Attempt to load a file from disk: LOAD DATA INFILE '/tmp/users' INTO TABLE users.

Cross-site scripting (XSS) attacks can be difficult to detect when launched by those who know how to evade detection systems. If the entry point is in the HTML, the attacker must find a way to change from HTML and into something more dangerous. Danger comes from JavaScript, ActiveX components, Flash programs, or other embedded objects. The following list of problematic HTML tags is by no means exhaustive, but it will prove the point:

Your best bet is to try to detect any HTML in the parameters and also the special JavaScript entity syntax that only works in Netscape. If a broad pattern such as <.+> is too broad for you, you may want to list all possible tag names and detect them. But if the attacker can sneak in a tag, then detection becomes increasingly difficult because of many evasion techniques that can be used. From the following two evasion examples, you can see it is easy to obfuscate a string to make detection practically impossible:

If the attacker can inject content directly into JavaScript, the list of evasion options is even longer. For example, he can use the eval( ) function to execute an arbitrary string or the document.write() function to output HTML into the document:

Now you understand why you should not stop attackers too early. Knowing you are being attacked, even successfully attacked, is sometimes better than not knowing at all. A useful collection list of warning patterns for XSS attacks is given in Table 12-6. (I call them warning patterns because you probably do not want to automatically reject requests with such patterns.) They are not foolproof but cast a wide net to catch potential abuse. You may have to refine it over time to reduce false positives for your particular application.

Table 12-6. XSS attack warning patterns

&#[[0-9a-fA-F]]{2}

eval[[:space:]]*(

onKeyUp

\x5cx[0-9a-fA-F]{2}

fromCharCode

onLoad

<.+>

http-equiv

onMouseDown

<applet

javascript:

onMouseOut

<div

onAbort

onMouseOver

<embed

onBlur

onMouseUp

<iframe

onChange

onMove

<img

onClick

onReset

<meta

onDblClick

onResize

<object

onDragDrop

onSelect

<script

onError

onSubmit

document.cookie

onFocus

onUnload

document.write

onKeyDown

style[[:space:]]*=

dynsrc

onKeyPress

vbscript:

I conclude this chapter with a few advanced topics. These topics are regularly the subject of email messages I get about mod_security on the users’ mailing list.

Since mod_security understands the multipart/form-data encoding used for file uploads, it can extract the uploaded files from the request and store them for future reference. In a way, this is a form of audit logging (see Chapter 8). mod_security offers another exciting feature: validation of uploaded files in real time. All you need is a script designed to take the full path to the file as its first and only parameter and to enable file validation functionality in mod_security:

SecUploadApproveScript /usr/local/apache/bin/upload_verify.pl

The script will be invoked for every file upload attempt. If the script returns 1 as the first character of the first line of its output, the file will be accepted. If it returns anything else, the whole request will be rejected. It is useful to have the error message (if any) on the same line after the first character as it will be printed in the mod_security log. File upload validation can be used for several purposes:

  • To inspect uploaded files for viruses or other types of attack

  • To allow only files of certain types (e.g., images)

  • To inspect and validate file content

If you have the excellent open source antivirus program Clam AntiVirus (http://www.clamav.net) installed, then you can use the following utility script as an interface:

#!/usr/bin/perl
   
$CLAMSCAN = "/usr/bin/clamscan";
   
if (@ARGV != 1) {
    print "Usage: modsec-clamscan.pl <filename>\n";
    exit;
}
   
my ($FILE) = @ARGV;
   
$cmd = "$CLAMSCAN --stdout --disable-summary $FILE";
$input = `$cmd`;
$input =~ m/^(.+)/;
$error_message = $1;
   
$output = "0 Unable to parse clamscan output";
   
if ($error_message =~ m/: Empty file\.$/) {
    $output = "1 empty file";
}
elsif ($error_message =~ m/: (.+) ERROR$/) {
    $output = "0 clamscan: $1";
}
elsif ($error_message =~ m/: (.+) FOUND$/) {
    $output = "0 clamscan: $1";
}
elsif ($error_message =~ m/: OK$/) {
    $output = "1 clamscan: OK";
}
   
print "$output\n";

When mod_security operates from within Apache (as opposed to working as a network gateway), it can obtain more information about requests. One useful bit of information is the choice of a module to handle the request (called a handler). In the early phases of request processing, Apache will look for candidate modules to handle the request, usually by looking at the extension of the targeted file. If a handler is not found, the request is probably for a static file (e.g., an image). Otherwise, the handler will probably process the file in some way (for example, executing the script in the case of PHP) and dynamically create a response. Since mod_security mostly serves the purpose of protecting dynamic resources, this information can be used to perform optimization. If you configure the SecFilterEngine directive with the DynamicOnly parameter then mod_security will act only on those requests that have a handler attached to them.

# Only process dynamic requests
SecFilterEngine DynamicOnly

Unfortunately, it is possible to configure Apache to serve dynamic content and have the handler undefined, by misusing its AddType directive. Even the official PHP installation guide recommends this approach. If that happens, mod_security will not be able to determine which requests are truly dynamic and will not be able to protect them. The correct approach is to use the AddHandler directive, as in this example for PHP:

AddHandler application/x-httpd-php .php

Relying on the existence of a request handler to decide whether to protect a resource can be rewarding, but since it can be dangerous if handlers are not configured correctly, check if relying on handlers really works in your case. You can do this by having a rule that rejects every request (in which case it will be obvious whether mod_security works) or by looking at what mod_security writes to the debug log (where it will state if it believes the incoming request is for a static resource).

Though most of this chapter used negative security model protection for examples, you can deploy mod_security in a positive security model configuration. A positive security model relies on identifying requests that are safe instead of looking for dangerous content. In the following example, I will demonstrate how this approach can be used by showing the configuration for two application scripts. For each script, the standard Apache container directive <Location> is used to enclose mod_security rules that will only be applied to that script. The use of the SecFilterSelective directive to specify rules has previously been described.

<Location /user_view.php>
    # This script only accepts GET
    SecFilterSelective REQUEST_METHOD !^GET$
    # Accept only one parameter: id
    SecFilterSelective ARGS_NAMES !^id$
    # Parameter id is mandatory, and it must be
    # a number, 4-14 digits long
    SecFilterSelective ARG_id !^[[:digit:]]{4,14}$
</Location>
   
<Location /user_add.php>
    # This script only accepts POST
    SecFilterSelective REQUEST_METHOD !^POST$
    # Accept three parameters: firstname, lastname, and email
    SecFilterSelective ARGS_NAMES !^(firstname|lastname|email)$
    # Parameter firstname is mandatory, and it must
    # contain text 1-64 characters long
    SecFilterSelective ARG_firstname !^[[:alnum:][:space:]]{1,64}$
    # Parameter lastname is mandatory, and it must
    # contain text 1-64 characters long
    SecFilterSelective ARG_lastname !^[ [:alnum:][:space:]]{1,64}$
    # Parameter email is optional, but if it is present
    # it must consist only of characters that are
    # allowed in an email address 
    SecFilterSelective ARG_email !(^$|^[[:alnum:].@]{1,64}$)
</Location>

There is a small drawback to this configuration approach. To determine which <Location> block is applicable for a request, Apache has to look through all such directives present. For applications with a small number of scripts, this will not be a problem, but it may present a performance problem for applications with hundreds of scripts, each of which need a <Location> block.

A feature to allow user-defined types (predefined regular expressions), such as one present in mod_parmguard (see the sidebar), would significantly ease the task of writing configuration data.