> Apache Security: Chapter 2. Installation and Configuration


Contents
Previous
Next

2 Installation and Configuration

Installation is the first step in making Apache functional. Before you begin, you should have a clear idea of the installation’s purpose. This idea, together with your paranoia level, will determine the steps you will take to complete the process. The system-hardening matrix (described in Chapter 1) presents one formal way of determining the steps. Though every additional step you make now makes the installation more secure, it also increases the time you will spend maintaining security. Think about it realistically for a moment. If you cannot put in that extra time later, then why bother putting the extra time in now? Don’t worry about it too much, however. These things tend to sort themselves out over time: you will probably be eager to make everything perfect in the first couple of Apache installations you do; then, you will likely back off and find a balance among your security needs, the effort required to meet those needs, and available resources.

As a rule of thumb, if you are building a high profile web server—public or not—always go for a highly secure installation.

Though the purpose of this chapter is to be a comprehensive guide to Apache installation and configuration, you are encouraged to read others’ approaches to Apache hardening as well. Every approach has its unique points, reflecting the personality of its authors. Besides, the opinions presented here are heavily influenced by the work of others. The Apache reference documentation is a resource you will go back to often. In addition to it, ensure you read the Apache Benchmark, which is a well-documented reference installation procedure that allows security to be quantified. It includes a semi-automated scoring tool to be used for assessment.

The following is a list of some of the most useful Apache installation documentation I have encountered:

The installation instructions given in this chapter are designed to apply to both active branches (1.x and 2.x) of the Apache web server running on Linux systems. If you are running some other flavor of Unix, I trust you will understand what the minimal differences between Linux and your system are. The configuration advice given in this chapter works well for non-Unix platforms (e.g., Windows) but the differences in the installation steps are more noticeable:

One of the first decisions you will make is whether to compile the server from the source or use a binary package. This is a good example of the dilemma I mentioned at the beginning of this chapter. There is no one correct decision for everyone or one correct decision for you alone. Consider some pros and cons of the different approaches:

This book shows the approach of compiling from the source code since that approach gives us the most power and the flexibility to change things according to our taste. To download the source code, go to http://httpd.apache.org and pick the latest release of the branch you want to use.

Habitually checking the integrity of archives you download from the Internet is a good idea. The Apache distribution system works through mirrors. Someone may decide to compromise a mirror and replace the genuine archive with a trojaned version (a version that feels like the original but is modified in some way, for example, programmed to allow the attacker unlimited access to the web server). You will go through a lot of trouble to secure your Apache installation, and it would be a shame to start with a compromised version.

If you take a closer look at the Apache download page, you will discover that though archive links point to mirrors, archive signature links always point to the main Apache web site.

One way to check the integrity is to calculate the MD5 sum of the archive and to compare it with the sum in the signature file. An MD5 sum is an example of a hash function, also known as one-way encryption (see Chapter 4 for further information). The basic idea is that, given data (such as a binary file), a hash function produces seemingly random output. However, the output is always the same when the input is the same, and it is not possible to reconstruct the input given the output. In the example below, the first command calculates the MD5 sum of the archive that was downloaded, and the second command downloads and displays the contents of the MD5 sum from the main Apache web site. You can see the sums are identical, which means the archive is genuine:

$ md5sum httpd-2.0.50.tar.gz
8b251767212aebf41a13128bb70c0b41  httpd-2.0.50.tar.gz
$ wget -O - -q http://www.apache.org/dist/httpd/httpd-2.0.50.tar.gz.md5
8b251767212aebf41a13128bb70c0b41  httpd-2.0.50.tar.gz

Using MD5 sums to verify archive integrity can be circumvented if an intruder compromises the main distribution site. He will be able to replace the archives and the signature files, making the changes undetectable.

A more robust, but also a more complex approach is to use public-key cryptography (described in detail in Chapter 4) for integrity validation. In this approach, Apache developers use their cryptographic keys to sign the distribution digitally. This can be done with the help of GnuPG, which is installed on most Unix systems by default. First, download the PGP signature for the appropriate archive, such as in this example:

$ wget http://www.apache.org/dist/httpd/httpd-2.0.50.tar.gz.asc

Attempting to verify the signature at this point will result in GnuPG complaining about not having the appropriate key to verify the signature:

$ gpg httpd-2.0.50.tar.gz.asc
gpg: Signature made Tue 29 Jun 2004 01:14:14 AM BST using DSA key ID DE885DD3
gpg: Can't check signature: public key not found

GnuPG gives out the unique key ID (DE885DD3), which can be used to fetch the key from one of the key servers (for example, pgpkeys.mit.edu):

$ gpg --keyserver pgpkeys.mit.edu --recv-key DE885DD3
gpg: /home/ivanr/.gnupg/trustdb.gpg: trustdb created
gpg: key DE885DD3: public key "Sander Striker <striker@apache.org>" imported
gpg: Total number processed: 1
gpg:               imported: 1

This time, an attempt to check the signature gives satisfactory results:

$ gpg httpd-2.0.50.tar.gz.asc
gpg: Signature made Tue 29 Jun 2004 01:14:14 AM BST using DSA key ID DE885DD3
gpg: Good signature from "Sander Striker <striker@apache.org>"
gpg:                 aka "Sander Striker <striker@striker.nl>"
gpg:                 aka "Sander Striker <striker@striker.nl>"
gpg:                 aka "Sander Striker <striker@apache.org>"
gpg: checking the trustdb
gpg: no ultimately trusted keys found
Primary key fingerprint: 4C1E ADAD B4EF 5007 579C  919C 6635 B6C0 DE88 5DD3

At this point, we can be confident the archive is genuine. On the Apache web site, a file contains the public keys of all Apache developers (http://www.apache.org/dist/httpd/KEYS). You can use it to import all their keys at once but I prefer to download keys from a third-party key server. You should ignore the suspicious looking message (“no ultimately trusted keys found“) for the time being. It is related to the concept of web of trust (covered in Chapter 4).

The next big decision is whether to create a single static binary, or to compile Apache to use dynamically loadable modules. Again, the tradeoff is whether to spend more time in order to get more security.

In this chapter, I will assume the following locations for the specified types of files:

Installation locations are a matter of taste. You can adopt any layout you like as long as you use it consistently. Special care must be taken when deciding where to store the log files since they can grow over time. Make sure they reside on a partition with enough space and where they won’t jeopardize the system by filling up the root partition.

Different circumstances dictate different directory layouts. The layout used here is suitable when only one web site is running on the web server. In most cases, you will have many sites per server, in which case you should create a separate set of directories for each. For example, you might create the following directories for one of those sites:

/var/www/apachesecurity.net/bin
/var/www/apachesecurity.net/cgi-bin
/var/www/apachesecurity.net/data
/var/www/apachesecurity.net/htdocs
/var/www/apachesecurity.net/logs

A similar directory structure would exist for another one of the sites:

/var/www/modsecurity.org/bin
/var/www/modsecurity.org/cgi-bin
/var/www/modsecurity.org/data
/var/www/modsecurity.org/htdocs
/var/www/modsecurity.org/logs

Before the installation can take place Apache must be made aware of its environment. This is done through the configure script:

$ ./configure --prefix=/usr/local/apache

The configure script explores your operating system and creates the Makefile for it, so you can execute the following to start the actual compilation process, copy the files into the directory set by the --prefix option, and execute the apachectl script to start the Apache server:

$ make
# make install
# /usr/local/apache/bin/apachectl start

Though this will install and start Apache, you also need to configure your operating system to start Apache when it boots. The procedure differs from system to system on Unix platforms but is usually done by creating a symbolic link to the apachectl script for the relevant runlevel (servers typically use run level 3):

# cd /etc/rc3.d
# ln -s /usr/local/apache/bin/apachectl S85httpd

On Windows, Apache is configured to start automatically when you install from a binary distribution, but you can do it from a command line by calling Apache with the -k install command switch.

To verify the startup has succeeded, try to access the web server using a browser as a client. If it works you will see the famous “Seeing this instead of the website you expected?“ page, as shown in Figure 2-1. At the time of this writing, there are talks on the Apache developers’ list to reduce the welcome message to avoid confusing users (not administrators but those who stumble on active but unused Apache installations that are publicly available on the Internet).

As a bonus, toward the end of the page, you will find a link to the Apache reference manual. If you are near a computer while reading this book, you can use this copy of the manual to learn configuration directive specifics.

Using the ps tool, you can find out how many Apache processes there are:

$ ps -Ao user,pid,ppid,cmd | grep httpd
root     31738     1  /usr/local/apache/bin/httpd -k start
httpd    31765 31738  /usr/local/apache/bin/httpd -k start
httpd    31766 31738  /usr/local/apache/bin/httpd -k start
httpd    31767 31738  /usr/local/apache/bin/httpd -k start
httpd    31768 31738  /usr/local/apache/bin/httpd -k start
httpd    31769 31738  /usr/local/apache/bin/httpd -k start

Using tail, you can see what gets logged when different requests are processed. Enter a nonexistent filename in the browser location bar and send the request to the web server; then examine the access log (logs are in the /var/www/logs folder). The example below shows successful retrieval (as indicated by the 200 return status code) of a file that exists, followed by an unsuccessful attempt (404 return status code) to retrieve a file that does not exist:

192.168.2.3 - - [21/Jul/2004:17:12:22 +0100] "GET /manual/images/feather.gif
HTTP/1.1" 200 6471
192.168.2.3 - - [21/Jul/2004:17:20:05 +0100] "GET /manual/not-here
HTTP/1.1" 404 311

Here is what the error log contains for this example:

[Wed Jul 21 17:17:04 2004] [notice] Apache/2.0.50 (Unix) configured
-- resuming normal operations
[Wed Jul 21 17:20:05 2004] [error] [client 192.168.2.3] File does not
exist: /usr/local/apache/manual/not-here

The idea is to become familiar with how Apache works. As you learn what constitutes normal behavior, you will learn how to spot unusual events.

The theory behind module selection says that the smaller the number of modules running, the smaller the chances of a vulnerability being present in the server. Still, I do not think you will achieve much by being too strict with default Apache modules. The likelihood of a vulnerability being present in the code rises with the complexity of the module. Chances are that the really complex modules, such as mod_ssl (and the OpenSSL libraries behind it), are the dangerous ones.

Your strategy should be to identify the modules you need to have as part of an installation and not to include anything extra. Spend some time researching the modules distributed with Apache so you can correctly identify which modules are needed and which can be safely turned off. The complete module reference is available at http://httpd.apache.org/docs-2.0/mod/.

The following modules are more dangerous than the others, so you should consider whether your installation needs them:

On the other hand, you should include these modules in your installation:

In the configure example, I assumed acceptance of the default module list. In real situations, this should rarely happen as you will want to customize the module list to your needs. To obtain the list of modules activated by default in Apache 1, you can ask the configure script. I provide only a fragment of the output below, as the complete output is too long to reproduce in a book:

$ ./configure --help
...
[access=yes      actions=yes     alias=yes      ]
[asis=yes        auth_anon=no    auth_dbm=no    ]
[auth_db=no      auth_digest=no  auth=yes       ]
[autoindex=yes   cern_meta=no    cgi=yes        ]
[digest=no       dir=yes         env=yes        ]
[example=no      expires=no      headers=no     ]
[imap=yes        include=yes     info=no        ]
[log_agent=no    log_config=yes  log_forensic=no]
[log_referer=no  mime_magic=no   mime=yes       ]
[mmap_static=no  negotiation=yes proxy=no       ]
[rewrite=no      setenvif=yes    so=no          ]
[speling=no      status=yes      unique_id=no   ]
[userdir=yes     usertrack=no    vhost_alias=no ]
...

As an example of interpreting the output, userdir=yes means that the module mod_userdir will be activated by default. Use the --enable-module and --disable-module directives to adjust the list of modules to be activated:

$ ./configure \
> --prefix=/usr/local/apache \
> --enable-module=rewrite \
> --enable-module=so \
> --disable-module=imap \
> --disable-module=userdir

Obtaining a list of modules activated by default in Apache 2 is more difficult. I obtained the following list by compiling Apache 2.0.49 without passing any parameters to the configure script and then asking the httpd binary to produce a list of modules:

$ ./httpd -l
Compiled in modules:
  core.c
  mod_access.c
  mod_auth.c
  mod_include.c
  mod_log_config.c
  mod_env.c
  mod_setenvif.c
  prefork.c
  http_core.c
  mod_mime.c
  mod_status.c
  mod_autoindex.c
  mod_asis.c
  mod_cgi.c
  mod_negotiation.c
  mod_dir.c
  mod_imap.c
  mod_actions.c
  mod_userdir.c
  mod_alias.c
  mod_so.c

To change the default module list on Apache 2 requires a different syntax than that used on Apache 1:

$ ./configure \
> --prefix=/usr/local/apache \
> --enable-rewrite \
> --enable-so \
> --disable-imap \
> --disable-userdir   

Now that you know your installation works, make it more secure. Being brave, we start with an empty configuration file, and work our way up to a fully functional configuration. Starting with an empty configuration file is a good practice since it increases your understanding of how Apache works. Furthermore, the default configuration file is large, containing the directives for everything, including the modules you will never use. It is best to keep the configuration files nice, short, and tidy.

Start the configuration file (/usr/local/apache/conf/httpd.conf) with a few general-purpose directives:

# location of the web server files
ServerRoot /usr/local/apache
# location of the web server tree
DocumentRoot /var/www/htdocs
# path to the process ID (PID) file, which
# stores the PID of the main Apache process
PidFile /var/www/logs/httpd.pid
# which port to listen at
Listen 80
# do not resolve client IP addresses to names
HostNameLookups Off

Unless told otherwise, Apache will serve any file it can access. This is probably not what most people want; a configuration error could accidentally expose vital system files to anyone caring to look. To change this, we would deny access to the complete filesystem and then allow access to the document root only by placing the following directives in the httpd.conf configuration file:

<Directory />
    Order Deny,Allow
    Deny from all
</Directory>
<Directory /var/www/htdocs>
    Order Allow,Deny
    Allow from all
</Directory>

This sort of protection will not help with incorrectly or maliciously placed symbolic links that point outside the /var/www/htdocs web server root. System users could create s ymbolic links to resources they do not own. If someone creates such a link and the web server can read the resource, it will accept a request to serve the resource to the public. Symbolic link usage and other file access restrictions are controlled with the Options directive (inside a <Directory> directive). The Options directive can have one or more of the following values:

The following configuration directive will disable symbolic link usage in Apache:

Options -FollowSymLinks

The minus sign before the option name instructs Apache to keep the existing configuration and disable the listed option. The plus character is used to add an option to an existing configuration.

If you need symbolic links consider using the Alias directive, which tells Apache to incorporate an external folder into the web server tree. It serves the same purpose but is more secure. For example, it is used in the default configuration to allow access to the Apache manual:

Alias /manual/ /usr/local/apache/manual/

If you want to keep symbolic links, it is advisable to turn ownership verification on by setting the SymLinksIfOwnerMatch option. After this change, Apache will follow symbolic links if the target and the destination belong to the same user:

Options -FollowSymLinks +SymLinksIfOwnerMatch

Other features you do not want to allow include the ability to have scripts and server-side includes executed anywhere in the web server tree. Scripts should always be placed in special folders, where they can be monitored and controlled.

Options -Includes -ExecCGI

If you do not intend to use content negotiation (to have Apache choose a file to serve based on the client’s language preference), you can (and should) turn all of these features off in one go:

Options None

In addition to serving any file it can access by default, Apache also by default allows parts of configuration data to be placed under the web server tree, in files normally named .htaccess. Configuration information in such files can override the information in the httpd.conf configuration file. Though this can be useful, it slows down the server (because Apache is forced to check whether the file exists in any of the subfolders it serves) and allows anyone who controls the web server tree to have limited control of the web server. This feature is controlled with the AllowOverride directive, which, like Options, appears within the <Directory> directive specifying the directory to which the options apply. The AllowOverride directive supports the following options:

AuthConfig

Allows use (in .htaccess files) of the authorization directives (explained in Chapter 7)

FileInfo

Allows use of the directives controlling document types

Indexes

Allows use of the directives controlling directory indexing

Limit

Allows use of the directives controlling host access

Options

Allows use of the directives controlling specific directory functions (the Options and XbitHack directives)

All

Allows all options listed

None

Ignores .htaccess configuration files

For our default configuration, we choose the None option. So, our <Directory> directives are now:

<Directory />
    Order Deny,Allow
    Deny from all
    Options None
    AllowOverride None
</Directory>
   
<Directory /var/www/htdocs>
    Order Allow,Deny
    Allow from all
</Directory>

Only enable CGI scripts when you need them. When you do, a good practice is to have all scripts grouped in a single folder (typically named cgi-bin). That way you will know what is executed on the server. The alternative solution is to enable script execution across the web server tree, but then it is impossible to control script execution; a developer may install a script you may not know about. To allow execution of scripts in the /var/www/cgi-bin directory, include the following <Directory> directive in the configuration file:

<Directory /var/www/cgi-bin>
    Options ExecCGI
    SetHandler cgi-script
</Directory>

An alternative is to use the ScriptAlias directive, which has a similar effect:

ScriptAlias /cgi-bin/ /var/www/cgi-bin/

There is a subtle but important difference between these two approaches. In the first approach, you are setting the configuration for a directory directly. In the second, a virtual directory is created and configured, and the original directory is still left without a configuration. In the examples above, there is no difference because the names of the two directories are the same, and the virtual directory effectively hides the real one. But if the name of the virtual directory is different (e.g., my-cgi-bin/), the real directory will remain visible under its own name and you would end up with one web site directory where files are treated like scripts (my-cgi-bin/) and with one where files are treated as files (cgi-bin/). Someone could download the source code of all scripts from the latter. Using the <Directory> directive approach is recommended when the directory with scripts is under the web server tree. In other cases, you may use ScriptAlias safely.

Having a record of web server activity is of utmost importance. Logs tell you which content is popular and whether your server is underutilized, overutilized, misconfigured, or misused. This subject is so important that a complete chapter is dedicated to it. Here I will only bring your attention to two details: explaining how to configure logging and how not to lose valuable information. It is not important to understand all of the meaning of logging directives at this point. When you are ready, proceed to Chapter 8 for a full coverage.

Two types of logs exist. The access log is a record of all requests sent to a particular web server or web site. To create an access log, you need two steps. First, use the LogFormat directive to define a logging format. Then, use the CustomLog directive to create an access log in that format:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/www/logs/access_log combined

The error log contains a record of all system events (such as web server startup and shutdown) and a record of errors that occurred during request processing. For example, a request for a resource that does not exist generates an HTTP 404 response for the client, one entry in the access log, and one entry in the error log. Two directives are required to set up the error log, just as for the access log. The following LogLevel directive increases the logging detail from a default value of notice to info. The ErrorLog directive creates the actual log file:

LogLevel info
ErrorLog /var/www/logs/error_log

Though you are not likely to fine-tune the server during installation, you must be aware of the existence of server limits and the way they are configured. Incorrectly configured limits make a web server an easy target for attacks (see Chapter 5). The following configuration directives all show default Apache configuration values and define how long the server will wait for a slow client:

# wait up to 300 seconds for slow clients
TimeOut 300
# allow connections to be reused between requests
KeepAlive On
# allow a maximum of 100 requests per connection
MaxKeepAliveRequests 100
# wait up to 15 seconds for the next
# request on an open connection
KeepAliveTimeout 15

The default value for the connection timeout (300 seconds) is too high. You can safely reduce it below 60 seconds and increase your tolerance against denial of service (DoS) attacks (see Chapter 5).

The following directives impose limits on various aspects of an HTTP request:

# impose no limits on the request body
LimitRequestBody 0
# allow up to 100 headers in a request
LimitRequestFields 100
# each header may be up to 8190 bytes long
LimitRequestFieldsize 8190
# the first line of the request can be
# up to 8190 bytes long
LimitRequestLine 8190
# limit the XML request body to 1 million bytes(Apache 2.x only)
LimitXMLRequestBody 1000000

LimitXMLRequestBody is an Apache 2 directive and is used by the mod_dav module to limit the size of its command requests (which are XML-based).

Seeing that the maximal size of the request body is unlimited by default (2 GB in practice), you may wish to specify a more sensible value for LimitRequestBody. You can go as low as 64 KB if you do not plan to support file uploads in the installation.

The following directives control how server instances are created and destroyed in Apache 1 and sometimes in Apache 2 (as described further in the following text):

# keep 5 servers ready to handle requests
MinSpareServers 5
# do not keep more than 10 servers idle
MaxSpareServers 10
# start with 5 servers
StartServers 5
# allow a max of 150 clients at any given time
MaxClients 150
# allow unlimited requests per server
MaxRequestsPerChild 0

You may want to lower the maximal number of clients (MaxClients) if your server does not have enough memory to handle 150 Apache instances at one time.

You should make a habit of putting a limit on the maximal number of requests served by one server instance, which is unlimited by default in Apache 1 (as indicated by the 0 MaxRequestsPerChild value) but set to 10000 in Apache 2. When a server instance reaches the limit, it will be shut down and replaced with a fresh copy. A high value such as 1000 (or even more) will not affect web server operation but will help if an Apache module has a memory leak. Interestingly, when the Keep-Alive feature (which allows many requests to be performed over a single network connection) is used, all requests performed over a single Keep-Alive connection will be counted as one for the purposes of MaxRequestsPerChild handling.

Apache 2 introduces the concept of multiprocessing modules (MPMs), which are special-purpose modules that determine how request processing is organized. Only one MPM can be active at any one time. MPMs were introduced to allow processing to be optimized for each operating system individually. The Apache 1 processing model (multiple processes, no threads, each process handling one request at one time) is called prefork, and it is the default processing model in Apache 2 running on Unix platforms. On Windows, Apache always runs as a single process with multiple execution threads, and the MPM for that is known as winnt. On Unix systems running Apache 2, it is possible to use the worker MPM, which is a hybrid, as it supports many processes each with many threads. For the worker MPM, the configuration is similar to the following (refer to the documentation for the complete description):

# the maximum number of processes
ServerLimit 16
# how many processes to start with
StartServers 2
# how many threads per process to create
ThreadsPerChild 25
# minimum spare threads across all processes
MinSpareThreads 25
# maximum spare threads across all processes
MaxSpareThreads 75
# maximum clients at any given time
MaxClients 150

Since the number of threads per process is fixed, the Apache worker MPM will change the number of active processes to obey the minimum and maximum spare threads configured. Unlike with the prefork MPM, the MaxClients directive now controls the maximum number of active threads at any given time.

By default, Apache provides several bits of information to anyone interested. Any information obtained by attackers helps them build a better view of the system and makes it easier for them to break into the system.

For example, the installation process automatically puts the email address of the user compiling Apache (or, rather, the email address it thinks is the correct email address) into the configuration file. This reveals the account to the public, which is undesirable. The following directive replaces the Apache-generated email address with a generic address:

ServerAdmin webmaster@apachesecurity.net

By default, the email address defined with this directive appears on server-generated pages. Since this is probably not what you want, you can turn off this feature completely via the following directive:

ServerSignature Off

The HTTP protocol defines a response header field Server, whose purpose is to identify the software responding to the request. By default, Apache populates this header with its name, version number, and names and version numbers of all its modules willing to identify themselves. You can see what this looks like by sending a test request to the newly installed server:

$ telnet localhost 80
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
HEAD / HTTP/1.0
   
HTTP/1.1 200 OK
Date: Fri, 19 Mar 2004 22:05:35 GMT
Server: Apache/1.3.29 (Unix)
Content-Location: index.html.en
Vary: negotiate,accept-language,accept-charset
TCN: choice
Last-Modified: Fri, 04 May 2001 00:00:38 GMT
ETag: "4002c7-5b0-3af1f126;405a21d7"
Accept-Ranges: bytes
Content-Length: 1456
Connection: close
Content-Type: text/html
Content-Language: en
Expires: Fri, 19 Mar 2004 22:05:35 GMT

This header field reveals specific and valuable information to the attacker. You can’t hide it completely (this is not entirely true, as you will find in the next section), but you can tell Apache to disclose only the name of the server (“Apache”).

ServerTokens ProductOnly

We turned off the directory indexing feature earlier when we set the Options directive to have the value None. Having the feature off by default is a good approach. You can enable it later on a per-directory basis:

<Directory /var/www/htdocs/download>
    Options +Indexes
</Directory>

Automatic directory i ndexes are dangerous because programmers frequently create folders that have no default indexes. When that happens, Apache tries to be helpful and lists the contents of the folder, often showing the names of files that are publicly available (because of an error) but should not be seen by anyone, such as the following:

To fight the problem of unintentional file disclosure, you should turn off automatic indexing (as described in the Section 2.2.3.2 section) and instruct Apache to reject all requests for files matching a series of regular expressions given below. Similar configuration code exists in the default httpd.conf file to deny access to .htaccess files (the per-directory configuration files I mentioned earlier). The following extends the regular expression to look for various file extensions that should normally not be present on the web server:

<FilesMatch "(^\.ht|~$|\.bak$|\.BAK$)">
    Order Allow,Deny    
    Deny from all
</FilesMatch>

The FilesMatch directive only looks at the last part of the full filename (the basename), and thus, FilesMatch configuration specifications do not apply to directory names. To completely restrict access to a particular directory, for example to deny access to CVS administrative files (frequently found on web sites), use something like:

<DirectoryMatch /CVS/>
    Order Allow,Deny
    Deny from all
</DirectoryMatch>

One of the principles of web server hardening is hiding as much information from the public as possible. By extending the same logic, hiding the identity of the web server makes perfect sense. This subject has caused much controversy. Discussions usually start because Apache does not provide facilities to control all of the content provided in the Server header field, and some poor soul tries to influence Apache developers to add it. Because no clear technical reasons support either opinion, discussions continue.

I have mentioned the risks of providing server information in the Server response header field defined in the HTTP standard, so a first step in our effort to avoid this will be to fake its contents. As you will see later, this is often not straightforward, but it can be done. Suppose we try to be funny and replace our standard response “Apache/1.3.30 (Unix)” with “Microsoft-IIS/5.0” (it makes no difference to us that Internet Information Server has a worse security record than Apache; our goal is to hide who we are). An attacker sees this but sees no trace of Active Server Pages (ASP) on the server, and that makes him suspicious. He decides to employ operating system fingerprinting. This technique uses the variations in the implementations of the TCP/IP protocol to figure out which operating system is behind an IP address. This functionality comes with the popular network scanner NMAP. Running NMAP against a Linux server will sometimes reveal that the server is not running Windows. Microsoft IIS running on a Linux server—not likely!

There are also differences in the implementations of the HTTP protocol supplied by different web servers. HTTP fingerprinting exploits these differences to determine the make of the web server. The differences exist for the following reasons:

The most frequently used example of web server behavior that may allow exploitation is certainly the way Apache treats URL encoded forward slash characters. Try this:

This happens only if the site runs Apache. In two steps you have determined the make of the web server without looking at the Server header field. Automating this check is easy.

This behavior was so widely and frequently discussed that it led Apache developers to introduce a directive ( AllowEncodedSlashes) to the 2.x branch to toggle how Apache behaves. This will not help us much in our continuing quest to fully control the content provided in the Server header field. There is no point in continuing to fight for this. In theory, the only way to hide the identity of the server is to put a reverse proxy (see Chapter 9) in front and instruct it to alter the order of header fields in the response, alter their content, and generally do everything possible to hide the server behind it. Even if someone succeeds at this, this piece of software will be so unique that the attacker will identify the reverse proxy successfully, which is as dangerous as what we have been trying to hide all along.

Not everything is lost, however. You may not be able to transform your installation’s identity, but you can pretend to be, say, a different version of the same web server. Or you can pretend to be a web server with a list of modules different from reality. There is a great opportunity here to mislead the attacker and make him spend a lot of time on the wrong track and, hopefully, give up. To conclude:

  • With a different server name in the Server header field, you can deflect some automated tools that use this information to find servers of certain make.

  • It is possible to fool and confuse a range of attackers with not quite developed skills. Not everyone knows of TCP/IP and HTTP fingerprinting, for example.

  • Small changes can be the most effective.

Now, let’s see how we can hide server information in practice.

The following sections discuss alternative approaches to changing the web server identity.

You can make modifications to change the web server identity in two places in the source code. One is in the include file httpd.h in Apache 1 (ap_release.h in Apache 2) where the version macros are defined:

#define SERVER_BASEVENDOR   "Apache Group"
#define SERVER_BASEPRODUCT  "Apache"
#define SERVER_BASEREVISION "1.3.29"
#define SERVER_BASEVERSION  SERVER_BASEPRODUCT "/" SERVER_BASEREVISION
#define SERVER_PRODUCT  SERVER_BASEPRODUCT
#define SERVER_REVISION SERVER_BASEREVISION
#define SERVER_VERSION  SERVER_PRODUCT "/" SERVER_REVISION

Apache Benchmark recommends that only the value of the SERVER_BASEPRODUCT macro be changed, allowing the other information such as the version number to remain in the code so it can be used later, for example, for web server version identification (by way of code audit, not from the outside). If you decide to follow this recommendation, the ServerTokens directive must be set to ProductOnly, as discussed earlier in this chapter.

The reason Apache Benchmark recommends changing just one macro is because some modules (such as mod_ssl) are made to work only with a specific version of the Apache web server. To ensure correct operation, these modules check the Apache version number (contained in the SERVER_BASEVERSION macro) and refuse to run if the version number is different from what is expected.

A different approach for changing the name in a source file is to replace the ap_set_version() function, which is responsible for construction of the server name in the first place. For Apache 1, replace the existing function (in http_main.c) with one like the following, specifying whatever server name you wish:

static void ap_set_version(void)
{
    /* set the server name */
    ap_add_version_component("Microsoft-IIS/5.0");
    /* do not allow other modules to add to it */
    version_locked++;
}

For Apache 2, replace the function (defined in core.c):

static void ap_set_version(apr_pool_t *pconf)
{
    /* set the server name */
    ap_add_version_component(pconf, "Microsoft-IIS/5.0");
    /* do not allow other modules to add to it */
    version_locked++;
}

Changing the source code can be tiresome, especially if it is done repeatedly. A different approach to changing the name of the server is to use a third-party module, mod_security (described in detail in Chapter 12). For this approach to work, we must allow Apache to reveal its full identity, and then instruct mod_security to change the identity to something else. The following directives can be added to Apache configuration:

# Reveal full identity (standard Apache directive)
ServerTokens Full
# Replace the server name (mod_security directive)
SecServerSignature "Microsoft-IIS/5.0"

Apache modules are not allowed to change the name of the server completely, but mod_security works by finding where the name is kept in memory and overwriting the text directly. The ServerTokens directive must be set to Full to ensure the web server allocates a large enough space for the name, giving mod_security enough space to make its changes later.

The key to changing web server identity is consistency. The trouble we went through to change the web server make may be useless if we leave the default Apache content around. The removal of the default content is equivalent to changing one’s clothes when going undercover. This action may be useful even if we do not intend to change the server identity. Applications often come with sample programs and, as a general rule, it is a good practice to remove them from production systems; they may contain vulnerabilities that may be exploited later.

Most of the default content is out of reach of the public, since we have built our Apache from scratch, changed the root folder of the web site, and did not include aliases for the manual and the icons. Just to be thorough, erase the following directories:

You will probably want to keep the original /usr/local/apache/logs directory though the logs are stored in /var/www/logs. This is because many modules use the logs/ folder relative to the Apache installation directory to create temporary files. These modules usually offer directives to change the path they use, but some may not. The only remaining bit of default content is the error pages Apache displays when errors occur. These pages can be replaced with the help of the ErrorDocument directive. Using one directive per error code, replace the error pages for all HTTP error codes. (A list of HTTP codes is given in Chapter 8; it can also be found at http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html.)

ErrorDocument 401 /error/401.html
ErrorDocument 403 /error/403.html
ErrorDocument 404 /error/404.html
ErrorDocument 500 /error/500.html
...

An alternative to creating dozens of static pages is to create one intelligent script that retrieves the error code from Apache and uses it to display the appropriate message. A small bit of programming is required in this case, following guidance from the Apache documentation at http://httpd.apache.org/docs-2.0/custom-error.html.

Even the most secure software installations get broken into. Sometimes, this is because you get the attention of a skilled and persistent attacker. Sometimes, a new vulnerability is discovered, and an attacker uses it before the server is patched. Once an intruder gets in, his next step is to look for local vulnerability and become superuser. When this happens, the whole system becomes contaminated, and the only solution is to reinstall everything.

Our aim is to contain the intrusion to just a part of the system, and we do this with the help of the chroot(2) system call. This system call allows restrictions to be put on a process, limiting its access to the filesystem. It works by choosing a folder to become the new filesystem root. Once the system call is executed, a process cannot go back (in most cases, and provided the jail was properly constructed).

The term chroot is often interchangeably used with the term jail. The term can be used as a verb and noun. If you say Apache is chrooted, for example, you are saying that Apache was put in jail, typically via use of the chroot binary or the chroot(2) system call. On Linux systems, the meanings of chroot and jail are close enough. BSD systems have a separate jail() call, which implements additional security mechanisms. For more details about the jail() call, see the following: http://docs.freebsd.org/44doc/papers/jail/jail.html.

Incorporating the jail mechanism (using either chroot(2) or jail()) into your web server defense gives the following advantages:

The chroot(2) call was not originally designed as a security measure. Its use for security is essentially a hack, and will be replaced as the server virtualization technologies advance. For Linux, that will happen once these efforts become part of a mainstream kernel. Though server virtualization falls out of the scope of this book, some information on this subject is provided in Chapter 9.

The following sections describe various approaches to putting Apache in jail. First, an example demonstrating use of the original chroot binary to put a process in jail is shown. That example demonstrates the issues that typically come up when attempting to put a process in jail and briefly documents tools that are useful for solving these issues. Next, the steps required for creating a jail and putting Apache in it using chroot are shown. This is followed by the simpler chroot(2) approach, which can be used in some limited situations. Finally, the use of mod_security or mod_chroot to chroot Apache is presented.

Before you venture into chroot land you must become aware of several tools and techniques you will need to make things work and to troubleshoot problems when they appear. The general problem you will encounter is that programs do not expect to be run without full access to the filesystem. They assume certain files are present and they do not check error codes of system calls they assume always succeed. As a result, these programs fail without an error message. You must use diagnostic tools such as those described below to find out what has gone wrong.

The ldd tool—available by default on all Unix systems—prints shared library dependencies for a given binary. Most binaries are compiled to depend on shared libraries and will not work without them. Using ldd with the name of a binary (or another shared library) as the first parameter gives a list of files that must accompany the binary to work. Trying ldd on /bin/bash gives the following output:

# ldd /bin/bash
        libtermcap.so.2 => /lib/libtermcap.so.2 (0x0088a000)
        libdl.so.2 => /lib/libdl.so.2 (0x0060b000)
        libc.so.6 => /lib/tls/libc.so.6 (0x004ac000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x00494000)

Therefore, bash depends on four shared libraries. Create copies of these files in jail:

# mkdir /chroot/lib
# cp /lib/libtermcap.so.2 /chroot/lib
# cp /lib/libdl.so.2 /chroot/lib
# cp /lib/tls/libc.so.6 /chroot/lib
# cp /lib/ld-linux.so.2 /chroot/lib

The jailed execution of a bash shell will finally succeed:

# chroot /chroot /bin/bash
bash-2.05b#

You are rewarded with a working shell prompt. You will not be able to do much from it though. Though the shell works, none of the binaries you would normally use are available inside (ls, for example). You can only use the built-in shell commands, as can be seen in this example:

bash-2.05b# pwd
/
bash-2.05b# echo /*
/bin /lib
bash-2.05b# echo /bin/*
/bin/bash
bash-2.05b# echo /lib/*
/lib/ld-linux.so.2 /lib/libc.so.6 /lib/libdl.so.2 /lib/libtermcap.so.2

As the previous example demonstrates, from a jailed shell you can access a few files you explicitly copied into the jail and nothing else.

The strace tool (truss on systems other than Linux) intercepts and records system calls that are made by a process. It gives much insight into how programs work, without access to the source code. Using chroot and ldd, you will be able to get programs to run inside jail, but you will need strace to figure out why they fail when they fail without an error message, or if the error message does not indicate the real cause of the problem. For that reason, you will often need strace inside the jail itself. (Remember to remove it afterwards.)

Using strace you will find that many innocent looking binaries do a lot of work before they start. If you want to experiment, I suggest you write a simple program such as this one:

#include <stdio.h>
#include <stdarg.h>
   
int main(void) {
    puts("Hello world!");
}

Compile it once with a shared system support and once without it:

# gcc helloworld.c -o helloworld.shared
# gcc helloworld.c -o helloworld.static -static

Using strace on the static version gives the following output:

# strace ./helloworld.static
execve("./helloworld.static", ["./helloworld.static"], [/* 22 vars */]) = 0
uname({sys="Linux", node="ben", ...})   = 0
brk(0)                                  = 0x958b000
brk(0x95ac000)                          = 0x95ac000
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xbf51a000
write(1, "Hello world!\n", 13Hello world!
)          = 13
munmap(0xbf51a000, 4096)                = 0
exit_group(13)

The strace output is ugly. Each line in the output represents a system call made from the process. It is not important at the moment what each line contains. Jailed binaries most often fail because they cannot open a file. If that happens, one of the lines near the end of the output will show the name of the file the binary attempted to access:

open("/usr/share/locale/locale.alias", O_RDONLY) = -1 ENOENT
(No such file or directory)

As an exercise, use strace on the dynamically compiled version of the program and compare the two outputs. You will see how many shared libraries are accessed even from a small program such as this one.

Now that you know the basics of using chroot to put a process in jail and you are familiar with tools required to facilitate the process, we can take the steps required to put Apache in jail. Start by creating a new home for Apache and move the version installed (shown in Section 2.1.4) to the new location:

# mkdir -p /chroot/apache/usr/local
# mv /usr/local/apache /chroot/apache/usr/local
# ln -s /chroot/apache/usr/local/apache /usr/local/apache
# mkdir -p /chroot/apache/var
# mv /var/www /chroot/apache/var/
# ln -s /chroot/apache/var/www /var/www

The symbolic link from the old location to the new one allows the web server to be used with or without being jailed as needed and allows for easy web server upgrades.

Like other programs, Apache depends on many shared libraries. The ldd tool gives their names (this ldd output comes from an Apache that has all default modules built-in statically):

# ldd /chroot/apache/usr/local/apache/bin/httpd
        libm.so.6 => /lib/tls/libm.so.6 (0x005e7000)
        libcrypt.so.1 => /lib/libcrypt.so.1 (0x00623000)
        libgdbm.so.2 => /usr/lib/libgdbm.so.2 (0x00902000)
        libexpat.so.0 => /usr/lib/libexpat.so.0 (0x00930000)
        libdl.so.2 => /lib/libdl.so.2 (0x0060b000)
        libc.so.6 => /lib/tls/libc.so.6 (0x004ac000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x00494000)

This is a long list; we make copies of these libraries in the jail:

# mkdir /chroot/apache/lib
# cp /lib/tls/libm.so.6 /chroot/apache/lib
# cp /lib/libcrypt.so.1 /chroot/apache/lib
# cp /usr/lib/libgdbm.so.2 /chroot/apache/lib
# cp /usr/lib/libexpat.so.0 /chroot/apache/lib
# cp /lib/libdl.so.2 /chroot/apache/lib
# cp /lib/tls/libc.so.6 /chroot/apache/lib
# cp /lib/ld-linux.so.2 /chroot/apache/lib

Most CGI scripts send email using the sendmail binary. That will not work in our jail since the sendmail binary isn’t there. Adding the complete sendmail installation to the jail would defy the very purpose of having a jail in the first place. If you encounter this problem, consider installing mini_sendmail (http://www.acme.com/software/mini_sendmail/), a sendmail replacement specifically designed for jails. Most programming languages come with libraries that allow email to be sent directly to an SMTP server. PHP can send email directly, and from Perl you can use the Mail::Sendmail library. Using these libraries reduces the number of packages that are installed in a jail.

You will probably encounter database connectivity problems when scripts in jail try to connect to a database engine running outside the jail. This happens if the program is using localhost as the host name of the database server. When a database client library sees localhost, it tries to connect to the database using a Unix domain socket. This socket is a special file usually located in /tmp, /var/run, or /var/lib, all outside the jail. One way to get around this is to use 127.0.0.1 as the host name and force the database client library to use TCP/IP. However, since a performance penalty is involved with that solution (Unix domain socket communication is much faster than communication over TCP/IP), a better way would be to have the socket file in the jail.

For PostgreSQL, find the file postgresql.conf (usually in /var/lib/pgsql/data) and change the line containing the unix_socket_directory directive to read:

unix_socket_directory = '/chroot/apache/tmp'

Create a symbolic link from the previous location to the new one:

# ln -s /chroot/apache/tmp/.s.PGSQL.5432 /tmp

MySQL keeps its configuration options in a file called my.cnf, usually located in /etc. In the same file, you can add a client section (if one is not there already) and tell clients where to look for a socket:

[mysqld]
datadir=/var/lib/mysql
socket=/chroot/apache/var/lib/mysql/mysql.sock
   
[client]
socket=/chroot/apache/var/lib/mysql/mysql.sock

Or, just as you did with PostgreSQL, create a symbolic link:

# mkdir -p /chroot/apache/var/lib/mysql
# chown mysql /chroot/apache/var/lib/mysql/
# ln -s /chroot/apache/var/lib/mysql/mysql.sock /var/lib/mysql

Now that I have explained the manual chroot process, you are wondering if an easier way exists. The answer is, conditionally, yes.

The approach so far was to create the jail before the main process was started. For this approach to work, the jail must contain all shared libraries and files the process requires. This approach is also known as an external chroot.

With an internal chroot, the jail is established from within the process after the process initialization is completed. In the case of Apache, the jail must be created before request processing begins, at the latest. The process is born free and then jailed. Since the process has full access to the filesystem during the initialization phase, it is free to access any files it needs. Because of the way chrooting works, descriptors to the files opened before the call remain valid after. Therefore, we do not have to create a copy of the filesystem and we can have a “perfect” jail, the one that contains only files needed for web serving, the files in the web server tree.

It is obvious that internal chrooting is not a universal solution. It works only if the following is true:

Now that I have lured you into thinking you can get away from the hard labor of chrooting, I will have to disappoint you: Apache does not support internal chrooting natively. But the help comes from Arjan de Vet in the form of a chroot(2) patch. It is available for download from http://www.devet.org/apache/chroot/. After the patch is applied to the source code, Apache will support a new directive, ChrootDir. Chrooting Apache can be as easy as supplying the new root of the filesystem as the ChrootDir first parameter. The record of a successful chroot(2) call will be in the error log.

As a downside, you will have to apply the patch every time you install Apache. And there is the problem of finding the patch for the version of Apache you want to install. At the time of this writing only the patch for Apache 1.3.31 is available. But not everything is lost.

In a saga with more twists than a soap opera, I will describe a third way to jail Apache. Provided the limitations described in the previous section are acceptable to you, this method is the simplest: chrooting using mod_security (http://www.modsecurity.org) or mod_chroot (http://core.segfault.pl/~hobbit/mod_chroot/). Both modules use the same method to do their work (at the time of this writing) so I will cover them in this section together. Which module you will use depends on your circumstances. Use mod_security if you have a need for its other features. Otherwise, mod_chroot is likely to be a better choice because it only contains code to deal with this one feature and is, therefore, less likely to have a fault.

The method these two modules use to perform chrooting is a variation of the chroot(2) patch. Thus, the discussion about the usefulness of the chroot(2) patch applies to this case. The difference is that here the chroot(2) call is made from within the Apache module (mod_security or mod_chroot), avoiding a need to patch the Apache source code. And it works for 1.x and 2.x branches of the server. As in the previous case, there is only one new directive to learn: SecChrootDir for mod_security or ChrootDir for mod_chroot. Their syntaxes are the same, and they accept the name of the root directory as the only parameter:

SecChrootDir /chroot/apache

The drawback of working from inside the module is that it is not possible to control exactly when the chroot call will be executed. But, as it turns out, it is possible to successfully perform a chroot(2) call if the module is configured to initialize last.