7 Access Control

Access control is an important part of security and is its most visible aspect, leading people to assume it is security. You may need to introduce access control to your system for a few reasons. The first and or most obvious reason is to allow some people to see (or do) what you want them to see/do while keeping the others out. However, you must also know who did what and when, so that they can be held accountable for their actions.

This chapter covers the following:

Access control concepts
HTTP authentication protocols
Form-based authentication as an alternative to HTTP-based authentication
Access control mechanisms built into Apache
Single sign-on

Overview

Access control concerns itself with restricting access to authorized persons and with establishing accountability. There are four terms that are commonly used in discussions related to access control:

Identification: Process in which a user presents his identity
Authentication: Process of verifying the user is allowed to access the system
Authorization: Process of verifying the user is allowed to access a particular resource
Accountability: Ability to tell who accessed a resource and when, and whether the resource was modified as part of the access

From system users’ point of view, they rarely encounter accountability, and the rest of the processes can appear to be a single step. When working as a system administrator, however, it is important to distinguish which operation is performed in which step and why. I have been very careful to word the definitions to reflect the true meanings of these terms.

Identification is the easiest process to describe. When required, users present their credentials so subsequent processes to establish their rights can begin. In real life, this is the equivalent of showing a pass upon entering a secure area.

The right of the user to access the system is established in the authentication step. This part of the process is often viewed as establishing someone’s identity but, strictly speaking, this is not the case. Several types of information, called factors, are used to make the decision:

Something you know (Type 1): This is the most commonly used authentication type. The user is required to demonstrate knowledge of some information—e.g., a password, passphrase, or PIN code.
Something you have (Type 2): A Type 2 factor requires the user to demonstrate possession of some material access control element, usually a smart card or token of some kind. In a wider sense, this factor can include the time and location attributes of an access request, for example, “Access is allowed from the central office during normal work hours.“
Something you are (Type 3): Finally, a Type 3 factor treats the user as an access control element through the use of biometrics; that is, physical attributes of a user such as fingerprints, voiceprint, or eye patterns.

The term two-factor authentication is used to describe a system that requires two of the factors to be used as part of the authentication process. For example, to withdraw money from an ATM machine, you must present your ATM card and know the PIN number associated with it.

Before the authorization part of the access control process begins, it is already known who the user is, and that he has the right to be there. For a simple system, this may be enough and the authorization process practically always succeeds. More complex systems, however, consist of many resources and access levels. Within an organization, some users may have access to some resources but not to others. This is a normal operating condition. Therefore, the authorization process looks at the resource and makes a decision whether the user is allowed to access it. The best way to differentiate between authentication and authorization is in terms of what they protect. Authentication protects the system, while authorization protects resources.

Accountability requirements should be considered when deciding how authentication and authorization are going to be performed. For example, if you allow a group of people to access an application using identical credentials, you may achieve the first goal of access control (protecting resources) but you will have no way of knowing who accessed what, though you will know when. So, when someone leaks that confidential document to the public and no one wants to take the blame, the system logs will not help either. (This is why direct root login should never be allowed. Let the users log in as themselves first, and then change into root. That way the log files will contain a reliable access record.)

Authentication Methods

This section discusses three widely deployed authentication methods:

Basic authentication
Digest authentication
Form-based authentication

The first two are built into the HTTP protocol and defined in RFC 2617, “HTTP Authentication: Basic and Digest Access Authentication“ (http://www.ietf.org/rfc/rfc2617.txt). Form-based authentication is a way of moving the authentication problem from a web server to the application.

Other authentication methods exist (Windows NT challenge/response authentication and the Kerberos-based Negotiate protocol), but they are proprietary to Microsoft and of limited interest to Apache administrators.

Basic Authentication

Authentication methods built into HTTP use headers to send and receive authentication-related information. When a client attempts to access a protected resource the server responds with a challenge. The response is assigned a 401 HTTP status code, which means that authentication is required. (HTTP uses the word “authorization” in this context but ignore that for a moment.) In addition to the response code, the server sends a response header WWW-Authenticate, which includes information about the required authentication scheme and the authentication realm. The realm is a case-insensitive string that uniquely identifies (within the web site) the protected area. Here is an example of an attempt to access a protected resource and the response returned from the server:

$ telnet www.apachesecurity.net 80
Trying 217.160.182.153...
Connected to www.apachesecurity.net.
Escape character is '^]'.
GET /review/ HTTP/1.0
Host: www.apachesecurity.net
   
HTTP/1.1 401 Authorization Required
Date: Thu, 09 Sep 2004 09:55:07 GMT
WWW-Authenticate: Basic realm="Book Review"
Connection: close
Content-Type: text/html

The first HTTP 401 response returned when a client attempts to access a protected resource is normally not displayed to the user. The browser reacts to such a response by displaying a pop-up window, asking the user to type in the login credentials. After the user enters her username and password, the original request is attempted again, this time with more information.

$ telnet www.apachesecurity.net 80
Trying 217.160.182.153...
Connected to www.apachesecurity.net.
Escape character is '^]'.
GET /review/ HTTP/1.0
Host: www.apachesecurity.net
Authorization: Basic aXZhbnI6c2VjcmV0
   
HTTP/1.1 200 OK
Date: Thu, 09 Sep 2004 10:07:05 GMT
Connection: close
Content-Type: text/html

The browser has added an Authorization request header, which contains the credentials collected from the user. The first part of the header value contains the authentication scheme (Basic in this case), and the second part contains a base-64 encoded combination of the username and the password. The aXZhbnI6c2VjcmV0 string from the header decodes to ivanr:secret. (To experiment with base-64 encoding, use the online encoder/decoder at http://makcoder.sourceforge.net/demo/base64.php.) Provided valid credentials were supplied, the web server proceeds with the request normally, as if authentication was not necessary.

Nothing in the HTTP protocol suggests a web server should remember past authentication requests, regardless of if they were successful. As long as the credentials are missing or incorrect, the web server will keep responding with status 401. This is where some browsers behave differently than others. Mozilla will keep prompting for credentials indefinitely. Internet Explorer, on the other hand, gives up after three times and displays the 401 page it got from the server. Being “logged in” is only an illusion provided by browsers. After one request is successfully authenticated, browsers continue to send the login credentials until the session is over (i.e., the user closes the browser).

Basic authentication is not an ideal authentication protocol. It has a number of disadvantages:

Credentials are transmitted over the wire in plaintext.
There are no provisions for user logout (on user request, or after a timeout).
The login page cannot be customized.
HTTP proxies can extract credentials from the traffic. This may not be a problem in controlled environments when proxies are trusted, but it is a potential problem in general when proxies cannot be trusted.

An attempt to solve some of these problems was made with the addition of Digest authentication to the HTTP protocol.

Digest Authentication

The major purpose of Digest authentication is to allow authentication to take place without sending user credentials to the server in plaintext. Instead, the server sends the client a challenge. The client responds to the challenge by computing a hash of the challenge and the password, and sends the hash back to the server. The server uses the response to determine if the client possesses the correct password.

The increased security of Digest authentication makes it more complex, so I am not going to describe it here in detail. As with Basic authentication, it is documented in RFC 2617, which makes for interesting reading. The following is an example of a request successfully authenticated using Digest authentication:

$ telnet www.apachesecurity.net 80
Trying 217.160.182.153...
Connected to www.apachesecurity.net.
Escape character is '^]'.
GET /review/ HTTP/1.1
Host: www.apachesecurity.net
Authorization: Digest username="ivanr", realm="Book Review",
nonce="OgmPjb/jAwA=7c5a49c2ed9416dba1b04b5307d6d935f74a859d",
uri="/review/", algorithm=MD5, response="3c430d26043cc306e0282635929d57cb",
qop=auth, nc=00000004, cnonce="c3bcee9534c051a0"
   
HTTP/1.1 200 OK
Authentication-Info: rspauth="e18e79490b380eb645a3af0ff5abf0e4",
cnonce="c3bcee9534c051a0", nc=00000004, qop=auth
Connection: close
Content-Type: text/html

Though Digest authentication succeeds in its goal, its adoption on the server side and on the client side was (is) very slow, most likely because it was never deemed significantly better than Basic authentication. It took years for browsers to start supporting it fully. In Apache, the mod_auth_digest module used for Digest authentication (described later) is still marked “experimental.” Consequently, it is rarely used today.

Digest authentication suffers from several weaknesses:

Though user passwords are stored in a form that prevents an attacker from extracting the actual passwords, even if he has access to the password file, the form in which the passwords are stored can be used to authenticate against a Digest authentication-protected area.
Because the realm name is used to convert the password into a form suitable for storing, Digest authentication requires one password file to exist for each protection realm. This makes user database maintenance much more difficult.
Though user passwords cannot be extracted from the traffic, the attacker can deploy what is called a “replay attack” and reuse the captured information to access the authenticated areas for a short period of time. How long it can do so depends on server configuration. With a default Apache configuration, the maximum duration is five minutes.
The most serious problem is that Digest authentication simply does not solve the root issue. Though the password is somewhat protected (admittedly, that can be important in some situations), an attacker who can listen to the traffic can read the traffic directly and extract resources from there.

Engaging in secure, authenticated communication when using an unencrypted channel is impossible. Once you add SSL to the server (see Chapter 4), it corrects most of the problems people have had with Basic authentication. If using SSL is not an option, then deployment of Digest authentication is highly recommended. There are many freely available tools that allow almost anyone (since no technical knowledge is required) to automatically collect Basic authentication passwords from the traffic flowing on the network. But I haven’t seen any tools that automate the process of performing a replay attack when Digest authentication is used. The use of Digest authentication at least raises the bar to require technical skills on the part of the attacker.

There is one Digest authentication feature that is very interesting: server authentication. As of RFC 2617 (which obsoletes RFC 2609), clients can use Digest authentication to verify that the server does know their password. Sounds like a widespread use of Digest authentication could help the fight against numerous phishing attacks that take place on the Internet today (see Chapter 10).

Form-Based Authentication

In addition to the previously mentioned problems with HTTP-based authentication, there are further issues:

HTTP is a stateless protocol. Therefore, applications must add support for sessions so that they can remember what the user did in previous requests.
HTTP has no provisions for authorization. Even if it had, it would only cover the simplest cases since authorization is usually closely integrated with the application logic.
Programmers, responsible for development and maintenance of applications, often do not have sufficient privileges to do anything related to the web servers, which are maintained by system administrators. This has prompted programmers to resort to using the authentication techniques they can control.
Having authentication performed on the web-server level and authorization on the application level complicates things. Furthermore, there are no APIs developers could use to manage the password database.

Since applications must invest significant resources for handling sessions and authorization anyway, it makes sense to shift the rest of the responsibility their way. This is what form-based authentication does. As a bonus, the boundary between programmers’ and system administrators’ responsibilities is better defined.

Form-based authentication is not a protocol since every application is free to implement access control any way it chooses (except in the Java camp, where form-based authentication is a part of the Servlets specification). In response to a request from a user who has not yet authenticated herself, the application sends a form (hence the name form-based) such as the one created by the following HTML:

<form action="/login.php" method="POST">
<input type="text" name="username"><br>
<input type="password" name="password"><br>
<input type="submit" value="Submit"><br>
</form>

The user is expected to fill in appropriate username and password values and select the Submit button. The script login.php then examines the username and password parameters and decides whether to let the user in or send her back to the login form.

HTTP-based authentication does not necessarily need to be implemented on the web server level. Applications can use it for their purposes. However, since that approach has limitations, most applications implement their own authentication schemes. This is unfortunate because most developers are not security experts, and they often design inadequate access control schemes, which lead to insecure applications.

Authentication features built into Apache (described below) are known to be secure because they have stood the test of time. Users (and potential intruders) are not allowed to interact with an application if they do not authenticate themselves first. This can be a great security advantage. When authentication takes place at the application level (instead of the web-server level), the intruder has already passed one security layer (that of the web server). Applications are often given far less testing than the web server and potentially contain more security issues. Some files in the application, for example, may not be protected at all. Images are almost never protected. Often applications contain large amounts of code that are executed prior to authentication. The chances of an intruder finding a hole are much higher when application-level authentication is used.

Note

When deploying private applications on the public Internet, consider using web-server authentication in addition to the existing application-based authentication. In most cases, just a simple outer protection layer where everyone from the organization shares one set of credentials will do.

Access Control in Apache

Out of the box, Apache supports the Basic and Digest authentication protocols with a choice of plaintext or DBM files (documented in a later section) as backends. (Apache 2 also includes the mod_auth_ldap module, but it is considered experimental.) The way authentication is internally handled in Apache has changed dramatically in the 2.1 branch. (In the Apache 2 branch, odd-number releases are development versions. See http://cvs.apache.org/viewcvs.cgi/httpd-2.0/VERSIONING?view=markup for more information on new Apache versioning rules.) Many improvements are being made with little impact to the end users. For more information, take a look at the web site of the 2.1 Authentication Project at http://mod-auth.sourceforge.net.

Outside Apache, many third-party authentication modules enable authentication against LDAP, Kerberos, various database servers, and every other system known to man. If you have a special need, the Apache module repository at http://modules.apache.org is the first place to look.

Basic Authentication Using Plaintext Files

The easiest way to add authentication to Apache configuration is to use mod_auth , which is compiled in by default and provides Basic authentication using plaintext password files as authentication source.

You need to create a password file using the htpasswd utility (in the Apache /bin folder after installation). You can keep it anywhere you want but ensure it is out of reach of other system users. I tend to keep the password file at the same place where I keep the Apache configuration so it is easier to find:

# htpasswd -c /usr/local/apache/conf/auth.users ivanr
New password: ******
Re-type new password: ******
Adding password for user ivanr

This utility expects a path to a password file as its first parameter and the username as its second. The first invocation requires the -c switch, which instructs the utility to create a new password file if it does not exist. A look into the newly created file reveals a very simple structure:

# cat /usr/local/apache/conf/auth.users
ivanr:EbsMlzzsDXiFg

You need the htpasswd utility to encrypt the passwords since storing passwords in plaintext is a bad idea. For all other operations, you can use your favorite text editor. In fact, you must use the text editor because htpasswd provides no features to rename accounts, and most versions do not support deletion of user accounts. (The Apache 2 version of the httpasswd utility does allow you to delete a user account with the -D switch.)

To password-protect a folder, add the following to your Apache configuration, replacing the folder, realm, and user file specifications with values relevant for your situation:

<Directory /var/www/htdocs/review/>
    # Choose authentication protocol
    AuthType Basic
    # Define the security realm
    AuthName "Book Review"
    # Location of the user password file
    AuthUserFile /usr/local/apache/conf/auth.users
    # Valid users can access this folder and no one else
    Require valid-user
</Directory>

After you restart Apache, access to the folder will require valid login credentials.

Working with groups

Using one password file per security realm may work fine in simpler cases but does not work well when users are allowed access to some realms but not the others. Changing passwords for such users would require changes to all password files they belong to. A better approach is to have only one password file. The Require directive allows only named users to be allowed access:

# Only the book reviewers can access this folder
Require user reviewer1 reviewer2 ivanr

But this method can get out of hand as the number of users and realms rises. A better solution is to use group membership as the basis for authentication. Create a group file, such as /usr/local/apache/conf/auth.groups, containing a group definition such as the following:

reviewers: reviewer1 reviewer2 ivanr

Then change the configuration to reference the file and require membership in the group reviewers in order to allow access:

<Directory /var/www/htdocs/review/>
    AuthType Basic
    AuthName "Book Review"
    AuthUserFile /usr/local/apache/conf/auth.users
    # Location of the group membership file
    AuthGroupFile /usr/local/apache/conf/auth.groups
    # Only the book reviewers can access this folder
    Require group reviewers
</Directory>

Basic Authentication Using DBM Files

Looking up user accounts in plaintext files can be slow, especially when the number of users grows over a couple of hundred. The server must open and read the file sequentially until it finds a matching username and must repeat this process on every request. The mod_auth_dbm module also performs Basic authentication, but it uses efficient DBM files to store user account data. DBM files are simple databases, and they allow usernames to be indexed, enabling quick access to the required information. Since mod_auth_dbm is not compiled in by default, you will have to recompile Apache to use it. Using mod_auth_dbm directives instead of mod_auth ones in the previous example gives the following:

<Directory /var/www/htdocs/review/>
    AuthType Basic
    AuthName "Book Review"
    AuthDBMUserFile /usr/local/apache/conf/auth.users.dat
    # Location of the group membership file. Yes,
    # it points to the same file as the password file.
    AuthDBMGroupFile /usr/local/apache/conf/auth.users.dat
    # Only the book reviewers can access this folder
    Require group reviewers
</Directory>

The directive names are almost the same. I added the .dat extension to the password and group file to avoid confusion. Since DBM files cannot be edited directly, you will need to use the dbmmanage utility to manage the password and group files. (The file will be created automatically if it does not exist.) The following adds a user ivanr, member of the group reviewers, to the file auth.users.dat. The dash after the username tells the utility to prompt for the password.

# dbmmanage /usr/local/apache/conf/auth.users.dat adduser ivanr - reviewers
New password: ******
Re-type new password: ******
User ivanr added with password encrypted to 9yWQZ0991uFnc:reviewers using crypt

Warning

When using DBM files for authentication, you may encounter a situation where dbmmanage creates a DBM file of one type while Apache expects a DBM file of another type. This happens because Unix systems often support several DBM formats, dbmmanage determines which format it is going to use at runtime, and Apache determines the default expected format at compile time. Neither of the two tools is smart enough to figure out the format of the file they are given. If your authentication is failing and you find a message in the error log stating mod_auth_dbm cannot find the DBM file and you know the file is there, use the AuthDBMType directive to set the DBM file format (try any of the following settings: SDBM, GDBM, NDBM, or DB).

Digest Authentication

The use of Digest authentication requires the mod_auth_digest module to be compiled into Apache. From an Apache administrator’s point of view Digest authentication is not at all difficult to use. The main difference with Basic authentication is the use of a new directive, AuthDigestDomain. (There are many other directives, but they control the behavior of the Digest authentication implementation.) This directive accepts a list of URLs that belong to the same protection space.

<Directory /var/www/htdocs/review/>
    AuthType Digest
    AuthName "Book Review"
    AuthDigestDomain /review/
    AuthDigestFile /usr/local/apache/conf/auth.users.digest
    Require valid-user
</Directory>

The other difference is that a separate utility, htdigest, must be used to manage the password database. As mentioned earlier, Digest authentication forces you to use one password database per protection space. Without a single user database for the whole server, the AuthDigestGroupFile directive is much less useful. (You can have user groups, but you can only use them within one realm, which may happen, but only rarely.) Here is an example of using htdigest to create the password database and add a user:

# htdigest -c /usr/local/apache/conf/auth.users.digest "Book Review" ivanr
Adding password for ivanr in realm Book Review.
New password: ******
Re-type new password: ******

Certificate-Based Access Control

The combination of any of the authentication methods covered so far and SSL encryption provides a solid authentication layer for many applications. However, that is still one-factor authentication. A common choice when two-factor authentication is needed is to use private client certificates. To authenticate against such a system, you must know a password (the client certificate passphrase, a Type 1 factor) and possess the certificate (a Type 2 factor).

Chapter 4 discusses cryptography, SSL, and client certificates. Here, I bring a couple of authentication-related points to your attention. Only two directives are needed to start asking clients to present their private certificates provided everything else SSL-related has been configured:

SSLVerifyClient require
SSLVerifyDepth 1

This and the use of the SSLRequireSSL directive to enforce SSL-only access for a host or a directory will ensure only strong authentication takes place.

The SSLRequire directive allows fine access control using arbitrarily complex boolean expressions and any of the Apache environment variables. The following (added to a directory context somewhere) will limit access to a web site only to customer services staff and only during business hours:

SSLRequire ( %{SSL_CLIENT_S_DN_OU} eq "Customer Services" ) and \
           ( %{TIME_WDAY} >= 1 and %{TIME_WDAY} <=  5 ) and \
           ( %{TIME_HOUR} >= 8 and %{TIME_HOUR} <= 19 )

Warning

SSLRequire works only for SSL-enabled sites. Attempts to use this directive to perform access control for nonencrypted sites will silently fail because expressions will not be evaluated. Use mod_rewrite for non-SSL sites instead.

The full reference for the SSLRequire directive is available in the Apache documentation at http://httpd.apache.org/docs-2.0/mod/mod_ssl.html#sslrequire.

Network Access Control

Network access control is performed with the help of the mod_access module. Directives Allow and Deny are used to allow or deny access to a directory. Each directive takes a hostname, an IP address, or a fragment of either of the two. (Fragments will be taken to refer to many addresses.) A third directive, Order, determines the order in which allow and deny actions are evaluated. This may sound confusing and it is (always has been to me), so let us see how it works in practice.

To allow access to a directory from the internal network only (assuming the network uses the 192.168.254.x network range):

<Directory /var/www/htdocs/review/>
    Order Deny,Allow
    Deny from all
    Allow from 192.168.254.
</Directory>

You are not required to use IP addresses for network access control. The following identification formats are allowed:

192.168.254.125: Just one IP address
192.168.254: Whole network segment, one C class
192.168.254.0/24: Whole network segment, one C class
192.168.254.0/255.255.255.0: Whole network segment, one C class
ivanr.apachesecurity.net: Just one IP address, resolved at runtime
.apachesecurity.net: IP address of any subdomain, resolved at runtime

Note

A performance penalty is incurred when domain names are used for network access control because Apache must perform a reverse DNS lookup to convert the IP address into a name. In fact, Apache will perform another forward lookup to ensure the name points back to the same IP address. This is necessary because sometimes many names are associated with an IP address (for example, in name-based shared hosting).

Do the following to let anyone but the users from the internal network access the directory:

<Directory /var/www/htdocs/review/>
    Order Allow,Deny
    Allow from all
    Deny from 192.168.254.
</Directory>

The addresses in Allow and Deny can overlap. This feature can be used to create exceptions for an IP address or an IP address range, as in the following example, where access is allowed to users from the internal network but is explicitly forbidden to the user whose workstation uses the IP address 192.168.254.125:

<Directory /var/www/htdocs/review/>
    Order Allow,Deny
    Allow from 192.168.254.
    Deny from 192.168.254.125
    # Access will be implicitly denied to requests
    # that have not been explicitly allowed.
</Directory>

With Order set to Allow,Deny, access is denied by default; with Deny,Allow, access is allowed by default. To make it easier to configure network access control properly, you may want to do the following:

Put the Allow and Deny directives in the order you want them executed. This will not affect the execution order (you control that via the Order directive), but it will give you one less thing to think about.
Use explicit Allow from all or Deny from all instead of relying on the implicit behavior.
Always test the configuration to ensure it works as expected.

Using environment variables

Allow and Deny support a special syntax that can be used to allow or deny access based not on the request IP address but on the information available in the request itself or on the contents of an environment variable. If you have mod_setenvif installed (and you probably do since it is there by default), you can use the SetEnvIf directive to inspect incoming requests and set an environment variable if certain conditions are met.

In the following example, I use SetEnvIf to set an environment variable whenever the request uses GET or POST. Later, such requests are allowed via the Allow directive:

# Set the valid_method environment variable if
# the request method is either GET or POST
SetEnvIf Request_Method "^(GET|POST)$" valid_method=1
   
# Then only allow requests that have this variable set
<Directory /var/www/htdocs/review/>
    Order Deny,Allow
    Deny from all
    Allow from env=valid_method
</Directory>

Proxy Access Control

Restricting access to a proxy server is very important if you are running a forward proxy, i.e., when a proxy is used to access other web servers on the Internet. A warning about this fact appears at the beginning of the mod_proxy reference documentation (http://httpd.apache.org/docs-2.0/mod/mod_proxy.html). Failure to properly secure a proxy will quickly result in spammers abusing the server to send email. Others will use your proxy to hide their tracks as they perform attacks against other servers.

In Apache 1, proxy access control is done through a specially named directory (proxy:), using network access control (as discussed in the Section 7.3.5):

# Allow forward proxy requests
ProxyRequests On
   
# Allow access to the proxy only from
# the internal network
<Directory proxy:*>
    Order Deny,Allow
    Deny from all
    Allow from 192.168.254.
</Directory>

In Apache 2, the equivalent <Proxy> directive is used. (Apache 2 also provides the <ProxyMatch> directive, which allows the supplied URL to be an arbitrary regular expression.)

# Allow forward proxy requests
ProxyRequests On
   
# Allow access to the proxy only from
# the internal network
<Proxy *>
    Order Deny,Allow
    Deny from all
    Allow from 192.168.254.
</Proxy>

Proxying SSL requests requires use of a special CONNECT method, which is designed to allow arbitrary TCP/IP connection tunneling. (See Chapter 11 for examples.) Apache will allow connection tunneling to target only ports 443 (SSL) and 563 (SNEWS) by default. You should not allow other ports to be used (using the AllowCONNECT directive) since that would allow forward proxy users to connect to other services through the proxy.

One consequence of using a proxy server is transfer of trust. Instead of users on the internal network, the target server (or application) is seeing the proxy as the party initiating communication. Because of this, the target may give more access to its services than it would normally do. One common example of this problem is using a forward proxy server to send email. Assuming an email server is running on the same machine as the proxy server, this is how a spammer would trick the proxy into sending email:

POST http://localhost:25/ HTTP/1.0
Content-Length: 120
   
MAIL FROM: aspammer
RCPT TO: ivanr@webkreator.com
DATA
Subject: Please have some of our spam
Spam, spam, spam...
.
QUIT

This works because SMTP servers are error tolerant. When receiving the above request, the proxy opens a connection to port 25 on the same machine (that is, to the SMTP server) and forwards the request to that server. The SMTP server ignores errors incurred by the HTTP request line and the header that follows and processes the request body normally. Since the body contains a valid SMTP communication, an email message is created and accepted.

Unlike for the CONNECT method, Apache does not offer directives to control target ports for normal forward proxy requests. However, Apache Cookbook (Recipe 10.2) provides a solution for the proxy-sending-email problem in the form of a couple of mod_rewrite rules:

<Proxy *>
    RewriteEngine On
    # Do not allow proxy requests to target port 25 (SMTP)
    RewriteRule "^proxy:[a-z]*://[^/]*:25(/|$)" "-" [F,NC,L]
</Proxy>

Reverse proxies

The use of a reverse proxy does not require access control, but it is essential to turn the forward proxy off in the Apache configuration:

# We are running a reverse proxy only, do not
# allow forward proxy requests
ProxyRequests Off

Final Access Control Notes

I will mention more Apache directives related to access control. Prior to presenting that information, I would like to point out one more thing: many modules other than the ones described in this chapter can also be used to perform access control, even if that isn’t their primary purpose. I have used one such module, mod_rewrite, many times in this book to perform things that would be impossible otherwise. Some modules are designed to perform advanced access control. This is the case with mod_dosevasive (mentioned in Chapter 5) and mod_security (described in detail in Chapter 12).

Limiting request methods

The <Limit> and <LimitExcept> directives are designed to perform access control based on the method used in the request. Each method has a different meaning in HTTP. Performing access control based on the request method is useful for restricting usage of some methods capable of making changes to the resources stored on the server. (Such methods include PUT, DELETE, and most of the WebDAV methods.) The possible request methods are defined in the HTTP and the WebDAV specifications. Here are descriptions and access control guidance for some of them:

GET
HEAD: The GET method is used to retrieve the information identified by the request URI. The HEAD method is identical to GET, but the response must not include a body. It should be used to retrieve resource metadata (contained in response headers) without having to download the resource itself. Static web sites need only these two methods to function properly.
POST: The POST method should be used by requests that want to make changes on the server. Unlike the GET method, which does not contain a body, requests that use POST contain a body. Dynamic web applications require the POST method to function properly.
PUT
DELETE: The PUT and DELETE methods are designed to allow a resource to be uploaded to the server or deleted from the server, respectively. Web applications typically do not use these methods, but some client applications (such as Netscape Composer and FrontPage) do. By default Apache is not equipped to handle these requests. The Script directive can be used to redirect requests that use these methods to a custom CGI script that knows how to handle them (for example, Script PUT /cgi-bin/handle-put.pl). For the CGI script to do anything useful, it must be able to write to the web server root.
CONNECT: The CONNECT method is only used in a forward proxy configuration and should be disabled otherwise.
OPTIONS
TRACE: The OPTIONS method is designed to enable a client to inquire about the capabilities of a web server (for example, to learn which request methods it supports). The TRACE method is used for debugging. Whenever a TRACE request is made, the web server should respond by putting the complete request (the request line and the headers received from a client) into the response body. This allows the client to see what is being received by the server, which is particularly useful when the client and the server do not communicate directly, but through one or more proxy servers. These two methods are not dangerous, but some administrators prefer to disable them because they send out information that can be abused by an attacker.
PROPFIND
PROPPATCH
MKCOL
COPY
MOVE
LOCK
UNLOCK: These methods are all defined in the WebDAV specification and provide the means for a capable client to manipulate resources on the web server, just as it would manipulate files on a local hard disk. These methods are enabled automatically when the WebDAV Apache module is enabled, and are only needed when you want to provide WebDAV functionality to your users. They should be disabled otherwise.

The <Limit> directive allows access control to be performed for known request methods. It is used in the same way as the <Directory> directive is to protect directories. The following example allows only authenticated users to make changes on the server using the PUT and DELETE methods:

<Limit PUT DELETE>
    AuthType Basic
    AuthName "Content Editors Only"
    AuthUserFile /usr/local/apache/conf/auth.users
    Require valid-user
</Limit>

Since the <Limit> directive only works for named request methods, it cannot be used to defend against unknown request methods. This is where the <LimitExcept> directive comes in handy. It does the opposite and only allows anonymous access to requests using the listed methods, forcing authentication for others. The following example performs essentially the equivalent functionality as the previous example but forces authentication for all methods except GET, HEAD, and POST:

<LimitExcept GET HEAD POST>
    AuthType Basic
    AuthName "Content Editors Only"
    AuthUserFile /usr/local/apache/conf/auth.users
    Require valid-user
</LimitExcept>

Combining authentication with network access control

Authentication-based and network-based access control can be combined with help from the Satisfy configuration directive. This directive can have two values:

Any: If more than one access control mechanism is specified in the configuration, allow access if any of them is satisfied.
All: If more than one access control mechanism is specified in the configuration, allow access only if all are satisfied. This is the default setting.

This feature is typically used to relax access control in some specific cases. For example, a frequent requirement is to allow internal users access to a resource without providing passwords, but to require authentication for requests coming in from outside the organization. This is what the following example does:

<Directory /var/www/htdocs>
    # Network access control
    Order Deny,Allow
    Deny from all
    Allow from 192.168.254.
   
    # Authentication
    AuthType Basic
    AuthName "Content Editors Only"
    AuthUserFile /usr/local/apache/conf/auth.users
    Require valid-user
   
    # Allow access if either of the two
    # requirements above are satisfied
    Satisfy Any
</Directory>

Combining multiple authentication modules

Though most authentication examples only show one authentication module in use at a time, you can configure multiple modules to require authentication for the same resource. This is when the order in which the modules are loaded becomes important. The first authentication module initialized will be the first to verify the user’s credentials. With the default configuration in place, the first module will also be the last. However, some (possibly all) authentication modules support an option to allow subsequent authentication modules to attempt to authenticate the user. Authentication delegation happens if the first module processing the request is unable to authenticate the user. In practice, this occurs if the user is unknown to the module. If the username used for the request is known but the password is incorrect, delegation will not happen.

Each module uses a directive with a different name for this option, but the convention is to have the names end in “Authoritative.” For example, the AuthAuthoritative directive configures mod_auth, and the AuthDBMAuthoritative directive configures mod_auth_dbm.

Single Sign-on

The term single sign-on (SSO) is used today to refer to several different problems, but it generally refers to a system where people can log in only once and have access to system-wide resources. What people mean when they say SSO depends on the context in which the term is used:

SSO within a single organization
SSO among many related organizations
Internet-wide SSO among unrelated organizations

The term identity management is used to describe the SSO problem from the point of view of those who maintain the system. So what is the problem that makes implementing SSO difficult? Even within a single organization where the IT operations are under the control of a central authority, achieving all business goals by deploying a single system is impossible, no matter how complex the system. In real life, business goals are achieved with the use of many different components. For example, at minimum, every modern organization must enable their users to do the following:

Log on to their workstations
Send email (via an SMTP server)
Read email (via a POP or IMAP server)

In most organizations, this may lead to users having three sets of unrelated credentials, so SSO is not achieved. And I haven’t even started to enumerate all the possibilities. A typical organization will have many web applications (e.g., intranet, project management, content management) and many other network accounts (e.g., FTP servers). As the organization grows, the problem grows exponentially. Maintaining the user accounts and all the passwords becomes a nightmare for system administrators even if users simplify their lives by using a single password for all services. From the security point of view, a lack of central access control leads to complete failure to control access and to be aware of who is doing what with the services. On the other hand, unifying access to resources means that if someone’s account is broken into, the attacker will get access to every resource available to the user. (In a non-SSO system, only one particular service would be compromised.) Imagine only one component that stores passwords insecurely on a local hard drive. Anyone with physical access to the workstation would be able to extract the password from the drive and use it to get access to other resources in the system.

SSO is usually implemented as a central database of user accounts and access privileges (usually one set of credentials per user used for all services). This is easier said than done since many of the components were not designed to play well with each other. In most cases, the SSO problem lies outside the realm of web server administration since many components are not web servers. Even in the web server space, there are many brands (Apache, Microsoft IIS, Java-based web servers) and SSO must work across all of them.

A decent SSO strategy is to use a Lightweight Directory Access Protocol (LDAP) server to store user accounts. Many web servers and other network servers support the use of LDAP for access control. Microsoft decided to use Kerberos (http://web.mit.edu/kerberos/www/) for SSO, but the problem with Kerberos is that all clients must be Kerberos-aware and most browsers still are not. In the Apache space, the mod_auth_kerb module (http://modauthkerb.sourceforge.net) can be configured to use Basic authentication to collect credentials from the user and check them against a Kerberos server, thus making Kerberos work with any browser.

Expanding the scope to include more than one organization brings new problems, and makes it vastly complex. Microsoft was among the first to attempt to introduce Internet-wide SSO with their Passport program (now called .Net Passport), described at http://www.passport.net. There were many concerns about their implementation and that Microsoft has a monopoly on the desktop did not help either. To counter their solution, Sun initiated Project Liberty (http://www.projectliberty.org) and formed an organization called the Liberty Alliance to run it. This organization claims to have more than 150 members.

Web Single Sign-on

Solving a web-only SSO problem seems to be easier since there are several freely available solutions. You can find them listed on the home page of the WebISO Working Group (http://middleware.internet2.edu/webiso/). Also of interest is the Shibboleth project (http://shibboleth.internet2.edu), which aims to establish a standard way of sharing resources related to inter-organizational access control.

Implementing a web SSO solution consists of finding and configuring one of the available implementations that suit your requirements. Most web single sign-on solutions work in much the same way:

All web servers are assigned subdomains on the same domain name. For example, valid names could be app1.apachesecurity.net, app2.apachesecurity.net, and login.apachesecurity.net. This is necessary so cookies issued by one web server can be received by some other web server. (Cookies can be reused when the main domain name is the same.)
When a client without a cookie comes to a content server, he is forwarded to the central server for authentication. This way the password is never disclosed to any of the content servers. If the authentication is successful the login server issues a shared authentication cookie, which will be visible to all web servers in the ring. It then forwards the user back to the content server he came from.
When a client with a cookie comes to a content server, the server contacts the login server behind the scenes to verify it. If the cookie is valid, the content server creates a new user session and accepts the user. Alternatively, if the login server has signed the cookie with its private key, the content server can use public-key cryptography to verify the cookie without contacting the login server.

Simple Apache-Only Single Sign-on

If all you have to worry about is authentication against Apache web servers, a brilliant little module, called mod_auth_remote (see http://puggy.symonds.net/~srp/stuff/mod_auth_remote/), allows authentication (and authorization) to be delegated from one server to another. All you need to do is have a central web server where all authentication will take place (the authentication server) and install mod_auth_remote on all other web servers (which I will refer to as content servers). The approach this module takes is very smart. Not only does it use Basic authentication to receive credentials from clients, it also uses Basic authentication to talk to the central web server behind the scenes. What this means is that there is no need to install anything on the central server, and there are no new configuration directives to learn. At the central server you are free to use any authentication module you like. You can even write an application (say, using PHP) to implement a custom authentication method.

The configuration on a content server looks much like that of any other authentication module:

<Directory /var/www/htdocs/review/>
    AuthType Basic
    AuthName "Book Review"
    AuthRemoteServer sso.apachesecurity.net
    AuthRemotePort 80
    AuthRemoteURL /auth
    Require valid-user
</Directory>

On the central server, you only need to secure one URL. If you need SSO then you have many servers with many requests; therefore, using mod_auth_dbm to speed up the authentication process seems appropriate here:

<Location /auth>
    AuthType Basic
    AuthName "Central Authentication"
    AuthDBMUserFile /usr/local/apache/conf/auth.users.dat
    Require valid-user
</Location>

At first glance, it looks like this module is only good for authentication, but if you use different remote URLs for different protection realms, the script on the central server can take the URL into account when making the decision as to whether to allow someone access.

There are two weak points:

For every request coming to a content server, mod_auth_remote performs a request against the authentication server. This increases latency and, in environments with heavy traffic, may create a processing bottleneck.
Communication between servers is not encrypted, so both servers must be on a secure private network. Since adding SSL support to mod_auth_remote is not trivial, chances are it will not be improved to support it in the near future.

If you have a situation where the authentication server is not on a trusted network, you could use the Stunnel universal SSL driver (as described in the Appendix A) to secure communication between mod_auth_remote and the authentication server. However, if you recall the discussion from Chapter 4, establishing an SSL communication channel is the most expensive part of SSL communication. Without proper SSL support built into mod_auth_remote (enabling session reuse), performance will be inadequate.

Credential caching (actually the absence of it) is a frequent problem with authentication modules. The new authentication backend (the one from the 2.1 branch) includes a module mod_authn_cache (http://mod-auth.sourceforge.net/docs/mod_authn_cache/) to enable caching. For Apache 1, similar functionality is provided by mod_auth_cache (http://mod-auth-cache.sourceforge.net).