Access control is an important part of security and is its most visible aspect, leading people to assume it is security. You may need to introduce access control to your system for a few reasons. The first and or most obvious reason is to allow some people to see (or do) what you want them to see/do while keeping the others out. However, you must also know who did what and when, so that they can be held accountable for their actions.
This chapter covers the following:
Access control concepts
HTTP authentication protocols
Form-based authentication as an alternative to HTTP-based authentication
Access control mechanisms built into Apache
Single sign-on
Access control concerns itself with restricting access to authorized persons and with establishing accountability. There are four terms that are commonly used in discussions related to access control:
Process in which a user presents his identity
Process of verifying the user is allowed to access the system
Process of verifying the user is allowed to access a particular resource
Ability to tell who accessed a resource and when, and whether the resource was modified as part of the access
From system users’ point of view, they rarely encounter accountability, and the rest of the processes can appear to be a single step. When working as a system administrator, however, it is important to distinguish which operation is performed in which step and why. I have been very careful to word the definitions to reflect the true meanings of these terms.
Identification is the easiest process to describe. When required, users present their credentials so subsequent processes to establish their rights can begin. In real life, this is the equivalent of showing a pass upon entering a secure area.
The right of the user to access the system is established in the authentication step. This part of the process is often viewed as establishing someone’s identity but, strictly speaking, this is not the case. Several types of information, called factors, are used to make the decision:
This is the most commonly used authentication type. The user is required to demonstrate knowledge of some information—e.g., a password, passphrase, or PIN code.
A Type 2 factor requires the user to demonstrate possession of some material access control element, usually a smart card or token of some kind. In a wider sense, this factor can include the time and location attributes of an access request, for example, “Access is allowed from the central office during normal work hours.“
Finally, a Type 3 factor treats the user as an access control element through the use of biometrics; that is, physical attributes of a user such as fingerprints, voiceprint, or eye patterns.
The term two-factor authentication is used to describe a system that requires two of the factors to be used as part of the authentication process. For example, to withdraw money from an ATM machine, you must present your ATM card and know the PIN number associated with it.
Before the authorization part of the access control process begins, it is already known who the user is, and that he has the right to be there. For a simple system, this may be enough and the authorization process practically always succeeds. More complex systems, however, consist of many resources and access levels. Within an organization, some users may have access to some resources but not to others. This is a normal operating condition. Therefore, the authorization process looks at the resource and makes a decision whether the user is allowed to access it. The best way to differentiate between authentication and authorization is in terms of what they protect. Authentication protects the system, while authorization protects resources.
Accountability requirements should be considered when deciding how authentication and
authorization are going to be performed. For example, if you allow a group of people to
access an application using identical credentials, you may achieve the first goal of
access control (protecting resources) but you will have no way of knowing who accessed
what, though you will know when. So, when someone leaks that confidential document to
the public and no one wants to take the blame, the system logs will not help either.
(This is why direct root
login should never be allowed. Let the users
log in as themselves first, and then change into root
. That way the
log files will contain a reliable access record.)
This section discusses three widely deployed authentication methods:
Basic authentication
Digest authentication
Form-based authentication
The first two are built into the HTTP protocol and defined in RFC 2617, “HTTP
Authentication: Basic and Digest Access Authentication“ (http://www.ietf.org/rfc/rfc2617.txt
). Form-based authentication is a
way of moving the authentication problem from a web server to the application.
Other authentication methods exist (Windows NT challenge/response authentication and the Kerberos-based Negotiate protocol), but they are proprietary to Microsoft and of limited interest to Apache administrators.
Authentication methods built into HTTP use headers to send and receive
authentication-related information. When a client attempts to access a protected
resource the server responds with a challenge. The response is
assigned a 401
HTTP status code, which means that authentication
is required. (HTTP uses the word “authorization” in this context but ignore that for
a moment.) In addition to the response code, the server sends a response header
WWW-Authenticate
, which includes information about the
required authentication scheme and the authentication realm.
The realm is a case-insensitive string that uniquely identifies (within the web
site) the protected area. Here is an example of an attempt to access a protected
resource and the response returned from the server:
$telnet www.apachesecurity.net 80
Trying 217.160.182.153... Connected to www.apachesecurity.net. Escape character is '^]'.GET /review/ HTTP/1.0
Host: www.apachesecurity.net
HTTP/1.1 401 Authorization Required Date: Thu, 09 Sep 2004 09:55:07 GMT WWW-Authenticate: Basic realm="Book Review" Connection: close Content-Type: text/html
The first HTTP 401
response returned when a client attempts to
access a protected resource is normally not displayed to the user. The browser
reacts to such a response by displaying a pop-up window, asking the user to type in
the login credentials. After the user enters her username and password, the original
request is attempted again, this time with more information.
$telnet www.apachesecurity.net 80
Trying 217.160.182.153... Connected to www.apachesecurity.net. Escape character is '^]'.GET /review/ HTTP/1.0
Host: www.apachesecurity.net
Authorization: Basic aXZhbnI6c2VjcmV0
HTTP/1.1 200 OK Date: Thu, 09 Sep 2004 10:07:05 GMT Connection: close Content-Type: text/html
The browser has added an Authorization
request header, which
contains the credentials collected from the user. The first part of the header value
contains the authentication scheme (Basic
in this case), and the
second part contains a base-64 encoded combination of the username and the password.
The aXZhbnI6c2VjcmV0
string from the header decodes to
ivanr:secret
. (To experiment with base-64 encoding, use the
online encoder/decoder at http://makcoder.sourceforge.net/demo/base64.php
.) Provided valid
credentials were supplied, the web server proceeds with the request normally, as if
authentication was not necessary.
Nothing in the HTTP protocol suggests a web server should remember past
authentication requests, regardless of if they were successful. As long as the
credentials are missing or incorrect, the web server will keep responding with
status 401
. This is where some browsers behave differently than
others. Mozilla will keep prompting for credentials indefinitely. Internet Explorer,
on the other hand, gives up after three times and displays the
401
page it got from the server. Being “logged in” is only an
illusion provided by browsers. After one request is successfully authenticated,
browsers continue to send the login credentials until the session is over (i.e., the
user closes the browser).
Basic authentication is not an ideal authentication protocol. It has a number of disadvantages:
Credentials are transmitted over the wire in plaintext.
There are no provisions for user logout (on user request, or after a timeout).
The login page cannot be customized.
HTTP proxies can extract credentials from the traffic. This may not be a problem in controlled environments when proxies are trusted, but it is a potential problem in general when proxies cannot be trusted.
An attempt to solve some of these problems was made with the addition of Digest authentication to the HTTP protocol.
The major purpose of Digest authentication is to allow authentication to take place without sending user credentials to the server in plaintext. Instead, the server sends the client a challenge. The client responds to the challenge by computing a hash of the challenge and the password, and sends the hash back to the server. The server uses the response to determine if the client possesses the correct password.
The increased security of Digest authentication makes it more complex, so I am not going to describe it here in detail. As with Basic authentication, it is documented in RFC 2617, which makes for interesting reading. The following is an example of a request successfully authenticated using Digest authentication:
$telnet www.apachesecurity.net 80
Trying 217.160.182.153... Connected to www.apachesecurity.net. Escape character is '^]'.GET /review/ HTTP/1.1
Host: www.apachesecurity.net
Authorization: Digest username="ivanr", realm="Book Review",
nonce="OgmPjb/jAwA=7c5a49c2ed9416dba1b04b5307d6d935f74a859d",
uri="/review/", algorithm=MD5, response="3c430d26043cc306e0282635929d57cb",
qop=auth, nc=00000004, cnonce="c3bcee9534c051a0"
HTTP/1.1 200 OK Authentication-Info: rspauth="e18e79490b380eb645a3af0ff5abf0e4", cnonce="c3bcee9534c051a0", nc=00000004, qop=auth Connection: close Content-Type: text/html
Though Digest authentication succeeds in its goal, its adoption on the server side
and on the client side was (is) very slow, most likely because it was never deemed
significantly better than Basic authentication. It took years for browsers to start
supporting it fully. In Apache, the
mod_auth_digest
module used for Digest authentication
(described later) is still marked “experimental.” Consequently, it is rarely used
today.
Digest authentication suffers from several weaknesses:
Though user passwords are stored in a form that prevents an attacker from extracting the actual passwords, even if he has access to the password file, the form in which the passwords are stored can be used to authenticate against a Digest authentication-protected area.
Because the realm name is used to convert the password into a form suitable for storing, Digest authentication requires one password file to exist for each protection realm. This makes user database maintenance much more difficult.
Though user passwords cannot be extracted from the traffic, the attacker can deploy what is called a “replay attack” and reuse the captured information to access the authenticated areas for a short period of time. How long it can do so depends on server configuration. With a default Apache configuration, the maximum duration is five minutes.
The most serious problem is that Digest authentication simply does not solve the root issue. Though the password is somewhat protected (admittedly, that can be important in some situations), an attacker who can listen to the traffic can read the traffic directly and extract resources from there.
Engaging in secure, authenticated communication when using an unencrypted channel is impossible. Once you add SSL to the server (see Chapter 4), it corrects most of the problems people have had with Basic authentication. If using SSL is not an option, then deployment of Digest authentication is highly recommended. There are many freely available tools that allow almost anyone (since no technical knowledge is required) to automatically collect Basic authentication passwords from the traffic flowing on the network. But I haven’t seen any tools that automate the process of performing a replay attack when Digest authentication is used. The use of Digest authentication at least raises the bar to require technical skills on the part of the attacker.
There is one Digest authentication feature that is very interesting: server authentication. As of RFC 2617 (which obsoletes RFC 2609), clients can use Digest authentication to verify that the server does know their password. Sounds like a widespread use of Digest authentication could help the fight against numerous phishing attacks that take place on the Internet today (see Chapter 10).
In addition to the previously mentioned problems with HTTP-based authentication, there are further issues:
HTTP is a stateless protocol. Therefore, applications must add support for sessions so that they can remember what the user did in previous requests.
HTTP has no provisions for authorization. Even if it had, it would only cover the simplest cases since authorization is usually closely integrated with the application logic.
Programmers, responsible for development and maintenance of applications, often do not have sufficient privileges to do anything related to the web servers, which are maintained by system administrators. This has prompted programmers to resort to using the authentication techniques they can control.
Having authentication performed on the web-server level and authorization on the application level complicates things. Furthermore, there are no APIs developers could use to manage the password database.
Since applications must invest significant resources for handling sessions and authorization anyway, it makes sense to shift the rest of the responsibility their way. This is what form-based authentication does. As a bonus, the boundary between programmers’ and system administrators’ responsibilities is better defined.
Form-based authentication is not a protocol since every application is free to implement access control any way it chooses (except in the Java camp, where form-based authentication is a part of the Servlets specification). In response to a request from a user who has not yet authenticated herself, the application sends a form (hence the name form-based) such as the one created by the following HTML:
<form action="/login.php" method="POST"> <input type="text" name="username"><br> <input type="password" name="password"><br> <input type="submit" value="Submit"><br> </form>
The user is expected to fill in appropriate username
and
password
values and select the Submit
button. The script login.php
then examines the
username
and password
parameters and
decides whether to let the user in or send her back to the login form.
HTTP-based authentication does not necessarily need to be implemented on the web server level. Applications can use it for their purposes. However, since that approach has limitations, most applications implement their own authentication schemes. This is unfortunate because most developers are not security experts, and they often design inadequate access control schemes, which lead to insecure applications.
Authentication features built into Apache (described below) are known to be secure because they have stood the test of time. Users (and potential intruders) are not allowed to interact with an application if they do not authenticate themselves first. This can be a great security advantage. When authentication takes place at the application level (instead of the web-server level), the intruder has already passed one security layer (that of the web server). Applications are often given far less testing than the web server and potentially contain more security issues. Some files in the application, for example, may not be protected at all. Images are almost never protected. Often applications contain large amounts of code that are executed prior to authentication. The chances of an intruder finding a hole are much higher when application-level authentication is used.
Out of the box, Apache supports the Basic and Digest authentication protocols with a
choice of plaintext or DBM files (documented in a later section) as backends. (Apache 2
also includes the
mod_auth_ldap
module, but it is considered experimental.)
The way authentication is internally handled in Apache has changed dramatically in the
2.1 branch. (In the Apache 2 branch, odd-number releases are development versions. See
http://cvs.apache.org/viewcvs.cgi/httpd-2.0/VERSIONING?view=markup
for
more information on new Apache versioning rules.) Many improvements are being made with
little impact to the end users. For more information, take a look at the web site of the
2.1 Authentication Project at http://mod-auth.sourceforge.net
.
Outside Apache, many third-party authentication modules enable authentication against
LDAP, Kerberos, various database servers, and every other system known to man. If you
have a special need, the
Apache module repository at http://modules.apache.org
is the first place to look.
The easiest way to add authentication to Apache configuration is to use
mod_auth
, which is compiled in by default and provides Basic authentication
using plaintext password files as authentication source.
You need to create a password file using the htpasswd
utility
(in the Apache /bin
folder after installation). You can keep it
anywhere you want but ensure it is out of reach of other system users. I tend to
keep the password file at the same place where I keep the Apache configuration so it
is easier to find:
#htpasswd -c /usr/local/apache/conf/auth.users ivanr
New password:******
Re-type new password:******
Adding password for user ivanr
This utility expects a path to a password file as its first parameter and the
username as its second. The first invocation requires the -c
switch, which instructs the utility to create a new password file if it does not
exist. A look into the newly created file reveals a very simple structure:
# cat /usr/local/apache/conf/auth.users
ivanr:EbsMlzzsDXiFg
You need the htpasswd
utility to encrypt the passwords since
storing passwords in plaintext is a bad idea. For all other operations, you can use
your favorite text editor. In fact, you must use the text editor because
htpasswd
provides no features to rename accounts, and most
versions do not support deletion of user accounts. (The Apache 2 version of the
httpasswd
utility does allow you to delete a user account
with the -D
switch.)
To password-protect a folder, add the following to your Apache configuration, replacing the folder, realm, and user file specifications with values relevant for your situation:
<Directory /var/www/htdocs/review/> # Choose authentication protocol AuthType Basic # Define the security realm AuthName "Book Review" # Location of the user password file AuthUserFile /usr/local/apache/conf/auth.users # Valid users can access this folder and no one else Require valid-user </Directory>
After you restart Apache, access to the folder will require valid login credentials.
Using one password file per security realm may work fine in simpler cases but
does not work well when users are allowed access to some realms but not the
others. Changing passwords for such users would require changes to all password
files they belong to. A better approach is to have only one password file. The
Require
directive allows only named users to be allowed
access:
# Only the book reviewers can access this folder Require user reviewer1 reviewer2 ivanr
But this method can get out of hand as the number of users and realms rises. A
better solution is to use group membership as the basis for authentication.
Create a group file, such as
/usr/local/apache/conf/auth.groups
, containing a group
definition such as the following:
reviewers: reviewer1 reviewer2 ivanr
Then change the configuration to reference the file and require membership in the group reviewers in order to allow access:
<Directory /var/www/htdocs/review/> AuthType Basic AuthName "Book Review" AuthUserFile /usr/local/apache/conf/auth.users # Location of the group membership file AuthGroupFile /usr/local/apache/conf/auth.groups # Only the book reviewers can access this folder Require group reviewers </Directory>
Looking up user accounts in plaintext files can be slow, especially when the
number of users grows over a couple of hundred. The server must open and read the
file sequentially until it finds a matching username and must repeat this process on
every request. The
mod_auth_dbm
module also performs Basic
authentication, but it uses efficient DBM files to store user account data. DBM
files are simple databases, and they allow usernames to be indexed, enabling quick
access to the required information. Since mod_auth_dbm
is not
compiled in by default, you will have to recompile Apache to use it. Using
mod_auth_dbm
directives instead of
mod_auth
ones in the previous example gives the
following:
<Directory /var/www/htdocs/review/> AuthType Basic AuthName "Book Review" AuthDBMUserFile /usr/local/apache/conf/auth.users.dat # Location of the group membership file. Yes, # it points to the same file as the password file. AuthDBMGroupFile /usr/local/apache/conf/auth.users.dat # Only the book reviewers can access this folder Require group reviewers </Directory>
The directive names are almost the same. I added the .dat
extension to the password and group file to avoid confusion. Since DBM files cannot
be edited directly, you will need to use the dbmmanage
utility
to manage the password and group files. (The file will be created automatically if
it does not exist.) The following adds a user ivanr, member of
the group reviewers, to the file
auth.users.dat
. The dash after the username tells the
utility to prompt for the password.
#dbmmanage /usr/local/apache/conf/auth.users.dat adduser ivanr - reviewers
New password:******
Re-type new password:******
User ivanr added with password encrypted to 9yWQZ0991uFnc:reviewers using crypt
When using DBM files for authentication, you may encounter a situation where
dbmmanage
creates a DBM file of one type while
Apache expects a DBM file of another type. This happens because Unix systems
often support several DBM formats, dbmmanage
determines
which format it is going to use at runtime, and Apache determines the default
expected format at compile time. Neither of the two tools is smart enough to
figure out the format of the file they are given. If your authentication is
failing and you find a message in the error log stating
mod_auth_dbm
cannot find the DBM file and you know the
file is there, use the AuthDBMType
directive to set the DBM file format (try any of the following settings:
SDBM
, GDBM
, NDBM
,
or DB
).
The use of Digest authentication requires the
mod_auth_digest
module to be compiled into Apache.
From an Apache administrator’s point of view Digest authentication is not at all
difficult to use. The main difference with Basic authentication is the use of a new
directive,
AuthDigestDomain
. (There are many other directives,
but they control the behavior of the Digest authentication implementation.) This
directive accepts a list of URLs that belong to the same protection space.
<Directory /var/www/htdocs/review/> AuthType Digest AuthName "Book Review" AuthDigestDomain /review/ AuthDigestFile /usr/local/apache/conf/auth.users.digest Require valid-user </Directory>
The other difference is that a separate utility, htdigest
, must be used to manage the password
database. As mentioned earlier, Digest authentication forces you to use one password
database per protection space. Without a single user database for the whole server,
the AuthDigestGroupFile
directive is much less useful. (You can
have user groups, but you can only use them within one realm, which may happen, but
only rarely.) Here is an example of using htdigest
to create the
password database and add a user:
#htdigest -c /usr/local/apache/conf/auth.users.digest "Book Review" ivanr
Adding password for ivanr in realm Book Review. New password:******
Re-type new password:******
The combination of any of the authentication methods covered so far and SSL encryption provides a solid authentication layer for many applications. However, that is still one-factor authentication. A common choice when two-factor authentication is needed is to use private client certificates. To authenticate against such a system, you must know a password (the client certificate passphrase, a Type 1 factor) and possess the certificate (a Type 2 factor).
Chapter 4 discusses cryptography, SSL, and client certificates. Here, I bring a couple of authentication-related points to your attention. Only two directives are needed to start asking clients to present their private certificates provided everything else SSL-related has been configured:
SSLVerifyClient require SSLVerifyDepth 1
This and the use of the
SSLRequireSSL
directive to enforce SSL-only access
for a host or a directory will ensure only strong authentication takes place.
The SSLRequire
directive allows fine access control using
arbitrarily complex boolean expressions and any of the Apache environment variables.
The following (added to a directory context somewhere) will limit access to a web
site only to customer services staff and only during business hours:
SSLRequire ( %{SSL_CLIENT_S_DN_OU} eq "Customer Services" ) and \ ( %{TIME_WDAY} >= 1 and %{TIME_WDAY} <= 5 ) and \ ( %{TIME_HOUR} >= 8 and %{TIME_HOUR} <= 19 )
SSLRequire
works only for SSL-enabled sites. Attempts to
use this directive to perform access control for nonencrypted sites will
silently fail because expressions will not be evaluated. Use
mod_rewrite
for non-SSL sites instead.
The full reference for the SSLRequire
directive is available in
the Apache documentation at http://httpd.apache.org/docs-2.0/mod/mod_ssl.html#sslrequire
.
Network access control is performed with the help of the mod_access
module. Directives Allow
and Deny
are used to allow or deny access to a directory.
Each directive takes a hostname, an IP address, or a fragment of either of the two.
(Fragments will be taken to refer to many addresses.) A third directive,
Order
, determines the order in which allow and deny
actions are evaluated. This may sound confusing and it is (always has been to me),
so let us see how it works in practice.
To allow access to a directory from the internal network only (assuming the
network uses the 192.168.254.x
network range):
<Directory /var/www/htdocs/review/> Order Deny,Allow Deny from all Allow from 192.168.254. </Directory>
You are not required to use IP addresses for network access control. The following identification formats are allowed:
192.168.254.125
Just one IP address
192.168.254
Whole network segment, one C class
192.168.254.0/24
Whole network segment, one C class
192.168.254.0/255.255.255.0
Whole network segment, one C class
ivanr.apachesecurity.net
Just one IP address, resolved at runtime
.apachesecurity.net
IP address of any subdomain, resolved at runtime
A performance penalty is incurred when domain names are used for network access control because Apache must perform a reverse DNS lookup to convert the IP address into a name. In fact, Apache will perform another forward lookup to ensure the name points back to the same IP address. This is necessary because sometimes many names are associated with an IP address (for example, in name-based shared hosting).
Do the following to let anyone but the users from the internal network access the directory:
<Directory /var/www/htdocs/review/> Order Allow,Deny Allow from all Deny from 192.168.254. </Directory>
The addresses in Allow
and Deny
can overlap.
This feature can be used to create exceptions for an IP address or an IP address
range, as in the following example, where access is allowed to users from the
internal network but is explicitly forbidden to the user whose workstation uses the
IP address 192.168.254.125
:
<Directory /var/www/htdocs/review/> Order Allow,Deny Allow from 192.168.254. Deny from 192.168.254.125 # Access will be implicitly denied to requests # that have not been explicitly allowed. </Directory>
With Order
set to Allow,Deny
, access is
denied by default; with Deny,Allow
, access is allowed by default.
To make it easier to configure network access control properly, you may want to do
the following:
Put the Allow
and Deny
directives in
the order you want them executed. This will not affect the execution order
(you control that via the Order
directive), but it will
give you one less thing to think about.
Use explicit Allow
from
all
or Deny
from
all
instead of relying on the implicit behavior.
Always test the configuration to ensure it works as expected.
Allow
and Deny
support a special syntax
that can be used to allow or deny access based not on the request IP address but
on the information available in the request itself or on the contents of an
environment variable. If you have
mod_setenvif
installed (and you probably do since
it is there by default), you can use the
SetEnvIf
directive to inspect incoming requests
and set an environment variable if certain conditions are met.
In the following example, I use SetEnvIf
to set an
environment variable whenever the request uses GET
or
POST
. Later, such requests are allowed via the
Allow
directive:
# Set the valid_method environment variable if # the request method is either GET or POST SetEnvIf Request_Method "^(GET|POST)$" valid_method=1 # Then only allow requests that have this variable set <Directory /var/www/htdocs/review/> Order Deny,Allow Deny from all Allow from env=valid_method </Directory>
Restricting access to a proxy server is very important if you are running a
forward proxy, i.e., when a proxy is used to access other
web servers on the Internet. A warning about this fact appears at the beginning of
the
mod_proxy
reference documentation (http://httpd.apache.org/docs-2.0/mod/mod_proxy.html
).
Failure to properly secure a proxy will quickly result in spammers abusing the
server to send email. Others will use your proxy to hide their tracks as they
perform attacks against other servers.
In Apache 1, proxy access control is done through a specially named directory (proxy:), using network access control (as discussed in the Section 7.3.5):
# Allow forward proxy requests ProxyRequests On # Allow access to the proxy only from # the internal network <Directory proxy:*> Order Deny,Allow Deny from all Allow from 192.168.254. </Directory>
In Apache 2, the equivalent
<Proxy>
directive is used. (Apache 2 also
provides the <ProxyMatch>
directive, which allows the supplied URL to be an arbitrary regular
expression.)
# Allow forward proxy requests ProxyRequests On # Allow access to the proxy only from # the internal network <Proxy *> Order Deny,Allow Deny from all Allow from 192.168.254. </Proxy>
Proxying SSL requests requires use of a special CONNECT
method,
which is designed to allow arbitrary TCP/IP connection tunneling. (See Chapter 11 for examples.) Apache will allow
connection tunneling to target only ports 443 (SSL) and 563 (SNEWS) by default. You
should not allow other ports to be used (using the AllowCONNECT
directive) since that would allow forward proxy users to connect to other services
through the proxy.
One consequence of using a proxy server is transfer of trust. Instead of users on the internal network, the target server (or application) is seeing the proxy as the party initiating communication. Because of this, the target may give more access to its services than it would normally do. One common example of this problem is using a forward proxy server to send email. Assuming an email server is running on the same machine as the proxy server, this is how a spammer would trick the proxy into sending email:
POST http://localhost:25/ HTTP/1.0 Content-Length: 120 MAIL FROM: aspammer RCPT TO: ivanr@webkreator.com DATA Subject: Please have some of our spam Spam, spam, spam... . QUIT
This works because SMTP servers are error tolerant. When receiving the above request, the proxy opens a connection to port 25 on the same machine (that is, to the SMTP server) and forwards the request to that server. The SMTP server ignores errors incurred by the HTTP request line and the header that follows and processes the request body normally. Since the body contains a valid SMTP communication, an email message is created and accepted.
Unlike for the CONNECT
method, Apache does not offer directives
to control target ports for normal forward proxy requests. However, Apache
Cookbook (Recipe 10.2) provides a solution for the
proxy-sending-email problem in the form of a couple of
mod_rewrite
rules:
<Proxy *> RewriteEngine On # Do not allow proxy requests to target port 25 (SMTP) RewriteRule "^proxy:[a-z]*://[^/]*:25(/|$)" "-" [F,NC,L] </Proxy>
I will mention more Apache directives related to access control. Prior to
presenting that information, I would like to point out one more thing: many modules
other than the ones described in this chapter can also be used to perform access
control, even if that isn’t their primary purpose. I have used one such module,
mod_rewrite
, many times in this book to perform things that
would be impossible otherwise. Some modules are designed to perform advanced access
control. This is the case with mod_dosevasive
(mentioned in Chapter 5) and mod_security
(described in detail in Chapter 12).
The
<Limit>
and
<LimitExcept>
directives are designed to
perform access control based on the method used in the request. Each method has
a different meaning in HTTP. Performing access control based on the request
method is useful for restricting usage of some methods capable of making changes
to the resources stored on the server. (Such methods include
PUT
, DELETE
, and most of the WebDAV
methods.) The possible request methods are defined in the HTTP and the WebDAV
specifications. Here are descriptions and access control guidance for some of
them:
GET
HEAD
The GET
method is used to retrieve the
information identified by the request URI. The
HEAD
method is identical to
GET
, but the response must not include a
body. It should be used to retrieve resource metadata (contained in
response headers) without having to download the resource itself.
Static web sites need only these two methods to function
properly.
POST
The POST
method should be used by requests that
want to make changes on the server. Unlike the
GET
method, which does not contain a body,
requests that use POST
contain a body. Dynamic
web applications require the POST
method to
function properly.
PUT
DELETE
The PUT
and DELETE
methods
are designed to allow a resource to be uploaded to the server or
deleted from the server, respectively. Web applications typically do
not use these methods, but some client applications (such as
Netscape Composer and FrontPage) do. By default Apache is not
equipped to handle these requests. The Script
directive can be used to redirect requests that use these methods to
a custom CGI script that knows how to handle them (for example,
Script
PUT
/cgi-bin/handle-put.pl
). For the CGI script to do
anything useful, it must be able to write to the web server
root.
CONNECT
The CONNECT
method is only used in a forward
proxy configuration and should be disabled otherwise.
OPTIONS
TRACE
The OPTIONS
method is designed to enable a
client to inquire about the capabilities of a web server (for
example, to learn which request methods it supports). The
TRACE
method is used for debugging. Whenever
a TRACE
request is made, the web server should
respond by putting the complete request (the request line and the
headers received from a client) into the response body. This allows
the client to see what is being received by the server, which is
particularly useful when the client and the server do not
communicate directly, but through one or more proxy servers. These
two methods are not dangerous, but some administrators prefer to
disable them because they send out information that can be abused by
an attacker.
PROPFIND
PROPPATCH
MKCOL
COPY
MOVE
LOCK
UNLOCK
These methods are all defined in the WebDAV specification and provide the means for a capable client to manipulate resources on the web server, just as it would manipulate files on a local hard disk. These methods are enabled automatically when the WebDAV Apache module is enabled, and are only needed when you want to provide WebDAV functionality to your users. They should be disabled otherwise.
The <Limit>
directive allows access control to be
performed for known request methods. It is used in the same way as the
<Directory>
directive is to protect
directories. The following example allows only authenticated users to make
changes on the server using the PUT
and
DELETE
methods:
<Limit PUT DELETE> AuthType Basic AuthName "Content Editors Only" AuthUserFile /usr/local/apache/conf/auth.users Require valid-user </Limit>
Since the <Limit>
directive only works for named
request methods, it cannot be used to defend against unknown request methods.
This is where the <LimitExcept>
directive comes in
handy. It does the opposite and only allows anonymous access to requests using
the listed methods, forcing authentication for others. The following example
performs essentially the equivalent functionality as the previous example but
forces authentication for all methods except GET
,
HEAD
, and POST
:
<LimitExcept GET HEAD POST> AuthType Basic AuthName "Content Editors Only" AuthUserFile /usr/local/apache/conf/auth.users Require valid-user </LimitExcept>
Authentication-based and network-based access control can be combined with
help from the
Satisfy
configuration directive. This directive
can have two values:
Any
If more than one access control mechanism is specified in the configuration, allow access if any of them is satisfied.
All
If more than one access control mechanism is specified in the configuration, allow access only if all are satisfied. This is the default setting.
This feature is typically used to relax access control in some specific cases. For example, a frequent requirement is to allow internal users access to a resource without providing passwords, but to require authentication for requests coming in from outside the organization. This is what the following example does:
<Directory /var/www/htdocs> # Network access control Order Deny,Allow Deny from all Allow from 192.168.254. # Authentication AuthType Basic AuthName "Content Editors Only" AuthUserFile /usr/local/apache/conf/auth.users Require valid-user # Allow access if either of the two # requirements above are satisfied Satisfy Any </Directory>
Though most authentication examples only show one authentication module in use at a time, you can configure multiple modules to require authentication for the same resource. This is when the order in which the modules are loaded becomes important. The first authentication module initialized will be the first to verify the user’s credentials. With the default configuration in place, the first module will also be the last. However, some (possibly all) authentication modules support an option to allow subsequent authentication modules to attempt to authenticate the user. Authentication delegation happens if the first module processing the request is unable to authenticate the user. In practice, this occurs if the user is unknown to the module. If the username used for the request is known but the password is incorrect, delegation will not happen.
Each module uses a directive with a different name for this option, but the
convention is to have the names end in “Authoritative.” For example, the
AuthAuthoritative
directive configures
mod_auth
, and the
AuthDBMAuthoritative
directive configures
mod_auth_dbm
.
The term single sign-on (SSO) is used today to refer to several different problems, but it generally refers to a system where people can log in only once and have access to system-wide resources. What people mean when they say SSO depends on the context in which the term is used:
SSO within a single organization
SSO among many related organizations
Internet-wide SSO among unrelated organizations
The term identity management is used to describe the SSO problem from the point of view of those who maintain the system. So what is the problem that makes implementing SSO difficult? Even within a single organization where the IT operations are under the control of a central authority, achieving all business goals by deploying a single system is impossible, no matter how complex the system. In real life, business goals are achieved with the use of many different components. For example, at minimum, every modern organization must enable their users to do the following:
Log on to their workstations
Send email (via an SMTP server)
Read email (via a POP or IMAP server)
In most organizations, this may lead to users having three sets of unrelated credentials, so SSO is not achieved. And I haven’t even started to enumerate all the possibilities. A typical organization will have many web applications (e.g., intranet, project management, content management) and many other network accounts (e.g., FTP servers). As the organization grows, the problem grows exponentially. Maintaining the user accounts and all the passwords becomes a nightmare for system administrators even if users simplify their lives by using a single password for all services. From the security point of view, a lack of central access control leads to complete failure to control access and to be aware of who is doing what with the services. On the other hand, unifying access to resources means that if someone’s account is broken into, the attacker will get access to every resource available to the user. (In a non-SSO system, only one particular service would be compromised.) Imagine only one component that stores passwords insecurely on a local hard drive. Anyone with physical access to the workstation would be able to extract the password from the drive and use it to get access to other resources in the system.
SSO is usually implemented as a central database of user accounts and access privileges (usually one set of credentials per user used for all services). This is easier said than done since many of the components were not designed to play well with each other. In most cases, the SSO problem lies outside the realm of web server administration since many components are not web servers. Even in the web server space, there are many brands (Apache, Microsoft IIS, Java-based web servers) and SSO must work across all of them.
A decent SSO strategy is to use a Lightweight Directory Access Protocol (LDAP) server
to store user accounts. Many web servers and other network servers support the use of
LDAP for access control. Microsoft decided to use Kerberos (http://web.mit.edu/kerberos/www/
) for SSO, but the problem with
Kerberos is that all clients must be Kerberos-aware and most browsers still are not. In
the Apache space, the mod_auth_kerb
module (http://modauthkerb.sourceforge.net
) can be configured to use Basic
authentication to collect credentials from the user and check them against a Kerberos
server, thus making Kerberos work with any browser.
Expanding the scope to include more than one organization brings new problems, and
makes it vastly complex. Microsoft was among the first to attempt to introduce
Internet-wide SSO with their Passport program (now called .Net Passport), described at
http://www.passport.net
. There were many
concerns about their implementation and that Microsoft has a monopoly on the desktop did
not help either. To counter their solution, Sun initiated Project Liberty (http://www.projectliberty.org
) and formed an organization
called the Liberty Alliance to run it. This organization claims to have more than 150
members.
Solving a web-only SSO problem seems to be easier since there are several freely
available solutions. You can find them listed on the home page of the WebISO Working
Group (http://middleware.internet2.edu/webiso/
).
Also of interest is the Shibboleth project (http://shibboleth.internet2.edu
), which aims to establish a
standard way of sharing resources related to inter-organizational access
control.
Implementing a web SSO solution consists of finding and configuring one of the available implementations that suit your requirements. Most web single sign-on solutions work in much the same way:
All web servers are assigned subdomains on the same domain name. For
example, valid names could be app1.apachesecurity.net
, app2.apachesecurity.net
, and login.apachesecurity.net
. This is necessary so cookies
issued by one web server can be received by some other web server. (Cookies
can be reused when the main domain name is the same.)
When a client without a cookie comes to a content server, he is forwarded to the central server for authentication. This way the password is never disclosed to any of the content servers. If the authentication is successful the login server issues a shared authentication cookie, which will be visible to all web servers in the ring. It then forwards the user back to the content server he came from.
When a client with a cookie comes to a content server, the server contacts the login server behind the scenes to verify it. If the cookie is valid, the content server creates a new user session and accepts the user. Alternatively, if the login server has signed the cookie with its private key, the content server can use public-key cryptography to verify the cookie without contacting the login server.
If all you have to worry about is authentication against Apache web servers, a
brilliant little module, called mod_auth_remote
(see http://puggy.symonds.net/~srp/stuff/mod_auth_remote/
),
allows authentication (and authorization) to be delegated from one server to
another. All you need to do is have a central web server where all authentication
will take place (the authentication server) and install
mod_auth_remote
on all other web servers (which I will refer
to as content servers). The approach this module takes is very smart. Not only does
it use Basic authentication to receive credentials from clients, it also uses Basic
authentication to talk to the central web server behind the scenes. What this means
is that there is no need to install anything on the central server, and there are no
new configuration directives to learn. At the central server you are free to use any
authentication module you like. You can even write an application (say, using PHP)
to implement a custom authentication method.
The configuration on a content server looks much like that of any other authentication module:
<Directory /var/www/htdocs/review/> AuthType Basic AuthName "Book Review" AuthRemoteServer sso.apachesecurity.net AuthRemotePort 80 AuthRemoteURL /auth Require valid-user </Directory>
On the central server, you only need to secure one URL. If you need SSO then you
have many servers with many requests; therefore, using
mod_auth_dbm
to speed up the authentication process seems
appropriate here:
<Location /auth> AuthType Basic AuthName "Central Authentication" AuthDBMUserFile /usr/local/apache/conf/auth.users.dat Require valid-user </Location>
At first glance, it looks like this module is only good for authentication, but if you use different remote URLs for different protection realms, the script on the central server can take the URL into account when making the decision as to whether to allow someone access.
There are two weak points:
For every request coming to a content server,
mod_auth_remote
performs a request against the
authentication server. This increases latency and, in environments with
heavy traffic, may create a processing bottleneck.
Communication between servers is not encrypted, so both servers must be on
a secure private network. Since adding SSL support to
mod_auth_remote
is not trivial, chances are it will
not be improved to support it in the near future.
If you have a situation where the authentication server is not on a trusted
network, you could use the Stunnel universal SSL driver (as described in the Appendix A) to secure communication between
mod_auth_remote
and the authentication server. However, if
you recall the discussion from Chapter 4,
establishing an SSL communication channel is the most expensive part of SSL
communication. Without proper SSL support built into
mod_auth_remote
(enabling session reuse), performance will be
inadequate.
Credential caching (actually the absence of it) is a frequent problem with
authentication modules. The new authentication backend (the one from the 2.1 branch)
includes a module mod_authn_cache
(http://mod-auth.sourceforge.net/docs/mod_authn_cache/
) to enable
caching. For Apache 1, similar functionality is provided by
mod_auth_cache
(http://mod-auth-cache.sourceforge.net
).