> Apache Security: Chapter 9. Infrastructure


Contents
Previous
Next

9 Infrastructure

In this chapter, we take a step back from a single Apache server to discuss the infrastructure and the architecture of the system as a whole. Topics include:

We want to make each element of the infrastructure as secure as it can be and design it to work securely as if the others did not exist. We must do the following:

Some sections of this chapter (the ones on host security and network security) discuss issues that not only relate to Apache, but also could be applied to running any service. I will mention them briefly so you know you need to take care of them. If you wish to explore these other issues, I recommend of the following books:

Network Security Hacks is particularly useful because it is concise and allows you to find an answer quickly. If you need to do something, you look up the hack in the table of contents, and a couple of pages later you have the problem solved.

Choosing a correct application isolation strategy can have a significant effect on a project’s security. Ideally, a strategy will be selected early in the project’s life, as a joint decision of the administration and the development team. Delaying the decision may result in the inability to deploy certain configurations.

Isolating application modules from each other helps reduce damage caused by a break-in. The idea is not to put all your eggs into one basket. First, you need to determine whether there is room for isolation. When separating the application into individual logical modules, you need to determine whether there are modules that are accessed by only one class of user. Each module should be separated from the rest of the application to have its own:

This configuration will allow for maximal security and maximal configuration flexibility. If you cannot accommodate such separation initially, due to budget constraints, you should plan for it anyway and upgrade the system when the opportunity arises.

To argue the case for isolation, consider the situation where a company information system consists of the following modules:

Four groups of users are each using their own application module and, what is more important, the company has four different levels of risk. The public application is the one carrying the largest risk. If you isolate application modules, a potential intrusion through the public portion of the application will not spill into the rest of the company (servers, databases, LDAP servers, etc.).

Here is the full range of solutions for isolation, given in the order of decreasing desirability from a security standpoint:

As previously mentioned, having many physical servers for security purposes can be costly. In between a full separate physical server solution and a chroot sits a third option: virtual servers.

Virtual servers are a software-based solution to the problem. Only one physical server exists, but it hosts many virtual servers. Each virtual server behaves like a less-powerful standalone server. There are many commercial options for virtual servers and two open source approaches:

Both solutions offer similar functionality, yet they take different paths to get there. User Mode Linux is a full emulation of a system, and each virtual server has its own kernel running and its own process list, memory allocation, etc. Virtual servers on a Linux VServer share the same kernel, so virtual server isolation relies more on heavy kernel patching.

Both solutions appear to be production ready. I have used User Mode Linux with good results. Many companies offer virtual-server hosting using one of these two solutions. The drawback is that both solutions require heavy kernel patching to make them work, and you will need to spend a lot of time to get them up and running. Note: User Mode Linux has been incorporated into the SUSE Enterprise Server family since Version 9.

On the plus side, consider the use of virtual servers in environments where there are limited hardware resources available, with many projects requiring loose permissions on the server. Giving each project a virtual server would solve the problem without jeopardizing the security of the system as a whole.

Going backward from applications, host security is the first layer we encounter. Though we will continue to build additional defenses, the host must be secured as if no additional protection existed. (This is a recurring theme in this book.)

After the operating system installation, you will discover many shell accounts active in the /etc/passwd file. For example, each database engine comes with its own user account. Few of these accounts are needed. Review every active account and cancel the shell access of each account not needed for server operation. To do this, replace the shell specified for the user in /etc/password with /bin/false. Here is a replacement example:

ivanr:x:506:506::/home/users/ivanr:/bin/bash

with:

ivanr:x:506:506::/home/users/ivanr:/bin/false

Restrict whom you provide shell access. Users who are not security conscious represent a threat. Work to provide some other way for them to do their jobs without the shell access. Most users only need to have a way to transport files and are quite happy using FTP for that. (Unfortunately, FTP sends credentials in plaintext, making it easy to break in.)

Finally, secure the entry point for interactive access by disabling insecure plaintext protocols such as Telnet, leaving only secure shell (SSH) as a means for host access. Configure SSH to refuse direct root logins, by setting PermitRootLogin to no in the sshd_config file. Otherwise, in an environment where the root password is shared among many administrators, you may not be able to tell who was logged on at a specific time.

If possible, do not allow users to use a mixture of plaintext (insecure) and encrypted (secure) services. For example, in the case of the FTP protocol, deploy Secure FTP (SFTP) where possible. If you absolutely must use a plaintext protocol and some of the users have shells, consider opening two accounts for each such user: one account for use with secure services and the other for use with insecure services. Interactive logging should be forbidden for the latter; that way a compromise of the account is less likely to lead to an attacker gaining a shell on the system.

Every open port on a host represents an entry point for an attacker. Closing as many ports as possible increases the security of a host. Operating systems often have many services enabled by default. Use the netstat tool on the command line to retrieve a complete listing of active TCP and UDP ports on the server:

# netstat -nlp
                                                                 PID/
Proto Recv-Q Send-Q Local Address   Foreign Address   State      Program name
tcp        0      0 0.0.0.0:3306    0.0.0.0:*         LISTEN     963/mysqld
tcp        0      0 0.0.0.0:110     0.0.0.0:*         LISTEN     834/xinetd
tcp        0      0 0.0.0.0:143     0.0.0.0:*         LISTEN     834/xinetd
tcp        0      0 0.0.0.0:80      0.0.0.0:*         LISTEN     13566/httpd
tcp        0      0 0.0.0.0:21      0.0.0.0:*         LISTEN     1060/proftpd
tcp        0      0 0.0.0.0:22      0.0.0.0:*         LISTEN     -
tcp        0      0 0.0.0.0:23      0.0.0.0:*         LISTEN     834/xinetd
tcp        0      0 0.0.0.0:25      0.0.0.0:*         LISTEN     979/sendmail
udp        0      0 0.0.0.0:514     0.0.0.0:*                    650/syslogd

Now that you know which services are running, turn off the ones you do not need. (You will probably want port 22 open so you can continue to access the server.) Turning services off permanently is a two-step process. First you need to turn the running instance off:

# /etc/init.d/proftpd stop

Then you need to stop the service from starting the next time the server boots. The procedure depends on the operating system. You can look in two places: on Unix systems a service is started at boot time, in which case it is permanently active; or it is started on demand, through the Internet services daemon (inetd or xinetd).

Uninstall any software you do not need. For example, you will probably not need an X Window system on a web server, or the KDE, GNOME, and related programs.

Though desktop-related programs are mostly benign, you should uninstall some of the more dangerous tools such as compilers, network monitoring tools, and network assessment tools. In a properly run environment, a compiler on a host is not needed. Provided you standardize on an operating system, it is best to do development and compilation on a single development system and to copy the binaries (e.g., Apache) to the production systems from there.

It is important to gather the information you can use to monitor the system or to analyze events after an intrusion takes place.

Here are the types of information that should be gathered:

System statistics

Having detailed statistics of the behavior of the server is very important. In a complex network environment, a network management system (NMS) collects vital system statistics via the SNMP protocol, stores them, and acts when thresholds are reached. Having some form of an NMS is recommended even with smaller systems; if you can’t justify such an activity, the systat package will probably serve the purpose. This package consists of several binaries executed by cron to probe system information at regular intervals, storing data in binary format. The sar binary is used to inspect the binary log and produce reports. Learn more about sar and its switches; the amount of data you can get out if it is incredible. (Hint: try the -A switch.)

Integrity validation

Integrity validation software—also often referred to as host intrusion detection software—monitors files on the server and alerts the administrator (usually in the form of a daily or weekly report) whenever a change takes place. It is the only mechanism to detect a stealthy intruder. The most robust integrity validation software is Tripwire (http://www.tripwire.org). It uses public-key cryptography to prevent signature database tampering. Some integrity validation software is absolutely necessary for every server. Even a simple approach such as using the md5sum tool (which computes an MD5 hash for each file) will work, provided the resulting hashes are kept on a different computer or on a read-only media.

Process accounting

Process accounting enables you to log every command executed on a server (see Chapter 5).

Automatic log analysis

Except maybe in the first couple of days after installing your shiny new server, you will not review your logs manually. Therefore you must find some other way to keep an eye on events. Logwatch (http://www.logwatch.org) looks at the log files and produces an activity report on a regular basis (e.g., once a day). It is a modular Perl script, and it comes preinstalled on Red Hat systems. It is great to summarize what has been going on, and unusual events become easy to spot. If you want something to work in real time, try Swatch (http://swatch.sourceforge.net). Swatch and other log analysis programs are discussed in Chapter 8.

Though a network firewall is necessary for every network, individual hosts should have their own firewalls for the following reasons:

On Linux, a host-based firewall is configured through the Netfilter kernel module (http://www.netfilter.org). In the user space, the binary used to configure the firewall is iptables. As you will see, it pays off to spend some time learning how Netfilter works. On a BSD system, ipfw and ipfilter can be used to configure a host-based firewall. Windows server systems have a similar functionality but it is configured through a graphical user interface.

Whenever you design a firewall, follow the basic rules:

What follows is an example iptables firewall script for a dedicated server. It assumes the server occupies a single IP address (192.168.1.99), and the office occupies a fixed address range 192.168.2.0/24. It is easy to follow and to modify to suit other purposes. Your actual script should contain the IP addresses appropriate for your situation. For example, if you do not have a static IP address range in the office, you may need to keep the SSH port open to everyone; in that case, you do not need to define the address range in the script.

#!/bin/sh
   
IPT=/sbin/iptables
# IP address of this machine
ME=192.168.1.99
# IP range of the office network
OFFICE=192.168.2.0/24
   
# flush existing rules
$IPT -F
   
# accept traffic from this machine
$IPT -A INPUT -i lo -j ACCEPT
$IPT -A INPUT -s $ME -j ACCEPT
   
# allow access to the HTTP and HTTPS ports
$IPT -A INPUT -m state --state NEW -d $ME -p tcp --dport 80 -j ACCEPT
$IPT -A INPUT -m state --state NEW -d $ME -p tcp --dport 443 -j ACCEPT
   
# allow SSH access from the office only
$IPT -A INPUT -m state --state NEW -s $OFFICE -d $ME -p tcp --dport 22 
-j ACCEPT
# To allow SSH access from anywhere, comment the line above and uncomment
# the line below if you don't have a static IP address range to use
# in the office
# $IPT -A INPUT -m state --state NEW -d $ME -p tcp --dport 22 -j ACCEPT
   
# allow related traffic
$IPT -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
   
# log and deny everything else
$IPT -A INPUT -j LOG
$IPT -A INPUT -j DROP

As you can see, installing a host firewall can be very easy to do, yet it provides excellent protection. As an idea, you may consider logging the unrelated outgoing traffic. On a dedicated server such traffic may represent a sign of an intrusion. To use this technique, you need to be able to tell what constitutes normal outgoing traffic. For example, the server may have been configured to download operating system updates automatically from the vendor’s web site. This is an example of normal (and required) outgoing traffic.

For systems intended to be highly secure, you can make that final step and patch the kernel with one of the specialized hardening patches:

These patches will enhance the kernel in various ways. They can:

I mention grsecurity’s advanced kernel-auditing capabilities in Chapter 5.

Some operating systems have kernel-hardening features built into them by default. For example, Gentoo supports grsecurity as an option, while the Fedora developers prefer SELinux. Most systems do not have these features; if they are important to you consider using one of the operating systems that support them. Such a decision will save you a lot of time. Otherwise, you will have to patch the kernel yourself. The biggest drawback of using a kernel patch is that you must start with a vanilla kernel, then patch and compile it every time you need to upgrade. If this is done without a clear security benefit, then the kernel patches can be a great waste of time. Playing with mandatory access control, in particular, takes a lot of time and nerves to get right.

To learn more about kernel hardening, see the following:

  • “Minimizing Privileges” by David A. Wheeler (http://www-106.ibm.com/developerworks/linux/library/l-sppriv.html)

  • “Linux Kernel Hardening” by Taylor Merry (http://www.sans.org/rr/papers/32/1294.pdf)

Another step backward from host security and we encounter network security. We will consider the network design a little bit later. For the moment, I will discuss issues that need to be considered in this context:

A central firewall is mandatory. The remaining three steps are highly recommended but not strictly necessary.

As the number of servers grows, the ability to manually follow what is happening on each individual server decreases. The “standard” growth path for most administrators is to use host-based monitoring tools or scripts and use email messages to be notified of unusual events. If you follow this path, you will soon discover you are getting too many emails and you still don’t know what is happening and where.

Implementing a centralized logging system is one of the steps toward a solution for this problem. Having the logs at one location ensures you are seeing everything. As an additional benefit, centralization enhances the overall security of the system: if a single host on the network is breached the attacker may attempt to modify the logs to hide her tracks. This is more difficult when logs are duplicated on a central log server. Here are my recommendations:

You will find that the syslog daemon installed by default on most distributions is not adequate for advanced configurations: it only offers UDP as a means of transport and does not offer flexible message routing. I recommend a modern syslog daemon such as syslog-ng (http://www.balabit.com/products/syslog_ng/). Here are its main advantages over the stock syslog daemon:

If you decide to implement central logging, that dedicated host can be used to introduce additional security to the system by implementing network monitoring or running an intrusion detection system. Intrusion detection is just another form of logging.

Network monitoring systems are passive tools whose purpose is to observe and record information. Here are two tools:

Argus is easy to install, easy to run, and produces very compact logs. I highly recommend that you install it, even if it runs on the same system as your main (and only) web server. For in-depth coverage of this subject, I recommend Richard Bejtlich’s book The Tao of Network Security Monitoring: Beyond Intrusion Detection (Addison-Wesley).

Intrusion detection system (IDS) software observes and reacts to traffic-creating events. Many commercial and open source IDS tools are available. From the open source community, the following two are especially worth mentioning:

Snort is an example of a network intrusion detection system (NIDS) because it monitors the network. Prelude is a hybrid IDS; it monitors the network (potentially using Snort as a sensor), but it also supports events coming from other types of sensors. Using hybrid IDS is a step toward a complete security solution.

The term intrusion prevention system (IPS) was coined to denote a system capable of detecting and preventing intrusion. IPS systems can, therefore, offer better results provided their detection mechanisms are reliable, avoiding the refusal of legitimate traffic.

Since NIDSs are generic tools designed to monitor any network traffic, it is natural to attempt to use them for HTTP traffic as well. Though they work, the results are not completely satisfying:

These problems have led to the creation of specialized network appliances designed to work as HTTP firewalls. Designed from the ground up with HTTP in mind, and with enough processing power, the two problems mentioned are neutralized. Several such systems are:

The terms web application firewall and application gateway are often used to define systems that provide web application protection. Such systems are not necessarily embedded in hardware only. An alternative approach is to embed a software module into the web server and to protect web applications from there. This approach also solves the two problems mentioned earlier: there is no problem with SSL because the module acts after the SSL traffic is decrypted and such modules typically operate on whole requests and responses, giving access to all of the features of HTTP.

In the open source world, mod_security is an embeddable web application protection engine. It works as an Apache module. Installed together with mod_proxy and other supporting modules on a separate network device in the reverse proxy mode of operation, it creates an open source application gateway appliance. The setup of a reverse proxy will be covered in the Section 9.4. Web intrusion detection and mod_security will be covered in Chapter 12.

A proxy is an intermediary communication device. The term “proxy” commonly refers to a forward proxy, which is a gateway device that fetches web traffic on behalf of client devices. We are more interested in the opposite type of proxy. Reverse proxies are gateway devices that isolate servers from the Web and accept traffic on their behalf.

There are two reasons to add a reverse proxy to the network: security and performance. The benefits coming from reverse proxies stem from the concept of centralization: by having a single point of entry for the HTTP traffic, we are increasing our monitoring and controlling capabilities. Therefore, the larger the network, the more benefits we will have. Here are the advantages:

There are some disadvantages as well:

The use of Apache 2 is recommended in reverse proxy systems. The new version of the mod_proxy module offers better support for standards and conforms to the HTTP/1.1 specification. The Apache 2 architecture introduces filters, which allow many modules to look at the content (both on the input and the output) simultaneously.

The following modules will be needed:

You are unlikely to need mod_proxy_connect, which is needed for forward proxy operation only.

Compile the web server as usual. Whenever the proxy module is used within a server, turn off the forward proxying operation:

# do not work as forward proxy
ProxyRequests Off

Not turning it off is a frequent error that creates an open proxy out of a web server, allowing anyone to go through it to reach any other system the web server can reach. Spammers will want to use it to send spam to the Internet, and attackers will use the open proxy to reach the internal network.

Two directives are needed to activate the proxy:

ProxyPass / http://web.internal.com/
ProxyPassReverse / http://web.internal.com/

The first directive instructs the proxy to forward all requests it receives to the internal server web.internal.com and to forward the responses back to the client. So, when someone types the proxy address in the browser, she will be served the content from the internal web server (web.internal.com) without having to know about it or access it directly.

The same applies to the internal server. It is not aware that all requests are executed through the proxy. To it the proxy is just another client. During normal operation, the internal server will use its real name (web.internal.com) in a response. If such a response goes to the client unmodified, the real name of the internal server will be revealed. The client will also try to use the real name for the subsequent requests, but that will probably fail because the internal name is hidden from the public and a firewall prevents access to the internal server.

This is where the second directive comes in. It instructs the proxy server to observe response headers, modify them to hide the internal information, and respond to its clients with responses that make sense to them.

Another way to use the reverse proxy is through mod_rewrite. The following would have the same effect as the ProxyPass directive above. Note the use of the P (proxy throughput) and L (last rewrite directive) flags.

RewriteRule ^(.+)$ http://web.internal.com/$1 [P,L]

At this point, one problem remains: applications often generate and embed absolute links into HTML pages. But unlike the response header problem that gets handled by Apache, absolute links in pages are left unmodified. Again, this reveals the real name of the internal server to its clients. This problem cannot be solved with standard Apache but with the help of a third-party module, mod_proxy_html, which is maintained by Nick Kew. It can be downloaded from http://apache.webthing.com/mod_proxy_html/. It requires libxml2, which can be found at http://xmlsoft.org. (Note: the author warns against using libxml2 versions lower than 2.5.10.)

To compile the module, I had to pass the compiler the path to libxml2:

# apxs -Wc,-I/usr/include/libxml2 -cia mod_proxy_html.c

For the same reason, in the httpd.conf configuration file, you have to load the libxml2 dynamic library before attempting to load the mod_proxy_html module:

LoadFile /usr/lib/libxml2.so
LoadModule proxy_html_module modules/mod_proxy_html.so

The module looks into every HTML page, searches for absolute links referencing the internal server, and replaces them with links referencing the proxy. To activate this behavior, add the following to the configuration file:

# activate mod_proxy_html
SetOutputFilter proxy-html
   
# prevent content compression in backend operation
RequestHeader unset Accept-Encoding
   
# replace references to the internal server
# with references to this proxy
ProxyHTMLURLMap http://web.internal.com/ /

You may be wondering about the directive to prevent compression. If the client supports content decompression, it will state that with an appropriate Accept-Encoding header:

Accept-Encoding: gzip,deflate

If that happens, the backend server will respond with a compressed response, but mod_proxy_html does not know how to handle compressed content and it fails to do its job. By removing the header from the request, we force plaintext communication between the reverse proxy and the backend server. This is not a problem. Chances are both servers will share a fast local network where compression would not work to enhance performance.

Read Nick’s excellent article published in Apache Week, in which he gives more tips and tricks for reverse proxying:

“Running a Reverse Proxy With Apache” by Nick Kew (http://www.apacheweek.com/features/reverseproxies)

There is an unavoidable performance penalty when using mod_proxy_html. To avoid unnecessary slow down, only activate this module when a problem with absolute links needs to be solved.

A well-designed network is the basis for all other security efforts. Though we are dealing with Apache security here, our main subject alone is insufficient. Your goal is to implement a switched, modular network where services of different risk are isolated into different network segments.

Figure 9-1 illustrates a classic demilitarized zone (DMZ) network architecture.

This architecture assumes you have a collection of backend servers to protect and also assumes danger comes from one direction only, which is the Internet. A third zone, DMZ, is created to work as an intermediary between the danger outside and the assets inside.

Ideally, each service should be isolated onto its own server. When circumstances make this impossible (e.g., financial reasons), try not to combine services of different risk levels. For example, combining a public email server with an internal web server is a bad idea. If a service is not meant to be used directly from the outside, moving it to a separate server would allow you to move the service out of the DMZ and into the internal LAN.

For complex installations, it may be justifiable to create classes of users. For example, a typical business system will operate with:

With proper planning, each of these user classes can have its own DMZ, and each DMZ will have different privileges with regards to access to the internal LAN. Multiple DMZs allow different classes of users to access the system via different means. To participate in high-risk systems, partners may be required to access the network via a virtual private network (VPN).

To continue to refine the network design, there are four paths from here:

So far I have discussed the mechanics of reverse proxy operation. I am now going to describe usage patterns to illustrate how and why you might use the various types of reverse proxies on your network. Reverse proxies are among the most useful tools in HTTP network design. None of their benefits are HTTP-specific—it is just that HTTP is what we are interested in. Other protocols benefit from the same patterns I am about to describe.

The nature of patterns is to isolate one way of doing things. In real life, you may have all four patterns discussed below combined onto the same physical server.

For additional coverage of this topic, consider the following resources:

The configuration of an integration reverse proxy, illustrated in Figure 9-3, is similar to that of a front door pattern, but the purpose is completely different. The purpose of the integration reverse proxy is to integrate multiple application parts (often on different servers) into one unique application space. There are many reasons for doing this:

Basically, this pattern allows a messy configuration that no one wants to touch to be transformed into a well-organized, secured, and easy-to-maintain system.

There are two ways to use this pattern. The obvious way is to hide the internal workings of a system and present clients with a single server. But there is also a great benefit of having a special internal integration proxy to sort out the mess inside.

In recent years there has been a lot of talk about web services. Systems are increasingly using port 80 and the HTTP protocol for internal communication as a new implementation of remote procedure calling (RPC). Technologies such as REST, XML-RPC, and SOAP (given in the ascending level of complexity) belong to this category.

Allowing internal systems to communicate directly results in a system where interaction is not controlled, logged, or monitored. The integration reverse proxy pattern brings order.

A protection reverse proxy, illustrated in Figure 9-4, greatly enhances the security of a system:

  • Internal servers are no longer exposed to the outside world. The pattern introduces another layer of protection for vulnerable web servers and operating systems.

  • Network topology remains hidden from the outside world.

  • Internal servers can be moved out of the demilitarized zone.

  • Vulnerable applications can be protected by putting an HTTP firewall on the reverse proxy.

The protection reverse proxy is useful when you must maintain an insecure, proprietary, or legacy system. Direct exposure to the outside world could lead to a compromise, but putting such systems behind a reverse proxy would extend their lifetime and allow secure operation. A protection reverse proxy can also actually be useful for all types of web applications since they can benefit from having an HTTP firewall in place, combined with full traffic logging for auditing purposes.

There are three reasons why you would concern yourself with advanced HTTP architectures:

It would be beneficial to define relevant terms first (this is where Wikipedia, http://www.wikipedia.org, becomes useful):

We will cover the advanced architectures as a journey from a single-server system to a scalable and highly available system. The application part of the system should be considered during the network design phase. There are too many application-dependent issues to leave them out of this phase. Consult the following for more information about application issues related to scalability and availability:

The following sections describe various advanced architectures.

At the bottom of the scale we have a single-server system. It is great if such a system works for you. Introducing scalability and increasing availability of a system involves hard work, and it is usually done under pressure and with (financial) constraints.

So, if you are having problems with that server, you should first look into ways to enhance the system without changing it too much:

If you have done all of this and you are still on the edge of the server’s capabilities, then look into replacing the server with a more powerful machine. This is an easy step because hardware continues to improve and drop in price.

The approach I have just described is not very scalable but is adequate for many installations that will never grow to require more than one machine. There remains a problem with availability—none of this will increase the availability of the system.

A cluster of servers (see Figure 9-7) provides scalability, high availability, and efficient resource utilization (load balancing). First, we need to create a cluster. An ideal cluster consists of N identical servers, called (cluster) nodes. Each node is capable of serving a request equally well. To create consistency at the storage level, one of the following strategies can be used:

  • Install nodes from a single image and automate maintenance afterward.

  • Boot nodes from the network. (Such nodes are referred to as diskless nodes.)

  • Use shared storage. (This can be a useful thing to do, but it can be expensive and it is a central point of failure.)

  • Replicate content (e.g., using rsync).

  • Put everything into a database (optionally clustering the database, too).

After creating a cluster, we need to distribute requests among cluster nodes. The simplest approach is to use a feature called DNS Round Robin (DNSRR). Each node is given a real IP address, and all IP addresses are associated with the same domain name. Before a client can make a request, it must resolve the domain name of the cluster to an IP address. The following query illustrates what happens during the resolution process. This query returns all IP addresses associated with the specified domain name:

$ dig www.cnn.com
   
; <<>> DiG 9.2.1 <<>> www.cnn.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38792
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 4, ADDITIONAL: 4
   
;; QUESTION SECTION:
;www.cnn.com.                   IN      A
   
;; ANSWER SECTION:
www.cnn.com.            285     IN      CNAME   cnn.com.
cnn.com.                285     IN      A       64.236.16.20
cnn.com.                285     IN      A       64.236.16.52
cnn.com.                285     IN      A       64.236.16.84
cnn.com.                285     IN      A       64.236.16.116
cnn.com.                285     IN      A       64.236.24.4
cnn.com.                285     IN      A       64.236.24.12
cnn.com.                285     IN      A       64.236.24.20
cnn.com.                285     IN      A       64.236.24.28

Here you can see the domain name www.cnn.com resolves to eight different IP addresses. If you repeat the query several times, you will notice the order in which the IP addresses appear changes every time. Hence the name “round robin.” Similarly, during domain name resolution, each client gets a “random” IP address from the list. This leads to the total system load being distributed evenly across all cluster nodes.

But what happens when a cluster node fails? The clients working with the node have already resolved the name, and they will not repeat the process. For them, the site appears to be down though other nodes in the cluster are working.

One solution for this problem is to dynamically modify the list of IP addresses in short intervals, while simultaneously shortening the time-to-live (TTL, the period during which DNS query results are to be considered valid).

If you look at the results of the query for www.cnn.com, the TTL is set to 285 seconds. In fact, CNN domain name servers regenerate the list every five minutes. When a node fails, its IP address will not appear on the list until it recovers. In that case, one portion of all clients will experience a downtime of a couple of minutes.

This process can be automated with the help of Lbnamed, a load-balancing name server written in Perl (http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html).

Another solution is to keep the DNS static but implement a fault-tolerant cluster of nodes using Wackamole (http://www.backhand.org/wackamole/). Wackamole works in a peer-to-peer fashion and ensures that all IP addresses in a cluster remain active. When a node breaks down, Wackamole detects the event and instructs one of the remaining nodes to assume the lost IP address.

The DNSRR clustering architecture works quite well, especially when Wackamole is used. However, a serious drawback is that there is no place to put the central security reverse proxy to work as an application gateway.

A different approach to solving the DNSRR node failure problem is to introduce a central management node to the cluster (Figure 9-8). In this configuration, cluster nodes are given private addresses. The system as a whole has only one IP address, which is assigned to the management node. The management node will do the following:

  • Monitor cluster nodes for failure

  • Measure utilization of cluster nodes

  • Distribute incoming requests

To avoid a central point of failure, the management node itself is clustered, usually in a failover mode with an identical copy of itself (though you can use a DNSRR solution with an IP address for each management node).

This is a classic high-availability/load-balancing architecture. Distribution is often performed on the TCP/IP level so the cluster can work for any protocol, including HTTP (though all solutions offer various HTTP extensions). It is easy, well understood, and widely deployed. The management nodes are usually off-the-shelf products, often quite expensive but quite capable, too. These products include:

An open source alternative for Linux is the Linux Virtual Server project (http://www.linuxvirtualserver.org). It provides tools to create a high availability cluster (or management node) out of cheap commodity hardware.

Reverse proxy clusters are the same in principle as management node clusters except that they work on the HTTP level and, therefore, only for the HTTP protocol. This type of proxy is of great interest to us because it is the only architecture that allows HTTP firewalling. Commercial solutions that work as proxies are available, but here we will discuss an open source solution based around Apache.

Ralf S. Engelschall, the man behind mod_rewrite, was the first to describe how reverse proxy load balancing can be achieved using mod_rewrite:

“Website Balancing, Practical approaches to distributing HTTP traffic” by Ralf S. Engelschall (http://www.webtechniques.com/archives/1998/05/engelschall/)

First, create a script that will create a list of available cluster nodes and store it in a file servers.txt:

# a list of servers to load balance
www www1|www2|www3|www4

The script should be executed every few minutes to regenerate the list. Then configure mod_rewrite to use the list to redirect incoming requests through the internal proxy:

RewriteMap servers rnd:/usr/local/apache/conf/servers.txt
RewriteRule ^/(.+)$ ${servers:www} [P,L]

In this configuration, mod_rewrite is smart enough to detect when the file servers.txt changes and to reload the list. You can configure mod_rewrite to start an external daemon script and communicate with it in real time (which would allow us to use a better algorithm for load distribution).

With only a couple of additional lines added to the httpd.conf configuration file, we have created a reverse proxy. We can proceed to add features to it by adding other modules (mod_ssl, mod_deflate, mod_cache, mod_security) to the mix. The reverse proxy itself must be highly available, using one of the two methods we have described. Wackamole peer-to-peer clustering is a good choice because it allows the reverse proxy cluster to consist of any number of nodes.

An alternative to using mod_rewrite for load balancing, but only for the Apache 1.x branch, is to use mod_backhand (http://www.backhand.org/mod_backhand/). While load balancing in mod_rewrite is a hack, mod_backhand was specifically written with this purpose in mind.

This module does essentially the same thing as mod_rewrite, but it also automates the load balancing part. An instance of mod_backhand runs on every backend server and communicates with other mod_backhand instances. This allows the reverse proxy to make an educated judgment as to which of the backend servers should be handed the request to process. With mod_backhand, you can easily have a cluster of very different machines.

Only a few changes to the Apache configuration are required. To configure a mod_backhand instance to send status to other instances, add the following (replacing the specified IP addresses with ones suitable for your situation):

# the folder for interprocess communication
UnixSocketDir /usr/local/apache/backhand
# multicast data to the local network
MulticastStats 192.168.1.255:4445
# accept resource information from all hosts in the local network
AcceptStatus 192.168.1.0/24

To configure the reverse proxy to send requests to backend servers, you need to feed mod_backhand a list of candidacy functions. Candidacy functions process the server list in an attempt to determine which one server is the best candidate for the job:

# byAge eliminates servers that have not
# reported in the last 20 seconds
Backhand byAge
# byLoad reorders the server list from the
# least loaded to the most loaded
Backhand byLoad

Finally, on the proxy, you can configure a handler to access the mod_backhand status page:

<Location /backhand/>
    SetHandler backhand-handler
</Location>