> Apache Security: Chapter 11. Web Security Assessment


11 Web Security Assessment

The purpose of a web system security assessment is to determine how tight security is. Many deployments get it wrong because the responsibility to ensure a web system’s security is split between administrators and developers. I have seen this many times. Neither party understands the whole system, yet they have responsibility to ensure security.

The way I see it, web security is the responsibility of the system administrator. With the responsibility assigned to one party, the job becomes an order of magnitude easier. If you are a system administrator, think about it this way:


It is your server. That makes you responsible!

To get the job done, you will have to approach the other side, web application development, and understand how it is done. The purpose of Chapter 10 was to give you a solid introduction to web application security issues. The good news is that web security is very interesting! Furthermore, you will not be expected to create secure code, only judge it.

The assessment methodology laid down in this chapter is what I like to call “lightweight web security assessment methodology.“ The word ”lightweight“ is there because the methodology does not cover every detail, especially the programming parts. In an ideal world, web application security should only be assessed by web application security professionals. They need to concern themselves with programming details. I will assume you are not this person, you have many tasks to do, and you do not do web security full time. Have the 20/80 rule in mind: expend 20 percent of the effort to get 80 percent of the benefits.

Though web security professionals can benefit from this book, such professionals will, however, use the book as a starting point and make that 80 percent of additional effort that is expected of them. A complete web security assessment consists of three complementary parts. They should be executed in the following order:

Black-box testing

Testing from the outside, with no knowledge of the system.

White-box testing

Testing from the inside, with full knowledge of the system.

Gray-box testing

Testing that combines the previous two types of testing. Gray-box testing can reflect the situation that might occur when an attacker can obtain the source code for an application (it could have been leaked or is publicly available). In such circumstances, the attacker is likely to set up a copy of the application on a development server and practice attacks there.

Before you continue, look at the Appendix A, where you will find a list of web security tools. Knowing how something works under the covers is important, but testing everything manually takes away too much of your precious time.

In black-box testing, you pretend you are an outsider, and you try to break in. This useful technique simulates the real world. The less you know about the system you are about to investigate, the better. I assume you are doing black-box assessment because you fall into one of these categories:

Unless you belong to the first category, you must ensure you have permission to perform black-box testing. Black-box testing can be treated as hostile and often illegal. If you are doing a favor for a friend, get written permission from someone who has the authority to provide it.

Ask yourself these questions: Who am I pretending to be? Or, what is the starting point of my assessment? The answer depends on the nature of the system you are testing. Here are some choices:

Different starting points require different approaches. A system administrator may have access to the most important servers, but such servers are (hopefully) out of reach of a member of the public. The best way to conduct an assessment is to start with no special privileges and examine what the system looks like from that point of view. Then continue upward, assuming other roles. While doing all this, remember you are doing a web security assessment, which is a small fraction of the subject of information security. Do not cover too much territory, or you will never finish. In your initial assessment, you should focus on the issues mostly under your responsibility.

As you perform the assessment, record everything, and create an information trail. If you know something about the infrastructure beforehand, you must prove you did not use it as part of black-box testing. You can use that knowledge later, as part of white-box testing.

Black-box testing consists of the following steps:

I did not include report writing, but you will have to do that, too. To make your job easier, mark your findings this way:

Information gathering is the first step of every security assessment procedure and is important when performed as part of black-box testing methodology. Working blindly, you will see information available to a potential attacker. Here we assume you are armed only with the name of a web site.

Information gathering can be broadly separated into two categories: passive and active. Passive techniques cannot be detected by the organization being investigated. They involve extracting knowledge about the organization from systems outside the organization. They may include techniques that involve communication with systems run by the organization but only if such techniques are part of their normal operation (e.g., the use of the organization’s DNS servers) and cannot be detected.

Most information gathering techniques are well known, having been used as part of traditional network penetration testing for years. Passive information gathering techniques were covered in the paper written by Gunter Ollmann:

“Passive Information Gathering: The Analysis Of Leaked Network Security Information“ by Gunter Ollmann (NGSS) (http://www.nextgenss.com/papers/NGSJan2004PassiveWP.pdf)

The name of the web site you have been provided will resolve to an IP address, giving you the vital information you need to start with. Depending on what you have been asked to do, you must decide whether you want to gather information about the whole of the organization. If your only target is the public web site, the IP address of the server is all you need. If the target of your research is an application used internally, you will need to expand your search to cover the organization’s internal systems.

The IP address of the public web site may help discover the whole network, but only if the site is internally hosted. For smaller web sites, hosting internally is overkill, so hosting is often outsourced. Your best bet is to exchange email with someone from the organization. Their IP address, possibly the address from an internal network, will be embedded into email headers.

Current domain name registration practices require significant private information to be provided to the public. This information can easily be accessed using the whois service, which is available in many tools, web sites, and on the command line.

There are many whois servers (e.g., one for each registrar), and the important part of finding the information you are looking for is in knowing which server to ask. Normally, whois servers issue redirects when they cannot answer a query, and good tools will follow redirects automatically. When using web-based tools (e.g., http://www.internic.net/whois.html), you will have to perform redirection manually.

Watch what information we can find on O’Reilly (registrar disclaimers have been removed from the output to save space):

$ whois oreilly.com
O'Reilly & Associates
   1005 Gravenstein Hwy., North
   Sebastopol, CA, 95472
   Domain Name: OREILLY.COM

   Administrative Contact -
        DNS Admin -  nic-ac@OREILLY.COM
        O'Reilly & Associates, Inc.
        1005 Gravenstein Highway North
        Sebastopol, CA 95472
        Phone -  707-827-7000
        Fax -  707-823-9746
   Technical Contact -
        technical DNS -  nic-tc@OREILLY.COM
        O'Reilly & Associates
        1005 Gravenstein Highway North
        Sebastopol, CA 95472
        Phone -  707-827-7000
        Fax -  - 707-823-9746

   Record update date -  2004-05-19 07:07:44
   Record create date -  1997-05-27
   Record will expire on -  2005-05-26
   Database last updated on -  2004-06-02 10:33:07 EST
   Domain servers in listed order:

A tool called dig can be used to convert names to IP addresses or do the reverse, convert IP addresses to names (known as reverse lookup). An older tool, nslookup, is still popular and widely deployed.

$ dig oreilly.com any
; <<>> DiG 9.2.1 <<>> oreilly.com any
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30773
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 3, ADDITIONAL: 4
;oreilly.com.                   IN      ANY

oreilly.com.            20923   IN      NS      ns1.sonic.net.
oreilly.com.            20923   IN      NS      ns2.sonic.net.
oreilly.com.            20923   IN      NS      ns.oreilly.com.
oreilly.com.            20924   IN      SOA     ns.oreilly.com. 
2004052001 10800 3600 604800 21600
oreilly.com.            20991   IN      MX      20 smtp2.oreilly.com.

oreilly.com.            20923   IN      NS      ns1.sonic.net.
oreilly.com.            20923   IN      NS      ns2.sonic.net.
oreilly.com.            20923   IN      NS      ns.oreilly.com.
ns1.sonic.net.          105840  IN      A
ns2.sonic.net.          105840  IN      A
ns.oreilly.com.         79648   IN      A
smtp2.oreilly.com.      21011   IN      A

;; Query time: 2 msec
;; WHEN: Wed Jun  2 15:54:00 2004
;; MSG SIZE  rcvd: 262

This type of query reveals basic information about a domain name, such as the name servers and the mail servers. We can gather more information by asking a specific question (e.g., “What is the address of the web site?”):

$ dig www.oreilly.com
;www.oreilly.com.               IN      A
www.oreilly.com.        20269   IN      A
www.oreilly.com.        20269   IN      A

The dig tool converts IP addresses into names when the -x option is used:

$ dig -x
;   IN      PTR
;; ANSWER SECTION: 86381 IN   PTR     www.oreillynet.com.

You can see that this reverse query of the IP address from looking up the domain name oreilly.com gave us a whole new domain name.

A zone transfer is a service where all the information about a particular domain name is transferred from a domain name server. Such services are handy because of the wealth of information they provide. For the same reason, the access to a zone transfer service is often restricted. Zone transfers are generally not used for normal DNS operation, so requests for zone transfers are sometimes logged and treated as signs of preparation for intrusion.

You have probably discovered several IP addresses by now. IP addresses are not sold; they are assigned to organizations by bodies known as Regional Internet Registries (RIRs). The information kept by RIRs is publicly available. Four registries cover address allocation across the globe:

Registries do not work with end users directly. Instead, they delegate large blocks of addresses to providers, who delegate smaller chunks further. In effect, an address can be assigned to multiple parties. In theory, every IP address should be associated with the organization using it. In real life, Internet providers may not update the IP address database. The best you can do is to determine the connectivity provider of an organization.

IP assignment data can be retrieved from any active whois server, and different servers can give different results. In the case below, I just guessed that whois.sonic.net exists. This is what we get for one of O’Reilly’s IP addresses:

$ whois -h whois.sonic.net
[Querying whois.sonic.net]
You asked for
network:IP-Network-Block: -
network:Org-Name:John Irwin
network:IP-Network-Block: -
network:Org-Name:Sonic Hostmaster

Search engines have become a real resource when it comes to information gathering. This is especially true for Google, which has exposed its functionality through an easy-to-use programming interface. Search engines can help you find:

Look at some example Google queries. If you want to find a list of PDF documents available on a site, type a Google search query such as the following:

site:www.modsecurity.org filetype:pdf

To see if a site contains Apache directory listings, type something like this:

site:www.modsecurity.org intitle:"Index of /" "Parent Directory"

To see if it contains any WS_FTP log files, type something like this:

site:www.modsecurity.org inurl:ws_ftp.log

Anyone can register with Google and receive a key that will support up to 1,000 automated searches per day. To learn more about Google APIs, see the following:

Social engineering is arguably the oldest hacking technique, having been used hundreds of years before computers were invented. With social engineering, a small effort can go a long way. Kevin Mitnick (http://en.wikipedia.org/wiki/Kevin_Mitnick) is the most well-known practitioner. Here are some social-engineering approaches:

For more information on social engineering (and funny real-life stories), see:

For each domain name or IP address you acquire, perform a connectivity check using traceroute. Again, I use O’Reilly as an example.

$ traceroute www.oreilly.com
traceroute: Warning: www.oreilly.com has multiple addresses; using 208.201.
traceroute to www.oreilly.com (, 30 hops max, 38 byte packets
 1    gw-prtr-44-a.schlund.net (  0.238 ms
 2    v999.gw-dist-a.bs.ka.schlund.net (  0.373 ms
 3    ge-41.gw-backbone-b.bs.ka.schlund.net (  0.535 ms
 4    pos-80.gw-backbone-b.ffm.schlund.net (  3.210 ms
 5    cr02.frf02.pccwbtn.net (  4.363 ms
 6    pos3-0.cr02.sjo01.pccwbtn.net (  195.201 ms
 7    layer42.ge4-0.4.cr02.sjo01.pccwbtn.net (  187.701 ms
 8    2.fast0-1.gw.equinix-sj.sonic.net (  185.405 ms
 9    fast5-0-0.border.sr.sonic.net (  191.517 ms
10    eth1.dist1-1.sr.sonic.net (  192.652 ms
11    www.oreillynet.com (  190.662 ms

The traceroute output shows the route packets use to travel from your location to the target’s location. The last few lines matter; the last line is the server. On line 10, we see what is most likely a router, connecting the network to the Internet.

Port scanning is an active information-gathering technique. It is viewed as impolite and legally dubious. You should only perform port scanning against your own network or where you have written permission from the target.

The purpose of port scanning is to discover active network devices on a given range of addresses and to analyze each device to discover public services. In the context of web security assessment, you will want to know if a publicly accessible FTP or a database engine is running on the same server. If there is, you may be able to use it as part of your assessment.

The most popular port-scanning tool is Nmap (http://www.insecure.org/nmap/), which is free and useful. It is a command line tool, but a freeware frontend called NmapW is available from Syhunt (http://www.syhunt.com/section.php?id=nmapw). In the remainder of this section, I will demonstrate how Nmap can be used to learn more about running devices. In all examples, the real IP addresses are masked because they belong to real devices.

The process of the discovery of active hosts is called a ping sweep. An attempt is made to ping each IP address and live addresses are reported. Here is a sample run, in which XXX.XXX.XXX.112/28 represents the IP address you would type:

# nmap -sP 

Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
Host (XXX.XXX.XXX.112) seems to be a subnet broadcast address (returned 1
extra pings).
Host (XXX.XXX.XXX.114) appears to be up.
Host (XXX.XXX.XXX.117) appears to be up.
Host (XXX.XXX.XXX.120) appears to be up.
Host (XXX.XXX.XXX.122) appears to be up.
Host (XXX.XXX.XXX.125) appears to be up.
Host (XXX.XXX.XXX.126) appears to be up.
Host (XXX.XXX.XXX.127) seems to be a subnet broadcast address (returned 1
extra pings).
Nmap run completed -- 16 IP addresses (6 hosts up) scanned in 7 seconds

After that, you can proceed to get more information from individual hosts by looking at their TCP ports for active services. The following is sample output from scanning a single host. I have used one of my servers since scanning one of O’Reilly’s servers without a permit would have been inappropriate.

# nmap -sS 

Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
The SYN Stealth Scan took 144 seconds to scan 1657 ports.
Interesting ports on XXX.XXX.XXX.XXX:
(The 1644 ports scanned but not shown below are in state: closed)
21/tcp   open  ftp
22/tcp   open  ssh
23/tcp   open  telnet
25/tcp   open  smtp
53/tcp   open  domain
80/tcp   open  http
110/tcp  open  pop-3
143/tcp  open  imap
443/tcp  open  https
993/tcp  open  imaps
995/tcp  open  pop3s
3306/tcp open  mysql
8080/tcp open  http-proxy
Nmap run completed -- 1 IP address (1 host up) scanned in 157.022 seconds

You can go further if you use Nmap with a -sV switch, in which case it will connect to the ports you specify and attempt to identify the services running on them. In the following example, you can see the results of service analysis when I run Nmap against ports 21, 80, and 8080. It uses the Server header field to identify web servers, which is the reason it incorrectly identified the Apache running on port 80 as a Microsoft Internet Information Server. (I configured my server with a fake server name, as described in Chapter 2, where HTTP fingerprinting for discovering real web server identities is discussed.)

# nmap -sV 
 -P0 -p 21,80,8080
Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
Interesting ports on XXX.XXX.XXX.XXX:
21/tcp   open  ftp     ProFTPD 1.2.9
80/tcp   open  http    Microsoft IIS webserver 5.0
8080/tcp open  http    Apache httpd 2.0.49 ((Unix) DAV/2 PHP/4.3.4)
Nmap run completed -- 1 IP address (1 host up) scanned in 22.065 seconds

Scanning results will usually fall into one of three categories:

If scan results fall into the first or the second category, the server is probably not being closely monitored. The third option shows the presence of people who know what they are doing; additional security measures may be in place.

This is where the real fun begins. At a minimum, you need the following tools:

Optionally, you may choose to perform an assessment through one or more open proxies (by chaining). This makes the test more realistic, but it may disclose sensitive information to others (whoever controls the proxy), so be careful.

We will take these steps:

I have put SSL tests first because, logically, SSL is the first layer of security you encounter. Also, in some rare cases you will encounter a target that requires use of a privately issued client certificate. In such cases, you are unlikely to progress further until you acquire a client certificate. However, you should still attempt to trick the server to give you access without a valid client certificate.

Attempt to access the server using any kind of client certificate (even a certificate you created will do). If that fails, try to access the server using a proper certificate signed by a well-known CA. On a misconfigured SSL server, such a certificate will pass the authentication phase and allow access to the application. (The server is only supposed to accept privately issued certificates.) Sometimes using a valid certificate with a subject admin or Administrator may get you inside (without a password).

Whether or not a client certificate is required, perform the following tests:

After SSL testing (if any), attempt to identify the web server. Start by typing a Telnet command such as the following, substituting the appropriate web site name:

$ telnet www.modsecurity.org 80
Connected to www.modsecurity.org.
Escape character is '^]'.
Host: www.modsecurity.org
HTTP/1.1 200 OK
Date: Tue, 08 Jun 2004 10:54:52 GMT
Server: Microsoft-IIS/5.0
Content-Length: 0

We learn two things from this output:

We turn to httprint for the confirmation of the signature:

$ httprint -P0 -h www.modsecurity.org -s signatures.txt
httprint v0.202 (beta) - web server fingerprinting tool
(c) 2003,2004 net-square solutions pvt. ltd. - see readme.txt
Finger Printing on http://www.modsecurity.org:80/
Derived Signature:
Banner Reported: Microsoft-IIS/5.0
Banner Deduced: Apache/1.3.27
Score: 140
Confidence: 84.34

This confirms the version of the web server that was reported by Netcraft. The confirmation shows the web server had not been upgraded since October 2003, so the chances of web server modules having been upgraded are slim. This is good information to have.

This complete signature gives us many things to work with. From here we can go and examine known vulnerabilities for Apache, PHP, mod_ssl, and OpenSSL. The OpenSSL version (reported by Netcraft as 0.9.6b) looks very old. According to the OpenSSL web site, Version 0.9.6b was released in July 2001. Many serious OpenSSL vulnerabilities have been made public since that time.

A natural way forward from here would be to explore those vulnerabilities further. In this case, however, that would be a waste of time because the version of OpenSSL running on the server is not vulnerable to current attacks. Vendors often create custom branches of software applications that they include in their operating systems. After the split, the included applications are maintained internally, and the version numbers rarely change. When a security problem is discovered, vendors perform what is called a backport: the patch is ported from the current software version (maintained by the original application developers) back to the older release. This only results in a change of the packaging version number, which is typically only visible from the inside. Since there is no way of knowing this from the outside, the only thing to do is to go ahead and check for potential vulnerabilities.

We now know the site likely uses PHP because PHP used to appear in the web server signature. We can confirm our assumption by browsing and looking for a nonstatic part of the site. Pages with the extension .php are likely to be PHP scripts.

Some sites can attempt to hide the technology by hiding extensions. For example, they may associate the extension .html with PHP, making all pages dynamic. Or, if the site is running on a Windows server, associating the extension .asp with PHP may make the application look as if it was implemented in ASP.

Suppose you are not sure what technology is used at a web site. For example, suppose the extension for a file is .asp but you think that ASP is not used. The HTTP response may reveal the truth:

$ telnet www.modsecurity.org 80
Connected to www.modsecurity.org.
Escape character is '^]'.
HEAD /index.asp HTTP/1.0
Host: www.modsecurity.org
HTTP/1.1 200 OK
Date: Tue, 24 Aug 2004 13:54:11 GMT
Server: Microsoft-IIS/5.0
X-Powered-By: PHP/4.3.3-dev
Set-Cookie: PHPSESSID=9d3e167d46dd3ebd81ca12641d82106d; path=/
Connection: close
Content-Type: text/html

There are two clues in the response that tell you this is a PHP-based site. First, the X-Powered-By header includes the PHP version. Second, the site sends a cookie (the Set-Cookie header) whose name is PHP-specific.

Don’t forget a site can utilize more than one technology. For example, CGI scripts are often used even when there is a better technology (such as PHP) available. Examine all parts of the site to discover the technologies used.

Test to see if proxy operations are allowed in the web server. A running proxy service that allows anyone to use it without restriction (a so-called open proxy) represents a big configuration error. To test, connect to the target web server and request a page from a totally different web server. In proxy mode, you are allowed to enter a full hostname in the request (otherwise, hostnames go into the Host header):

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
HEAD http://www.google.com:80/ HTTP/1.0
HTTP/1.1 302 Found
Date: Thu, 11 Nov 2004 14:10:14 GMT
Server: GWS/2.1
Location: http://www.google.de/
Content-Type: text/html; charset=ISO-8859-1
Via: 1.0 www.google.com
Connection: close
Connection closed by foreign host.

If the request succeeds (you get a response, like the response from Google in the example above), you have encountered an open proxy. If you get a 403 response, that could mean the proxy is active but configured not to accept requests from your IP address (which is good). Getting anything else as a response probably means the proxy code is not active. (Web servers sometimes simply respond with a status code 200 and return their default home page.)

The other way to use a proxy is through a CONNECT method, which is designed to handle any type of TCP/IP connection, not just HTTP. This is an example of a successful proxy connection using this method:

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
CONNECT www.google.com:80 HTTP/1.0
HTTP/1.0 200 Connection Established
Proxy-agent: Apache/2.0.49 (Unix)
Host: www.google.com
HTTP/1.0 302 Found
Location: http://www.google.de/
Content-Type: text/html
Server: GWS/2.1
Content-Length: 214
Date: Thu, 11 Nov 2004 14:15:22 GMT
Connection: Keep-Alive
Connection closed by foreign host.

In the first part of the request, you send a CONNECT line telling the proxy server where you want to go. If the CONNECT method is allowed, you can continue typing. Everything you type from this point on goes directly to the target server. Having access to a proxy that is also part of an internal network opens up interesting possibilities. Internal networks usually use nonroutable private space that cannot be reached from the outside. But the proxy, because it is sitting on two addresses simultaneously, can be used as a gateway. Suppose you know that the IP address of a database server is (For example, you may have found this information in an application library file through file disclosure.) There is no way to reach this database server directly but if you ask the proxy nicely it may respond:

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
HTTP/1.0 200 Connection Established
Proxy-agent: Apache/2.0.49 (Unix)

If you think a proxy is there but configured not to respond to your IP address, make a note of it. This is one of those things whose exploitation can be attempted later, for example after a successful entry to a machine that holds an IP address internal to the organization.

The presence of WebDAV may allow file enumeration. You can test this using the WebDAV protocol directly (see Chapter 10) or with a WebDAV client. Cadaver (http://www.webdav.org/cadaver/) is one such client. You should also attempt to upload a file using a PUT method. On a web server that supports it, you may be able to upload and execute a script.

Another frequent configuration problem is the unrestricted availability of web server access logs. The logs, when available, can reveal direct links to other interesting (possibly also unprotected) server resources. Here are some folder names you should try:

  • /logs

  • /stats

  • /weblogs

  • /webstats

If the source of the web application you are assessing is commonly available, then download it for review. (You can install it later if you determine there is a reason to practice attacking it.) Try to find the exact version used at the target site. Then proceed with the following:

The remainder of this section continues with the review under the assumption the source code is unavailable. The principle is the same, except that with the source code you will have much more information to work with.

You have collected enough information about the application to analyze three potentially vulnerable areas in every web application:

Session management

Session management mechanisms, especially those that are homemade, may be vulnerable to one of the many attacks described in Chapter 10. Session tokens should be examined and tested for randomness.


The login page is possibly the most important page in an application, especially if the application is not open for public registration. One way to attack the authentication method is to look for script vulnerabilities as you would for any other page. Perhaps the login page is vulnerable to an SQL injection attack and you could craft a special request to bypass authentication. An alternative is to attempt a brute-force attack. Since HTTP is a stateless protocol, many web applications were not designed to detect multiple authentication failures, which makes them vulnerable to brute-force attacks. Though such attacks leave clearly visible tracks in the error logs, they often go unnoticed because logs are not regularly reviewed. It is trivial to write a custom script (using Perl, for example) to automate brute-force attacks, and most people do just that. You may be able to use a tool such as Hydra (http://thc.org/thc-hydra/) to do the same without any programming.


The authorization subsystem can be tested once you authenticate with the application. The goal of the tests should be to find ways to perform actions that should be beyond your normal user privileges. The ability to do this is known under the term privilege escalation. For example, a frequent authorization problem occurs when a user’s unique identifier is used in a script as a parameter but the script does not check that the identifier belongs to the user who is executing the script. When you hear in the news of users being able to see other users’ banking details online, the cause was probably a problem of this type. This is known as horizontal privilege escalation. Vertical privilege escalation occurs when you are able to perform an action that can normally only be performed by a different class of user altogether. For example, some applications keep the information as to whether the user is a privileged user in a cookie. In such circumstances, any user can become a privileged user simply by forging the cookie.

The final step of black-box vulnerability testing requires the public interface of the application, parameterized pages, to be examined to prove (or disprove) they are susceptible to attacks.

If you have already found some known vulnerabilities, you will need to confirm them, so do that first. The rest of the work is a process of going through the list of all pages, fiddling with the parameters, attempting to break the scripts. There is no single straight path to take. You need to understand web application security well, think on your feet, and combine pieces of information to build toward an exploit.

This process is not covered in detail here. Practice using the material available in this chapter and in Chapter 10. You should follow the links provided throughout both chapters. You may want to try out two web application security learning environments (WebMaven and WebGoat) described in the Appendix A.

Here is a list of the vulnerabilities you may attempt to find in an application. All of these are described in Chapter 10, with the exception of DoS attacks, which are described in Chapter 5.

  • SQL injection attacks

  • XSS attacks

  • File disclosure flaws

  • Source code disclosure flaws

  • Misconfigured access control mechanisms

  • Application logic flaws

  • Command execution attacks

  • Code execution attacks

  • Session management attacks

  • Brute-force attacks

  • Technology-specific flaws

  • Buffer overflow attacks

  • Denial of service attacks

White-box testing is the complete opposite of what we have been doing. The goal of black-box testing was to rely only on your own resources and remain anonymous and unnoticed; here we can access anything anywhere (or so the theory goes).

The key to a successful white-box review is having direct contact and cooperation from developers and people in charge of system maintenance. Software documentation may be nonexistent, so you will need help from these people to understand the environment to the level required for the assessment.

To begin the review, you need the following:

The process of white-box testing consists of the following steps:

At the end of your white-box testing, you should have a review report that documents your methodology, contains review notes, lists notices, warnings, and errors, and offers recommendations for improvement.

The purpose of the architecture review is to pave the way for the actions ahead. A good understanding of the application is essential for a successful review. You should examine the following:

Application security policy

If you are lucky, the application review will begin with a well-defined security policy in hand. If such a thing does not exist (which is common), you will have difficulties defining what “security” means. Where possible, a subproject should be branched out to create the application security policy. Unless you know what needs to be protected, it will not be possible to determine whether the system is secure enough. If a subproject is not a possibility, you will have to sketch a security policy using common sense. This security policy will suffer from being focused too much on technology, and based on your assumptions about the business (which may be incorrect). In any case, you will definitely need something to guide you through the rest of the review.

Application modules

Code review will be the subject of later review steps. At this point, we are only interested in major application modules. A typical example would be an application that consists of a public part and the administrative interfaces.


Applications are built onto libraries that handle common tasks. It is these libraries that interact with the environment and should be the place to look for security problems.


What kind of data is the application storing? How is it stored and where? Is the storage methodology secure enough for that type of data? Authentication information (such as passwords) should be treated as data, too. Here are some common questions: Are passwords stored in plaintext? What about credit card information? Such information should not be stored in plaintext and should not be stored with a method that would allow an attacker to decrypt it on the server.

Interaction with external systems

Which external systems does the application connect to? Most web applications connect to databases. Is the rule of least privilege used?

Further questions to ask yourself at this point are:

In a configuration review, you pay attention to the environment the application resides in. You need to ask yourself the following questions:

Applications typically have their own configuration files. You need to know where such files are stored and familiarize yourself with the options. Make copies of the files for record-keeping purposes.

You will probably be interested in options related to logging and access control. Applications often need their own password to access other parts of the system (e.g., a database), and you should note how those passwords are stored. If the application supports a debugging mode, you need to examine if it is used and how.

Examine how a connection to the database is made. You do not want to see:

The web application should have minimal database privileges. It is acceptable for an application to use one account to access a database and have full privileges over it. It is not acceptable to be able to access more than one database (think about containment). The application privileges should be further restricted wherever possible (e.g., do not allow the account to drop tables, or give it read-only access to parts of the database).

The same concept (“least privilege used”) applies to connections to other types of systems, for example LDAP.

When reviewing file permissions, we are interested in deviations from the default permissions, which are defined as follows:

We examine the potential for information leakage first, by understanding who is allowed read access to application files. If read access is discovered and it cannot be justified, the discovery is marked as an error. We automate the search using the find utility.

Examine if any suid or guid files are present. Such files allow binaries to run as their owner (typically root) and not as the user who is executing them. Their presence (though unlikely) may be very dangerous, so it is worth checking for them:

# find /home/application -type f -and \( -perm -4000 -or -perm -2000 \) | 
xargs ls -adl

The following finds world-readable files, where any system user can read the files and folders:

# find /home/application -perm -4 | xargs ls -adl

The following finds files owned by users other than the application user:

# find /home/application ! -user appuser | xargs ls -adl

The following finds group-readable files, where the group is not the application group:

# find /home/application -perm -40 ! -group appgrp | xargs ls -adl

Allowing users other than the application user write access opens a whole new attack vector and is, therefore, very dangerous. This is especially true for the web server user because it may be possible for an attacker to control the publicly available scripts to create a file under the application tree, leading to code execution compromise.

The following finds world-writable files:

# find /home/application -perm -2 | xargs ls -adl

The following finds files owned by users other than the application user. This includes files owned by the web server user.

# find /home/application ! -user appuser | xargs ls -adl

The following finds group-writable files, in which the group is not the application group (group-writable files are not necessary but there may be a good reason for their existence):

# find /home/application -perm -20 ! -group appgrp | xargs ls -adl

We now go through the file listing, trying to understand the purpose of each file and make a judgment as to whether it is in the right place and whether the permissions are configured properly. Here is advice regarding the different types of files:

At the end of this step, we go back to the file permission report and note as errors any assigned permissions that are not essential for the application to function properly.

The next step is to examine parts of the source code. A full source code review is expensive and often not economical (plus it requires very good understanding of programming and the technology used, an understanding only developers can have). To meet our own goals, we perform a limited review of the code:

Web applications are typically built on top of infrastructure that is designed to handle common web-related tasks. This is the layer where many security issues are found. I say “typically” because the use of libraries is a best practice and not a mandatory activity. Badly designed applications will have the infrastructure tasks handled by the same code that provides the application functionality. It is a bad sign if you cannot identify the following basic building blocks:

Input validation

Input data should never be accessed directly. Individual bits of data should first be validated for type (“Is it a number?”) and meaning (“Birth dates set in the future are not valid”). It is generally accepted that the correct strategy to deal with input is to accept what you know is valid (as opposed to trying to filter out what you know is not).

Output escaping

To prevent XSS attacks, output should be properly escaped. The correct way to perform escaping depends on the context. In the case of HTML files, the metacharacters < (less than), > (greater than), & (ampersand), ’ (single quote), and “ (double quotes) should be replaced with their safe equivalents: &lt;, &gt;, &amp;, &#39;, and &quot;, respectively. (Remember that an HTML file can contain other types of content, such as Javascript, and escaping rules can be different for them.)

Database interaction

Examine how database queries are constructed. The ideal way is through use of prepared statements. Constructing queries through string concatenation is easy to get wrong even if special care is taken.

External system interaction

Examine the interaction with systems other than databases. For example, in the case of LDAP, you want to see the LDAP query properly constructed to avoid the possibility of LDAP injection.

Session management

Examine the session management mechanisms for weaknesses (as described in Chapter 10).

Access control

Examine the code that performs access control. Does it make sense? You are looking to spot dumb mistakes here, such as storing information in cookies or performing authentication only at the gate, which lets those who know the layout of the application straight through.


The application should have an error log and an audit log. It should actively work to log relevant application events (e.g., users logging in, users logging out, users accessing documents). If, as recommended, you did black-box testing, you should look in the log files for your own traces. Learning how to catch yourself will help catch others.

In the third and final phase of security assessment, the black-box testing procedures are executed again but this time using the knowledge acquired in the white-box testing phase. This is similar to the type of testing an attacker might do when he has access to the source code, but here you have a slight advantage because you know the layout of the files on disk, the configuration, and changes made to the original source code (if any). This time you are also allowed to have access to the target system while you are testing it from the outside. For example, you can look at the application logs to discover why some of your attacks are failing.

The gray-box testing phase is the time to confirm or deny the assumptions about vulnerabilities you made in the black-box phase. For example, maybe you thought Apache was vulnerable to a particular problem but you did not want to try to exploit it at that time. Looking at it from the inside, it is much easier and quicker to determine if your assumption was correct.