11 Web Security Assessment

The purpose of a web system security assessment is to determine how tight security is. Many deployments get it wrong because the responsibility to ensure a web system’s security is split between administrators and developers. I have seen this many times. Neither party understands the whole system, yet they have responsibility to ensure security.

The way I see it, web security is the responsibility of the system administrator. With the responsibility assigned to one party, the job becomes an order of magnitude easier. If you are a system administrator, think about it this way:

Note

It is your server. That makes you responsible!

To get the job done, you will have to approach the other side, web application development, and understand how it is done. The purpose of Chapter 10 was to give you a solid introduction to web application security issues. The good news is that web security is very interesting! Furthermore, you will not be expected to create secure code, only judge it.

The assessment methodology laid down in this chapter is what I like to call “lightweight web security assessment methodology.“ The word ”lightweight“ is there because the methodology does not cover every detail, especially the programming parts. In an ideal world, web application security should only be assessed by web application security professionals. They need to concern themselves with programming details. I will assume you are not this person, you have many tasks to do, and you do not do web security full time. Have the 20/80 rule in mind: expend 20 percent of the effort to get 80 percent of the benefits.

Though web security professionals can benefit from this book, such professionals will, however, use the book as a starting point and make that 80 percent of additional effort that is expected of them. A complete web security assessment consists of three complementary parts. They should be executed in the following order:

Black-box testing: Testing from the outside, with no knowledge of the system.
White-box testing: Testing from the inside, with full knowledge of the system.
Gray-box testing: Testing that combines the previous two types of testing. Gray-box testing can reflect the situation that might occur when an attacker can obtain the source code for an application (it could have been leaked or is publicly available). In such circumstances, the attacker is likely to set up a copy of the application on a development server and practice attacks there.

Before you continue, look at the Appendix A, where you will find a list of web security tools. Knowing how something works under the covers is important, but testing everything manually takes away too much of your precious time.

Black-Box Testing

In black-box testing, you pretend you are an outsider, and you try to break in. This useful technique simulates the real world. The less you know about the system you are about to investigate, the better. I assume you are doing black-box assessment because you fall into one of these categories:

You want to increase the security of your own system.
You are helping someone else secure their system.
You are performing web security assessment professionally.

Unless you belong to the first category, you must ensure you have permission to perform black-box testing. Black-box testing can be treated as hostile and often illegal. If you are doing a favor for a friend, get written permission from someone who has the authority to provide it.

Ask yourself these questions: Who am I pretending to be? Or, what is the starting point of my assessment? The answer depends on the nature of the system you are testing. Here are some choices:

A member of the general public
A business partner of the target organization
A customer on the same shared server where the target application resides
A malicious employee
A fellow system administrator

Different starting points require different approaches. A system administrator may have access to the most important servers, but such servers are (hopefully) out of reach of a member of the public. The best way to conduct an assessment is to start with no special privileges and examine what the system looks like from that point of view. Then continue upward, assuming other roles. While doing all this, remember you are doing a web security assessment, which is a small fraction of the subject of information security. Do not cover too much territory, or you will never finish. In your initial assessment, you should focus on the issues mostly under your responsibility.

As you perform the assessment, record everything, and create an information trail. If you know something about the infrastructure beforehand, you must prove you did not use it as part of black-box testing. You can use that knowledge later, as part of white-box testing.

Black-box testing consists of the following steps:

Information gathering (passive and active)
Web server analysis
Web application analysis
Vulnerability probing

I did not include report writing, but you will have to do that, too. To make your job easier, mark your findings this way:

Notices: Things to watch out for
Warnings: Problems that are not errors but are things that should be fixed
Errors: Problems that should be corrected as soon as possible
Severe errors: Gross oversights; problems that must be corrected immediately

Information Gathering

Information gathering is the first step of every security assessment procedure and is important when performed as part of black-box testing methodology. Working blindly, you will see information available to a potential attacker. Here we assume you are armed only with the name of a web site.

Information gathering can be broadly separated into two categories: passive and active. Passive techniques cannot be detected by the organization being investigated. They involve extracting knowledge about the organization from systems outside the organization. They may include techniques that involve communication with systems run by the organization but only if such techniques are part of their normal operation (e.g., the use of the organization’s DNS servers) and cannot be detected.

Most information gathering techniques are well known, having been used as part of traditional network penetration testing for years. Passive information gathering techniques were covered in the paper written by Gunter Ollmann:

“Passive Information Gathering: The Analysis Of Leaked Network Security Information“ by Gunter Ollmann (NGSS) (http://www.nextgenss.com/papers/NGSJan2004PassiveWP.pdf)

The name of the web site you have been provided will resolve to an IP address, giving you the vital information you need to start with. Depending on what you have been asked to do, you must decide whether you want to gather information about the whole of the organization. If your only target is the public web site, the IP address of the server is all you need. If the target of your research is an application used internally, you will need to expand your search to cover the organization’s internal systems.

The IP address of the public web site may help discover the whole network, but only if the site is internally hosted. For smaller web sites, hosting internally is overkill, so hosting is often outsourced. Your best bet is to exchange email with someone from the organization. Their IP address, possibly the address from an internal network, will be embedded into email headers.

Organizational information

Your first goal is to learn as much as possible about the organization, so going to its public web site is a natural place to start. You are looking for the following information:

Names and positions
Email addresses
Addresses and telephone numbers, which reveal physical locations
Posted documents, which often reveal previous revisions, or information on who created them

The web site should be sufficient for you to learn enough about the organization to map out its network of trust. In a worst-case scenario (from the point of view of attacking them), the organization will trust itself. If it relies on external entities, there may be many opportunities for exploitation. Here is some of the information you should determine:

Size: The security posture of a smaller organization is often lax, and such organizations usually cannot afford having information security professionals on staff. Bigger companies employ many skilled professionals and possibly have a dedicated information security team.
Outsourcing: Organizations are rarely able to enforce their procedures when parts of the operations are outsourced to external entities. If parts of the organization are outsourced, you may have to expand your search to target other sites.
Business model: Do they rely on a network of partners or distributors to do the business? Distributors are often smaller companies with lax security procedures. A distributor may be an easy point of entry.

Domain name registration

Current domain name registration practices require significant private information to be provided to the public. This information can easily be accessed using the whois service, which is available in many tools, web sites, and on the command line.

There are many whois servers (e.g., one for each registrar), and the important part of finding the information you are looking for is in knowing which server to ask. Normally, whois servers issue redirects when they cannot answer a query, and good tools will follow redirects automatically. When using web-based tools (e.g., http://www.internic.net/whois.html), you will have to perform redirection manually.

Watch what information we can find on O’Reilly (registrar disclaimers have been removed from the output to save space):

$ whois oreilly.com
...
O'Reilly & Associates
   1005 Gravenstein Hwy., North
   Sebastopol, CA, 95472
   US
   
   Domain Name: OREILLY.COM

   Administrative Contact -
        DNS Admin -  nic-ac@OREILLY.COM
        O'Reilly & Associates, Inc.
        1005 Gravenstein Highway North
        Sebastopol, CA 95472
        US
        Phone -  707-827-7000
        Fax -  707-823-9746
   Technical Contact -
        technical DNS -  nic-tc@OREILLY.COM
        O'Reilly & Associates
        1005 Gravenstein Highway North
        Sebastopol, CA 95472
        US
        Phone -  707-827-7000
        Fax -  - 707-823-9746

   Record update date -  2004-05-19 07:07:44
   Record create date -  1997-05-27
   Record will expire on -  2005-05-26
   Database last updated on -  2004-06-02 10:33:07 EST
   
   Domain servers in listed order:
   
   NS.OREILLY.COM                209.204.146.21
   NS1.SONIC.NET                 208.201.224.11

Domain name system

A tool called dig can be used to convert names to IP addresses or do the reverse, convert IP addresses to names (known as reverse lookup). An older tool, nslookup, is still popular and widely deployed.

$ dig oreilly.com any
   
; <<>> DiG 9.2.1 <<>> oreilly.com any
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 30773
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 3, ADDITIONAL: 4
   
;; QUESTION SECTION:
;oreilly.com.                   IN      ANY

;; ANSWER SECTION:
oreilly.com.            20923   IN      NS      ns1.sonic.net.
oreilly.com.            20923   IN      NS      ns2.sonic.net.
oreilly.com.            20923   IN      NS      ns.oreilly.com.
oreilly.com.            20924   IN      SOA     ns.oreilly.com. 
                                                nic-tc.oreilly.com.
2004052001 10800 3600 604800 21600
oreilly.com.            20991   IN      MX      20 smtp2.oreilly.com.

;; AUTHORITY SECTION:
oreilly.com.            20923   IN      NS      ns1.sonic.net.
oreilly.com.            20923   IN      NS      ns2.sonic.net.
oreilly.com.            20923   IN      NS      ns.oreilly.com.
   
;; ADDITIONAL SECTION:
ns1.sonic.net.          105840  IN      A       208.201.224.11
ns2.sonic.net.          105840  IN      A       208.201.224.33
ns.oreilly.com.         79648   IN      A       209.204.146.21
smtp2.oreilly.com.      21011   IN      A       209.58.173.10

;; Query time: 2 msec
;; SERVER: 217.160.182.251#53(217.160.182.251)
;; WHEN: Wed Jun  2 15:54:00 2004
;; MSG SIZE  rcvd: 262

This type of query reveals basic information about a domain name, such as the name servers and the mail servers. We can gather more information by asking a specific question (e.g., “What is the address of the web site?”):

$ dig www.oreilly.com
   
;; QUESTION SECTION:
;www.oreilly.com.               IN      A
   
;; ANSWER SECTION:
www.oreilly.com.        20269   IN      A       208.201.239.36
www.oreilly.com.        20269   IN      A       208.201.239.37

The dig tool converts IP addresses into names when the -x option is used:

$ dig -x 208.201.239.36
   
;; QUESTION SECTION:
;36.239.201.208.in-addr.arpa.   IN      PTR
   
;; ANSWER SECTION:
36.239.201.208.in-addr.arpa. 86381 IN   PTR     www.oreillynet.com.

You can see that this reverse query of the IP address from looking up the domain name oreilly.com gave us a whole new domain name.

A zone transfer is a service where all the information about a particular domain name is transferred from a domain name server. Such services are handy because of the wealth of information they provide. For the same reason, the access to a zone transfer service is often restricted. Zone transfers are generally not used for normal DNS operation, so requests for zone transfers are sometimes logged and treated as signs of preparation for intrusion.

Note

If you have an address range, you can gather information similar to that of a zone transfer by performing a reverse lookup on every individual IP address.

Regional Internet Registries

You have probably discovered several IP addresses by now. IP addresses are not sold; they are assigned to organizations by bodies known as Regional Internet Registries (RIRs). The information kept by RIRs is publicly available. Four registries cover address allocation across the globe:

APNIC: Asia-Pacific Network Information Center (http://www.apnic.net)
ARIN: American Registry for Internet Numbers (http://www.arin.net)
LACNIC: Latin American and Caribbean Internet Address Registry (http://www.lacnic.net)
RIPE NCC: RIPE Network Coordination Centre (http://www.ripe.net)

Registries do not work with end users directly. Instead, they delegate large blocks of addresses to providers, who delegate smaller chunks further. In effect, an address can be assigned to multiple parties. In theory, every IP address should be associated with the organization using it. In real life, Internet providers may not update the IP address database. The best you can do is to determine the connectivity provider of an organization.

IP assignment data can be retrieved from any active whois server, and different servers can give different results. In the case below, I just guessed that whois.sonic.net exists. This is what we get for one of O’Reilly’s IP addresses:

$ whois -h whois.sonic.net 209.204.146.21
[Querying whois.sonic.net]
[whois.sonic.net]
You asked for 209.204.146.21
network:Class-Name:network
network:Auth-Area:127.0.0.1/32
network:ID:NETBLK-SONIC-209-204-146-0.127.0.0.1/32
network:Handle:NETBLK-SONIC-209-204-146-0
network:Network-Name:SONIC-209-204-146-0
network:IP-Network:209.204.146.0/24
network:IP-Network-Block:209.204.146.0 - 209.204.146.255
network:Org-Name:John Irwin
network:Email:ora@sonic.net
network:Tech-Contact;Role:SACC-ORA-SONIC.127.0.0.1/32
   
network:Class-Name:network
network:Auth-Area:127.0.0.1/32
network:ID:NETBLK-SONIC-209-204-128-0.127.0.0.1/32
network:Handle:NETBLK-SONIC-209-204-128-0
network:Network-Name:SONIC-209-204-128-0
network:IP-Network:209.204.128.0/18
network:IP-Network-Block:209.204.128.0 - 209.204.191.255
network:Org-Name:Sonic Hostmaster
network:Email:ipowner@sonic.net
network:Tech-Contact;Role:SACC-IPOWNER-SONIC.127.0.0.1/32

Search engines

Search engines have become a real resource when it comes to information gathering. This is especially true for Google, which has exposed its functionality through an easy-to-use programming interface. Search engines can help you find:

Publicly available information on a web site or information that was available before.
Information that is not intended for public consumption but that is nevertheless available unprotected (and the search engine picked it up).
Posts from employees to newsgroups and mailing lists. Post headers reveal information about the infrastructure. Even message content can reveal bits about the infrastructure. If you find a member of the development team asking questions about a particular database engine, chances are that engine is used in-house.
Links to other organizations, possibly those that have done work for the organization being targeted.

Look at some example Google queries. If you want to find a list of PDF documents available on a site, type a Google search query such as the following:

site:www.modsecurity.org filetype:pdf

To see if a site contains Apache directory listings, type something like this:

site:www.modsecurity.org intitle:"Index of /" "Parent Directory"

To see if it contains any WS_FTP log files, type something like this:

site:www.modsecurity.org inurl:ws_ftp.log

Anyone can register with Google and receive a key that will support up to 1,000 automated searches per day. To learn more about Google APIs, see the following:

Google Web APIs (http://www.google.com/apis/)
Google Web API Reference (http://www.google.com/apis/reference.html)

$ traceroute www.oreilly.com
traceroute: Warning: www.oreilly.com has multiple addresses; using 208.201.
                                                                   239.36
traceroute to www.oreilly.com (208.201.239.36), 30 hops max, 38 byte packets
 1    gw-prtr-44-a.schlund.net (217.160.182.253)  0.238 ms
 2    v999.gw-dist-a.bs.ka.schlund.net (212.227.125.253)  0.373 ms
 3    ge-41.gw-backbone-b.bs.ka.schlund.net (212.227.116.232)  0.535 ms
 4    pos-80.gw-backbone-b.ffm.schlund.net (212.227.112.127)  3.210 ms
 5    cr02.frf02.pccwbtn.net (80.81.192.50)  4.363 ms
 6    pos3-0.cr02.sjo01.pccwbtn.net (63.218.6.66)  195.201 ms
 7    layer42.ge4-0.4.cr02.sjo01.pccwbtn.net (63.218.7.6)  187.701 ms
 8    2.fast0-1.gw.equinix-sj.sonic.net (64.142.0.21)  185.405 ms
 9    fast5-0-0.border.sr.sonic.net (64.142.0.13)  191.517 ms
10    eth1.dist1-1.sr.sonic.net (208.201.224.30)  192.652 ms
11    www.oreillynet.com (208.201.239.36)  190.662 ms

The traceroute output shows the route packets use to travel from your location to the target’s location. The last few lines matter; the last line is the server. On line 10, we see what is most likely a router, connecting the network to the Internet.

Note

traceroute relies on the ICMP protocol to discover the path packets use to travel from one point to another, but ICMP packets can be filtered for security reasons. An alternative tool, tcptraceroute (http://michael.toren.net/code/tcptraceroute/) performs a similar function but uses other methods. Try tcptraceroute if traceroute does not produce results.

Port scanning

Port scanning is an active information-gathering technique. It is viewed as impolite and legally dubious. You should only perform port scanning against your own network or where you have written permission from the target.

The purpose of port scanning is to discover active network devices on a given range of addresses and to analyze each device to discover public services. In the context of web security assessment, you will want to know if a publicly accessible FTP or a database engine is running on the same server. If there is, you may be able to use it as part of your assessment.

Note

Services often run unprotected and with default passwords. I once discovered a MySQL server on the same machine as the web server, running with the default root password (which is an empty string). Anyone could have accessed the company’s data and not bother with the web application.

The process of the discovery of active hosts is called a ping sweep. An attempt is made to ping each IP address and live addresses are reported. Here is a sample run, in which XXX.XXX.XXX.112/28 represents the IP address you would type:

# nmap -sP 
XXX.XXX.XXX.112/28

Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
   
Host (XXX.XXX.XXX.112) seems to be a subnet broadcast address (returned 1
extra pings).
Host (XXX.XXX.XXX.114) appears to be up.
Host (XXX.XXX.XXX.117) appears to be up.
Host (XXX.XXX.XXX.120) appears to be up.
Host (XXX.XXX.XXX.122) appears to be up.
Host (XXX.XXX.XXX.125) appears to be up.
Host (XXX.XXX.XXX.126) appears to be up.
Host (XXX.XXX.XXX.127) seems to be a subnet broadcast address (returned 1
extra pings).
   
Nmap run completed -- 16 IP addresses (6 hosts up) scanned in 7 seconds

After that, you can proceed to get more information from individual hosts by looking at their TCP ports for active services. The following is sample output from scanning a single host. I have used one of my servers since scanning one of O’Reilly’s servers without a permit would have been inappropriate.

# nmap -sS 
XXX.XXX.XXX.XXX

Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
   
The SYN Stealth Scan took 144 seconds to scan 1657 ports.
Interesting ports on XXX.XXX.XXX.XXX:
(The 1644 ports scanned but not shown below are in state: closed)
PORT     STATE SERVICE
21/tcp   open  ftp
22/tcp   open  ssh
23/tcp   open  telnet
25/tcp   open  smtp
53/tcp   open  domain
80/tcp   open  http
110/tcp  open  pop-3
143/tcp  open  imap
443/tcp  open  https
993/tcp  open  imaps
995/tcp  open  pop3s
3306/tcp open  mysql
8080/tcp open  http-proxy
   
Nmap run completed -- 1 IP address (1 host up) scanned in 157.022 seconds

You can go further if you use Nmap with a -sV switch, in which case it will connect to the ports you specify and attempt to identify the services running on them. In the following example, you can see the results of service analysis when I run Nmap against ports 21, 80, and 8080. It uses the Server header field to identify web servers, which is the reason it incorrectly identified the Apache running on port 80 as a Microsoft Internet Information Server. (I configured my server with a fake server name, as described in Chapter 2, where HTTP fingerprinting for discovering real web server identities is discussed.)

# nmap -sV 
XXX.XXX.XXX.XXX
 -P0 -p 21,80,8080
Starting nmap 3.48 ( http://www.insecure.org/nmap/ )
   
Interesting ports on XXX.XXX.XXX.XXX:
PORT     STATE SERVICE VERSION
21/tcp   open  ftp     ProFTPD 1.2.9
80/tcp   open  http    Microsoft IIS webserver 5.0
8080/tcp open  http    Apache httpd 2.0.49 ((Unix) DAV/2 PHP/4.3.4)
   
Nmap run completed -- 1 IP address (1 host up) scanned in 22.065 seconds

Note

Another well-known tool for service identification is Amap (http://www.thc.org/releases.php). Try it if Nmap does not come back with satisfactory results.

Scanning results will usually fall into one of three categories:

No firewall: Where there is no firewall in place, you will often find many unrestricted services running on the server. This indicates a server that is not taken care of properly. This is the case with many managed dedicated servers.
Limited firewall: A moderate-strength firewall is in place, allowing access to public services (e.g., http) but protecting private services (e.g., ssh). This often means whoever maintains the server communicates with the server from a static IP address. This type of firewall uses an “allow by default, deny what is sensitive“ approach.
Tight firewall: In addition to protecting nonpublic services, a tight firewall configuration will restrict ICMP (ping) traffic, restrict outbound traffic, and only accept related incoming traffic. This type of firewall uses a “deny by default, allow what is acceptable” approach.

If scan results fall into the first or the second category, the server is probably not being closely monitored. The third option shows the presence of people who know what they are doing; additional security measures may be in place.

Web Server Analysis

This is where the real fun begins. At a minimum, you need the following tools:

A browser to access the web server
A way to construct and send custom requests, possibly through SSL
A web security assessment proxy to monitor and change traffic

Optionally, you may choose to perform an assessment through one or more open proxies (by chaining). This makes the test more realistic, but it may disclose sensitive information to others (whoever controls the proxy), so be careful.

Note

If you do choose to go with a proxy, note that special page objects such as Flash animations and Java applets often choose to communicate directly with the server, thus revealing your real IP address.

We will take these steps:

Test SSL.
Identify the web server.
Identify the application server.
Examine default locations.
Probe for common configuration problems.
Examine responses to exceptions.
Probe for known vulnerabilities.
Enumerate applications.

Testing SSL

I have put SSL tests first because, logically, SSL is the first layer of security you encounter. Also, in some rare cases you will encounter a target that requires use of a privately issued client certificate. In such cases, you are unlikely to progress further until you acquire a client certificate. However, you should still attempt to trick the server to give you access without a valid client certificate.

Attempt to access the server using any kind of client certificate (even a certificate you created will do). If that fails, try to access the server using a proper certificate signed by a well-known CA. On a misconfigured SSL server, such a certificate will pass the authentication phase and allow access to the application. (The server is only supposed to accept privately issued certificates.) Sometimes using a valid certificate with a subject admin or Administrator may get you inside (without a password).

Whether or not a client certificate is required, perform the following tests:

Version 2 of the SSL protocol is known to suffer from a few security problems. Unless there is a good reason to support older SSLv2 clients, the web server should be configured to accept only SSLv3 or TLSv1 connections. To check this, use the OpenSSL client, as demonstrated in Chapter 4, adding the -no_ssl3 and -no_tls1 switches.
A default Apache SSL configuration will allow various ciphers to be used to secure the connection. Many ciphers are not considered secure any more. They are there only for backward compatibility. The OpenSSL s_client tool can be used for this purpose, but an easier way exists. The Foundstone utility SSLDigger (described in the Appendix A) will perform many tests attempting to establish SSL connections using ciphers of different strength. It comes with a well-written whitepaper that describes the tool’s function.
Programmers sometimes redirect users to the SSL portion of the web site from the login page only and do not bother to check at other entry points. Consequently, you may be able to bypass SSL and use the site without it by directly typing the URL of a page.

Identifying the web server

After SSL testing (if any), attempt to identify the web server. Start by typing a Telnet command such as the following, substituting the appropriate web site name:

$ telnet www.modsecurity.org 80
Trying 217.160.182.153...
Connected to www.modsecurity.org.
Escape character is '^]'.
OPTIONS / HTTP/1.0
Host: www.modsecurity.org
   
HTTP/1.1 200 OK
Date: Tue, 08 Jun 2004 10:54:52 GMT
Server: Microsoft-IIS/5.0
Content-Length: 0
Allow: GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, PATCH, PROPFIND,
PROPPATCH, MKCOL, COPY, MOVE, LOCK, UNLOCK, TRACE

We learn two things from this output:

The web server supports WebDAV. You can see this by the appearance of the WebDAV specific methods, such as PATCH and PROPFIND, in the Allow response header. This is an indication that we should perform more WebDAV research.
The Server signature tells us the site is running the Microsoft Internet Information Server. Suppose you find this unlikely (having in mind the nature of the site and its pro-Unix orientation). You can use Netcraft’s “What’s this site running?” service (at http://uptime.netcraft.co.uk and described in the Appendix A) and access the historical data if available. In this case, Netcraft will reveal the site is running on Linux and Apache, and that the server signature is “Apache/1.3.27 (Unix) (Red-Hat/Linux) PHP/4.2.2 mod_ssl/2.8.12 openSSL/0.9.6b“ (as of August 2003).

We turn to httprint for the confirmation of the signature:

$ httprint -P0 -h www.modsecurity.org -s signatures.txt
httprint v0.202 (beta) - web server fingerprinting tool
(c) 2003,2004 net-square solutions pvt. ltd. - see readme.txt
http://net-square.com/httprint/
httprint@net-square.com
   
--------------------------------------------------
Finger Printing on http://www.modsecurity.org:80/
Derived Signature:
Microsoft-IIS/5.0
9E431BC86ED3C295811C9DC5811C9DC5050C5D32505FCFE84276E4BB811C9DC5
0D7645B5811C9DC5811C9DC5CD37187C11DDC7D7811C9DC5811C9DC58A91CF57
FCCC535BE2CE6923FCCC535B811C9DC5E2CE69272576B769E2CE69269E431BC8
6ED3C295E2CE69262A200B4C6ED3C2956ED3C2956ED3C2956ED3C295E2CE6923
E2CE69236ED3C295811C9DC5E2CE6927E2CE6923
   
Banner Reported: Microsoft-IIS/5.0
Banner Deduced: Apache/1.3.27
Score: 140
Confidence: 84.34

This confirms the version of the web server that was reported by Netcraft. The confirmation shows the web server had not been upgraded since October 2003, so the chances of web server modules having been upgraded are slim. This is good information to have.

This complete signature gives us many things to work with. From here we can go and examine known vulnerabilities for Apache, PHP, mod_ssl, and OpenSSL. The OpenSSL version (reported by Netcraft as 0.9.6b) looks very old. According to the OpenSSL web site, Version 0.9.6b was released in July 2001. Many serious OpenSSL vulnerabilities have been made public since that time.

A natural way forward from here would be to explore those vulnerabilities further. In this case, however, that would be a waste of time because the version of OpenSSL running on the server is not vulnerable to current attacks. Vendors often create custom branches of software applications that they include in their operating systems. After the split, the included applications are maintained internally, and the version numbers rarely change. When a security problem is discovered, vendors perform what is called a backport: the patch is ported from the current software version (maintained by the original application developers) back to the older release. This only results in a change of the packaging version number, which is typically only visible from the inside. Since there is no way of knowing this from the outside, the only thing to do is to go ahead and check for potential vulnerabilities.

Identifying the application server

We now know the site likely uses PHP because PHP used to appear in the web server signature. We can confirm our assumption by browsing and looking for a nonstatic part of the site. Pages with the extension .php are likely to be PHP scripts.

Some sites can attempt to hide the technology by hiding extensions. For example, they may associate the extension .html with PHP, making all pages dynamic. Or, if the site is running on a Windows server, associating the extension .asp with PHP may make the application look as if it was implemented in ASP.

Note

Attempts to increase security in this way are not likely to succeed. If you look closely, determining the technology behind a web site is easy. For system administrators it makes more sense to invest their time where it really matters.

Suppose you are not sure what technology is used at a web site. For example, suppose the extension for a file is .asp but you think that ASP is not used. The HTTP response may reveal the truth:

$ telnet www.modsecurity.org 80
Trying 217.160.182.153...
Connected to www.modsecurity.org.
Escape character is '^]'.
HEAD /index.asp HTTP/1.0
Host: www.modsecurity.org
   
HTTP/1.1 200 OK
Date: Tue, 24 Aug 2004 13:54:11 GMT
Server: Microsoft-IIS/5.0
X-Powered-By: PHP/4.3.3-dev
Set-Cookie: PHPSESSID=9d3e167d46dd3ebd81ca12641d82106d; path=/
Connection: close
Content-Type: text/html

There are two clues in the response that tell you this is a PHP-based site. First, the X-Powered-By header includes the PHP version. Second, the site sends a cookie (the Set-Cookie header) whose name is PHP-specific.

Don’t forget a site can utilize more than one technology. For example, CGI scripts are often used even when there is a better technology (such as PHP) available. Examine all parts of the site to discover the technologies used.

Examining default locations

A search for default locations can yield significant rewards:

Finding files present where you expect them to be present will reinforce your judgment about the identity of the server and the application server.
Default installations can contain vulnerable scripts or files that reveal information about the target.
Management interfaces are often left unprotected, or protected with a default username/password combination.

For Apache, here are the common pages to try to locate:

/server-status
/server-info
/mod_gzip_status
/manual
/icons
~root/
~nobody/

Probing for common configuration problems

Test to see if proxy operations are allowed in the web server. A running proxy service that allows anyone to use it without restriction (a so-called open proxy) represents a big configuration error. To test, connect to the target web server and request a page from a totally different web server. In proxy mode, you are allowed to enter a full hostname in the request (otherwise, hostnames go into the Host header):

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
HEAD http://www.google.com:80/ HTTP/1.0
   
HTTP/1.1 302 Found
Date: Thu, 11 Nov 2004 14:10:14 GMT
Server: GWS/2.1
Location: http://www.google.de/
Content-Type: text/html; charset=ISO-8859-1
Via: 1.0 www.google.com
Connection: close
   
Connection closed by foreign host.

If the request succeeds (you get a response, like the response from Google in the example above), you have encountered an open proxy. If you get a 403 response, that could mean the proxy is active but configured not to accept requests from your IP address (which is good). Getting anything else as a response probably means the proxy code is not active. (Web servers sometimes simply respond with a status code 200 and return their default home page.)

The other way to use a proxy is through a CONNECT method, which is designed to handle any type of TCP/IP connection, not just HTTP. This is an example of a successful proxy connection using this method:

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
CONNECT www.google.com:80 HTTP/1.0
   
HTTP/1.0 200 Connection Established
Proxy-agent: Apache/2.0.49 (Unix)
   
HEAD / HTTP/1.0
Host: www.google.com
   
HTTP/1.0 302 Found
Location: http://www.google.de/
Content-Type: text/html
Server: GWS/2.1
Content-Length: 214
Date: Thu, 11 Nov 2004 14:15:22 GMT
Connection: Keep-Alive
   
Connection closed by foreign host.

In the first part of the request, you send a CONNECT line telling the proxy server where you want to go. If the CONNECT method is allowed, you can continue typing. Everything you type from this point on goes directly to the target server. Having access to a proxy that is also part of an internal network opens up interesting possibilities. Internal networks usually use nonroutable private space that cannot be reached from the outside. But the proxy, because it is sitting on two addresses simultaneously, can be used as a gateway. Suppose you know that the IP address of a database server is 192.168.0.99. (For example, you may have found this information in an application library file through file disclosure.) There is no way to reach this database server directly but if you ask the proxy nicely it may respond:

$ telnet www.example.com 80
Connected to www.example.com.
Escape character is '^]'.
CONNECT 192.168.0.99:3306 HTTP/1.0
   
HTTP/1.0 200 Connection Established
Proxy-agent: Apache/2.0.49 (Unix)

If you think a proxy is there but configured not to respond to your IP address, make a note of it. This is one of those things whose exploitation can be attempted later, for example after a successful entry to a machine that holds an IP address internal to the organization.

The presence of WebDAV may allow file enumeration. You can test this using the WebDAV protocol directly (see Chapter 10) or with a WebDAV client. Cadaver (http://www.webdav.org/cadaver/) is one such client. You should also attempt to upload a file using a PUT method. On a web server that supports it, you may be able to upload and execute a script.

Another frequent configuration problem is the unrestricted availability of web server access logs. The logs, when available, can reveal direct links to other interesting (possibly also unprotected) server resources. Here are some folder names you should try:

/logs
/stats
/weblogs
/webstats

Examining responses to exceptional requests

For your review, you need to be able to differentiate between normal responses and exceptions when they are coming from the web server you are investigating. To do this, make several obviously incorrect requests at the beginning of the review and watch for the following:

Is the server responding with HTTP status 404 when pages are not found, as expected?
Is an IDS present? Simulate a few attacks against arbitrary scripts and see what happens. See if there might be a device that monitors the traffic and interferes upon attack detection.

Some applications respond to errors with HTTP status 200 as they would for successful requests, rather than following the HTTP standard of returning suitable status codes (such as status 404 when a page is not found). They do this in error or in an attempt to confuse automated vulnerability scanners. Authors of vulnerability scanners know about this trick, but it is still used. Having HTTP status 200 returned in response to errors will slow down any programmatic analysis of the web site but not much. Instead of using the response status code to detect problems, you will have to detect problems from the text embedded in the response page.

Examine the error messages produced by the application (even though we have not reached application analysis yet). If the application gives out overly verbose error messages, note this problem. Then proceed to use this flaw for information discovery later in the test.

Probing for known vulnerabilities

If there is sufficient information about the web server and the application server and there is reason to suspect the site is not running the latest version of either, an attacker will try to exploit the vulnerabilities. Vulnerabilities fall into one of the following three categories:

Easy to exploit vulnerabilities, often web-based
Vulnerabilities for which ready-made exploits are available
Vulnerabilities for which exploits are not yet released

Attackers are likely to attempt exploitation in cases 1 and 2. Exploitation through case 3 is possible in theory, but it requires much effort and determination by the attacker. Run up-to-date software to prevent the exploitation of valuable targets.

If you have reason to believe a system is vulnerable to a known vulnerability, you should attempt to compromise it. A successful exploitation of a vulnerability is what black-box assessment is all about. However, that can sometimes be dangerous and may lead to interrupted services, server crashing, or even data loss, so exercise good judgment to stop short of causing damage.

Enumerating applications

The last step in web server analysis is to enumerate installed applications. Frequently, there will be only one. Public web sites sometimes have several applications, one for the main content, another for forums, a third for a web log, and so on. Each application is an attack vector that must be analyzed. If you discover that a site uses a well-known application, you should look for its known vulnerabilities (for example, by visiting http://www.securityfocus.com/bid or http://www.secunia.com). If the application has not been patched recently there may be vulnerabilities that can be exploited.

The web application analysis steps should be repeated for every identified application.

Assessing the execution environment

Depending on the assessment you are performing, you may be able to execute processes on the server from the beginning (if you are pretending to be a shared hosting customer, for example). Even if such a privilege is not given to you, a successful exploitation of an application weakness may still provide you with this ability. If you can do this, one of the mandatory assessment steps would be to assess the execution environment:

Use a tool such as env_audit (see Chapter 6) to search for process information leaks.
Search the filesystem to locate executable binaries, files and directories you can read and write.

Web Application Analysis

If the source of the web application you are assessing is commonly available, then download it for review. (You can install it later if you determine there is a reason to practice attacking it.) Try to find the exact version used at the target site. Then proceed with the following:

Learn about the application architecture.
Discover how session management is implemented.
Examine the access control mechanisms.
Learn about the way the application interacts with other components.
Read through the source code (if available) for vulnerabilities.
Research whether there are any known vulnerabilities.

The remainder of this section continues with the review under the assumption the source code is unavailable. The principle is the same, except that with the source code you will have much more information to work with.

Using a spider to map out the application structure

Map out the entire application structure. A good approach is to use a spider to crawl the site automatically and review the results manually to fill in the blanks. Many spiders do not handle the use of the HTML <base> tag properly. If the site uses it, you will be likely to do most of the work manually.

As you are traversing the application, you should note response headers and cookies used by the application. Whenever you discover a page that is a part of a process (for example, a checkout process in an e-commerce application), write the information down. Those pages are candidates for tests against process state management weaknesses.

Examining page elements

Look into the source code of every page (here I mean the HTML source code and not the source of the script that generated it), examining JavaScript code and HTML comments. Developers often create a single JavaScript library file and use it for all application modules. It may happen that you get a lot of JavaScript code covering the use of an administrative interface.

Enumerating pages with parameters

Enumerate pages that accept parameters. Forms are especially interesting because most of the application functionality resides in them. Give special attention to hidden form fields because applications often do not expect the values of such fields to change.

For each page, write down the following information:

Target URL
Method (GET/POST)
Encoding (usually application/x-www-form-urlencoded; sometimes multipart/form-data)
Parameters (their types and default values)
If authentication is required
If SSL is required
Notes

You should note all scripts that perform security-sensitive operations, for the following reasons:

File downloads performed through scripts (instead of directly by the web server) may be vulnerable to file disclosure problems.
Scripts that appear to be using page parameters to include files from disk are also candidates for file disclosure attacks.
User registration, login, and pages to handle forgotten passwords are sensitive areas where brute-force attacks may work.

Examining well-known locations

Attempt to access directories directly, hoping to get directory listings and discover new files. Use WebDAV directory listings if WebDAV is available.

If that fails, some of the well-known files may provide more information:

robots.txt (may contain links to hidden folders)
.bash_history
citydesk.xml (contains a list of all site files)
WS_FTP.LOG (contains a record of all FTP transfers)
WEB-INF/ (contains code that should never be accessed directly)
CVS/ (contains a list of files in the folder)
_mm/contribute.xml (Macromedia Contribute configuration)
_notes/<pagename>.mno (Macromedia Contribute file notes)
_baks (Macromedia Contribute backup files)

Mutate existing filenames, appending frequently used backup extensions and sometimes replacing the existing extension with one of the following:

~
.bak
.BAK
.old
.OLD
.prev
.swp (but with a dot in front of the filename)

Finally, attempting to download predictably named files and folders in every existing folder of the site may yield results. Some sample predictable names include:

phpinfo.php
p.php
test.php
secret/
test/
new/
old/

Attacks Against Access Control

You have collected enough information about the application to analyze three potentially vulnerable areas in every web application:

Session management: Session management mechanisms, especially those that are homemade, may be vulnerable to one of the many attacks described in Chapter 10. Session tokens should be examined and tested for randomness.
Authentication: The login page is possibly the most important page in an application, especially if the application is not open for public registration. One way to attack the authentication method is to look for script vulnerabilities as you would for any other page. Perhaps the login page is vulnerable to an SQL injection attack and you could craft a special request to bypass authentication. An alternative is to attempt a brute-force attack. Since HTTP is a stateless protocol, many web applications were not designed to detect multiple authentication failures, which makes them vulnerable to brute-force attacks. Though such attacks leave clearly visible tracks in the error logs, they often go unnoticed because logs are not regularly reviewed. It is trivial to write a custom script (using Perl, for example) to automate brute-force attacks, and most people do just that. You may be able to use a tool such as Hydra (http://thc.org/thc-hydra/) to do the same without any programming.
Authorization: The authorization subsystem can be tested once you authenticate with the application. The goal of the tests should be to find ways to perform actions that should be beyond your normal user privileges. The ability to do this is known under the term privilege escalation. For example, a frequent authorization problem occurs when a user’s unique identifier is used in a script as a parameter but the script does not check that the identifier belongs to the user who is executing the script. When you hear in the news of users being able to see other users’ banking details online, the cause was probably a problem of this type. This is known as horizontal privilege escalation. Vertical privilege escalation occurs when you are able to perform an action that can normally only be performed by a different class of user altogether. For example, some applications keep the information as to whether the user is a privileged user in a cookie. In such circumstances, any user can become a privileged user simply by forging the cookie.

Vulnerability Probing

The final step of black-box vulnerability testing requires the public interface of the application, parameterized pages, to be examined to prove (or disprove) they are susceptible to attacks.

If you have already found some known vulnerabilities, you will need to confirm them, so do that first. The rest of the work is a process of going through the list of all pages, fiddling with the parameters, attempting to break the scripts. There is no single straight path to take. You need to understand web application security well, think on your feet, and combine pieces of information to build toward an exploit.

This process is not covered in detail here. Practice using the material available in this chapter and in Chapter 10. You should follow the links provided throughout both chapters. You may want to try out two web application security learning environments (WebMaven and WebGoat) described in the Appendix A.

Here is a list of the vulnerabilities you may attempt to find in an application. All of these are described in Chapter 10, with the exception of DoS attacks, which are described in Chapter 5.

SQL injection attacks
XSS attacks
File disclosure flaws
Source code disclosure flaws
Misconfigured access control mechanisms
Application logic flaws
Command execution attacks
Code execution attacks
Session management attacks
Brute-force attacks
Technology-specific flaws
Buffer overflow attacks
Denial of service attacks

White-Box Testing

White-box testing is the complete opposite of what we have been doing. The goal of black-box testing was to rely only on your own resources and remain anonymous and unnoticed; here we can access anything anywhere (or so the theory goes).

The key to a successful white-box review is having direct contact and cooperation from developers and people in charge of system maintenance. Software documentation may be nonexistent, so you will need help from these people to understand the environment to the level required for the assessment.

To begin the review, you need the following:

Complete application documentation and the source code.
Direct access to application developers and system administrators. There is no need for them to be with you all the time; having their telephone numbers combined with a meeting or two will be sufficient.
Unrestricted access to the production server or to an exact system replica. You will need a working system to perform tests since looking at the code is not enough.

The process of white-box testing consists of the following steps:

Architecture review
Configuration review
Functional review

At the end of your white-box testing, you should have a review report that documents your methodology, contains review notes, lists notices, warnings, and errors, and offers recommendations for improvement.

Architecture Review

The purpose of the architecture review is to pave the way for the actions ahead. A good understanding of the application is essential for a successful review. You should examine the following:

Application security policy: If you are lucky, the application review will begin with a well-defined security policy in hand. If such a thing does not exist (which is common), you will have difficulties defining what “security” means. Where possible, a subproject should be branched out to create the application security policy. Unless you know what needs to be protected, it will not be possible to determine whether the system is secure enough. If a subproject is not a possibility, you will have to sketch a security policy using common sense. This security policy will suffer from being focused too much on technology, and based on your assumptions about the business (which may be incorrect). In any case, you will definitely need something to guide you through the rest of the review.
Application modules: Code review will be the subject of later review steps. At this point, we are only interested in major application modules. A typical example would be an application that consists of a public part and the administrative interfaces.
Libraries: Applications are built onto libraries that handle common tasks. It is these libraries that interact with the environment and should be the place to look for security problems.
Data: What kind of data is the application storing? How is it stored and where? Is the storage methodology secure enough for that type of data? Authentication information (such as passwords) should be treated as data, too. Here are some common questions: Are passwords stored in plaintext? What about credit card information? Such information should not be stored in plaintext and should not be stored with a method that would allow an attacker to decrypt it on the server.
Interaction with external systems: Which external systems does the application connect to? Most web applications connect to databases. Is the rule of least privilege used?

Further questions to ask yourself at this point are:

Is the application architecture prone to DoS attacks?
Is the application designed in such a way as to allow it to scale to support its users and processing demands?

Configuration Review

In a configuration review, you pay attention to the environment the application resides in. You need to ask yourself the following questions:

What: What operating system is the server running? What kind of protection does it have? What other services does it offer?
How: Is the server exclusively used for this application? Are many applications sharing the same server? Is it a shared hosting server managed by a third party?
Who: Who has access to the system and how? Shell access is the most dangerous because it gives great flexibility, but other types of access (FTP, CGI scripts) can become equally dangerous with effort and creativity.

Preparing a storage area for review files

To begin your configuration review, create a temporary folder somewhere to store the files you will create during the review, as well as the relevant files you will copy from the application. We assume the path /home/review is correct.

Note

Always preserve the file path when making copies. For example, if you want to preserve /etc/passwd, copy it to the location /home/review/etc/passwd.

As you are making copies ensure you do not copy some of the sensitive data. For example, you do not want to make a copy of the server’s private key. If configuration files contain passwords, you should replace them with a note.

There can always be exceptions. If you have a good reason to make a copy of a sensitive file, go ahead and do it. Review results are likely to be classified as sensitive data, too.

Preparing a file listing and initial notes

Armed with the knowledge of how the application works (or how it should work), we go to the filesystem to assess the configuration. This part of the review starts by creating a record of all files that are part of the application. I find it useful to have a folder tree at the beginning followed by the detailed listing of all files:

# find /home/application/ -type d | sort > /home/review/filelist.txt
# echo >> /home/review/filelist.txt
# ls -albR /home/application >> /home/review/filelist.txt

In the example above, I have assumed the application sits in the /home/application folder. Ideally, all application files will reside within a single folder. If they do not, the review should include all relevant folders. For now we assume we have everything listed in the file filelist.txt.

Continue to use the same file for your notes. It is convenient to have everything in one place. You will need at least two console windows and a browser window to test assumptions you make during the review. In your notes, include the following:

Name of the application and a short description of its purpose
Details about the environment (e.g., the name of the server and whether it is a production server, a development server, or a demo setup for the review)
Your name and email address
Possibly a phone number
Description of the activity (e.g., “Routine web security review“)

Reviewing the web server configuration

Make a copy of the web server configuration files first. Then examine the relevant parts of the configuration, making notes as you go. Remember to include the .htaccess files in the review (if used). Record the following information:

Hostnames and web server ports
Web server document root folder(s) and aliases
Extension-based mappings, folders where CGI scripts are allowed to run, and script aliases
Parts of the site that are password-protected
Situations in which access control is based on file or folder names (e.g., “.htaccess files cannot be downloaded“)
Situations in which access control is based on client IP address or hostname (e.g., “Access to the administrative interface is allowed only from UK offices“)

In most cases, you can copy the server configuration and add your notes to it. Remember your audience will include people who do not know how to configure Apache, so your notes should translate the configuration for them.

Creating a comprehensive checklist of things to look for in web server configuration is difficult. The approach most likely to succeed is to compare the documented requirements (if they exist) with the actual configuration to find flaws. Ask yourself if the web server is configured to mitigate DoS attacks (see Chapter 5).

Reviewing the application configuration

Applications typically have their own configuration files. You need to know where such files are stored and familiarize yourself with the options. Make copies of the files for record-keeping purposes.

Note

Some applications keep their configuration, or parts of the configuration, in a database. If you find this is the case, you need to dump the configuration part of a database into a file and store the dump as a record.

You will probably be interested in options related to logging and access control. Applications often need their own password to access other parts of the system (e.g., a database), and you should note how those passwords are stored. If the application supports a debugging mode, you need to examine if it is used and how.

Examine how a connection to the database is made. You do not want to see:

A connection based on trust (e.g., “accept all connections from localhost“). This would mean that any local user could gain access to the database.
A connection made with a root account. This account will typically have full access to the database system.

The web application should have minimal database privileges. It is acceptable for an application to use one account to access a database and have full privileges over it. It is not acceptable to be able to access more than one database (think about containment). The application privileges should be further restricted wherever possible (e.g., do not allow the account to drop tables, or give it read-only access to parts of the database).

The same concept (“least privilege used”) applies to connections to other types of systems, for example LDAP.

Reviewing file permissions

When reviewing file permissions, we are interested in deviations from the default permissions, which are defined as follows:

Application files are owned by the application user (for example, appuser) and the application group (for example appgrp). The account and the group are not used for other purposes, which also means that no other users should be members of the application group.
Write access is not allowed.
Other users and groups have no access to application files.
As an exception, the web server user is allowed read access for files and is allowed read and execute access for CGI scripts (see Chapter 6).

We examine the potential for information leakage first, by understanding who is allowed read access to application files. If read access is discovered and it cannot be justified, the discovery is marked as an error. We automate the search using the find utility.

Examine if any suid or guid files are present. Such files allow binaries to run as their owner (typically root) and not as the user who is executing them. Their presence (though unlikely) may be very dangerous, so it is worth checking for them:

# find /home/application -type f -and \( -perm -4000 -or -perm -2000 \) | 
xargs ls -adl

The following finds world-readable files, where any system user can read the files and folders:

# find /home/application -perm -4 | xargs ls -adl

The following finds files owned by users other than the application user:

# find /home/application ! -user appuser | xargs ls -adl

The following finds group-readable files, where the group is not the application group:

# find /home/application -perm -40 ! -group appgrp | xargs ls -adl

Allowing users other than the application user write access opens a whole new attack vector and is, therefore, very dangerous. This is especially true for the web server user because it may be possible for an attacker to control the publicly available scripts to create a file under the application tree, leading to code execution compromise.

The following finds world-writable files:

# find /home/application -perm -2 | xargs ls -adl

The following finds files owned by users other than the application user. This includes files owned by the web server user.

# find /home/application ! -user appuser | xargs ls -adl

The following finds group-writable files, in which the group is not the application group (group-writable files are not necessary but there may be a good reason for their existence):

# find /home/application -perm -20 ! -group appgrp | xargs ls -adl

Reviewing the files

We now go through the file listing, trying to understand the purpose of each file and make a judgment as to whether it is in the right place and whether the permissions are configured properly. Here is advice regarding the different types of files:

Data: Datafiles should never be stored under the web server tree. No user other than the application user should have access to them.
Library files: Library files should never be kept under the web server tree either, but they are found there sometimes. This is relatively safe (but not ideal) provided the extension used is seen by the web server as that of a script. Otherwise, having such files under the web server tree is a configuration error. For example, some programmers use a .inc extension for PHP library files or a .class extension for individual PHP classes. These will probably not be recognized as PHP scripts.
Obscure files: This class covers temporary files placed under the web server for download, “special” folders that can be accessed by anyone who knows their names. Such files do not belong on a web site. Temporary files should be moved to the assessment storage area immediately. If there is a genuine need for functionality that does not exist (for example, secure download of certain files), a note should be made to implement the functionality securely.
Uploaded files: If file upload is allowed, the folder where writing is allowed should be configured not to allow script or code execution. Anything other than that is a code execution compromise waiting to happen.
Files that should not be there: All sorts of files end up under the web server tree. Archives, backup files created by editors, and temporary files are dangerous as they can leak system information.

At the end of this step, we go back to the file permission report and note as errors any assigned permissions that are not essential for the application to function properly.

Functional Review

The next step is to examine parts of the source code. A full source code review is expensive and often not economical (plus it requires very good understanding of programming and the technology used, an understanding only developers can have). To meet our own goals, we perform a limited review of the code:

Basic review to understand how the application works
Review of critical application components
Review of hot spots, the parts of the code most vulnerable to attacks

Basic application review

In basic application review, you browse through the source code, locate the libraries, and examine the general information flow. The main purpose of the review is to identify the application building blocks, and review them one by one.

Application infrastructure review

Web applications are typically built on top of infrastructure that is designed to handle common web-related tasks. This is the layer where many security issues are found. I say “typically” because the use of libraries is a best practice and not a mandatory activity. Badly designed applications will have the infrastructure tasks handled by the same code that provides the application functionality. It is a bad sign if you cannot identify the following basic building blocks:

Input validation: Input data should never be accessed directly. Individual bits of data should first be validated for type (“Is it a number?”) and meaning (“Birth dates set in the future are not valid”). It is generally accepted that the correct strategy to deal with input is to accept what you know is valid (as opposed to trying to filter out what you know is not).
Output escaping: To prevent XSS attacks, output should be properly escaped. The correct way to perform escaping depends on the context. In the case of HTML files, the metacharacters < (less than), > (greater than), & (ampersand), ’ (single quote), and “ (double quotes) should be replaced with their safe equivalents: <, >, &, ', and ", respectively. (Remember that an HTML file can contain other types of content, such as Javascript, and escaping rules can be different for them.)
Database interaction: Examine how database queries are constructed. The ideal way is through use of prepared statements. Constructing queries through string concatenation is easy to get wrong even if special care is taken.
External system interaction: Examine the interaction with systems other than databases. For example, in the case of LDAP, you want to see the LDAP query properly constructed to avoid the possibility of LDAP injection.
Session management: Examine the session management mechanisms for weaknesses (as described in Chapter 10).
Access control: Examine the code that performs access control. Does it make sense? You are looking to spot dumb mistakes here, such as storing information in cookies or performing authentication only at the gate, which lets those who know the layout of the application straight through.
Logging: The application should have an error log and an audit log. It should actively work to log relevant application events (e.g., users logging in, users logging out, users accessing documents). If, as recommended, you did black-box testing, you should look in the log files for your own traces. Learning how to catch yourself will help catch others.

Hot spot review

You should look for application hot spots by examining scripts that contain “dangerous” functions, which include those for:

File manipulation
Database interaction
Process execution
Access to input data

Some hot spots must be detected manually by using the application. For others, you can use the find and grep tools to search through the source code and tell you where the hot spots are.

First, create a grep pattern file, for example hotspots.txt, where each line contains a pattern that will match one function you would like to review. A list of patterns to look for related to external process invocation under PHP looks like this:

exec
passthru
proc_open
shell_exec
system
`
popen

Next, tell grep to search through all PHP files. If other extensions are also used, be sure to include extensions other than the .php one shown.

# find . -name "*.php" | xargs grep -n -f hotspots.txt

If you find too many false positives, create a file notspots.txt and fill it with negative patterns (I needed to exclude the pg_exec pattern, for example). Then use another grep process to filter out the negative patterns:

# find . -name "*.php" | xargs grep -n -f hotspots.txt | grep -v -f notspots.txt

After you find a set of patterns that works well, store it for use in future reviews.

Note

If you do not like working from a command line like this, another option is to use RATS (http://www.securesw.com/rats/), a tool for statistical source code analysis.

Gray-Box Testing

In the third and final phase of security assessment, the black-box testing procedures are executed again but this time using the knowledge acquired in the white-box testing phase. This is similar to the type of testing an attacker might do when he has access to the source code, but here you have a slight advantage because you know the layout of the files on disk, the configuration, and changes made to the original source code (if any). This time you are also allowed to have access to the target system while you are testing it from the outside. For example, you can look at the application logs to discover why some of your attacks are failing.

The gray-box testing phase is the time to confirm or deny the assumptions about vulnerabilities you made in the black-box phase. For example, maybe you thought Apache was vulnerable to a particular problem but you did not want to try to exploit it at that time. Looking at it from the inside, it is much easier and quicker to determine if your assumption was correct.