Linux4Dummies: Developing Secure Web Applications

Contents:

Sanitize browser input/
Don't put everything in the HTML directory
Avoid the shell
'hidden' fields aren't
Don't trust HTTP_REFERER
Use POST instead of GET
Validate on the server
Use taint checking
Avoid real directory and file names
Use absolute path and filenames
Specify the open mode
Log suspicious errors
Look beyond the web application
Recommended reading

Web application development is very different from other environments. Web browser and the nature of HTTP pose security pitfalls not found in traditional client-server applications.

Web developers must know how web servers and browsers interact, the nature of Internet communications, and the attacks web applications undergo on the Internet.

If you think your network is secured by a firewall and network intrusion detection system, think again. Security flaws in web applications easily bypass firewalls and other basic security measures. Many banks, military and e-commerce sites have learned that lesson. It's easy for a well-experienced software developer to unwittingly create a web application that allows outsiders access to files on the server, gather passwords and customer information, and even alter the application itself despite firewalls and other security measures.

This document presents common web application security problems. The examples used here relate to Unix, Perl and the CGI.pm library running under Unix (arguably the most widely deployed environment for web apps). While the examples are specific, the flaws and concepts described apply to all languages and platforms: ASP, PHP, Servlets, Cold Fusion and so on.

Sanitize browser input:

All input from web browsers, such as user data from HTML forms and cookies, must be stripped of special characters and HTML tags as described in the following CERT advisories:

CERT advisory 97.25.CGI: http://www.cert.org/advisories/CA-1997-25.html
CERT advisory CA-2000-02: http://www.cert.org/advisories/CA-2000-02.html

This is by far the most common vulnerability in web applications. Everything from directory traversal problems to cross-site scripting problems can usually be traced to the simple lack of proper stripping of user input.

There are two separate dangers with browser input data:

Input containing special characters such as ! and & could cause the web server to execute an operating system command or have other unexpected behavior.
User input stored on the server, such as comments posted to a web discussion program, could contain malicious HTML tags and scripts. When another user views the input, that user's web browser could execute the HTML and scripts.

Special characters in form input

Characters such as & > ! $ (sometimes called "meta characters") have special meaning to many operating systems. For example, both Unix and Windows interpret the symbol < ("less than") as meaning "read input from a file".

When HTML form input is used to open files, send mail or interact with the operating system in any way, malicious users can enter meta characters hoping they will be passed to the underlying operating system, possibly accessing files or executing commands.

"Invisible" characters are also a threat... especially the NULL character (ASCII zero) and end-of-line characters. Programs written in C use the NULL character to mark the end of strings. However, languages like Perl do not. Inserting a NULL character in HTML form input can cause strings to be terminated early or be unrecognized by simple pattern matching filters. This "poisoned NULL" attack can cause interesting security problems when Web applications are built using both C and other languages. See Phrack magazine Vol 9 Issue 55 http://www.insecure.org/news/P55-07.txt for an excellent description of the problem.

Malicious HTML in stored data

Some web guest book and discussion board software allow users to format their comments with HTML tags such as <B></B> and <FONT SIZE>. This is pretty, but storing arbitrary HTML also allows users to insert Javascript, Java and DHTML tags. When another user reads the comments posted, the code can be executed on their browser, as if your web site had created it. User-created HTML can contain malicious scripts, applet references and other techniques to access files, delete files, crash a user's computer or steal information.

Stored HTML tags don't threaten your web server security like metacharacters do. Instead, they threaten the users of your web application. The real danger is one of trust... an outsider is allowed to store potentially dangerous HTML on your server. When another user reads it, they and the security settings of their web browser trust that the HTML was created by your organization. A Java applet or insecure Javascript functions that normally would be stopped could be allowed to run in the user's browser because it came from a trusted source.

Solutions

The best practice is to strip unwanted characters, invisible characters and HTML tags from user input.

When stripping unwanted characters, the safest way is to check the input against a list of valid characters, not a list of invalid ones. Why? It's too difficult to determine all possible malicious characters... just when you think you've thought of them all, a cracker invents an unexpected attack like "poisoned null" characters described above. It's also easier to simply check input against a list of characters A-Z and 0-9.

All input should be sanitized, not just selected fields. All input can potentially percolate through to unexpected places. Even if you're certain a particular input field cannot cause problems now, it might become possible in future revisions of the application. Rather than try to guess what input could be dangerous, it's simpler and more effective to just sanitize all input immediately when received from the browser.

It's simple to strip unwanted characters from a string. However, input from web applications is not always plain ASCII or UTF-8. Characters in HTML form fields, cookies and CGI query strings can also be expressed as HTML Character Entities (seehttp://www.w3.org/TR/REC-html40/sgml/entities.html) For example, the symbol < ("less than") can be input using the HTML Entity code < or in numeric format as <.

Stripping ampersand (&) and semi-colon (;) characters from input will disable such sly attempts to bypass input filters, but if your web application requires those characters it is best to decode any HTML character entities in all input to their corresponding characters before stripping. The Perl module HTML::Entities will do that for you.

If your web application is written in Perl using the CGI.pm library, all form input fields can be sanitized at the beginning of the program with a routine similar to the following:

use HTML::Entities ();
use CGI qw/:standard/;
$ok_chars = 'a-zA-Z0-9 ,-';
foreach $param_name ( param() ) {
    $_ = HTML::Entities::decode( param($param_name) );
    $_ =~ s/[^$ok_chars]//go;
    param($param_name,$_);
}

The above converts HTML character entities to plain characters then silently removes all characters except the ones listed in $ok_chars from all HTML form input fields collected by CGI.pm. This method cripples shell metacharacters, the poisoned NULL attack, disables HTML tags and defeats attempts to hide metacharacters using HTML entities.

The above code snippet works well, but instead of silently stripping input a more thorough solution would be to raise an alarm for the system administrator. When invalid characters are detected, it would be better to alert a system administrator and log the error and IP of the user to a file. Then you would know when potential crack attempts are made and by whom.

When sanitizing input, keep in mind that user input is not just limited to HTML form fields. A malicious user can potentially altereverything a web browser normally sends. HTML cookies and HTTP headers such as REMOTE_USER can be altered easily. It's not difficult to write a Perl script that poses as a web browser and sends altered versions of form input, hidden fields and other data complete with forged cookies and HTTP headers. Never trust any input from a browser... all of it can be altered.

Note that the above example Perl code only removes characters from HTML form fields. A more thorough sanitization routine would also strip input from cookies and HTTP headers in a similar way.

Don't put everything in the HTML directory:

Most web applications use many types of files:

Static files: HTML headers, footers, JPEG images and other raw non-changing content.
Parsed HTML: static HTML mixed with executable code, such as PHP and ASP files.
Include files: Libraries and routines shared by parsed HTML and other files.
Data files: Database or flat text files written by the web application

Most beginning web developers dump all their files in the HTML directory... the "public_html" "htdocs" or "C:\inetpub\wwwroot" directory on the web server.

Web servers use a "permission by directory" model: files are treated according to the directory they are in. Every file in the HTML directory can be accessed by a web browser if the URL is known. Every file in the cgi-bin directory can be executed.

If your web applications write data files (such as a guest book application or message board), don't put the data files in the cgi-bin directory or html directory. Every file in cgi-bin can be executed, including data files. Every file in /public_html can be read by a browser. Never assuming an outside will never guess the name of a file in those directories, because eventually... they will.

For example, if your web app places a data file named "card_numbers.dat" in /public_html, any outsider who guesses the file name can view it's contents in their browser. Likewise, if library file "user_validation.pl" is placed in cgi-bin it can be run, even though the code in that file should only be read by other executable files.

Place data files is a directory outside of both /html and /cgi-bin. For example, create a /data directory outside the cgi-bin and public_html directories so the files cannot be executed or read from a browser. Likewise a create /include directory so library files and header code cannot be executed or read from a browser.

Here's an example layout of a web server home directory on Unix. Note separate sub-directories in /data to hold data files from each web app used on the site:

/cgi-bin
/public_html
/data
   /catalog
   /webforum
   /webmail
/include

Keeping all data files in a common location also makes it easier to manage the web site. You have separate locations for HTML files, executable programs, shared library code and data. Separating the data files into sub-directories by application helps eliminate file naming problems, such as two different apps that create data files named "data.txt"

Avoid using a shell:

If you program in Perl, use Perl's system() call to run external programs instead of backticks (` `), as mentioned in the WWW security FAQ (http://www.w3.org/Security/Faq/www-security-faq.html).

Backticks in Perl are a convenient way to run external programs, carried over from shell programming. However, backticks cause Perl to spawn a Unix shell (or Windows command interpreter on that platform) then execute the program. Shells have powerful capabilities... by spawning a shell your application gives control to another application that is not as security-conscious as your own application. Stray metacharacters from user input could misdirect the shell into doing something nasty.

When using system(), separate each parameter in the system() call with commas. For example:

system("command", "arg1", "arg2");

This avoids a shell being spawned. If you string the program name together with the parameters, system() spawns a shell.

Not only is using system() more secure than the backtick method, it's more efficient... by not spawning a shell your application uses fewer system resources and runs faster.

'hidden' fields aren't:

Many CGI programs rely on so-called "hidden" form fields to store state information, settings and previous input data. However, HTML "hidden" fields are not hidden and not secure. Users can see them simply by viewing the HTML source of your form in their browser. It's easy for a user to change "hidden" fields... they only have to same the HTML form to their computer, edit the HTML then re-submit the form.

Contents of hidden fields should be sanitized and validated just like any other user input field. Hidden fields should not be used to set access modes or privileges for a CGI program (such as an 'admin' mode or 'paying user' privilege) without also using some form of user validation, such as password and username access restrictions.

The problem is demonstrated in detail in our paper "Preventing HTML form tampering"

( http://fafadiatech.blogspot.com/2012/05/preventing-html-form-tampering.html ). A solution, complete with example Perl code is presented.

If you write web applications in Perl, check out the CGI::EncryptForm module by Peter Marelas (available from CPANhttp://www.cpan.org/). It's one way of encrypting hidden form field information to prevent users from changing hidden fields.

A better way of preserving state information and settings is to store data in a file or database on the server then use an HTTP cookie or unique URL ID to reference the file. This is more difficult to program, but important data stays on your server. The Perl module CGI::Persistent by Vipul Ved Prakash is one tool that makes this technique easier to implement.

Don't trust HTTP_REFERER:

Don't rely on the HTTP_REFERER CGI variable for authentication. Many CGI scripts rely on that variable to make sure the script is being called by itself or another trusted source. However, the user's web browser, not the web server, sends HTTP_REFERER. Like all browser input, it can be can easily be spoofed.

In addition, some browsers and users accessing your script from behind proxy servers and firewalls don't send an HTTP_REFERER variable at all.

For example, the privacy protection proxy server Junkbuster and many firewall proxies routinely strip HTTP_REFERER.

Use POST instead of GET:

HTML forms can be submitted using either GET or POST methods. POST is preferred, especially when sending sensitive information.

The GET method sends all form input to the web application as part of the URL. For example:

http://www.example.com/cgi-bin/cart.cgi?username=jsmith&password=happypup

When the web application is called using GET, the above input is visible on the browser's URL location window. However, a more dangerous problem is that URLs are logged in many places:

The web server access log
The web browser's disk cache and history file
In firewall logs
In proxy server and web cache logs such as Squid.

All this logging allows others to see the data sent from HTML forms using GET.

The POST method sends form input in a data stream, not part of the URL. The data is not visible in the browser location window and is not recorded in web server log files.

The POST method is also more practical... there's a limit to how many characters can be sent using the GET method, but POST can send an almost unlimited amount of data from an HTML form.

However, even though POST information is generally not logged, like all other plain text information sent from a browser it can still be sniffed as it passes across the Internet. However, sniffing must be done in real time as information is sent across the Internet and requires the attacker to have physical access to the data lines between the web browser and web server. The risk of information being sniffed is far less than the risk of information being gathered from log files.

Validate on the server

Developers know they must check that form input fields contain the correct data type, that required input is not missing and to perform other simple sanity checks. However, a disturbing trend recently has been to use Javascript or Java applets to validate user input.

In a traditional web application, HTML form fields are checked when the user submits the HTML form to the web server. If a mandatory field is blank or has the wrong type of input, the web application typically sends back an HTML page describing the errors and lets the user must submit the form again.

Client-side validation lets this work be done in by the browser. Javascript or Java applets can be used to check inputs before the form is submitted to the web application, and not allow the form information to be sent until all fields are correct. This saves time and processing by the web application. It moves the overhead of input validation from the web application to the browser.

This technique saves load on the web server, but the problem is that anything running on the client end can never be trusted. Javascript in an HTML form can be changed or disabled by the end user very easily. Java applets can be disabled by the end user, or even decompiled and re-written.

It's easy for a knowledgeable user to save an HTML form, disable the embedded Javascript, then use the modified form to submit bad data back to the web application. When the application expects all input validation to have already been done by the web browser, and therefore doesn't double check the input, your web app can be compromised.

For this reason web applications should always validate form inputs on the server even if they are also validated on the client. Client-side input can never be trusted. Client-side validation should be used as a complement to server-side validation... a means of catching simple input mistakes to reduce the number of times the web server has to validate input. Client-side validation should never be trusted as a replacement for additional server-side validation.

Client-side validation is valuable... for highly loaded web sites it offloads a lot of work from the server to the browsers. However if server load is not a factor, it's probably not worth the trouble of writing both client-side and server-side validation routines for HTML form input.

If your web application is written in Perl, turn on Perl's "taint checks". These attempt to ensure that data coming from outside the program cannot accidentally be used to affect something at the operating system level, such as an unchecked user input field being used to erase a file.

Perl taint checks can be turned on by adding the -T parameter to #! line of the program. For example:

#!/usr/bin/perl -T

Beware that Perl's taint check is far from foolproof. It is not a substitute for knowing web application security issues and writing secure programs. More information about Perl taint checking is found on the 'perlsec' man page (http://language.perl.com/info/documentation.html)

The principle used by Perl's taint checks can be used in other programming languages, but for other languages the check must be applied manually.

Avoid real directory and file names:

Never use actual directory or file names as parameters or construct names based on user input. Instead, use keywords that arepointers and store the actual file or directory names in a lookup table.

For example, in a Perl program do NOT do this:

WRONG!> $datafile = param('datafilename');
WRONG!> $open DATAFILE $datafile or die;

Instead, do something like this:

BETTER> my %filelist = ( "name" => "/home/data/name.txt";
BETTER> "address" => "/home/data/address.txt" );
BETTER> $keyword = param('datafilename');
BETTER> open DATAFILE $filelist($keyword) or die;

Using the above method, HTML form input is never passed directly to the 'open' command. If a malicious user tries to pass a bad value (such as '/etc/passwd'), it will fail to find a match in the associative array. Lookup tables also prevent a cracker from using a poisoned NULL attack (see "Sanitize browser input," above) to shorten strings.

For flexibility, the locations of the files can be pulled in from an external configuration file, rather than be hard-coded into the application. However, make sure the config file cannot be accessed by web users (ie. put it in a separate data directory outside the html directory. See "Use a data directory," above). If the web server is shared by many users, also ensure the data file can't be changed by other users on the server.

If your web application absolutely must be capable of opening ad hoc files based directly on browser input, rather than a predetermined list, never accept complete path and filenames from HTML form input fields. Your web application should at least prefix the input filename with an absolute path and strip slashes, backslashes, NULLs and sequential dots ( ".." and "..." ) from the input.

For example:

$datadir = '/sites/internet/data';
datafile = param('datafilename');
sanitize($datafile);
open DATAFILE $datadir . $datafile or die;

Use absolute path and filenames:

Never assume a web app is being executed from a particular directory on the server. When opening files, whether for reading or writing, always use a fully qualified absolute path and filename. For example, instead of referring to a file like this:

open '../../../data/mydata.txt'

Use a complete pathname such as:

open '/sites/internet/data/mydata.txt'

Specify the mode when opening a file:

When opening a file for reading only (such as a configuration file), specify that it should be opened read-only. Most programming languages allow modes to be specified when opening files: read-only, write, append, read/write etc. Always explicitly specify the mode, especially is a file should only be opened read-only. Do not rely on the default mode of the programming language… the default may change in future versions.

For example, The Perl 'open' command defaults to read-only mode if no specific mode has been specified:

open DATAFILE '/sites/internet/data/mydata.txt';

However, future versions of Perl may change this, or Perl running on an unusual platform could use a different default. It is better to be specific:

open DATAFILE '</sites/internet/data/mydata.txt';

Log suspicious errors:

All applications should be written to trap errors. However, web applications are frequently attacked by crackers. It's a good idea to not only trap and recover from errors, but also to log events that may indicate an attack:

Before opening a file for reading, a web application should check that the file exists and is readable. If the filename was constructed based on input from an HTML form, attempts to access a non-existent file or one it doesn't have privileges to read can indicate an attack.
If a web application was written to be called with the POST method, it should raise an alarm if called using methods such as GET or PUT. Some web apps written to work when called by one method will fail when called by others. Even if your app handles with all methods properly, many crackers try this form of attack. Detecting it will warn you someone is trying to crack your web application.
If certain HTML input fields must always be present in a form (such as hidden fields that store values), the application should raise an alarm if the form is submitted without all the required fields. Missing fields could indicate a cracker is calling the web app from a saved version of the HTML form or from a script in an attempt to break in.
For Unix-based web applications, any input containing '/etc/passwd' should set off alarm bells. For Windows applications, attempts to access registry files or SAM files should be monitored. Input with ".." is always suspicious, indicating someone is trying a relative path attack (described above).
Denial of service attacks (where a malicious user tries to overload a server or application) can be detected by watching for repeated access from the same IP address. Multiple requests per second from the same IP could indicate a denial of service attempt, or that a cracker is running an automated script attempting to break the application.

Look beyond the web application:

All the above recommendations are methods of locking the "barn door"... securing the way users normally access a web application.

However, securing the application itself is worthless if the web server itself has not been secured. Internet security begins with having secure servers and networks. That means the web server operating system and all software on it has the latest security patches installed, it sits behind a well-maintained firewall, and everything is monitored for break-in attempts.

Many organizations believe skilled application developers are also skilled at system and network management. However, that is extremely rare. Being an application developer is a full time career, as is being a system administrator, network manager and security professional.

Many of the well-publicized attacks on web sites targeted well-known operating system and network problems that were missed simply because the organization was focused on developing the web applications and design of the web site, or assumed their development team had all the answers. The "Code Red" worm that affected hundreds of thousand of Microsoft's IIS web server was able to spread largely because unskilled or overworked staff didn't patch a security hole in that product soon enough.

Securing web applications requires a combined effort in many areas: application design, server management, network management and security auditing. Professionals specializing in one of these areas require a good knowledge of all the others, but in reality each area is a specialty that requires the attention of a focused individual or team.

Linux4Dummies

Pages

Tuesday, May 8, 2012

Developing Secure Web Applications