Forum Moderators: coopster

Message Too Old, No Replies

Help Understanding This Code

Could someone explain how this code works?

         

MickeyRoush

4:24 am on Jul 19, 2011 (gmt 0)

10+ Year Member



This code is from a third party Joomla plugin. I can understand from the comments that it's trying to protect your site from XSS, but I was hoping someone could explain how it works.
I'm really curious has to how $data is being used. Can someone please explain this?

Here is part of the code:

/**
* Cross site filtering (XSS). Recursive.
*
* @param string Data to be cleaned
* @return mixed
*/
public function xss_clean($data) {
// If its empty there is no point cleaning it :\
if (empty($data))
return $data;

// Recursive loop for arrays
if (is_array($data)) {
foreach ($data as $key => $value) {
$data[$key] = $this->xss_clean($data);
}

return $data;
}

// Fix &entity\n;
$data = str_replace(array('&', '<', '>'), array('&', '<', '>'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');

// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);

// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);

// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);

// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);

/* Remove eval() */
if ($this->pluginParams->get('check_eval', 1)) {
$data = str_ireplace('eval(', '', $data);
}

/* Remove base64_decode() */
if ($this->pluginParams->get('check_base64', 1)) {
$data = str_ireplace('base64_decode(', '', $data);
}

/* Remove UNION SQL command */
if ($this->pluginParams->get('check_sql', 1)) {
$data = str_ireplace('UNION', '', $data);
}

/* Remove CONCAT SQL command */
if ($this->pluginParams->get('check_sql', 1)) {
$data = str_ireplace('CONCAT', '', $data);
}

/* Remove SELECT * FROM SQL command */
if ($this->pluginParams->get('check_sql', 1)) {
$data = str_ireplace('SELECT * FROM', '', $data);
}

/* Remove jos_users to prevent SQL injections into users table */
if ($this->pluginParams->get('check_sql', 1)) {
$data = str_ireplace('jos_users', '', $data);
}

/* Remove 1=1-- SQL injection pattern */
if ($this->pluginParams->get('check_sql', 1)) {
$data = str_ireplace('1=1--', '', $data);
}

/* Remove a=a-- SQL injection pattern */
if ($this->pluginParams->get('check_sql', 1)) {
$data = str_ireplace('a=a--', '', $data);
}


do {
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
} while ($old_data !== $data);

return $data;
}

brotherhood of LAN

4:40 am on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It looks like it's protecting and cleaning data. When Joomla started this was probably a smaller piece of code but has grown with known vulnerabilities.

// Fix &entity\n;
$data = str_replace(array('&amp;', '&lt;', '&gt;'), array('&amp;amp;', '&amp;lt;', '&amp;gt;'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');


This just seems to allow literally displaying an HTML entity rather than showing the entity itself, just like WebmasterWorld can allow you to display <html> without the tag rendering.

// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);


Replacing attributes starting with "on" suggests they're trying to remove javascript events, like <a onclick="alert('This is annoying');">click me</a>. xmlns/xmlnamespace is probably removed because there's no requirement for it within $data, not sure.

// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);


Clearly they don't want any scripting commands to be in $data

// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);


Apparently there's an IE quirk that allows scripting with the style attribute. I imagine this is not a well known quirk but it exists nonetheless. This code removes any potential scripting from within style attributes.

// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);


Again, with the namespace, there must be some kind of vulnerability involving declaration of namespaces so this function is making sure there aren't any in $data

/* Remove eval() */
if ($this->pluginParams->get('check_eval', 1)) {
$data = str_ireplace('eval(', '', $data);
}


eval('code goes here') evaluates code in a number of server and client side languages, Joomla apparently doesn't want/require that in $data

/* Remove base64_decode() */
if ($this->pluginParams->get('check_base64', 1)) {
$data = str_ireplace('base64_decode(', '', $data);
}


This is checking for hidden nasties stored as base64 data in GET variables I think

/* Remove UNION SQL command */
if ($this->pluginParams->get('check_sql', 1)) {
$data = str_ireplace('UNION', '', $data);
}


This and the bunch of following commands look like they're preventing SQL injections, the SQL statement they're preparing must involve user-submitted variables that they're carefully checking.

// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
} while ($old_data !== $data);


Stuff that could really mess up a page if unchecked. Even a simple tag like <base> somewhere on a page could break all links on a page if they're relatively referenced.



Basically this looks like it's santizing user data, most likely for a comment section or user-submitted data that ends up on a webpage.

rocknbil

3:51 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



how $data is being used


Note also that $data is being passed as a parameter that is an array:

* @param string Data to be cleaned
* @return mixed

So theoretically you could do

$_POST = xss_clean($_POST);
or
$my_cleansed_array = xss_clean($_POST);

The variable $data itself is "closed" to the rest of the program (within the scope of the function), that is, you could have a variable somewhere in your program named $data and it would not be affected by what happens with $data in the function. If you needed to (which you shouldn't, and shouldn't do) you could declare $data as a global, but that's kind of against the whole idea of functions/classes - black boxes you pass information to and receive a result, as above.

MickeyRoush

4:26 am on Jul 20, 2011 (gmt 0)

10+ Year Member



@ brotherhood of LAN and rocknbil
Thanks!