homepage Welcome to WebmasterWorld Guest from 54.198.94.76
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
Forum Library, Charter, Moderators: coopster & jatar k & phranque

Perl Server Side CGI Scripting Forum

    
Perl security: how to scrub/handle input data?
Submitted data will contain malware
SteveWh




msg:4225666
 9:46 pm on Nov 2, 2010 (gmt 0)

I'm starting to prepare to go live with my first web application that uses Perl, and would like to know if there are any security precautions that should be added that I've overlooked. The server is Linux/Apache/PHP/Perl.

Except that its functionality is different, the interface is similar to the W3C HTML validation "by direct input" form: the user selects various options and then pastes a large text block, which can optionally be the source code of a web page, into an HTML textarea.

JavaScript truncates the text block, if necessary, to its maximum allowed length, and the text and options are submitted to the server as POST data.

The PHP receiver script validates all the submitted option values. If any values are invalid (e.g. a non-digit chars in a numeric value), it substitutes default legal values. It again truncates the text block to size in case it wasn't done by the JavaScript.

PHP then writes the text block (in its raw unscrubbed state) to a temporary file with a random name (that is never revealed to the user) in a directory that is blocked from web access (.htaccess: deny from all).

PHP then invokes my Perl script to scrub and perform its processing on the text that's in the temporary file:

$cmd = escapeshellcmd('perl -wT ' .
escapeshellarg("/path/to/myscript.pl") . ' ' .
[various args, including the name of the temporary file]);
$perlresult = shell_exec($cmd);


At this point, the text block in the temporary file may contain, and sometimes certainly will contain, malicious iframes, JavaScript, PHP, SQL injection exploits, and any other type of malicious code. Here is what Perl does with it:

It again interprets the option values and restricts them to legal values. Then...

use strict;
use warnings;
use Encode;
use HTML::Scrubber;# strips HTML tags
use HTML::Entities;# converts HTML entities
use Getopt::Long;

# [...code omitted...]

# READ ALL INPUT INTO A SINGLE STRING
# SO THAT SUBSEQUENT SEARCHES FOR TAGS
# CAN SUCCEED EVEN IF OPENING AND CLOSING
# TAGS ARE ON DIFFERENT LINES.

my $intext;# the entire text in one scalar string
while(<>)# THIS READS THE TEXT FROM THE TEMP FILE
{
$intext .= lc($_);# lower case all
}

# ---- FILTER THE INPUT TEXT
# I don't know if the next line's "do-nothing" decoding
# will actually accomplish anything. Its intended purpose
# is to turn all unsupported (i.e. illegally encoded) chars into
# legal supported ones even if the resulting output is garbage.

Encode::from_to($intext, "cp1252", "cp1252", Encode::FB_DEFAULT);

# Must strip out any embedded PHP code
# *before* passing the text to Scrubber,
# whose processing does not strip PHP,
# but does make PHP tags subsequently unfindable,
# while it preserves their potentially malicious contents in the text.
# TODO: also strip ASP and what other code?

$intext =~ s/\<\?(php)?.*?\?\>/ /sig;

# ---- STRIP OUT ALL HTML TAGS, COMMENTS, JAVASCRIPT.

my $scrubber = HTML::Scrubber->new;
$intext = $scrubber->scrub($intext);

# ---- DECODE ENTITIES.
# THE SCRIPT'S PROCESSING NEEDS THE ACTUAL CHARS, NOT "&apos;" etc.

$intext = decode_entities($intext);

# CHANGE CONTROL (0-31) AND SPACE CHARS TO SINGLE SPACE.
$intext =~ s/[[:space:][:cntrl:]]+/ /gi;

# THIS ALTERNATIVE ADDS TESTS FOR ANY REMAINING < AND >
# I AM NOT SURE WHETHER THIS IS NECESSARY.
$intext =~ s/[[:space:][:cntrl:]\<\>]+/ /gi;

# AT THIS POINT, $intext CONTAINS ONLY THE READABLE TEXT FROM THE
# WEB PAGE (IF THAT'S WHAT IT WAS), WITH ALL TAGS REMOVED.


In the remaining code, Perl processes the text and prints its summary output, which is received by PHP, which places the report on the page in an HTML textarea. Currently, PHP does not do any entity conversion. I haven't yet determined if that's necessary for textarea use. Then PHP deletes the temp file.

My main concern (at least the one I'm aware of) is whether it's possible for the unscrubbed text in the temporary file to contain any kind of exploit that could subvert or hijack the Perl <> operator while it reads the file, or subvert or corrupt HTML::Scrubber's processing of the text as it strips the tags.

Hopefully, if my code is any good, it can serve as an example to help someone else. If it needs fixing, that information can help anyone who reads what's wrong with it. Thank you to anyone who's willing to look at it.

 

janharders




msg:4225908
 11:57 am on Nov 3, 2010 (gmt 0)


My main concern (at least the one I'm aware of) is whether it's possible for the unscrubbed text in the temporary file to contain any kind of exploit that could subvert or hijack the Perl <> operator while it reads the file, or subvert or corrupt HTML::Scrubber's processing of the text as it strips the tags.

You should be safe. Personally, I don't think you need to strip php tags etc, just always use the html_entities() function in php when you output to a website. There might be legit cases where someone wants to paste code but is not trying to attack your website.

chorny




msg:4225970
 2:00 pm on Nov 3, 2010 (gmt 0)

Unless you do eval in Perl, you do need to bother with any code that is contained in a Perl string. Problem may appear on PHP side, but Perl has almost no bugs releated to security.

phranque




msg:4226500
 10:19 am on Nov 4, 2010 (gmt 0)

it doesn't sound like you need this in your case, but if you are using any input data in a SQL statement you will need to escape any special characters to transform the input into a legal string and to avoid SQL injection.
for example, if you were using a MySQL database, you would need to escape any backslashes and (single & double) quotes as described below.

http://dev.mysql.com/doc/refman/5.0/en/string-syntax.html [dev.mysql.com]:
When writing application programs, any string that might contain any of these special characters must be properly escaped before the string is used as a data value in an SQL statement that is sent to the MySQL server. You can do this in two ways:
* Process the string with a function that escapes the special characters. In a C program, you can use the mysql_real_escape_string() C API function to escape characters. See Section 20.8.3.53, "mysql_real_escape_string()" [dev.mysql.com]. The Perl DBI interface provides a quote method to convert special characters to the proper escape sequences. See Section 20.10, "MySQL Perl API" [dev.mysql.com]. Other language interfaces may provide a similar capability.
* As an alternative to explicitly escaping special characters, many MySQL APIs provide a placeholder capability that enables you to insert special markers into a statement string, and then bind data values to them when you issue the statement. In this case, the API takes care of escaping special characters in the values for you.

[edited by: phranque at 2:03 pm (utc) on Nov 4, 2010]

chorny




msg:4226519
 11:30 am on Nov 4, 2010 (gmt 0)

IMHO, placeholders are better for avoiding SQL injection because it is easier to use them.

phranque




msg:4226589
 2:05 pm on Nov 4, 2010 (gmt 0)

DBI: Placeholders and Bind Values:
http://search.cpan.org/~timb/DBI/DBI.pm#Placeholders_and_Bind_Values [search.cpan.org]

note that place holders cannot be used in all cases as specified in the cpan reference, in which case you must escape the input yourself as described above.

SteveWh




msg:4228689
 2:49 am on Nov 10, 2010 (gmt 0)

Sorry for my delay. Thank you all for the replies.

janharders, the main reason for stripping html and php tags is that they're irrelevant to the script's purpose. It's a text processor. If the user pastes a web page, the script still only wants the text that would be displayed on it.

chorny and phranque, thank you for the db related comments. Once the general methods for this script are worked out, there are multiple uses I can think of for it, and some involve MySQL. When starting to learn about PHP db methods (also a work in progress), I chose mysqli with its prepared statements and bound parameters. It's good to know that it looks like perl DBI provides equivalent methods.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Perl Server Side CGI Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved