homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
.htaccess now returns 500 code for RewriteCond
Wanting 403 status rather than 500 in RewriteCond with bad bots
lvilletxmike




msg:4661433
 2:22 pm on Apr 8, 2014 (gmt 0)

Something changed in my RewriteCond User Agents bad bots that now returns the 500 code status. I have it set for the 403 code. Can someone help? I realize what I have is long, but I am new to this process and am learning. My site is hosted on a Linux server.
---------------

As appears in htaccess:

# server custom error pages
ErrorDocument 403 /403.html
ErrorDocument 404 /404.html
ErrorDocument 406 /406.html
ErrorDocument 500 /500.html

# block bad bots

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} 360Spider [OR]
RewriteCond %{HTTP_USER_AGENT} A(?:ccess|ppid) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} C(?:apture|lient|opy|rawl|url) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} D(?:ata|evSoft|o(?:main|wnload)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} E(?:ngine|zooms) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} f(?:etch|ilter) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} genieo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ja(?:karta|va) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Li(?:brary|nk|bww) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} P(?:r(?:eview|oxy)|ublish) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} robot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} s(?:craper|istrix|pider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} W(?:get|(?:in(32|Http))) [NC]
RewriteRule .? - [F]

RewriteEngine On
RewriteBase /
# RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*\.mail\.ru [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*fc5* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^aiHitBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Aboundex/0.3 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^AntBot/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^AISearchBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^AMZNKAssocBot/4.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^archive\.org_bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider+ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^coccoc/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CoPubbot/v1.2 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^checks\.panopta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DBLBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Daumoa/3.0.6 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Daumoa/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Drupal [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Dillo/0.8 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Ezooms/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EasouSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^emefgebot/beta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^elefent/Elefent/1.2 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlightDeckReportsBot/2.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^fulltextrobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Image/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^heritrix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^info.wbcrwl.305.09 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^KamodiaBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^kimsufi [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^KomodiaBot/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Konqueror/3.1-rc3 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^linkoatlbot/Linkoatl-0.9 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MeanpathBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MRSPUTNIK [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^news\ bot\ /2.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSeer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetcraftSurveyAgent/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^oBot/2.3.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot/0.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^1.2.3\ CPython/2.7.3 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Powermarks/3.5 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Plukkie/1.5 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Presto/2.7.62 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Sogou [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SISTRIXCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SeznamBot/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ssearch_bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Sleuth/1.3.8 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoilaBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Yeti/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexDirect/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexImages/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexBot/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexFavicons/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YisouSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^woriobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wotbox/2.01 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wotbox [NC]
RewriteRule .* - [F]

RewriteEngine on
RewriteBase /
# IF UA contains Opera and comes from IP range
RewriteCond %{HTTP_USER_AGENT} Opera
RewriteCond %{REMOTE_ADDR} ^178\.137\.160\.157 [OR]
RewriteCond %{REMOTE_ADDR} ^91\.207\.9\.226
RewriteRule .* - [F]

RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule .* - [F]

RewriteEngine on
# RewriteBase /
RewriteCond %{HTTP_USER_AGENT} 176\.138\.58\.59\.broad\.pt\.fj\.dynamic\.163data\.com\.cn [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (?:<|>||%0[AD0]|%27|%3[CE]|\d/\*|#\!|\\"|\n|^(?:.{0,9}|(?:\d+\.)?\d+)$) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.{0,20}$ [NC,OR]
#RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} your-server\.de [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} startdedicated\.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} checks\.panopta\.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} djmail\.yaris\.co [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Firefox/mutant [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/4.61 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/0.91 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/0.6\ Beta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSIE\ ([23456])\. [OR]
RewriteCond %{HTTP_USER_AGENT} MSIE\ (7.0a1)\.
RewriteRule ^.* - [F]

 

aristotle




msg:4661435
 2:29 pm on Apr 8, 2014 (gmt 0)

Did you make a recent change? If so, that's probably the cause of the problem.

not2easy




msg:4661436
 2:49 pm on Apr 8, 2014 (gmt 0)

First suggestion is that you only need to add the
RewriteEngine on
RewriteBase /

one time. Beyond that there is too much mixing of terms and rules. Some examples:
RewriteCond %{HTTP_USER_AGENT} your-server\.de [NC,OR]
your-server.de is a referer or remote host, not a User Agent. Combining IPs, UAs and domain names is not good.

Use \ to escape and remember to escape the .

Too many things to detail each one, but as written, it cannot work the way you want it to. I suggest that you start by outlining your objectives and splitting off User Agents, referers and IPs into separate rules.

lvilletxmike




msg:4661440
 3:05 pm on Apr 8, 2014 (gmt 0)

Ok. I have commented out all but the first RewriteEngine on and RewriteBase/ as you have suggested. I have also commented out all but the first two (2) sections of bad bots which should remove all mixed terms. I am now what you mean by: Use \ to escape and remember to escape the .

lvilletxmike




msg:4661447
 3:29 pm on Apr 8, 2014 (gmt 0)

I understand what you mean by Use \ to escape, and remember to escape the "." . Looking over the second section of my bots, I have commented out: RewriteCond %{HTTP_USER_AGENT} ^info.wbcrwl.305.09 [NC,OR] . In the second section (as I stated above as now having commented out all but the first two), do I need to use the \ to escape in bots like Ezooms/1.0 to Ezooms/1\.0

aristotle




msg:4661488
 6:56 pm on Apr 8, 2014 (gmt 0)

do I need to use the \ to escape in bots like Ezooms/1.0 to Ezooms/1\.0

I think the best way is to put quotation marks around the whole expression. Like "Ezooms/1.0". That should always work (I think).
But it might be better in a case like this just to use Ezooms by itself without the /1.0

lucy24




msg:4661490
 6:59 pm on Apr 8, 2014 (gmt 0)

Escape all literal periods in all patterns everywhere in mod_rewrite. Both in the rule and in any conditions.
The only place you don't need to escape is in targets. Literal spaces need to be escaped in Apache only; this is about Apache syntax, not Regular Expressions in general. Sometimes you have a choice between escaping a space and putting the whole package inside quotation marks.

Now, personally I'd shift everything over to mod_setenvif (BrowserMatch and BrowserMatchNoCase), saving RewriteRules for the truly complicated sets of conditions. I don't know if there exist tests comparing speed of mod_rewrite and mod_setenvif on equivalent actions; this is just gut feeling. It's definitely easier to read.

g1smd




msg:4661619
 9:55 pm on Apr 8, 2014 (gmt 0)

Install the rulesets one at a time on the server until you get an error.

That will narrow it down. It's likely to be a simple syntax typo somewhere.

lucy24




msg:4661687
 2:13 am on Apr 9, 2014 (gmt 0)

Oh, and: Do you have access to your error logs? Not the general access logs, the error logs by that name. Sometimes they'll divulge something useful when it's a 500-class error.

lvilletxmike




msg:4661691
 2:35 am on Apr 9, 2014 (gmt 0)

I do have access to the log, but there is never anything in it

not2easy




msg:4661714
 3:23 am on Apr 9, 2014 (gmt 0)

I have also commented out all but the first two (2) sections of bad bots which should remove all mixed terms.

Your edits will leave all these rules in part 2 in place, and many of them won't work the way you want them to work because of the syntax:

RewriteCond %{HTTP_USER_AGENT} ^woriobot [NC,OR]

This part: ^woriobot tells the server to deliver a 403 when the UA starts with ^ and the first part of a UA does not usually have the bot's name at the beginning of the UA. There are other formats that would work better and faster for blocking named UAs. You sort of have some of it in the first blocking rules section - example:
RewriteCond %{HTTP_USER_AGENT} Ja(?:karta|va) [NC,OR]

though I'm not familiar with that exact format, many in the long list in part 2 would work better in a format like:
RewriteCond %{HTTP_USER_AGENT} (Access|Ahrefs|appid|Blog) [NC,OR]

Just because of that ^ issue.

wilderness




msg:4661727
 4:17 am on Apr 9, 2014 (gmt 0)

RewriteCond %{HTTP_USER_AGENT} P(?:r(?:eview|oxy)|ublish) [NC,OR]

lucy24




msg:4661741
 7:49 am on Apr 9, 2014 (gmt 0)

I'm not familiar with that exact format

It's a Regular Expression with alternation.
Ja(?:karta|va)
starts with "Ja"
next piece is either
karta
or
va
(always wondered about the connection, but have never been moved to look it up).

The ?: no-capture markup isn't essential, but might save a speck of work for the server as it doesn't have to keep things in memory until it passes the next Condition.

The ^ is not an Apache punctuation mark. It's a Regular Expression flag meaning "The specified content has to come at the very beginning of the text I'm looking at", here the User-Agent.

lvilletxmike




msg:4661941
 11:12 pm on Apr 9, 2014 (gmt 0)

found the issue: RewriteCond %{HTTP_USER_AGENT} ^.{0,20}$ [NC,OR] ... not sure how this got into the htaccess ... but commented it out and problem solved. Thanks to all!

lucy24




msg:4661953
 12:33 am on Apr 10, 2014 (gmt 0)

Yikes. "Any request with a UA string no longer than 20 characters" (with superfluous [NC] since there's no alphabetic content).

I once looked at lengths of UA strings because robots tend to be very short. But assorted people hereabouts pointed out some legitimate short UAs. Drat.

Don't see why it would create a 500-class error, though. Just an awful lot of false positives. Apache does recognize the {numbers-here} notation.

wilderness




msg:4664836
 5:14 am on Apr 22, 2014 (gmt 0)

I do have access to the log, but there is never anything in it


The logs are not accumulating for either of the following reasons:

1) You have not turned your logs on?
2) Your htaccess files generates a constant 500 (due to a syntax error) and no (ZERO) visitors (including yourself) are actually seeing your website?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved