Welcome to WebmasterWorld Guest from 50.19.0.90

Forum Moderators: Ocean10000 & incrediBILL & phranque

Message Too Old, No Replies

.htaccess now returns 500 code for RewriteCond

Wanting 403 status rather than 500 in RewriteCond with bad bots

     
2:22 pm on Apr 8, 2014 (gmt 0)

New User

joined:Apr 8, 2014
posts: 8
votes: 0


Something changed in my RewriteCond User Agents bad bots that now returns the 500 code status. I have it set for the 403 code. Can someone help? I realize what I have is long, but I am new to this process and am learning. My site is hosted on a Linux server.
---------------

As appears in htaccess:

# server custom error pages
ErrorDocument 403 /403.html
ErrorDocument 404 /404.html
ErrorDocument 406 /406.html
ErrorDocument 500 /500.html

# block bad bots

RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} 360Spider [OR]
RewriteCond %{HTTP_USER_AGENT} A(?:ccess|ppid) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} C(?:apture|lient|opy|rawl|url) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} D(?:ata|evSoft|o(?:main|wnload)) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} E(?:ngine|zooms) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} f(?:etch|ilter) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} genieo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ja(?:karta|va) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Li(?:brary|nk|bww) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} P(?:r(?:eview|oxy)|ublish) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} robot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} s(?:craper|istrix|pider) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} W(?:get|(?:in(32|Http))) [NC]
RewriteRule .? - [F]

RewriteEngine On
RewriteBase /
# RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*\.mail\.ru [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.*fc5* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^aiHitBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Aboundex/0.3 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^AntBot/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^AISearchBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^AMZNKAssocBot/4.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^archive\.org_bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider+ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^coccoc/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CoPubbot/v1.2 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^checks\.panopta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DBLBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Daumoa/3.0.6 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Daumoa/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Drupal [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Dillo/0.8 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Ezooms/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EasouSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^emefgebot/beta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^elefent/Elefent/1.2 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlightDeckReportsBot/2.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^fulltextrobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Gigabot/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Googlebot-Image/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^heritrix [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^info.wbcrwl.305.09 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^KamodiaBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^kimsufi [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^KomodiaBot/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Konqueror/3.1-rc3 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^linkoatlbot/Linkoatl-0.9 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MeanpathBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MRSPUTNIK [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^news\ bot\ /2.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSeer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetcraftSurveyAgent/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^oBot/2.3.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot/0.1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Python-urllib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^1.2.3\ CPython/2.7.3 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Powermarks/3.5 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Plukkie/1.5 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Presto/2.7.62 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Sogou [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SISTRIXCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SeznamBot/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ssearch_bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Sleuth/1.3.8 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoilaBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Yeti/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexDirect/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexImages/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexBot/3.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YandexFavicons/1.0 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^YisouSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^woriobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wotbox/2.01 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wotbox [NC]
RewriteRule .* - [F]

RewriteEngine on
RewriteBase /
# IF UA contains Opera and comes from IP range
RewriteCond %{HTTP_USER_AGENT} Opera
RewriteCond %{REMOTE_ADDR} ^178\.137\.160\.157 [OR]
RewriteCond %{REMOTE_ADDR} ^91\.207\.9\.226
RewriteRule .* - [F]

RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule .* - [F]

RewriteEngine on
# RewriteBase /
RewriteCond %{HTTP_USER_AGENT} 176\.138\.58\.59\.broad\.pt\.fj\.dynamic\.163data\.com\.cn [NC,OR]
RewriteCond %{HTTP_USER_AGENT} (?:<|>||%0[AD0]|%27|%3[CE]|\d/\*|#\!|\\"|\n|^(?:.{0,9}|(?:\d+\.)?\d+)$) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^.{0,20}$ [NC,OR]
#RewriteCond %{REMOTE_HOST} ^.*\.compute-1\.amazonaws\.com$ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} your-server\.de [NC,OR]
#RewriteCond %{HTTP_USER_AGENT} startdedicated\.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} checks\.panopta\.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} djmail\.yaris\.co [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Firefox/mutant [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/4.61 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/0.91 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/0.6\ Beta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSIE\ ([23456])\. [OR]
RewriteCond %{HTTP_USER_AGENT} MSIE\ (7.0a1)\.
RewriteRule ^.* - [F]
2:29 pm on Apr 8, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:2677
votes: 94


Did you make a recent change? If so, that's probably the cause of the problem.
2:49 pm on Apr 8, 2014 (gmt 0)

Moderator from US 

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:2556
votes: 48


First suggestion is that you only need to add the
RewriteEngine on
RewriteBase /

one time. Beyond that there is too much mixing of terms and rules. Some examples:
RewriteCond %{HTTP_USER_AGENT} your-server\.de [NC,OR]
your-server.de is a referer or remote host, not a User Agent. Combining IPs, UAs and domain names is not good.

Use \ to escape and remember to escape the .

Too many things to detail each one, but as written, it cannot work the way you want it to. I suggest that you start by outlining your objectives and splitting off User Agents, referers and IPs into separate rules.
3:05 pm on Apr 8, 2014 (gmt 0)

New User

joined:Apr 8, 2014
posts: 8
votes: 0


Ok. I have commented out all but the first RewriteEngine on and RewriteBase/ as you have suggested. I have also commented out all but the first two (2) sections of bad bots which should remove all mixed terms. I am now what you mean by: Use \ to escape and remember to escape the .
3:29 pm on Apr 8, 2014 (gmt 0)

New User

joined:Apr 8, 2014
posts: 8
votes: 0


I understand what you mean by Use \ to escape, and remember to escape the "." . Looking over the second section of my bots, I have commented out: RewriteCond %{HTTP_USER_AGENT} ^info.wbcrwl.305.09 [NC,OR] . In the second section (as I stated above as now having commented out all but the first two), do I need to use the \ to escape in bots like Ezooms/1.0 to Ezooms/1\.0
6:56 pm on Apr 8, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:2677
votes: 94


do I need to use the \ to escape in bots like Ezooms/1.0 to Ezooms/1\.0

I think the best way is to put quotation marks around the whole expression. Like "Ezooms/1.0". That should always work (I think).
But it might be better in a case like this just to use Ezooms by itself without the /1.0
6:59 pm on Apr 8, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12692
votes: 243


Escape all literal periods in all patterns everywhere in mod_rewrite. Both in the rule and in any conditions.
The only place you don't need to escape is in targets. Literal spaces need to be escaped in Apache only; this is about Apache syntax, not Regular Expressions in general. Sometimes you have a choice between escaping a space and putting the whole package inside quotation marks.

Now, personally I'd shift everything over to mod_setenvif (BrowserMatch and BrowserMatchNoCase), saving RewriteRules for the truly complicated sets of conditions. I don't know if there exist tests comparing speed of mod_rewrite and mod_setenvif on equivalent actions; this is just gut feeling. It's definitely easier to read.
9:55 pm on Apr 8, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Install the rulesets one at a time on the server until you get an error.

That will narrow it down. It's likely to be a simple syntax typo somewhere.
2:13 am on Apr 9, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12692
votes: 243


Oh, and: Do you have access to your error logs? Not the general access logs, the error logs by that name. Sometimes they'll divulge something useful when it's a 500-class error.
2:35 am on Apr 9, 2014 (gmt 0)

New User

joined:Apr 8, 2014
posts: 8
votes: 0


I do have access to the log, but there is never anything in it
3:23 am on Apr 9, 2014 (gmt 0)

Moderator from US 

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:2556
votes: 48


I have also commented out all but the first two (2) sections of bad bots which should remove all mixed terms.

Your edits will leave all these rules in part 2 in place, and many of them won't work the way you want them to work because of the syntax:

RewriteCond %{HTTP_USER_AGENT} ^woriobot [NC,OR]

This part: ^woriobot tells the server to deliver a 403 when the UA starts with ^ and the first part of a UA does not usually have the bot's name at the beginning of the UA. There are other formats that would work better and faster for blocking named UAs. You sort of have some of it in the first blocking rules section - example:
RewriteCond %{HTTP_USER_AGENT} Ja(?:karta|va) [NC,OR]

though I'm not familiar with that exact format, many in the long list in part 2 would work better in a format like:
RewriteCond %{HTTP_USER_AGENT} (Access|Ahrefs|appid|Blog) [NC,OR]

Just because of that ^ issue.
4:17 am on Apr 9, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


RewriteCond %{HTTP_USER_AGENT} P(?:r(?:eview|oxy)|ublish) [NC,OR]
7:49 am on Apr 9, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12692
votes: 243


I'm not familiar with that exact format

It's a Regular Expression with alternation.
Ja(?:karta|va)

starts with "Ja"
next piece is either
karta
or
va
(always wondered about the connection, but have never been moved to look it up).

The ?: no-capture markup isn't essential, but might save a speck of work for the server as it doesn't have to keep things in memory until it passes the next Condition.

The ^ is not an Apache punctuation mark. It's a Regular Expression flag meaning "The specified content has to come at the very beginning of the text I'm looking at", here the User-Agent.
11:12 pm on Apr 9, 2014 (gmt 0)

New User

joined:Apr 8, 2014
posts: 8
votes: 0


found the issue: RewriteCond %{HTTP_USER_AGENT} ^.{0,20}$ [NC,OR] ... not sure how this got into the htaccess ... but commented it out and problem solved. Thanks to all!
12:33 am on Apr 10, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

joined:Apr 9, 2011
posts:12692
votes: 243


Yikes. "Any request with a UA string no longer than 20 characters" (with superfluous [NC] since there's no alphabetic content).

I once looked at lengths of UA strings because robots tend to be very short. But assorted people hereabouts pointed out some legitimate short UAs. Drat.

Don't see why it would create a 500-class error, though. Just an awful lot of false positives. Apache does recognize the {numbers-here} notation.
5:14 am on Apr 22, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5408
votes: 2


I do have access to the log, but there is never anything in it


The logs are not accumulating for either of the following reasons:

1) You have not turned your logs on?
2) Your htaccess files generates a constant 500 (due to a syntax error) and no (ZERO) visitors (including yourself) are actually seeing your website?
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members