Forum Moderators: phranque
I've checked it using wannabrowser as a blacklisted bot and it works as expected in most cases.
If you try to access the robots.txt you get a HTTP 200 and the contents of the file.
If you try to access a page outside of the root eg. http://www.example.com/page.htm you get a HTTP 403 and the custom error page. However...
If you try and access the site root http://www.example.com you get the desired HTTP 403 but no custom error page - instead it returns an Apache HTTP Server Test Page.
Below is the htaccess file - Any help is greatly appreciated.
<.htaccess file start>
# CUSTOM ERROR PAGES
ErrorDocument 400 /error/400.htm
ErrorDocument 401 /error/401.htm
ErrorDocument 403 /error/403.htm
ErrorDocument 404 /error/404.htm
ErrorDocument 500 /error/500.htm
# EXTENSION FREE URI'S
Options +MultiViews
Options +followsymlinks
# 301 REDIRECT WWW/NON-WWW CANONICAL ISSUE
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
# BLOCK BAD BOTS
RewriteCond %{HTTP_USER_AGENT} SurveyBot/2.3 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget
RewriteRule !^(error/403\.htm¦robots\.txt)$ - [F,L]
<.htaccess file end>
Bubster
I'm wondering if it really makes any difference what the content of the page returned is for the blocked bots so along as they receive a valid HTTP 403 response?
I've checked the log for wannabrowser and I get the following for a request of the main site root (www.example.com) from a bad bot:
HTTP/1.1 403 Forbidden
Date: Tue, 19 Feb 2008 17:51:53 GMT
Server: Apache
Vary: Host
Accept-Ranges: bytes
Content-Length: 5044
Connection: close
Content-Type: text/html
Does it really matter if it doesn't return my custom 403 error file to the blocked bots? or am I missing something fundamental?
Thanks again for any assistance.
http://www.example.com/page1.htm
http://www.example.com/page2.htm
http://www.example.com/directory1/
http://www.example.com/directory2/ etc.
and returns a 403 with my custom errors file - as desired.
It is only when the request is to the domain root http://www.example.com/ that to 403 is returned without the custom error page; which is replaced with the standard Apache test page.
I'm wondering if this could be a server configuration issue !?
I'm wondering if this could be a server configuration issue !?
Bots are using a variety of tactics today that were not formerly used.
One of which is an incomplete URL, which serves the same purpose as a ping and returns a 301 (which is actual access), even though the bot or it's IP range may be denied.
Have no clue if that is your occurence or not, however, I'm seeing this issue in my visitor logs more and more frequently.
The only quibble I'd have with it it, "Why bother redirecting non-canonical requests if the user-agent is a bad bot?" Consider reversing the two rules.
Jim
I've spoken to my tech support and apparently the Apache test server page is only served when an index page isn't available (which there is) so I'm not sure how the redirect is missing it.
I don't have access to the config files so I may need to take another approach - oh, and MultiViews doesn't seem to make any difference.
Does anyone know the benefits of using a redirect instead of just using a - F or G flag which I presume just stops the bot dead?
Cheers