Forum Moderators: phranque

Message Too Old, No Replies

Blocking a bot that redirects a certain page

         

jake66

5:59 am on Jun 15, 2008 (gmt 0)

10+ Year Member



I've come to notice, a lot of "bots" (the bad ones) don't obey mod_rewrite properly and this results in them viewing a bunch of broken images on my site.

Instead of viewing:
mysite.com/rewrite/stuff.html
mysite.com/images/header.jpg

they show up in my error logs as:
mysite.com/rewrite/images/header.jpg

I have about 6 or 7 site-wide rules for changing dynamic URLs to static.
How can I redirect someone off my site or whatever, if they disobey one of my rewrite rules?
Is this possible?

And: How do I know these are bots?
I pick a random IP from my error log that hits pages like this and search the IP. If a zillion results come up (wikipedia bans, other sites' visitor logs, etc.) I assume it's pretty safe to say I'm dealing with a bot that doesn't identify itself as a bot.

wilderness

1:04 pm on Jun 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've come to notice, a lot of "bots" (the bad ones) don't obey mod_rewrite properly and this results in them viewing a bunch of broken images on my site.

If imposed properly?
They have no choice in the matter!

robots.txt allows choice.
htaccess imposes restrictions.

There are no standard rules for recgonizing bots which have not followed the protocol of identification in the UA portion of the visitor log.
The methods are ever changing for most bots.

You simply need to be aware of the inter-relation of you content and make a dtermination towards traffic based on that knowledge.

jdMorgan

2:17 pm on Jun 15, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As wilderness says, if your code is implemented properly, then 'bots and browsers have no choice about "obeying" mod_rewrite -- They can't even tell it's there.

So you may have a problem with your code.

The solution to unwanted access is not to redirect the client -- *If* the 'bot actually follows redirects, that just passes off the problem to someone else, and wastes internet bandwidth. The proper response is a simple, low-bandwidth 403-Forbidden response, and be done with it.

Jim