Forum Moderators: DixonJones

Message Too Old, No Replies

Blocking bots from crawling files not present on server

I want to stop bots from crawling pages which have NEVER existed the server

         

ashish21cool

12:14 pm on Feb 8, 2007 (gmt 0)

10+ Year Member



Hi Friends,

While I was going through raw logs of my sites, I came across a strange error. Raw logs were pointing to pages which have never existed on my sever and this is happening in each day's log. To make it more clear please go through the sample error log which is as follows:
[Wed Jan 31 20:08:41 2007] [error] [client #*$!.#*$!.#*$!.#*$!] File does not exist: C:/IBM HTTP Server/htdocs/en_US/blog
[Wed Jan 31 20:08:47 2007] [error] [client #*$!.#*$!.#*$!.#*$!] File does not exist: C:/IBM HTTP Server/htdocs/en_US/blogs
[Wed Jan 31 20:08:47 2007] [error] [client #*$!.#*$!.#*$!.#*$!] File does not exist: C:/IBM HTTP Server/htdocs/en_US/community
[Wed Jan 31 20:08:48 2007] [error] [client #*$!.#*$!.#*$!.#*$!] File does not exist: C:/IBM HTTP Server/htdocs/en_US/blogs

This #*$!.#*$!.#*$!.#*$! corresponds to different Ip addressess.

I would like to know how should I prevent this bots to stop jamming my site's bandwidth and stop cralwing and reporting errors which are not there.

Thanks friends for you help.

mipapage

12:55 pm on Feb 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I imagine that this bot is being returned a 404 PAGE NOT FOUND, so the bandwidth cost may be minimal.

In any case, why not use a robots.txt file to keep bots from these areas in case there is something that points there?

blend27

3:28 pm on Feb 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Get the IP Range(s) for this "BOT" and block them

ashish21cool

4:05 am on Feb 17, 2007 (gmt 0)

10+ Year Member



Its not possible to stop these bots by using their Ip addressess as they come from varying Ip addressess and in large numbers.
Is there any other alternative to this?

Look forward to hearing from u.