Welcome to WebmasterWorld Guest from 54.166.224.46

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

192.comAgent

violates robots.txt

     
10:37 pm on Jul 29, 2006 (gmt 0)

5+ Year Member



This ripped through one of our sites last month violating file and folder Disallows in robots.txt numerous times:

s15184428.rootmaster.info - - [08/Jun/2006:09:21:11 +1000] "GET /robots.txt HTTP/1.1" 200 1818 "-" "192.comAgent"
s15184428.rootmaster.info - - [08/Jun/2006:09:21:11 +1000] "GET / HTTP/1.1" 200 9270 "-" "192.comAgent"

Interestingly, even though 192.com seems to be a UK oriented site, the IP this bot came from, 217.160.75.202, is located in Germany. I wonder if it is even related to 192. I couldn't find any mention of the bot on the 192 site.

7:21 pm on Aug 7, 2006 (gmt 0)

10+ Year Member



It's also hammering all the sites hosted on my servers with no regard to robots.txt or bandwidth (clocked at 18 pages per second)!

<edited>
In my case the culprit seems to be s15205108.onlinehome-server.info (also in Germany?) I wonder if this is like the Grub project that was running a couple of years ago
</edited>

[edited by: JKMitchell at 7:24 pm (utc) on Aug. 7, 2006]

7:34 pm on Aug 7, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Same thing here. Right now it's banned by user agent. Hopefully that'll be enough.
11:14 am on Aug 8, 2006 (gmt 0)

5+ Year Member



I've been having troubles with this one too. Totally ignores the robots.txt on all of my sites (i've disallowed it in there too).

The way i've stopped it for now is to call a bit of asp in the pages i want to keep it out of that requests the "user-agent" server variable and if it matches the "192.comAgent" redirects it to a custom 401 page, which in turn requests the ip (remote_addr server variable) and redirects it back to itself :)

Until who ever is responsible for this bot sorts it out i'm keeping it out.

8:40 pm on Aug 12, 2006 (gmt 0)

10+ Year Member



The way i've stopped it for now is to call a bit of asp in the pages i want to keep it out of that requests the "user-agent" server variable and if it matches the "192.comagent" redirects it to a custom 401 page, which in turn requests the ip (remote_addr server variable) and redirects it back to itself :)

Thats a neat idea and might give the bot the idea that it's not wanted.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month