homepage Welcome to WebmasterWorld Guest from 54.197.147.90
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
192.comAgent
violates robots.txt
Mokita




msg:3027560
 10:37 pm on Jul 29, 2006 (gmt 0)

This ripped through one of our sites last month violating file and folder Disallows in robots.txt numerous times:

s15184428.rootmaster.info - - [08/Jun/2006:09:21:11 +1000] "GET /robots.txt HTTP/1.1" 200 1818 "-" "192.comAgent"
s15184428.rootmaster.info - - [08/Jun/2006:09:21:11 +1000] "GET / HTTP/1.1" 200 9270 "-" "192.comAgent"

Interestingly, even though 192.com seems to be a UK oriented site, the IP this bot came from, 217.160.75.202, is located in Germany. I wonder if it is even related to 192. I couldn't find any mention of the bot on the 192 site.

 

JKMitchell




msg:3037682
 7:21 pm on Aug 7, 2006 (gmt 0)

It's also hammering all the sites hosted on my servers with no regard to robots.txt or bandwidth (clocked at 18 pages per second)!

<edited>
In my case the culprit seems to be s15205108.onlinehome-server.info (also in Germany?) I wonder if this is like the Grub project that was running a couple of years ago
</edited>

[edited by: JKMitchell at 7:24 pm (utc) on Aug. 7, 2006]

GaryK




msg:3037710
 7:34 pm on Aug 7, 2006 (gmt 0)

Same thing here. Right now it's banned by user agent. Hopefully that'll be enough.

marktheleg




msg:3038401
 11:14 am on Aug 8, 2006 (gmt 0)

I've been having troubles with this one too. Totally ignores the robots.txt on all of my sites (i've disallowed it in there too).

The way i've stopped it for now is to call a bit of asp in the pages i want to keep it out of that requests the "user-agent" server variable and if it matches the "192.comAgent" redirects it to a custom 401 page, which in turn requests the ip (remote_addr server variable) and redirects it back to itself :)

Until who ever is responsible for this bot sorts it out i'm keeping it out.

JKMitchell




msg:3044147
 8:40 pm on Aug 12, 2006 (gmt 0)

The way i've stopped it for now is to call a bit of asp in the pages i want to keep it out of that requests the "user-agent" server variable and if it matches the "192.comagent" redirects it to a custom 401 page, which in turn requests the ip (remote_addr server variable) and redirects it back to itself :)

Thats a neat idea and might give the bot the idea that it's not wanted.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved