homepage Welcome to WebmasterWorld Guest from 54.211.230.186
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
192.comAgent
violates robots.txt
Mokita

5+ Year Member



 
Msg#: 3027558 posted 10:37 pm on Jul 29, 2006 (gmt 0)

This ripped through one of our sites last month violating file and folder Disallows in robots.txt numerous times:

s15184428.rootmaster.info - - [08/Jun/2006:09:21:11 +1000] "GET /robots.txt HTTP/1.1" 200 1818 "-" "192.comAgent"
s15184428.rootmaster.info - - [08/Jun/2006:09:21:11 +1000] "GET / HTTP/1.1" 200 9270 "-" "192.comAgent"

Interestingly, even though 192.com seems to be a UK oriented site, the IP this bot came from, 217.160.75.202, is located in Germany. I wonder if it is even related to 192. I couldn't find any mention of the bot on the 192 site.

 

JKMitchell

5+ Year Member



 
Msg#: 3027558 posted 7:21 pm on Aug 7, 2006 (gmt 0)

It's also hammering all the sites hosted on my servers with no regard to robots.txt or bandwidth (clocked at 18 pages per second)!

<edited>
In my case the culprit seems to be s15205108.onlinehome-server.info (also in Germany?) I wonder if this is like the Grub project that was running a couple of years ago
</edited>

[edited by: JKMitchell at 7:24 pm (utc) on Aug. 7, 2006]

GaryK

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3027558 posted 7:34 pm on Aug 7, 2006 (gmt 0)

Same thing here. Right now it's banned by user agent. Hopefully that'll be enough.

marktheleg

5+ Year Member



 
Msg#: 3027558 posted 11:14 am on Aug 8, 2006 (gmt 0)

I've been having troubles with this one too. Totally ignores the robots.txt on all of my sites (i've disallowed it in there too).

The way i've stopped it for now is to call a bit of asp in the pages i want to keep it out of that requests the "user-agent" server variable and if it matches the "192.comAgent" redirects it to a custom 401 page, which in turn requests the ip (remote_addr server variable) and redirects it back to itself :)

Until who ever is responsible for this bot sorts it out i'm keeping it out.

JKMitchell

5+ Year Member



 
Msg#: 3027558 posted 8:40 pm on Aug 12, 2006 (gmt 0)

The way i've stopped it for now is to call a bit of asp in the pages i want to keep it out of that requests the "user-agent" server variable and if it matches the "192.comagent" redirects it to a custom 401 page, which in turn requests the ip (remote_addr server variable) and redirects it back to itself :)

Thats a neat idea and might give the bot the idea that it's not wanted.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved