Forum Moderators: open

Message Too Old, No Replies

Secret ADSL Crawlers

         

papamaku

12:06 pm on Jul 21, 2004 (gmt 0)

10+ Year Member



For last few weeks been getting heavily hit (new page every 10 secs or so) from:

ip - 213.bb.cc.dd
nslookup - act****-dsl.example.com
location - example, USA
user agent - "Mozilla/4.0 (compatible; MSIE 4.01; Windows 98)"

Seems like a home user using an IE based crawler on their adsl line.

Now I dont mind being crawled, I just like people to clearly state who/what they are.

[edited by: Brett_Tabke at 9:36 pm (utc) on July 21, 2004]

Brett_Tabke

1:11 pm on Jul 21, 2004 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Yes, it was definatly a home user there.

We try not to post ip's of just any users since it could lead to problems for that person.

There are so many bots out there right now, that if we started posting the home run bots, this forum would have hundred of thousands of ip's dumped in it.

So, we try to stick to just the known search engine ip's.

wilderness

2:10 pm on Jul 21, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hello papamaku,
For these unidentified crawls your options are limited.
1) You either implement a bot-trap which stops them dead-in-their tracks
( [webmasterworld.com...] )

or you begin denying visitors based on either the UA (which in this instance for you does not provide anything outstanding to draw atttention to,) or you deny IP ranges with either SetEnvIf or Rewrites.
[webmasterworld.com...]
(with manual implementation you merely prevent that visitor from returning with either the same UA or within the same IP range.)
With the advent of DSL/Cable and shared rather than fixed ISP ranges, it's getting quite difficult to eliminate bad visitors while keeping the innocent visitors to a minimum.

(One a side note, I have the above conflict with both Eastern US IP's and RIPE IP's. My instance is quite unusual and upon inquiry, I just apologize and and inform the inqurying innocent visitor that IF they are able to be provided with either a "fixed or very narrow ocet range" from their provider, than I'm willing to match that effort with an htaccess entry.)
[These extreme's would not be rquired if ISP's would only enforce their own UAG's against their customers, not being able to see how these intrusions affect their other customers.]

papamaku

8:24 am on Jul 22, 2004 (gmt 0)

10+ Year Member



Brett - I wasn't sure of what info I could/couldn't post, so just sent it all, knowing you guys were currently pre-modding.

How about putting a post template for spider info in the charter (as most people will have the same kind of question/info) and list what can/can't be posted.

Wilderness, yeah I've let him have a good run of it now + am gonna .htaccess him outta there :)

volatilegx

4:46 pm on Jul 23, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



papamaku,

Your post was OK. Brett obscured the IP address and hostname because they appear to belong to a private individual and not a search engine spider.

Dan