Forum Moderators: open

Message Too Old, No Replies

cloaked spider from 66.220.7.* and 66.220.20.*

relentless .html + .txt spider

         

Hetta

5:13 pm on Jan 19, 2006 (gmt 0)

10+ Year Member



This particular bot picked up about 1000 .html and .txt files over the last 10 days. The UA is Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322).

Those IPs resolve to hurricane electric, which I've blocked now: I have a severe allergy to cloaked spiders.

Related: plinki, in the IP block of 66.220.23.192/27: there's a plinki hit among all the cloaked spiders hits, and plinki picked up robots.txt where the cloaked spider didn't.

One robots.txt over 10 days? Shouldn't they pick that up once a day or so?
... perhaps they have even more IPs. Sigh.

volatilegx

5:03 am on Jan 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hurricane Electric is mentioned fairly often in this forum as being a host for unwanted spiders :)

incrediBILL

6:36 am on Jan 20, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you're using PHP try serving up your robots.txt file from a PHP script and make it a spider trap and block whoever asks for it in real time if they aren't whitelisted - spider be gone.

I'm whitelisting Google, Yahoo, MSN, Teoma and Gigablast (for now), the rest hit the skids.

blend27

3:43 pm on Feb 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Every time i see an IP in my logs that belongs to "hurricane electric" ip ranges, it ends up to be a scraper bot. in fact i was just looking for an explanation to the fact why would a page be requested from completely orphan ip when the page could only be found on MSN if someone runs "site:" command and its on page 6 or 7 there.

i personally block all requests that come from HE ranges.

As incrediBILL says, if its not white-listed - .....