Forum Moderators: open
net-sweeper is a service that "filters" web pages for clients. Kind of like a Net Nanny or something, I guess. My understanding is that when you request a URL, they grab the page first (if it is not already in their database) and check it for "offensive" words. Like I said, that is my understanding...
In practice, I have seen this all over my sites. As you note, it does NOT respect robots.txt. (My guess it that it is trying to anticipate their clients next click, so it d/l's all the links from the first page it goes to, regardless of a robots.txt) I have the original IP address banned because of this practice. I am not sure how this effects their clients... but I frankly do not care.
dave
You can use mod_rewite:
RewriteCond %{REMOTE_ADDR} ^66\. 207\.120\.(22[4-9]¦25[0-5])\. [OR]
BTW there was a recent discussion in which the two backbones, Fibre Wired/Hamilton Hydro and Guelph Hydro were mentioned.
[webmasterworld.com...]
On this one, I am not sure if I would ban the entire net block...
Based on that this one says it does, and my summation above (if accurate), 66.207.120.227 is probably all that needs to be banned.
I THINK that 66.207.120.227 is there "preview" bot, that will d/l the page, and look for flagged words or phrases. The rest of the IP range COULD be legitimate users.
On this one, I would urge caution in banning it all... I think you can be fine with JUST banning the single IP.
dave
I can't tell you how many times my laxity in IP denies has permitted some pest to return and grab way too many pages.
Although my zealousness may not be appropiate for everybody? It fits my situation.
I've been getting hit from more than a few of FibreWird/Hamilton Hydro's users for nearly two years. It would be nice if I can deny on a UA and not close the door to all their users?
ONLY Hamilton Hydro can solve that dilema and like most IP's they hardly have their hearing aid on today or any other day as related to webmasters issues :(
Don
Thanks for the consideration and thought, though.
Not a problem! You are a bit more, well, reactive! than I am. I think I am a bit strong sometimes, too. (I am really hard on anyone stealing my images. But that is my business (as in my product!))
But to each his or her own... I am just glad we share info like this, so we can all make up our own minds how far to go!
<aside>Sometimes, I really wish I COULD just cut off all of APNIC like you. But I do get some sales from there. No many in number, but usually all high dollar amounts... so, again, we differ. ;) </aside>
dave
Initially I put a 403 block into effect but the badly designed bot just kept hammering away. In the end I had to block them at the IP level. I'm going to send these characters an invoice for the problems they have caused.
Regards...jmc