Forum Moderators: phranque
walrus said: ...here's an aggressive new scraper you might want to block with htaccess: 38.100.41.112
What was the bot in question? What user-agent was declared?
Thank you.
Eliz.
Mozilla/4.0 (compatible; MSIE 6.0; Windows XP
Not sure if the UA you've provided is incomplete as a result of a copy and paste error?
OR
If the UA is EXACTLY as?
If exactly and the trailing ) is missing and that is a fake UA.
I seem to recall that there was at one time an ends with XP denial being used by some folks, however my reccolection could be playing tricks on me.
why do they do that
1) they do not desire their active web pages to be harvested by less than major bots, regardless of criteria
2) they resent either the explanation or lack of accepted internet protocol for bots to comply with robots.txt
3) they do NOT desire any association with either the person, IP range or company that is behind the spidering.
There are a multitue of reasons which I have omitted.
There's an old thread in forum #11 in which Jim and others provided some valid explanations.
38.100.41.112 - - [02/Nov/2007:23:51:29 -0400] "GET / HTTP/1.1" 200 11312 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows XP)"
order allow,deny
deny from 38.
allow from allIs this correct?
Yes and perhaps ;)
If you have an existing file and your just adding those lines.
Here's some links to assorted exaplanations and examples.
Some old threads:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]
<snip>
[edited by: jdMorgan at 12:39 am (utc) on Nov. 7, 2007]
[edit reason] Removed URLs per TOS. [/edit]