Forum Moderators: open
Looks like it was hitting about ten pages per second. VERY bad!
Plus, it is STUPID. I have URL's that are upper and lower case- all the requests were lower case.
I do not know for sure that it follows robots.txt, but I did not catch it cheating. Hit me 120 times.
The URL checks out to a Santa Barbara location, which makes sense for Commission Junction.
Here is a hit:
207.71.241.81 - - [08/Apr/2003:03:21:24 -0600] "GET /a/lower/case/path/that/is/wrong HTTP/1.1" 404 11948 "-" "CJ Spider/"
dave
207.71.241.81 is definitely a CJ IP address. I have seen it in previous log entries when CJ support had checked something on my site. If you block that IP address, you will also block all human visitors from CJ. It is probably best to deny it by agent instead.
And yes, I got hit by this one for the first time today.
Ted
The dilemma is will banning the spider trigger an automatic review of a website or just eliminate the unwanted bandwidth usage?
Ted
>> Using mod_rewrite what would be the correct syntax to block 'CJ Spider/'?
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} ^CJ\ Spider/
RewriteRule .* - [F]
HTH,
Jim
Note that their documentation on the above page indicates that their spider
will visit, on a daily basis, all registered sites and all pages that have generated traffic within the past 30 days
Ted
will visit, on a daily basis, all registered sites and all pages that have generated traffic within the past 30 days
That is a bit of an over statement, don't you think? They might TRY, but if they try to visit EVERY page on my site EVERY day, they'd have to pull in 150-200,000 pages a DAY (as a guess). And if they try THAT...
dave