Forum Moderators: open
I've already identified a wide variety of robots that misbehave, but there are some Urchin statistics that I' m having trouble with.
Specifically, two of the listings for robots that have visited my sites are described as "Mozilla Compatible Agent" and "Googlebot."
The specifics for the Mozilla compatible agent are:
Mozilla Compatible Agent:
* Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
* Mozilla/3.01 (compatible;)
* Mozilla/4.7 [en](Exabot@exava.com)
* Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
* Mozilla/4.0 (compatible; grub-client-2.3)
* Mozilla/3.0 (compatible; Indy Library)
* Mozilla/3.0 (compatible)
Specifics for the Googlebot listing is:
Googlebot:
* Googlebot/2.1 (+http://www.google.com/bot.html)
I have already banned the grub-client and Indy Library bots, but I'm unsure which Googlebot is legitimate. Also, which of the other various Mozilla compatible bots are suspect?
What a wealth of information! I'll add this to my resources. Thanks, Wilderness.
> Both Googlebots could be legitimate. You would have to check their
> IP address to be sure, though.
Oy. Both of the sites I'm working with now have log files I don't have access to, and, unfortunately, the Urchin stats don't seem to correlate the robot statistics with the IP address. With this info, however, maybe I can talk the host into giving me access to them.
> Btw, welcome to WebmasterWorld mattie :)
Thanks.
I've done extensive research about this over the last week, and consistently found that the answers I needed were found here.
I appreciate your input FiestaGirl, Wilderness and Claus!
With this application, of course, we'd be able to add the latest Web cretins to the list.
Or has someone already done this?
Thanks all,
Mattie
Or has someone already done this?
A Close to Perfect Htaccess (this old thread should keep you busy for a week or so ;)
[webmasterworld.com...]
With this application, of course, we'd be able to add the latest Web cretins to the list.Or has someone already done this?
Somebody beat you to it: [joseluis.pellicer.org...]
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
RewriteRule .* - [F]
Jim had RewriteRule!^403i?\.html$ - [F,L]
"This allows access only to custom 403 error and 'help' pages, which were not subsequently fetched." I just put the lines at end of my long RewriteCond...[OR] list.
RewriteCond %{HTTP_REFERER} ^$
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/[0-9]\.[0-9]{1,2}$
because a MSIECrawler was getting in.
62.118.153.11 - - [01/Nov/2004:11:41:53 -0500] "GET /mydomain/page.htm HTTP/1.1" 200 2511 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; MSIECrawler)"
Another domain I didn't add the RewriteConds to gave the MSIECrawler a 403 because of other RewriteConds I have in the htaccess files on both domains.
The following MSIECrawler got a 403. Difference is that it has "SV1" instead of ".NET CLR 1.1.4322".
204.119.21.25 - - [01/Nov/2004:02:23:01 -0500] "GET /mydomain/ HTTP/1.1" 403 217 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; MSIECrawler)"
I sure didn't understand it to begin with. I'm doing trial and error; copying, pasting and deleting; and learning what works along the way.
<suspiciously narrowing my eyes> Surely some of Them are here on WWW right now, looking for ways to foil our Rewrites. Maybe it's even the guy who started this thread! If only the Internet allowed us to form an angry mob with pitchforks and a good scapegoat.