Forum Moderators: open

Message Too Old, No Replies

Whew. iaea.org is hyper

         

toolman

4:10 am on Nov 29, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Anyone else getting pounded by the scrapers? Glad I block 'em.

bobriggs

3:20 pm on Nov 29, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, I have been seeing 'em lately, but I can't block them.

I read:
[webmasterworld.com...]

I have mod_setenvif is a loaded module on my host, but it's apache 1.3.6 - I need 1.3.13 or higher to use in .htaccess

I don't have mod_rewrite loaded. And of course the ips come from different locations so I can't block based on that. Can you/anyone think of a way to block with this configuration? (only have access to .htaccess, not server config)

agerhart

3:26 pm on Nov 29, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have been seeing them in the logs lately as well........why?

websurprise

7:56 pm on Dec 5, 2001 (gmt 0)



In November almost 10% of my pageviews were from ieae.org. Last month was about 2%. Is there any way to stop them with robots.txt or another way that I can upload to my web site? Thanks.

volatilegx

11:52 pm on Dec 5, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



robots.txt isn't guaranteed to work, but .htaccess will.

See [httpd.apache.org...]

bobriggs

12:23 pm on Dec 6, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



...with mod_setenvif AND apache version 1.3.13 or higher.

toolman

3:23 pm on Dec 6, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use this (mod_rwrite):

RewriteEngine on  
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]
RewriteCond %{HTTP_REFERER} ^http://www.iaea.org$
RewriteRule !^http://[^/.]\.yourdomainnamehere.com.* - [F]

websurprise

3:30 pm on Dec 6, 2001 (gmt 0)



My hosting company uses http-analyze 2.4 for site stats. How can I find out what the spider name is to put in the robots.txt file?

I am contacting my hosting company about the .htaccess file too.

bobriggs

3:41 pm on Dec 6, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's the problem, there is no spider name, and it won't look at robots.txt at all.

Lately it's been coming (to my site) from AOL ip addresses, so you can't use a list of IPs. It just sends the referer (iaea.org)

And so far the only solutions I've seen to block it require mod_rewrite, which I don't have, or mod_setenvif with apache 1.3.13 or higher.

Toolman's solution (mod_rewrite) is by far the easiest to maintain and you can add to it easily.

wilderness

12:10 am on Dec 9, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I only have a single entry in my htaccess for IA bot:
deny from 209.247.40.

Lexus is the software name.
Somebody has included you in a dummy research survey.
The entire survey is only acurate based on the originators entries of websites available.
I would htaccess out those referer pages also.
If you can find the site that has included you in the survey.