homepage Welcome to WebmasterWorld Guest from 54.167.173.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Really enthusiastic human, or...?
*.europeonline.net - Pro/1.29
mivox




msg:395416
 7:30 pm on May 8, 2001 (gmt 0)

I got hammered by these four guys:

  • butterfly.europeonline.net
  • laurel.europeonline.net
  • hardy.europeonline.net
  • heromaster.europeonline.net
  • All show a browser of Pro/1.29.

    I would assume they were spiders, due to the large number of pages they fetched, but one of them shows an external referrer, which I've never seen on a spider before...

    So are these a nifty referrer-giving spider, or does our website have a HUGE fan in Europe somewhere?

     

    Froggyman




    msg:395417
     8:29 pm on May 8, 2001 (gmt 0)

    Teleport Pro/1.29
    [tenmax.com...]

    "...Launch up to ten simultaneous retrieval threads, access password-protected sites, filter files by size and type, search for keywords, and much more..."

    mivox




    msg:395418
     8:35 pm on May 8, 2001 (gmt 0)

    So it could be a European human user who doesn't want to pay internet-access-by-the-minute to browse our site?

    I don't like the idea of our entire site being downloaded, but if I had to pay for my internet per minute, I'd do the same... *sigh*

    jetsetter




    msg:395419
     8:59 pm on May 9, 2001 (gmt 0)

    I just had someone run this on my site and access 1000 pages over the course of an hour.

    I've been checking the tenmax.com website, but I haven't found anyway to block this beside pure IP blocking when identified.

    Any ideas?

    Joe

    Froggyman




    msg:395420
     9:34 pm on May 9, 2001 (gmt 0)

    If you are using Apache add this to .htaccess file:

    SetEnvIf User-Agent ^Pro/1.29
    <Directory /docroot>
    Order Allow,Deny
    Allow from=all
    Deny from env=Pro/1.29
    </Directory>

    Replace "docroot" with your own directory name.

    jetsetter




    msg:395421
     9:55 pm on May 9, 2001 (gmt 0)

    Thanks! I've already got the following in my .htaccess file. If I put the above code at the bottom, it locks me out of the site. How should I configure the file?

    <Limit GET PUT POST>
    order deny,allow
    deny from a.certain.specific.ip
    </Limit>
    Options -Indexes
    ErrorDocument 401 /error.php
    ErrorDocument 403 /error.php
    ErrorDocument 404 /error.php
    ErrorDocument 500 /error.php
    <Files .*>
    Deny from all
    </Files>

    <edit>Actually the error i get in my log is 'Missing envariable expression for SetEnvIf'
    </edit>

    Froggyman




    msg:395422
     11:03 pm on May 9, 2001 (gmt 0)

    Try this:

    SetEnvIf User-Agent ^Pro/1.29 Pro/1.29
    <Directory /docroot>
    Order Allow,Deny
    Allow from=all
    Deny from env=Pro/1.29
    </Directory>

    jetsetter




    msg:395423
     11:11 pm on May 9, 2001 (gmt 0)

    Thanks again.

    I've tried adding this to my .htaccess above the original code, and at the bottom. Still locks me out.

    I've tried with /docroot and /docroot/ and it still locks me out (with my docroot in place of docroot)

    I get a 500 error.

    Hmmm.

    I'm looking over this page: [httpd.apache.org...] as we speak.

    Froggyman




    msg:395424
     11:24 pm on May 9, 2001 (gmt 0)

    I'm sorry, I got it wrong. this will work.

    SetEnvIf User-Agent ^Pro/1.29 banned_bot
    <Directory /docroot>
    Order Allow,Deny
    Allow from all
    Deny from env=banned_bot
    </Directory>

    jetsetter




    msg:395425
     11:31 pm on May 9, 2001 (gmt 0)

    Arggh. Still doesn't do it. Maybe something in my original .htaccess file has to be removed.

    I'll try tomorrow. Thanks for your help.

    Frogman if you're the same as the ifroggy I'm thinking of, then we are both on the same host and it should work!

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved