homepage Welcome to WebmasterWorld Guest from 184.73.104.82
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Unknown bot - amzn_assoc
anyone sen it?
upside

10+ Year Member



 
Msg#: 1515 posted 3:53 pm on Nov 30, 2002 (gmt 0)

I've been getting a tremendous ammount of hits from a bot with a useragent of "amzn_assoc". Its shown up at the following ip's this past week:

66.125.173.224
64.160.49.82
64.166.159.198
64.166.156.221
64.165.204.84
64.166.157.42

Does anyone know who/what this is? Any reason not to ban it? A search on google turned up nothing.

I know Amazon webservices uses a bot named "aranhabot" but does this one belong to Amazon?

P.S. - Thanks to all on Webmasterworld for all the good reading. I've been lurking for quite a long time!

 

chiyo

WebmasterWorld Senior Member chiyo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 1515 posted 8:05 pm on Nov 30, 2002 (gmt 0)

/is the site an amazon affiliate? We get these but assumed it was amazon checking that their affiliates were following their TOS e.g. if you are using their web services you must update the search results regularly.

Romeo

10+ Year Member



 
Msg#: 1515 posted 8:33 pm on Nov 30, 2002 (gmt 0)

... the IP addresses are adsl subscriber lines of snfc21.pacbell.net ... I don't think it is the real Amazon.

Regards,
R.

WebGuerrilla

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 1515 posted 8:56 pm on Nov 30, 2002 (gmt 0)

I've seen it quite a bit also. Definitely one for the ban list.

carfac

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 1515 posted 5:26 am on Dec 2, 2002 (gmt 0)

upside:

Got hit by this last week.... sorry, I should have posted it. Hit pretty hard, got itself banned on my sites.

dave

Key_Master

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 1515 posted 7:26 am on Dec 2, 2002 (gmt 0)

This is what an Amazon employee looks like...

IP: 207.171.180.101
Host: 207-171-180-101.amazon.com

upside

10+ Year Member



 
Msg#: 1515 posted 5:22 pm on Dec 4, 2002 (gmt 0)

Amazon posted this [forums.prospero.com] on their Associates Announcement Board.

So Amazon does have a bot named amzn_assoc. I wonder though why it's comming from pacbell.net.

Last week I banned it using .htaccess because it downloaded over 350 megs of dynamically generated content without once checking robots.txt.

There is more information and discussion about this in this thread [forums.prosperotechnologies.com] on their Associates Board.

spinnercee

10+ Year Member



 
Msg#: 1515 posted 2:43 am on Dec 5, 2002 (gmt 0)

If it's any help, I get incoming from Alexa.com who is an associate/partner of Amazon.com --- Check 2 see if you're listed... who knows? I haven't noticed the bot, though.

spinnercee

10+ Year Member



 
Msg#: 1515 posted 3:49 am on Dec 5, 2002 (gmt 0)

I should mention that Alexa is a search engine that caches a cute little image (looks like a screen capture) of webpages -- Does anybody know how they do that? :)

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 1515 posted 3:58 am on Dec 5, 2002 (gmt 0)

If any of you are Amazon Associates, it would be a service to all if you would report to them that this 'bot is being banned because it does not check or respect robots.txt.

I have few hard-and-fast, non-negotiable rules, but this is one: Respect robots.txt or eat 403s.

Jim

P.S. Spinnercee: Welcome to WebmasterWorld!

carfac

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 1515 posted 6:39 pm on Dec 5, 2002 (gmt 0)

Respect robots.txt or eat 403s

:) :) :)

Jim, you kill me!

dave

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 1515 posted 8:04 pm on Dec 5, 2002 (gmt 0)

<snip>Jim, you kill me!>

Acually. . .a new internet acronym may be in order?

"403ONK" :-)

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 1515 posted 1:05 am on Dec 6, 2002 (gmt 0)

dave,

Yeah, but who talked me into hacking that bad-bot-blocking script? :)

wilderness,

Maybe this is a dumb question, but "ONK"? Whadizzat?

Jim

upside

10+ Year Member



 
Msg#: 1515 posted 1:43 am on Dec 6, 2002 (gmt 0)

I received a call from the VP of webservices at Alexa. They have temporarily stopped their bot while they investigate the matter. I'm supplying them with relevant sections from my logs. Their bot tries to access a particular site no more than 2 times per second. However, I have some sites that are all sharing 1 IP address. Apparently their checks are based on domain rather than ip.

It's really refreshing to see both Amazon and Alexa respond like this.

BTW, don't I get a "Welcome to WWW", too?!

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 1515 posted 1:54 am on Dec 6, 2002 (gmt 0)

upside,

Where are my manners?!? ... :o

Welcome to WebmasterWorld [webmasterworld.com]!

Let us know how this turns out. If they fix it, I'll unblock them.

Thanks,
Jim

Key_Master

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 1515 posted 2:07 am on Dec 6, 2002 (gmt 0)

I wont.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 1515 posted 5:11 am on Dec 6, 2002 (gmt 0)

<snip>Maybe this is a dumb question, but "ONK"? Whadizzat?>

A bad joke I thought would be easy to understand. :-(
Try a search on PLONK :-)

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 1515 posted 6:33 am on Dec 6, 2002 (gmt 0)

Ah, OK - Thanks for clarifying - I am not up-to-speed on these usenet terms.

Yes, this UA is a real plonker and it did indeed make a nice *plonk* sound when it hit the bottom of my kill file. Which, BTW, was created by the bad-bot banning script [webmasterworld.com] originally written and posted here on WebmasterWorld by Key_Master, for which I am very grateful. I was encouraged - even goaded - into installing said script by member carfac, for which I am also very grateful. I can't believe I used to try to keep up with this manually!

Thanks,
Jim

carfac

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 1515 posted 4:33 pm on Dec 6, 2002 (gmt 0)

Key_Master's script is great, and he deserves a lot of thanks for posting it here, and making it available. As with Jim, I do not beleive I survived without it! So credit where it is due- to Key_Master!

dave

BTW, I made some minor changes to Key_Master's script- I added a time stamp (so you can empty out the ones over a week old easily) and a bypass for wap users. I would be happy to send any changes to anyone who wants.

upside

10+ Year Member



 
Msg#: 1515 posted 5:30 pm on Dec 6, 2002 (gmt 0)

carfac. I'd like to see your script :) Please stickymail me.

Edge

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 1515 posted 8:04 pm on Dec 7, 2002 (gmt 0)

Sticky me as well if you don't mind carfac.

Thanks

Josk

10+ Year Member



 
Msg#: 1515 posted 3:29 pm on Dec 9, 2002 (gmt 0)

This is now a slashdot topic...

carfac

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 1515 posted 4:47 pm on Dec 9, 2002 (gmt 0)

Hi:

I had a few requests, and the script is not a secret, so I decided to post here. Apologies to Key_Master for hacking his script! Comments welcome- thats why we are in this forum, right? Lets make it all better!

OK, here is the script. See instructions AFTER the script!

#!/usr/local/bin/perl
# Name this script trap.pl, upload it in ASCII mode to your cgi-bin and set the file permissions to CHMOD 755.
# Original script by Key_Master taken from [webmasterworld.com...]

# This is the only variable that needs to be modified. Replace it with the absolute path to your root directory.
$rootdir = "/path/to/root/directory";

# Grab the IP of the bad bot
$visitor_ip = $ENV{'REMOTE_ADDR'};

if ($visitor_ip =~ /^216\.239\.3([3¦7¦9]\.5)$¦^216\.239\.35\.4$/) {
print "Content-type: text/html\n\n";
print "<html>\n";
print "<head>\n";
print "<title>Forward On</title>\n<META NAME=\"robots\" CONTENT=\"NOINDEX,NOFOLLOW\">\n";
print "</head>\n";
print "<body>\n";
print "<p><b>We had an error.<BR>Please return to continue!</b></p>\n";
print "</body>\n";
print "</html>\n";
exit;
}
else {

$visitor_ip =~ s/\./\\\./gi;

# Set Date
$date = scalar localtime ( time );

# Open .htaccess file
open(HTACCESS,"".$rootdir."/bad_ip.txt") ¦¦ die $!;
@htaccess = <HTACCESS>;
close(HTACCESS);

# Write banned IP to .htaccess file
open(HTACCESS,">".$rootdir."/bad_ip.txt") ¦¦ die $!;
print HTACCESS "\^".$visitor_ip."\$\n\# $date\n";
foreach $deny_ip (@htaccess) {
print HTACCESS $deny_ip;
}
close(HTACCESS);

# Close
print "Content-type: text/html\n\n";
print "<html>\n";
print "<head>\n";
print "<title>Error</title>\n<META NAME=\"robots\" CONTENT=\"NOINDEX,NOFOLLOW\">\n";
print "</head>\n";
print "<body>\n";
print "<p><b>A fatal error has occured:</b></p>\n";
print "<p><b>Invalid Site HTML method...</b></p>\n";
print "<p><b>Please enable debugging in setup for more details.</b></p>\n";
print "<A HREF=\"http://www.imdb.com/harvest_me/\"> </A>\n";
print "</body>\n";
print "</html>\n";
exit;

}

########################## END OF SCRIPT

OK, so take and pop this little puppie into the same directory you want protected, chmod 755. I use a file I call "bad_ip.txt" to hold my bad IP's, but you can change to anything (including .htaccess, but make sure you escape the "."!) This will write to the TOP of any file you have set up, so, the older stuff goes down two linbes every time this gets written to.

If you edit bad_ip.txt (or .htaccess) make SURE to upload to your server ASCII, or it wiull not work.

I think that is it... comments welcome!

dave

EliteWeb

WebmasterWorld Senior Member eliteweb us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 1515 posted 5:02 pm on Dec 9, 2002 (gmt 0)

:) slashdot cought the news too (:

andye

10+ Year Member



 
Msg#: 1515 posted 9:59 am on Dec 11, 2002 (gmt 0)

You might be intersted in Randal Schwartz's 'Throttle' script as well:

[stonehenge.com...]

Prevents any one IP address from submitting requests too quickly.

Andy.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved