homepage Welcome to WebmasterWorld Guest from 54.166.62.226
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Does amazon have a crawler or are all Amazon AWS IPs fair game?
bigtoga

5+ Year Member



 
Msg#: 4514970 posted 9:53 am on Nov 2, 2012 (gmt 0)

There is no valid reason that someone should hit my site from an amazon AWS box. I'd love to just block the whole of amazon AWS/etc straight at the firewall. But I'm worried that I'll somehow block an amazon crawler/spider and that would possibly impact the sales I do on amazon.

Anyone have any suggestions/links for this sort of thing? I want to allow, if it exists, the actual amazon company to browse the site but block amazon's AWS/etc customers who spin up a server then scrape/spam with it.

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4514970 posted 6:09 pm on Nov 2, 2012 (gmt 0)

Pfui has a long and dedicated thread [webmasterworld.com]

bigtoga

5+ Year Member



 
Msg#: 4514970 posted 6:14 pm on Nov 2, 2012 (gmt 0)

Yes, I've seen that - thank you. I'm not sure though whether Amazon has its own crawler?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4514970 posted 6:25 pm on Nov 2, 2012 (gmt 0)

Nor am I, unfortunately their hosting business customers have a proven record of abuse, as does Amazon AWS' acceptance of these customers.

Perhaps the Amazon FAQ (NOT Amazon AWS) provides the answer.

The easiest explanation is within your raw visitor logs and the image references to your own Amazon pages.
What are those IP's?
Simply separate them from the Amazon AWS IP's.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4514970 posted 8:09 pm on Nov 2, 2012 (gmt 0)

It used to be call "A1" but haven't seen that UA for a while.

Then there were versions of "AWSpider" (AWSpider 0.3.2.12 last hit my logs in 2011.)

Bewenched

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4514970 posted 9:35 pm on Nov 5, 2012 (gmt 0)

I've busted several amazon "bots" scraping our site for images.... makes me wonder if they are actually stealing product images from sites for their own use.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4514970 posted 10:52 pm on Nov 5, 2012 (gmt 0)

bots don't exactly leave resumes ;)

Harvesting, plagiarizing or simply indexing, who knows the why?

The AWS customers hit us all, that's why the long threads exist.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4514970 posted 5:54 am on Nov 6, 2012 (gmt 0)


The AWS customers hit us all, that's why the long threads exist.

Yes, but we're discussing "Amazon" bots.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4514970 posted 6:57 am on Nov 6, 2012 (gmt 0)

My bad, hope it's just a full moon ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved