homepage Welcome to WebmasterWorld Guest from 54.205.254.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Does amazon have a crawler or are all Amazon AWS IPs fair game?
bigtoga




msg:4514972
 9:53 am on Nov 2, 2012 (gmt 0)

There is no valid reason that someone should hit my site from an amazon AWS box. I'd love to just block the whole of amazon AWS/etc straight at the firewall. But I'm worried that I'll somehow block an amazon crawler/spider and that would possibly impact the sales I do on amazon.

Anyone have any suggestions/links for this sort of thing? I want to allow, if it exists, the actual amazon company to browse the site but block amazon's AWS/etc customers who spin up a server then scrape/spam with it.

 

wilderness




msg:4515139
 6:09 pm on Nov 2, 2012 (gmt 0)

Pfui has a long and dedicated thread [webmasterworld.com]

bigtoga




msg:4515145
 6:14 pm on Nov 2, 2012 (gmt 0)

Yes, I've seen that - thank you. I'm not sure though whether Amazon has its own crawler?

wilderness




msg:4515149
 6:25 pm on Nov 2, 2012 (gmt 0)

Nor am I, unfortunately their hosting business customers have a proven record of abuse, as does Amazon AWS' acceptance of these customers.

Perhaps the Amazon FAQ (NOT Amazon AWS) provides the answer.

The easiest explanation is within your raw visitor logs and the image references to your own Amazon pages.
What are those IP's?
Simply separate them from the Amazon AWS IP's.

keyplyr




msg:4515184
 8:09 pm on Nov 2, 2012 (gmt 0)

It used to be call "A1" but haven't seen that UA for a while.

Then there were versions of "AWSpider" (AWSpider 0.3.2.12 last hit my logs in 2011.)

Bewenched




msg:4516059
 9:35 pm on Nov 5, 2012 (gmt 0)

I've busted several amazon "bots" scraping our site for images.... makes me wonder if they are actually stealing product images from sites for their own use.

wilderness




msg:4516078
 10:52 pm on Nov 5, 2012 (gmt 0)

bots don't exactly leave resumes ;)

Harvesting, plagiarizing or simply indexing, who knows the why?

The AWS customers hit us all, that's why the long threads exist.

keyplyr




msg:4516178
 5:54 am on Nov 6, 2012 (gmt 0)


The AWS customers hit us all, that's why the long threads exist.

Yes, but we're discussing "Amazon" bots.

wilderness




msg:4516201
 6:57 am on Nov 6, 2012 (gmt 0)

My bad, hope it's just a full moon ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved