Forum Moderators: open

Message Too Old, No Replies

Amazon AWS gunning for Google?

         

Pfui

1:45 am on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've been bothered by what's coming OUT of Amazon AWS for a long time. [webmasterworld.com...] [webmasterworld.com...]

Today I'm also bothered by what's going IN to Amazon AWS:

Amazon is indexing commonly used images and files from websites and storing them on the cloud network, so they load faster on the browser and improve its performance. The system also anticipates the next page users are likely to view so it loads faster on the device. ...

Source: "Amazon adds Fire, sizzle to tablet wars" (Sept. 28-29) [seattletimes.nwsource.com...]

Wait -- this IS Amazon we're talking about, right?

- indexing commonly used images and files from websites [Googlebot]
- storing them [Google cache, etc.]
- anticipates the next page users are likely to view so it loads faster [Google Preview; Google Suggestions; etc.]

At least Google has Google Webmaster Tools so we have some say in how that behemoth 'uses' our copyrighted material. And Googlebot heeds robots.txt; and we can code NO-CACHE.

But Amazon? Nada.

No webmaster news. No tools. No nothing. The hometown article's the first I've heard of their plan to repurpose my stuff for their stuff.

Given the years-long, relentless assault by bad bots hailing from amazonaws.com and its IP ilk, I wonder which bot(s) Amazon will use -- is using -- to scrape. One of its AWS spawn? Or Amazon-owned archive.org/Wayback? Or A3? Or--?

Regardless, Amazon's going to scrape 'n' serve my stuff without my say-so? So no.

dstiles

8:14 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Be interesting to see if this comes from a known AWS IP range or not. If not, we need to know very soon in order to block it.

Without being an Amazon user, though, can I assume this only applies to things listed on Amazon? IE if you have a "shop" there I can see it uploading some of your site but if you don't have content there I would assume they couldn't know about your site. Ok, silly assertion but you know what I mean. :)

Pfui

10:49 pm on Sep 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Silk is a browser [webmasterworld.com...] so your stuff is 100% eligible for repurposing/repackaging by Amazon... ditto data from your online travels even if you never go to amazon.com

Amazon the corporate entity is far, far more than its namesake retail site. Its acquisitions and developments rival Google's. A very mere sampling of its more obvious holdings: AbeBooks.com - Alexa.com - AmazonAWS.com - AmazonFresh.com - AmazonLocal.com - AmazonPayments.com - AmazonWireless.com - Archive.org - Askville.com - Audible.com - Diapers.com - DPReview.com - Endless.com - Fabric.com - IMDb.com - MYHABIT.com - Shopbop.com - SmallParts.com - Soap.com - Woot.com - Zappos.com

Further reach can be seen in a partial list of their trademarks [amazon.com...] And don't forget all those world-wide Associates with sites and blogs with items and links as insidious as Facebook's and Twitter's and Google +1's buttons. That adds up to an awful lot of amassed, marketable, monetizable knowledge, with many logins and all purchases linked to bank accounts and debit and credit cards.

Suffice it to say Jeff Bezos didn't reach a net worth of $19 billion this month [forbes.com...] by thinking small -- or not drawing a bead on your wallet.

Or now, apparently, your website.

FWIW, I have a love-hate thing going on with Amazon. I adore Prime's 'free' 2-day shipping. Heck, AmazonFresh trucks bring groceries to my back door once a week. And the company is interesting to watch because it looks so singularly retail, almost benign, while making money hand over fist. [businessweek.com...] I wish I held AMZ stock and owned any space they lease all over Seattle.

But as a webmaster?

I abhor the cesspool that is amazonaws.com and having to defend against it every day. And as a domino-effect victim of last summer's instant-axing by Amazon of all Associates in California, I'm now very, very aware of how, like Google, some Amazon something can suddenly change everything, good or bad, online and off.

dstiles

6:58 pm on Oct 1, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And... now you've ruined my weekend! :)

I remember them as a more-or-less start-up years ago and then for the most part ignored them, apart from their cloud activities.

Still not sure how they can hit my web sites - although I don't doubt the possibility - but at least we do not have to accept their bots as we do with google, nor rely on their references.

Staffa

7:43 pm on Oct 1, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



dstiles, they may be hitting your site and you might not be aware of it.
50.17.124.98 with UA : Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3 ( .NET CLR 3.5.30729; .NET4.0E)
hit one of my sites today (didn't get anything for I block all known AWS ranges) but if one doesn't know that the IP number is from AWS and it takes pages and images it won't jump out immediately from the logs that it isn't a human visitor ;o)

I am watching to see the first Silk browser to appear and if I don't like what I see, i.e. humans using Silk vs. Silk as a scraper than it will get banned. It's an easy way to tank its spread before it is in general use. If it doesn't work people will stop using it.

dstiles

6:00 pm on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



50.16/14 is blocked here so no problem :)

Not sure about silk - first time I've come across a reference to it was here. Just had a look at the project site and I have to admit it's worrying, if only from the fact it's built on webkit, a known poor-security app ("a webkit-based Konqueror", which is primarily linux?)

Anyone have a UA for it yet?

If the mods will let this stand the KDE project notes are at
[techbase.kde.org...]

There are a few worrying points (from a webmaster's viewpoint) listed there.

The note in the Amazon intro (link earlier in this thread) seems a bit innacruate: I get the impression that Silk is a general multi-platform browser that happens to work on kindle.

Pfui

10:51 pm on Oct 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Which note (in which link? Part of the Seattle Times article?) seems inaccurate?

incrediBILL

7:20 am on Oct 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Have you all forgotten Amazon has a search engine called A9? [a9.com...]

They're coming for you :)

Pfui

4:01 pm on Oct 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Nope. I just forgot how to type -- "Or A3?" -- in the OP. :)

dstiles

6:41 pm on Oct 3, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Pfui - the bit about silk being unique to amazon. The project notes seem to contradict that, especially since it's part of KDE?

Bill - shouldn't a search engine have some way of searching for things? A9 doesn't seem to have a search box. Or did I miss something? Anyway, of the bots come from AWS ranges it'll get rejected along with all the other rubbish that comes from the cloud.