Forum Moderators: open

Message Too Old, No Replies

Any Downside to Blocking Amazonbot?

         

RubicCubed

11:54 am on Nov 10, 2022 (gmt 0)

Top Contributors Of The Month



We run an ecommerce store and for the first time I've seen Amazonbot crawling our site. Allegedly Amazon does this to help Alexa answer questions. We once sold on Amazon and were not impressed with (a) their costly and anti-business/pro consumer fraud policies and (b) the low quality customers they have. Is there any downside to blocking Amazonbot? What we sell is not sold on Amazon, and I'm not aware of Amazon sending Alexa users to sites outside of Amazon. Considering our past experiences with Amazon, it wouldn't surprise me if they were working with the Chinese to identify and knockoff our products. We've previously blocked AWS from accessing our site as well as full country blocks on China and traffic from other undesirable countries.

crawl.amazonbot.amazon
(Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

[developer.amazon.com...]

not2easy

1:28 pm on Nov 10, 2022 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Feel free to block any bots visiting that don't benefit your site goals. Given that background, I'd be blocking them via UA.

Have you done any checking into volume, destinations and hosts? I have not seen that UA and am curious whether they are a distributed bot.

lucy24

4:46 pm on Nov 10, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Heh. I'd forgotten about Amazonbot. There was a small cluster of requests earlier this year, but nothing after I disallowed them in robots.txt (always the first step in deciding whether to poke holes). That is, no requests at all, not even for robots.txt, so they must be fairly focused in what they want to look at.

Edit: The ones I’ve seen came from 3.224.220, where I have all of 3. flagged by default as bad_range. Oops, no, one set came from 23.22--but that's still AWS.

RubicCubed

2:40 am on Nov 11, 2022 (gmt 0)

Top Contributors Of The Month



Have you done any checking into volume, destinations and hosts?

Amazonbot crawled the entire website in 46 minutes. IP addresses reverse to crawl.amazonbot.amazon so its legit. Amazonbot hasn't been back since I took action (see below).

The ones I’ve seen came from

Amazonbot hit our site from the following:

23.22.35.
3.224.220.
52.70.240.

At the present time I'm feeding Amazonbot 503 errors, which is the same as I have been doing with AWS. It's important to note we've had some, but very few, sales from AWS ranges. However, the bulk of the AWS traffic serves no legitimate purpose so we'll accept the loss of the rare sale for the sake of security/savings. Next on tap is Zscaler which I see clicked 2 different ads about a dozen times despite this IP range being blocked in Google Ads.

lucy24

4:05 am on Nov 11, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



At the present time I'm feeding Amazonbot 503 errors
That seems counterproductive, as it sends a message of “Try again later”. In fact, almost an explicit message, since the code basically means “Sorry, can’t help you right now, but come back another time.”

If you don’t feel like hitting them with a plain 403--there can be reasons not to--why not return a manual 404? “Duhh, sorry, dunno, but I have no idea what you’re looking for.”

RubicCubed

12:29 pm on Nov 11, 2022 (gmt 0)

Top Contributors Of The Month



I'm limited by our backend system (not htaccess directly) to serving 503s, which is something I've been working on. Hopefully my push to have greater flexibility in creating rules will allow various status codes to be served.

Pfui

5:53 am on Nov 13, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've never seen an Amazonbot but I'm confused, Rubic. You say the IPs reverse to "crawl.amazonbot.amazon" but variations of that Hostname don't appear to exist, despite Amazon's examples on the bot's reference string.

When you've seen it, is the TLD actually .amazon (dot-amazon)? Because the company was initially and repeatedly turned down for that TLD beginning approx. 10 years ago: [icannwiki.org...]

If indeed .amazon, I've got some new denials to set up. Sigh.

lucy24

6:08 am on Nov 13, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



is the TLD actually .amazon (dot-amazon)?
Coincidentally, earlier today I was on a site that professes to have lists of all domains everywhere. There's a whopping total of 26 .amazon domains (and similar numbers for .dell, .bing, .windows and the like, in case anyone wondered). Further perfunctory research says the TLD was authorized in 2019, probably in May, over the objections of Peru and Brazil.

domaintools admits to having heard of amazonbot.amazon, but will divulge no further information except that it was created 215 days ago. There's no website, though; it just redirects to an info page at developer.amazon.com

Pfui

2:54 pm on Nov 13, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Lucy, interestingly, MYIP.MS shows amazonbot.amazon reverses to a cloudfront.net (amazon/AWS/awsdns) address since 2019. Between G and AWS, it's a wonder there are any IPs left!

RubicCubed

12:24 pm on Nov 14, 2022 (gmt 0)

Top Contributors Of The Month



Being that .amazon resolves/redirects to amazon.com, there's little question of its legitimacy as a domain. Though scant news regarding this is likely the result of the objection of others who feel the .amazon domain is best reserved for the Amazon rainforest.

[reuters.com...]

The global Internet Corporation for Assigned Names and Numbers (ICANN), which oversees internet addresses, said on Monday it had decided to proceed with the designation requested by Amazon Inc pending a 30-day period of public comment after the eight nations bordering the world’s largest rainforest and the company failed to reach an agreement.

Brazil lamented the ICANN decision and said it should have opted for shared governance of the domain, the country’s foreign ministry said in a statement.

Martin Potter

6:50 pm on Nov 14, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



I hope that some international society of mathematicians has already registered the .googol domain [en.wikipedia.org ] .
(Sorry, I have forgotten how to do that properly.)

lucy24

11:24 pm on Nov 14, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have forgotten how to do that properly
Like this,:
[url=https://en.wikipedia.org/wiki/Googol].googol domain[/url]

Martin Potter

12:39 am on Nov 15, 2022 (gmt 0)

5+ Year Member Top Contributors Of The Month



Thanks, Lucy! I had even forgotten that it is much the same as HTML. You're a Saviour.

tangor

2:25 am on Nov 16, 2022 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Chuckles ... OP asked about "downsides" ... I always ask are there any "benefits".

So far, no.