Welcome to WebmasterWorld Guest from 54.224.83.221

Forum Moderators: Ocean10000 & keyplyr

A6 Indexer

Heavy Activity

     
9:14 pm on Jul 10, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts:254
votes: 44


Is anyone familiar with this bot?

We had a ridiculous attack from them last night.

It brought our server cluster to it's knees by creating thousands of simultaneous connections.

I was able to block at least 40-50 incoming IP strings - and then blocked the user agent as a whole.

100% of the IPs were coming from Amazon servers.

Most of the Amazon bots were using the A6 Indexer agent, but some of them were not.

Not that much info about them out there - other than they always use Amazon's servers to scrape.

A few sites connected them to an ad company called A6 Corp - but they dont seem to be around any longer and their website is now some type of portal for medical information.

Someone is obviously running the bots. I complained to Amazon and they are having trouble tracking down the culprit due to the voluminous amount of changing IPs.
10:25 pm on July 10, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12083
votes: 770


Is anyone familiar with this bot?
Thanks vegasrick - yes, A6-Indexer is A6 Corp's bot. Don't know why you consider it "an attack." All they are doing is requesting files. That's what bots do.

If you send them an email including a log snippet (sans any personal info) they will stop hitting your server. Keep on them.

[webmasterworld.com...]

There are hundreds of independent bots using AWS (Amazon Web Services.)

Amazon IP ranges [webmasterworld.com]

Server Farm IP Ranges [webmasterworld.com]

Please use the Site Search in the upper-right corner of any page *before* you post UAs or IP ranges. Most of these have been documented before. But if you find something new, then we want to know about it :)

[fix typo]

[edited by: keyplyr at 5:18 am (utc) on Jul 12, 2018]

10:55 pm on July 10, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2640
votes: 96


Deepsixed them years ago. If they are using the same user agent, use the filtering function of iptables to drop queries containing the agent string.

Regards...jmcc
11:07 pm on July 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14905
votes: 649


Oh, that's funny: Id entirely forgotten about that old thread. As far as I knew, Id only started seeing A6-Indexer at the beginning of this year, when it showed up in response to a directory listing. Since they comprehensively ignore
User-Agent: A6-Indexer
Disallow: /
(honestly now, its not as if they can say Oh, sorry, I didnt know you meant me when that is their complete, litteratim UA string) it seems a bit disingenous for them to claim they will go away if you ask them to. Isnt that like spam email that professes to honor an opt-out when you never opted-in?

:: detour to raw logs ::

Oh, I see. In the latter months of 2014 there were a few visits from
A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)
(insert boilerplate about UAs calculated to inspire confidence) changing in November to a handful using the shorter version, and then they disappeared entirely until this past January.

Edit: Oh, right, and the whole thing becomes a bit moot when their new, bite-sized UA string contains no contact information.
1:08 am on July 11, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12083
votes: 770


I have yet to read a definitive mission for A6-Indexer. I think the old A6 bot started out as Amazon's search bot. However, Amazon't search met with an early demise and now is basically just a search index for Amazon's properties & products.

A6-Indexer seems to be associated with Social Media. If your site gets mentioned on any of the SM platforms, you'll see A6-Indexer in your logs.

I've never seen it abusing my servers. IMO extreme heavy activity is not abuse; not very good netizenship, but not abuse.
7:52 pm on July 11, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12083
votes: 770


@lucy24 - that bot info page (http://www.a6corp.com/a6-web-scraping-policy/) now returns a 404, but the entire site just talks about health & vitamins.

I think it was the same thing the last time I looked a year or two ago. The domain was probably acquired, hense the removal of the info link from the UA string.
3:42 am on July 12, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts: 254
votes: 44


I consider it an attack when the bot has 1200 simultaneous connections to each webserver - 2400 connections in all - pounding me over and over. They were causing our forum's mysql server to crash, with that many connections going through millions of threads at rapid speed.

What's odd, is they were hitting the same URLs (on our frontend), over and over.

Rather than block any Amazon IPs, I only blocked A6-Indexer's bot through Cloudflare. I saw them blocked today, at least 2,000 times alone.

I never heard of them before.

Not sure about it being Amazon's bot. A lot of what I read, has their bot associated with an advertising company called A6 Corp, which openly admitted to scrapping websites. The site is now a medical site, so I have now idea who is using them - but I doubt Amazon would buy some scrapping mechanism from a defunct ad network.
4:15 am on July 12, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12083
votes: 770


millions of threads at rapid speed
That's a pretty good trick considering AWS has limiters for abuse of this nature. This must be something unique and I would be persistent and prove to AWS exactly what happened.

Yes, A6 Corp using AWS.

However, considering your front end is Cloudflare, that's likely the problem. Cloudflare has distributed the requests so your server received 100xs of what would normally just have been a basic crawl like everyone else sees with this UA.
4:32 am on July 12, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts: 254
votes: 44


here is the thing, if A6 Corp went under - who is using the bot technology and for what reason?
4:36 am on July 12, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12083
votes: 770


Sorry, I didn't explain it well.

At one time the A6 bot, run by A6 Corp, was used by Amazon to index Amazon's properties & products. Many big companies use a 3rd party search company. Example: Yahoo once used an upstart named Google, and now uses Bing.

Some webmasters also suspected that Amazon was building an index of websites for a purpose yet to be known. But at some point a few years ago, Amazon stopped using this bot to crawl our websites, but it appears they still use it (or a version of it) to crawl their own properties.

The same A6 Corp runs the current A6-Indexer. A6 Corporation is mapping the Internet Topology for their products: Ad Targeting, Advertising, Fraud Detection.
4:46 am on July 12, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts: 254
votes: 44


I tried my best to find some form of contact to A6 Corp and can't find anything. Their linkedin, facebook, etc all erased. Numbers are disconnected. Websites URL taken over by someone else.
4:53 am on July 12, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12083
votes: 770


A6 Corporation maintains the Internet Topology, a living map of the internet that tracks 13.5 billion relationships between URLs, calculating the rank of each web page within one or more subject realms. The Internet Topology is used by online advertisers to increase response rates and avoid fraud, and by website publishers to increase revenues on...
source: crunchbase.com/organization/a6-corporation

On that page they list their Twitter account as: @a6corp and when you go to that Twitter page they list their website as: http://a6corp.com ...so same company.

If you publish ads, you may wish to allow this UA through your AWS IP range filters.

[added]
I just removed the IP/UA filter and allowed A6-Indexer access. So far they have been unobtrusive and compliant.