Welcome to WebmasterWorld Guest from 54.224.166.141

Forum Moderators: Ocean10000 & keyplyr

A6 Indexer

Heavy Activity

     
9:14 pm on Jul 10, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts:256
votes: 44


Is anyone familiar with this bot?

We had a ridiculous attack from them last night.

It brought our server cluster to it's knees by creating thousands of simultaneous connections.

I was able to block at least 40-50 incoming IP strings - and then blocked the user agent as a whole.

100% of the IPs were coming from Amazon servers.

Most of the Amazon bots were using the A6 Indexer agent, but some of them were not.

Not that much info about them out there - other than they always use Amazon's servers to scrape.

A few sites connected them to an ad company called A6 Corp - but they dont seem to be around any longer and their website is now some type of portal for medical information.

Someone is obviously running the bots. I complained to Amazon and they are having trouble tracking down the culprit due to the voluminous amount of changing IPs.
10:25 pm on July 10, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


Is anyone familiar with this bot?
Thanks vegasrick - yes, A6-Indexer is A6 Corp's bot. Don't know why you consider it "an attack." All they are doing is requesting files. That's what bots do.

If you send them an email including a log snippet (sans any personal info) they will stop hitting your server. Keep on them.

[webmasterworld.com...]

There are hundreds of independent bots using AWS (Amazon Web Services.)

Amazon IP ranges [webmasterworld.com]

Server Farm IP Ranges [webmasterworld.com]

Please use the Site Search in the upper-right corner of any page *before* you post UAs or IP ranges. Most of these have been documented before. But if you find something new, then we want to know about it :)

[fix typo]

[edited by: keyplyr at 5:18 am (utc) on Jul 12, 2018]

10:55 pm on July 10, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2646
votes: 97


Deepsixed them years ago. If they are using the same user agent, use the filtering function of iptables to drop queries containing the agent string.

Regards...jmcc
11:07 pm on July 10, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15119
votes: 676


Oh, that's funny: Id entirely forgotten about that old thread. As far as I knew, Id only started seeing A6-Indexer at the beginning of this year, when it showed up in response to a directory listing. Since they comprehensively ignore
User-Agent: A6-Indexer
Disallow: /
(honestly now, its not as if they can say Oh, sorry, I didnt know you meant me when that is their complete, litteratim UA string) it seems a bit disingenous for them to claim they will go away if you ask them to. Isnt that like spam email that professes to honor an opt-out when you never opted-in?

:: detour to raw logs ::

Oh, I see. In the latter months of 2014 there were a few visits from
A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)
(insert boilerplate about UAs calculated to inspire confidence) changing in November to a handful using the shorter version, and then they disappeared entirely until this past January.

Edit: Oh, right, and the whole thing becomes a bit moot when their new, bite-sized UA string contains no contact information.
1:08 am on July 11, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


I have yet to read a definitive mission for A6-Indexer. I think the old A6 bot started out as Amazon's search bot. However, Amazon't search met with an early demise and now is basically just a search index for Amazon's properties & products.

A6-Indexer seems to be associated with Social Media. If your site gets mentioned on any of the SM platforms, you'll see A6-Indexer in your logs.

I've never seen it abusing my servers. IMO extreme heavy activity is not abuse; not very good netizenship, but not abuse.
7:52 pm on July 11, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


@lucy24 - that bot info page (http://www.a6corp.com/a6-web-scraping-policy/) now returns a 404, but the entire site just talks about health & vitamins.

I think it was the same thing the last time I looked a year or two ago. The domain was probably acquired, hense the removal of the info link from the UA string.
3:42 am on July 12, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts: 256
votes: 44


I consider it an attack when the bot has 1200 simultaneous connections to each webserver - 2400 connections in all - pounding me over and over. They were causing our forum's mysql server to crash, with that many connections going through millions of threads at rapid speed.

What's odd, is they were hitting the same URLs (on our frontend), over and over.

Rather than block any Amazon IPs, I only blocked A6-Indexer's bot through Cloudflare. I saw them blocked today, at least 2,000 times alone.

I never heard of them before.

Not sure about it being Amazon's bot. A lot of what I read, has their bot associated with an advertising company called A6 Corp, which openly admitted to scrapping websites. The site is now a medical site, so I have now idea who is using them - but I doubt Amazon would buy some scrapping mechanism from a defunct ad network.
4:15 am on July 12, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


millions of threads at rapid speed
That's a pretty good trick considering AWS has limiters for abuse of this nature. This must be something unique and I would be persistent and prove to AWS exactly what happened.

Yes, A6 Corp using AWS.

However, considering your front end is Cloudflare, that's likely the problem. Cloudflare has distributed the requests so your server received 100xs of what would normally just have been a basic crawl like everyone else sees with this UA.
4:32 am on July 12, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts: 256
votes: 44


here is the thing, if A6 Corp went under - who is using the bot technology and for what reason?
4:36 am on July 12, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


Sorry, I didn't explain it well.

At one time the A6 bot, run by A6 Corp, was used by Amazon to index Amazon's properties & products. Many big companies use a 3rd party search company. Example: Yahoo once used an upstart named Google, and now uses Bing.

Some webmasters also suspected that Amazon was building an index of websites for a purpose yet to be known. But at some point a few years ago, Amazon stopped using this bot to crawl our websites, but it appears they still use it (or a version of it) to crawl their own properties.

The same A6 Corp runs the current A6-Indexer. A6 Corporation is mapping the Internet Topology for their products: Ad Targeting, Advertising, Fraud Detection.
4:46 am on July 12, 2018 (gmt 0)

Full Member

Top Contributors Of The Month

joined:July 3, 2015
posts: 256
votes: 44


I tried my best to find some form of contact to A6 Corp and can't find anything. Their linkedin, facebook, etc all erased. Numbers are disconnected. Websites URL taken over by someone else.
4:53 am on July 12, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


A6 Corporation maintains the Internet Topology, a living map of the internet that tracks 13.5 billion relationships between URLs, calculating the rank of each web page within one or more subject realms. The Internet Topology is used by online advertisers to increase response rates and avoid fraud, and by website publishers to increase revenues on...
source: crunchbase.com/organization/a6-corporation

On that page they list their Twitter account as: @a6corp and when you go to that Twitter page they list their website as: http://a6corp.com ...so same company.

If you publish ads, you may wish to allow this UA through your AWS IP range filters.

[added]
I just removed the IP/UA filter and allowed A6-Indexer access. So far they have been unobtrusive and compliant.
1:40 pm on July 22, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2646
votes: 97


Just saw them trying to hammer a site. Seems also be using stealth scrapers via Digital Ocean and cnode.io in addition to Amazon IPs. Recommend a permanent deepsix.

Regards...jmcc
7:47 pm on July 22, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


Recommend a permanent deepsix
Why? Just because you see no benefit from this bot you are recommending that everyone else also block it?

You should consider that all site owners do not have the same interests as you. Some may see a clear benefit from what this bot offers.
7:59 pm on July 22, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2646
votes: 97


And just what does this bot offer?

Regards...jmcc
8:02 pm on July 22, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


Assassination prior to investigation?

As with all UAs, you need to do the research.
8:05 pm on July 22, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15119
votes: 676


unobtrusive and compliant
Compliant with what?
8:09 pm on July 22, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2646
votes: 97


Assassination prior to investigation?
More like termination with extreme prejudice. They caused a problem on my site trying to download thousands of pages. They used multiple IP addresses to evade limiting. They were deepsixed. And I don't believe their waffle about Internet Topology either.

Regards...jmcc
8:16 pm on July 22, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


Block the bots that are of no benefit to *you* but that does not mean the bot is not beneficial to *other* site owners.

Bots built to scrape data and package into various products that marketers rely on is absolutely essential for the entire ad business.

That's why I wonder about the thinking of those who block all bots, but publish ads on their site & then complain they aren't making enough money (just one example.)
8:25 pm on July 22, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2646
votes: 97


To put it in some kind of perspective. There are over 500,000,000 pages (>500 million) on the site they hit. Their a6corp website is a splog. Their Twitter account has been dead since 2014. Maybe you believe that rubbish about Internet Topology but I don't. I can't take the risk of letting some maggots bring servers down.

A dead corporate website turned into a splog. A dead Twitter account. An aged mention on Crunchbase. This does not look like a legitimate operation with the best interests of the webmasters at heart.

Regards...jmcc
8:54 pm on July 22, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


This does not look like a legitimate operation with the best interests of the webmasters at heart
That's an unreasonable statement. Why would any botrunner have "the best interests of the webmasters at heart"?

They represent their own interests & those of their clients. Occasionally, our interests and their interests overlap and their is some benefit for us. That's when site owner weighs the pros/cons of allowing the UA access to our web properties.

Do the research, make the decision... that simple.
9:00 pm on July 22, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts:2646
votes: 97


You are the one claiming that the Ad industry depends on such. However the A6corp website is a splog. Its Twitter a/c is dead. Its FB page is gone. Its Linkedin page is gone. It tried to evade rate limiting by using multiple IPs. That's the research. Those are the facts.

<snip>

Regards...jmcc

[edited by: keyplyr at 10:22 pm (utc) on Jul 22, 2018]
[edit reason] removed derogatory statement [/edit]

9:13 pm on July 22, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15119
votes: 676


:: detour to World's Leading Search Engine ::

Huh. Today I learned a new word.
9:15 pm on July 22, 2018 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 30, 2002
posts:4931
votes: 27


Do the research


I rarely partake in the whack a mole of UA IDing but I suspect that 99.9% of all threads on here result in a block, given the ones I've read in the past.

From what I've gathered from the knowledge here, the general consensus is if it doesn't A) obey robots txt B) rate limit C) have a generally symbiotic mission statement then it's banned
9:24 pm on July 22, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


There's nothing "whack a mole" about website security. I hear the term often, usually from those who are unwilling to invest the time it takes to effectively secure their web documents & their users from the ongoing threats on today's everchanging netscape.

Most bots do not support robots.txt. That is not applicable to whether they are beneficial to the site owners interests. Unfortunately, some beneficial agents do not include a comprehensive mission statement. That's why it's necessary to do the research.
9:28 pm on July 22, 2018 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Aug 30, 2002
posts: 2646
votes: 97


Most bots do not support robots.txt. That is not applicable to whether they are beneficial to the site owners interests. Unfortunately, some beneficial agents do not include a comprehensive mission statement. That's why it's necessary to do the research.
To point it out again. Their website is a splog. There's no contact details or "about page" in the UA. Their Twitter a/c is dead since 2014. Their Facebook page has been deleted and ditto for their Linkedin page. All that's left is some rubbish about Internet Topology in a Crunchbase article from 2011. To use an old New York expression: A perp is a perp is a perp. :)

Regards...jmcc
9:41 pm on July 22, 2018 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12582
votes: 841


@jmccormac - we get it. You see no benefit to A6 Indexer and block it.

Let other site owners make their own decisions.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members