Welcome to WebmasterWorld Guest from 54.167.110.211

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

A6 on its Knees

They are taking notice

     
3:43 am on Aug 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 13, 2003
posts:694
votes: 0


If not a course on ethics.


54.210.238.138 - - [08/Aug/2014] "GET / HTTP/1.1" 403 294 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"

Just for a chuckle I read their a6corp.com/a6-web-scraping-policy/

It didn't let me down.
1:07 pm on Aug 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5429
votes: 2


54.196.35.231 - - [20/Apr/2014:19:43:47
"A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"

came as a follow-up request from a valid page request (w/supporting files)

And another (also followed by a valid page request)
50.30.41.53 - - [22/Mar/2014:18:09:40 -0600] "GET /MySub/SubSub/MyPage.html HTTP/1.1" 403 616 "-" "ADmantX Platform Semantic Analyzer - ADmantX Inc. - www. admantx .com - support@admantx .com"
54.82.49.102 - - [22/Mar/2014:18:10:16 -0600] "GET /robots.txt HTTP/1.1" 200 3210 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"
54.82.49.102 - - [22/Mar/2014:18:10:16 -0600] "GET //SameSub/SameSubSub/Same.html HTTP/1.1" 403 616 "-" "A6-Indexer/1.0 (http://www.a6corp .com/a6-web-scraping-policy/)"

Wonder if A6-Indexer is robots compliant?
1:45 pm on Aug 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Nov 5, 2005
posts: 2038
votes: 1


Wonder if A6-Indexer is robots compliant?

In a word? Hellno.

The Amazon-inbred A6 Corp (née SocialForce) scraper also appends stuff to plain file names, a la:

/filename.html&media_subtypes=1&ct=0
2:25 pm on Aug 10, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 13, 2003
posts:694
votes: 0


Q/
We may re-scrape the page in the future depending on how frequently the content changes. We do not crawl pages, or follow links.

We do not republish content.

We reserve the right to change and update our policy without notice.

4. Contact
Email us at info@a6corp.com with questions.
/Q

Who could reasonably complain about such transparent [dis]honesty?
5:30 pm on Aug 10, 2014 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6076
votes: 75


I asked them to stop and they did.
7:02 pm on Aug 10, 2014 (gmt 0)

New User

joined:Nov 17, 2013
posts: 6
votes: 0


54.0.0.0/8

don't have time or desire to ask them anything
10:27 pm on Aug 10, 2014 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6076
votes: 75


@ miscbyproductYes - Regardless of their range being blocked, they continued to fill our logs with endless 403 for months on end. I emailed Amazon to stop and they did; simple as that.
11:14 pm on Aug 10, 2014 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:12991
votes: 287


We do not crawl websites.

Insert "You keep using that word" boilerplate.

Edit: I thought I'd never set eyes on them, but log search turned up an interesting one-off:

54.227.78.6 - - [19/Sep/2013:17:08:34 -0700] "GET /ebooks/blind/ThreeBlindMice.html HTTP/1.1" 403 1600 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)" 

And your point is...?

Although that was the one and only occurrence of "a6corp", it was actually part of a more complex visit which I did note at the time. Other IPs involved were:
174.142.184.205 (page only; Carpathia Hosting)
198.72.106.168-190, 198.72.102.181-184 (iWeb; page again, plus about half of all images associated with this page)
all apparently triggered by a human visit from
207.32.55.abc
about half a minute earlier. The image requests began before either of the robotic page requests, and the a6corp request was randomly mixed in with the others. Given the size of my site, there is effectively zero possibility that these visits from various IPs were unrelated.
12:06 pm on Aug 11, 2014 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 13, 2003
posts:694
votes: 0


Creepy eh Lucy?

But that's what you get for advertising, and you must be according to the a6corp comedy script:

Q/
1. Focus
A6 Corporation is a Seattle based software company focused on the advancement of advertising technology. We scrape website content in order to utilize classification technologies that are designed to help advertisers execute highly-targeted campaigns.

2. General Policies
We do not crawl websites.
A6 connects into several of the major ad exchanges.
/Q

Surely you recall contracting "the major ad exchanges" Lucy?
Presumably one of whom is G Adsense, and no I've never touched them either.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members