Welcome to WebmasterWorld Guest from 54.162.157.249

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

A6 on its Knees

They are taking notice

     
3:43 am on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If not a course on ethics.


54.210.238.138 - - [08/Aug/2014] "GET / HTTP/1.1" 403 294 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"

Just for a chuckle I read their a6corp.com/a6-web-scraping-policy/

It didn't let me down.
1:07 pm on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



54.196.35.231 - - [20/Apr/2014:19:43:47
"A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"

came as a follow-up request from a valid page request (w/supporting files)

And another (also followed by a valid page request)
50.30.41.53 - - [22/Mar/2014:18:09:40 -0600] "GET /MySub/SubSub/MyPage.html HTTP/1.1" 403 616 "-" "ADmantX Platform Semantic Analyzer - ADmantX Inc. - www. admantx .com - support@admantx .com"
54.82.49.102 - - [22/Mar/2014:18:10:16 -0600] "GET /robots.txt HTTP/1.1" 200 3210 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"
54.82.49.102 - - [22/Mar/2014:18:10:16 -0600] "GET //SameSub/SameSubSub/Same.html HTTP/1.1" 403 616 "-" "A6-Indexer/1.0 (http://www.a6corp .com/a6-web-scraping-policy/)"

Wonder if A6-Indexer is robots compliant?
1:45 pm on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Wonder if A6-Indexer is robots compliant?

In a word? Hellno.

The Amazon-inbred A6 Corp (née SocialForce) scraper also appends stuff to plain file names, a la:

/filename.html&media_subtypes=1&ct=0
2:25 pm on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Q/
We may re-scrape the page in the future depending on how frequently the content changes. We do not crawl pages, or follow links.

We do not republish content.

We reserve the right to change and update our policy without notice.

4. Contact
Email us at info@a6corp.com with questions.
/Q

Who could reasonably complain about such transparent [dis]honesty?
5:30 pm on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I asked them to stop and they did.
7:02 pm on Aug 10, 2014 (gmt 0)



54.0.0.0/8

don't have time or desire to ask them anything
10:27 pm on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



@ miscbyproductYes - Regardless of their range being blocked, they continued to fill our logs with endless 403 for months on end. I emailed Amazon to stop and they did; simple as that.
11:14 pm on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



We do not crawl websites.

Insert "You keep using that word" boilerplate.

Edit: I thought I'd never set eyes on them, but log search turned up an interesting one-off:

54.227.78.6 - - [19/Sep/2013:17:08:34 -0700] "GET /ebooks/blind/ThreeBlindMice.html HTTP/1.1" 403 1600 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)" 

And your point is...?

Although that was the one and only occurrence of "a6corp", it was actually part of a more complex visit which I did note at the time. Other IPs involved were:
174.142.184.205 (page only; Carpathia Hosting)
198.72.106.168-190, 198.72.102.181-184 (iWeb; page again, plus about half of all images associated with this page)
all apparently triggered by a human visit from
207.32.55.abc
about half a minute earlier. The image requests began before either of the robotic page requests, and the a6corp request was randomly mixed in with the others. Given the size of my site, there is effectively zero possibility that these visits from various IPs were unrelated.
12:06 pm on Aug 11, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Creepy eh Lucy?

But that's what you get for advertising, and you must be according to the a6corp comedy script:

Q/
1. Focus
A6 Corporation is a Seattle based software company focused on the advancement of advertising technology. We scrape website content in order to utilize classification technologies that are designed to help advertisers execute highly-targeted campaigns.

2. General Policies
We do not crawl websites.
A6 connects into several of the major ad exchanges.
/Q

Surely you recall contracting "the major ad exchanges" Lucy?
Presumably one of whom is G Adsense, and no I've never touched them either.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month