homepage Welcome to WebmasterWorld Guest from 54.227.160.102
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
A6 on its Knees
They are taking notice
Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4694308 posted 3:43 am on Aug 10, 2014 (gmt 0)

If not a course on ethics.


54.210.238.138 - - [08/Aug/2014] "GET / HTTP/1.1" 403 294 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"

Just for a chuckle I read their a6corp.com/a6-web-scraping-policy/

It didn't let me down.

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4694308 posted 1:07 pm on Aug 10, 2014 (gmt 0)

54.196.35.231 - - [20/Apr/2014:19:43:47
"A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"

came as a follow-up request from a valid page request (w/supporting files)

And another (also followed by a valid page request)
50.30.41.53 - - [22/Mar/2014:18:09:40 -0600] "GET /MySub/SubSub/MyPage.html HTTP/1.1" 403 616 "-" "ADmantX Platform Semantic Analyzer - ADmantX Inc. - www. admantx .com - support@admantx .com"
54.82.49.102 - - [22/Mar/2014:18:10:16 -0600] "GET /robots.txt HTTP/1.1" 200 3210 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"
54.82.49.102 - - [22/Mar/2014:18:10:16 -0600] "GET //SameSub/SameSubSub/Same.html HTTP/1.1" 403 616 "-" "A6-Indexer/1.0 (http://www.a6corp .com/a6-web-scraping-policy/)"

Wonder if A6-Indexer is robots compliant?

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4694308 posted 1:45 pm on Aug 10, 2014 (gmt 0)

Wonder if A6-Indexer is robots compliant?

In a word? Hellno.

The Amazon-inbred A6 Corp (née SocialForce) scraper also appends stuff to plain file names, a la:

/filename.html&media_subtypes=1&ct=0

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4694308 posted 2:25 pm on Aug 10, 2014 (gmt 0)

Q/
We may re-scrape the page in the future depending on how frequently the content changes. We do not crawl pages, or follow links.

We do not republish content.

We reserve the right to change and update our policy without notice.

4. Contact
Email us at info@a6corp.com with questions.
/Q

Who could reasonably complain about such transparent [dis]honesty?

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4694308 posted 5:30 pm on Aug 10, 2014 (gmt 0)

I asked them to stop and they did.

miscbyproduct



 
Msg#: 4694308 posted 7:02 pm on Aug 10, 2014 (gmt 0)

54.0.0.0/8

don't have time or desire to ask them anything

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4694308 posted 10:27 pm on Aug 10, 2014 (gmt 0)

@ miscbyproductYes - Regardless of their range being blocked, they continued to fill our logs with endless 403 for months on end. I emailed Amazon to stop and they did; simple as that.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4694308 posted 11:14 pm on Aug 10, 2014 (gmt 0)

We do not crawl websites.

Insert "You keep using that word" boilerplate.

Edit: I thought I'd never set eyes on them, but log search turned up an interesting one-off:

54.227.78.6 - - [19/Sep/2013:17:08:34 -0700] "GET /ebooks/blind/ThreeBlindMice.html HTTP/1.1" 403 1600 "-" "A6-Indexer/1.0 (http://www.a6corp.com/a6-web-scraping-policy/)"

And your point is...?

Although that was the one and only occurrence of "a6corp", it was actually part of a more complex visit which I did note at the time. Other IPs involved were:
174.142.184.205 (page only; Carpathia Hosting)
198.72.106.168-190, 198.72.102.181-184 (iWeb; page again, plus about half of all images associated with this page)
all apparently triggered by a human visit from
207.32.55.abc
about half a minute earlier. The image requests began before either of the robotic page requests, and the a6corp request was randomly mixed in with the others. Given the size of my site, there is effectively zero possibility that these visits from various IPs were unrelated.

Angonasec

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4694308 posted 12:06 pm on Aug 11, 2014 (gmt 0)

Creepy eh Lucy?

But that's what you get for advertising, and you must be according to the a6corp comedy script:

Q/
1. Focus
A6 Corporation is a Seattle based software company focused on the advancement of advertising technology. We scrape website content in order to utilize classification technologies that are designed to help advertisers execute highly-targeted campaigns.

2. General Policies
We do not crawl websites.
A6 connects into several of the major ad exchanges.
/Q

Surely you recall contracting "the major ad exchanges" Lucy?
Presumably one of whom is G Adsense, and no I've never touched them either.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved