Forum Moderators: open

Message Too Old, No Replies

Apple search engine bot?

         

johnhh

8:56 pm on Feb 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just noticed a 'heavy hitter' bot called Fetcher/0.1 that appears to be coming an Apple Inc. IP address.

Is this new or am I just slow on the uptake ?

blend27

3:03 pm on Feb 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



not here.

maybe an earlier version of [webmasterworld.com...]

lucy24

8:30 pm on Feb 14, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Either brand-new or selective. I did a quick search of saved logs (up to a few days ago) and came up cold. You have to admit that "0.1" sounds extremely new.

But one of my weird corollary discoveries after splitting a website is that not all robots crawl everywhere.

trintragula

10:04 pm on Feb 14, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



It showed up at my place yesterday evening. 20 hits, blocked from the get go. More than 24 hours ago now, so I'd have to download the raw logs to find out exactly what they did. Probably as much as they could under the circumstances. They appear to have been well-behaved.

Mozilla/5.0 (compatible; Fetcher/0.1)
from
17.228.4.nnn

I've previously seen
python-requests/1.2.3 CPython/2.7.5+ Linux/3.11.0-20-generic
from 3 places in 17.142/16, most recently in November, but only single hits.
This is also the San Jose Apple Engineering division. So, same people.

I concur with Lucy about bots not going everywhere. I've been frequently visited by Netshelter Contentscan (out of AWS) for months, but no-one here has mentioned them.

keyplyr

7:50 am on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Any human traffic come from 17/8? Apple use any part of it for customer connectivity? Any reason not to block this range?

trintragula

10:59 am on Feb 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I've seen visits from there with OS X 10_10 or 10_9 Safari in recent versions, which is what you might expect their 40000 employees to be using if they were visiting on a lunchtime.
Having said that I get so little traffic of any kind from 17/8 that I wouldn't regard it as a source of trouble.

keyplyr

11:42 am on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They appear to have been well-behaved.

IMO a covert agent that does not declare what it is, where it's from or what it will do with my files is not well behaved.

I wouldn't regard it as a source of trouble.

Depends what Fetcher/0.1 is up to.

I've been frequently visited by Netshelter Contentscan (out of AWS) for months, but no-one here has mentioned them.

Probably because most here block AWS.

toidi

12:59 pm on Feb 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



So, y'all are going to block what might be the next big search engine?

aristotle

2:28 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I did a web search for Fetcher/0.1 and found the following article:
[thesempost.com ] [The Apple Search Bot is Actually a Crawler to Improve Siri Not New Search Engine ]
It began popping up in the middle of October with Apple’s IP range with a NetName of APPLE-WWNET as “Mozilla/5.0 (compatible; Fetcher/0.1)” written in Go...

But it would appear that the reality isn’t nearly as sexy as the possibility that Apple is laying down groundwork for a brand new search engine. It is actually used by data scientists using statistical analysis and machine learning to improve Siri’s accuracy and performance.

trintragula

3:10 pm on Feb 15, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



I don't think what people do here will interfere with the long term success of any new search engines.

I've blocked 1049 page requests from AWS today. And overlooked 3 which probably should have been blocked.
But I don't block AWS because its AWS - and I occasionally get good traffic from them. 3 of my forum members have posted from AWS in the last 12 months - and that's not counting ranges that were bought from Merck during that period.

wilderness

5:11 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, y'all are going to block what might be the next big search engine?


MSN was the last major SE in 2003 and they came from an obscure MSN IP.

lucy24

8:59 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Probably because most here block AWS.

I checked my raw logs, which include all 403s. Case-insensitive search for the separate words "Netshelter" and "Contentscan" came up absolutely cold. And even the vilest Ukrainian scraper is allowed to see robots.txt if they ask for it.

dstiles

9:35 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I see a fair amount of traffic from 17.142.0.0/16. I had to create a special bypass for one of the header fields that it fails to send, but legit within that.

I had a visit from fetcher yesterday - 3 hits, in fact, all at 17.228.4.80. Automatically blocked. As is that IP as of now.

Apart from that the only ranges I have blocked in 17/8 are:

17.152.253.0 - 17.152.253.255
17.161.96.0 - 17.161.127.255
17.199.16.0 - 17.199.16.255

keyplyr

9:53 pm on Feb 15, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



...ranges I have blocked in 17/8 are:
17.152.253.0 - 17.152.253.255
17.161.96.0 - 17.161.127.255
17.199.16.0 - 17.199.16.255

@dstiles - If you don't mind, what are the reasons for blocking?

trintragula

12:44 pm on Feb 16, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



For a the month of August 2014, I was spot-blocking daily visits by Windows Chrome/31 from 17.161.96.0 - 17.161.127.255 which dstiles listed - possibly for the same reason. As far as I know that's the only other bot-like behaviour I've seen from 17/8.

blend27

12:58 pm on Feb 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, y'all are going to block what might be the next big search engine?


For now,

Tilt the head 45 degrees upwards, then to the left. Open eyes widely. Rays right hand towards the same direction where the eyes are pointed to. Spread the fingers into the burst. Now swoosh the hand to the right across the horizon while at the same time whispering : the next big search !engine.

When it's that, then I will see.

dstiles

9:29 pm on Feb 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplyr:

17.152.253.0 - 17.152.253.255 - bot-like hits including bare java UA
17.161.96.0 - 17.161.127.255 - 2 bad hits and several IPs are "stealth"
17.199.16.0 - 17.199.16.255 - bot-like activity

NOTE: Trap dates are going back to 2014 and 2013 so usage may have changed.

johnhh

11:31 pm on Feb 16, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wasn't especially worried about the ip address that the Fetcher/0.1 bot came from. What I was worried about was that fetching multiple pages per second and that caused some stress on the server. So I blocked it.

toidi

1:35 pm on Feb 17, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Please excuse my ignorance here, but how is
data scientists using statistical analysis and machine learning to improve Siri’s accuracy and performance.
not a search engine?

blend27

5:23 pm on Feb 17, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



not a search engine?

It is, a profit driven one, at 600+ beans a pop

keyplyr

10:55 am on Feb 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Took robots.txt where it is clearly disallowed:
17.228.4.80 - - [24/Feb/2015:15:19:08 -0800] "GET /robots.txt HTTP/1.1" 200 1534 "-" "Mozilla/5.0 (compatible; Fetcher/0.1)"

Then blatantly ignored that and attempted crawl (where it is blocked by other methods:)
17.228.4.80 - - [24/Feb/2015:15:19:15 -0800] "GET /example.html HTTP/1.1" 403 968 "-" "Mozilla/5.0 (compatible; Fetcher/0.1)

Bad scene Apple!

Nutterum

11:31 am on Feb 25, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Did a quick check and no visits on my end, though a friend told me the bot ignored their robots.txt as well. Aggressive little bugger..

lucy24

6:15 pm on Feb 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Serious question: What does a robot gain by reading robots.txt and then ignoring it? Sure, you might learn the names of protected directories-- but these stories never seem to end with unwanted visitors asking for /admin/ or /includes/. And the mere act of asking for robots.txt is often enough to flag you as a robot. So what's in it for them?

keyplyr

8:49 pm on Feb 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What does a robot gain by reading robots.txt and then ignoring it?

Possibly to give the illusion that the protocol was followed, or to get past filters.