Forum Moderators: open

Message Too Old, No Replies

Applebot

Fresh from Apple

         

Pfui

3:24 pm on Mar 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From the URL in the UAs below:
---
Applebot is the web crawler for Apple, used by products including Siri and Spotlight Suggestions. It respects customary robots.txt rules and robots meta tags. It originates in the 17.0.0.0 net block.

User-agent strings will contain “Applebot” together with additional agent information. For example:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1)

If robots instructions don't mention Applebot but do mention Googlebot, the Apple robot will follow Googlebot instructions.

Last Modified: Feb 27, 2015
---

In recent weeks, numerous UAs have been landing from 17 -- none from Apple Host names, some with Apple UAs with variations of "(Fetcher)" and "(Getter)" appended, some not. Alas, NONE asked for robots.txt -- despite scores and scores and scores of hits every day. Thus this lifelong Apple user went from curious to annoyed to furious and opted to 403 everything from 17.

Finally over the weekend, 'new' UAs bearing the Applebot details appear. 'Bout time! But --

Hits from multiple Apple IPs with this iPhone UA *did not* ask for robots.txt:

17.142.152.72
Mozilla/5.0 (iPhone; CPU iPhone OS 8_1 like Mac OS X) AppleWebKit/600.1.4 (KHTML, like Gecko) Version/8.0 Mobile/12B410 Safari/600.1.4 (Applebot/0.1; +http://www.apple.com/go/applebot)

Whereas hits from multiple Apple IPs with this Macintosh UA *did*:

17.142.149.254
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)

FWIW: Thus far I'm not sure if their stated Applebot/Googlebot behavior will hold. (Ironically, my main Mac ended up the Apple Store with a dead logic board yesterday so I'm currently unable to allow 17 or tweak my robots.cgi scripts and watch what happens.)

keyplyr

10:25 pm on Mar 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month





Related discussion: [webmasterworld.com...]

lucy24

12:41 am on Mar 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hits from multiple Apple IPs with this iPhone UA *did not* ask for robots.txt

That's pretty funny, since that's the identical behavior I noticed last month with bing's fresh-minted Mobile Bingbot. The mobile (iPhone) version never asks for robots.txt; only the vanilla non-mobile version does.

While double-checking this in logs, I discovered that, contrary to what I'd initially thought, bing gave their mobile bot a dry run for a few days at the end of November before bringing it on full-force in mid-January. But I digress.

What do you suppose is the rationale behind the Googlebot equivalence? "Well, obviously we're a cut above all the other robots, so we'll follow the rules intended for the crème de la crème" ... ?

keyplyr

5:21 am on Mar 3, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




17.142.152.72 - - [01/Mar/2015:13:58:48 -0800] "GET /example.html HTTP/1.1" 200 9175 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)"

Did *not* request robots.txt (where it is disallowed by name) and proceeded to crawl a dozen pages w/ related files.

Now blocked by UA.

keyplyr

10:39 pm on Mar 8, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Now it *is* requesting robots.txt (where it is disallowed) and blatantly ignores it:

17.142.152.100 - - [07/Mar/2015:07:56:08 -0800] "GET /robots.txt HTTP/1.1" 200 1521 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)"
17.142.152.100 - - [07/Mar/2015:07:56:09 -0800] "GET / HTTP/1.1" 403 968 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)"

aristotle

7:06 pm on Mar 13, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There was no request for robots.txt on this visit:
Host: 17.142.151.80
/example.html
Http Code: 200 Date: Mar 13 08:26:14 Http Version: HTTP/1.1 Size in Bytes: 11577
Referer: -
Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)

So sometimes it asks for robots.txt and sometimes it doesn't.

blend27

5:11 pm on Mar 22, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Read custom robots.txt(IP not on the list)

User-agent: *
Disallow: /

and simply and ignored it.

No RDNS, No Headers...

ip: 17.142.152.130
rdns: 17.142.152.130
time: {ts '2015-03-22 010:56:06'}
method: GET
protocol: HTTP/1.1
-----------------------------------------
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Applebot/0.1; +http://www.apple.com/go/applebot)
Accept-Encoding: gzip,deflate
Content-Length: 0
Connection: Keep-Alive


That is -13 points to get banned in my script.

What interesting is that I get the point about them trying to teach Siri to understand more about what is on the web, but when a stardust(robots.txt, my file I call what I want) protocol is ignored, common...., from the company that makes it almost impossible to ....Steve would probably not approve this.

dstiles

8:34 pm on Mar 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This is becoming google-ish. G has been pushing bots of one type or another through their public proxies for years. Now apple is pushing applebot through apple proxies.

I allow all of 17.142.0.0/16 for proxy users but block applebot.

keyplyr

11:08 pm on Mar 23, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just a FYI - I did an iPhone location search for businesses near by. My company is prominent in local search for Google & Bing, but the iPhone did not list my company even though I was 5 miles away. This is likely due to me blocking Applebot which says:
Applebot is the web crawler for Apple, used by products including Siri and Spotlight Suggestions for iTunes, App Store, movie showtimes, locations nearby, and more.
(emphasis mine)

I just removed all blocks for Applebot :)

dstiles

7:52 pm on Mar 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They have loads of IPs. Why can't they do the job properly? :(

It remains blocked here - at least, until I get proof it's beneficial.

keyplyr

11:09 pm on Mar 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Several hours after I removed robots.txt disallow & server block for Applebot, sure enough it showed and crawled 60 pages; coincidence?

I'll give it a day or two for indexing, then borrow an iPhone (I don't personally use iPhone) and retest the location feature.

dstiles

8:55 pm on Mar 25, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'd be interested in the results, keyplr. You may even persuade me to remove the block. :)

keyplyr

9:10 am on Mar 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Voila! Siri found my business, associated it with what we do, recommended us & gave driving directions.

It seems Siri will only recommend & give directions to a business listed in Yelp, Insider Pages, City Search, Bing Local or Google Plus Local. But blocking Applebot (Siri's bot) from finding your business will cause Siri to not include it, presumably because Siri needs to know about the business first, then it weighs web presence and customer reviews to rank your business before recommending it.

We were the only business doing what I searched for within 10 miles, so we were the 1st one Siri recommended. It would be interesting to test "coffee houses" to see how much weight customer reviews & web presence affect recommendations.

Only tried this on a new iPhone 6+ so don't know if earlier builds/iOS function the same way.

Note: I do use ICBM and Geo META tags. Don't actually know if these have ever helped SEs locate me but they're supposed to :)

Also, found this posted in a Siri discussion group:

6 Steps Local Businesses Need To Take To Make Sure They Are Listed With Siri

#1 Maintain Their Google+ Local Page. This includes making sure it displays the correct business name, address, phone number, website, videos, and pictures.

#2 Rating And Reviews Are Important. Siri will incorporate both ratings and reviews when a user asks for a business. Rating and reviews are becoming more important with getting your business found. Encourage your customers to give you positive reviews on Yelp, Insider Pages, City Search and Google+ Local.

#3 Get Listed On Local Directories. It is not enough to just be on Google Plus Local (formerly Google Places), Facebook and Twitter. Your business needs a profile and to be listed in other local directories and review sites such as: Foursquare, Savings.Com, Yahoo Local , Bing Local, Yelp, Grubhub, Open Table, Judy’s Book, City Search, Urbanspoon, Superpages etc.

#4 Remove Obstacles Blocking Information On Your Website. This means remove javascript, flash graphics and animation. Moving graphics are poison to mobile websites and invisible to Google rankings online. Make sure pertinent information is not hidden on subpages or pdfs.

#5 Optimize Your Website For Mobile. More people are on their phones now and will access your website on the go. Make sure your address, local phone number and hours of operation are on the home page of your site. If you are using WordPress, Joomla, or Drupal to manage your site, this is a fairly simple process. If you don’t use one of those, hire someone to produce a mobile version of your website.

#6 More Listings = More Exposure. Make Siri’s job easier so she’ll be able to send you more business. The more local directories and review sites you list your business on, the more chances you will have of customers finding you.

lucy24

9:32 am on Mar 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



remove javascript, flash graphics and animation. Moving graphics are poison to mobile websites and invisible to Google rankings online.
...
If you are using WordPress, Joomla, or Drupal to manage your site, this is a fairly simple process. If you don’t use one of those, hire someone to produce a mobile version of your website.

They seem to be making a lot of assumptions about their readers'/users' intelligence. Are the assumptions correct?

If you want to make webpagesthatsuck dot com really, really mad, make your restaurant's menu available only as a downloadable PDF, and put all your essential time-and-place information into an image.

Pfui

11:24 pm on Mar 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't tend to any retail/brick-and-mortar business sites. Is the lookup local business-focused, or inclusive of sole proprietor/service sites or more generic searches?

keyplyr, what happens when you ask Siri for, oh, WebmasterWorld (or Webmaster World), plz? Does she/it say the URL?

keyplyr

11:53 pm on Mar 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Not an Apple fan. Don't know anything more than what I posted, sorry.

incrediBILL

4:29 pm on May 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In what universe is an APPLEBOT the same as a GOOGLEBOT?

If Googlebot is mentioned in robots.txt and not Applebot, and they follow Googlebot instructions, how in the hell is that following robots.txt?

They are not Google and if I were Google, I'd had a real problem with that web page and that bot.

That logic alone makes me want to block their stupid bot because it's 100% WRONG!

keyplyr

9:12 pm on May 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



if I were Google, I'd had a real problem with that web page and that bot.

That may well be one of the reasons, giving their relationship.