Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- And Now Google's Doing It. JS Stats Show GoogleBot


TheMadScientist - 2:51 am on May 15, 2011 (gmt 0)


As on-the-fly rendering is only done based on a user request (when a user activates previews), it’s possible that it will include embedded content which may be blocked from Googlebot using a robots.txt file.

Uh, it doesn't say when a user would like to preview your page, but when a user activates previews ... Which user on which SERP I wonder? My guess is they're telling the truth: When a user [any user] activates previews [on any SERP] ... lol ... They might not be quite that bad about it, but the wording they use allows them to be...

If that's really the case and the reason why they disregard robots.txt, then why not send an X-Forwarded-For header? Because the user doesn't actually have to request it, maybe? They're so good at the wording ... If you don't 'split hairs' it reads like the user has to request the page preview, but if you really look at the way it's worded, they disregard robots.txt on the CHANCE 'a user' might want to see it when 'a user' activates previews ... They do not do it 'at the request of the user', so they don't send the X-Forwarded-For [because it's not]; they do it on the chance [activation of previews] the user might want to see it.

Basically, the wording is a well worded: 'Hey, this user-agent completely disregards robots.txt, because it helps us out and we feel like it...' statement ... Nice!

In order for images to be embedded in previews, it is important that they are not disallowed by your robots.txt file. In order to block crawlable images from being indexed, you can use the "noindex" x-robots-tag HTTP header element.

[ClearsThroat] B***S*** ... What is it likely Disallow: /img relates to in the robots.txt I posted? The previews are fine...

https://sites.google.com/site/webmasterhelpforum/en/faq-instant-previews

Yeah, I went and did the searching I didn't feel like doing.

I guess at least they are really pretty much up front about throwing robots.txt out the window, because a user might want them to after activating previews, and we all know their users are way more important than web standards and OUR users ... Plus, we can feel free to disallow GoogleBot from files, and they don't even mind, because when 'a user' activates previews they're going to fetch the disallowed files anyway ... Hmmmm ... Do you think they really wait(ed) for a user to land on a SERP with the page they request on it every time or do you think it's more likely they wait(ed) for exactly what they say: A USER to ACTIVATE PREVIEWS and then immediately start(ed) spidering pages?


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4312058.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com