|brotherhood of LAN|
|To make things easier to debug, we're currently working on a tool for helping webmasters better understand how Google renders their site. We look forward to making it to available for you in the coming days in Webmaster Tools. |
|brotherhood of LAN|
Bill, it's clear that Google can use JS no problem for all the reasons you listed, though due to JS's event driven nature, how deep do they look?
My guess is that content is analysed after any "onload" events, essentially what the user would see after the page has 'loaded' (OT, fwiw it's pretty difficult to decide when a 'page' has 'loaded' in the context of what the document is 'about'). I doubt that it hovers/clicks/any-event every pixel and node of a page. Even inspecting every listener would be quite time consuming.
It seems the majority of commentary is from 3-4 years ago simply acknowledging that Google has fetched what was previously hidden behind JS events.
|due to JS's event driven nature, how deep do they look? |
Not just events. There's system information too. I check for the presence of specific, named fonts, and make certain minor but visible adjustments accordingly. (For comparison purposes: The plainclothes bingbot behaves as though it has Euphemia but not Pigiarniq. This is plausible. So far I haven't been able to find out what the Googlebot says about itself.)
For a human visitor, <script> and <noscript> are mutually exclusive. A robot of any kind-- including but not limited to search engines-- can choose to follow up on both. (Again for comparison purposes: facebookexternalhit takes the noscript option.)
Waiting for <meta name="robots" content="doNotTriggerEvents"> in order to prevent G from messing with my outbound links statistics...
google announced today on the webmaster central blog that the Fetch as Google feature of Webmaster Tools will now show how Googlebot would render the page:
What was the function? It was a trigger to log time spent on a web page either upon leaving the page or losing focus to another tab. So our client was quite peeved when his users had not been billed.
|the Fetch as Google feature of Webmaster Tools will now show how Googlebot would render the page |
:: detour to check some specimen pages ::
Oh, come on, Googlebot. You want us to believe you don't have a single member of this list?
Euphemia, "Euphemia UCAS", Pigiarniq, "DejaVu Sans", AiPaiNutaaq, Ballymun, "Ballymun RO", Code2000, NunacomU, Uqammaq
To make sure, I checked the one page where I name a font explicitly rather than rely on font substitutition. Further experimentation confirms that they understand font substitution and are perfectly willing to do classical Greek. They can even render Devanagari correctly. Whew. But really, Googlebot, no Euphemia? Let's not be ridiculous.
(There was a point to the preceding. I wanted to compare their behavior to that of the plainclothes bingbot.)
:: further detour to logs ::
They really do use the Googlebot UA, not the Preview they use for most WMT functions.
|Bill, it's clear that Google can use JS no problem for all the reasons you listed, though due to JS's event driven nature, how deep do they look? |
All I know for sure it what scrapers are now capable of doing, including what I can do with PhantomJS, and I'd assume Google with a really big team can do a heck of a lot more.
Any AJAX content can be read would be my first assumption, unless they're not as good as scrapers.
Google has crawled AJAX+pushState XML pages on my XHTML website so it can analyze the window.onclick event. As far as I can tell Googlebot is now a highly automated Chrome browser; in other words if you're cloaking in any form or fashion it's going to see it. If you're triggering popups to uncanny websites, it's going to see it.