Forum Moderators: Robert Charlton & goodroi
Google's proposal is to insert a new token after the hash mark. That would alert crawling technology that the following information creates a new page state, often by going to the server to update only part of the page content.
The new token Google proposes is an exclamation mark added immediately after the hash tag, so a stateful AJAX url might look like this:
http://www.example.com/page?query#!state
This approach would allow stateful AJAX urls to be shown in search results. More detail is available in the Google Webmaster Central Blog article [googlewebmastercentral.blogspot.com]
- MAINLY helps ONLY their SE.
- Could possibly change it's original intent AFTER the rest of web decides to use it how the web wishes, instead of how Goog wishes.
Google used up all their "benefit of the doubt" and "this is how we'd like to improve the web" points with their constant change in philosophy of rel=nofollow. At the end of the day, AJAX only really helps Google. NOT the 99.9% of the web who doesn't use or even know what AJAX is.
[edited by: encyclo at 1:29 am (utc) on Oct. 8, 2009]
Many/most webmasters will never have need for AJAX. But for those who use it, making that content more crawlable is a very sane goal.
Could possibly change it's original intent AFTER the rest of web decides to use it how the web wishes, instead of how Goog wishes.
This proposal comes to meet a real need that a portion of the web has already decided on. Not everything Google does is evil.
I'd also like to see a proposal from Bing about the challenge of indexing AJAX modified pages.
Many/most webmasters will never have need for AJAX. But for those who use it, making that content more crawlable is a very sane goal.
Many/most webmasters will never have need for rel=nofollow. But for those who use rel=nofollow, making that change is a very sane goal.
This proposal comes to meet a real need that a portion of the web has already decided on
Same arguments were used around rel=nofollow. -.-
Tell it to someone who gives Goog the benefit of the doubt.
I can see where this is going, even if others don't.
The Tweet thing circulated a rumor that may have originally been about this proposal, but it was not accurately understood at all, and it came out garbled.
[edited by: tedster at 3:01 am (utc) on Oct. 8, 2009]
Hence, it helps Google itself more than any mass of webmasters or web users who are demanding a change to Ajax handling...
[edited by: encyclo at 1:32 am (utc) on Oct. 8, 2009]
Many/most webmasters will never have need for AJAX.
Very good point tedster... Most people can't even implement a simple redirect, canonicalize domains, or remove the /index.html from their directories without expert help, let alone code a site that's AJAX based.
Personally, I own two AJAX based sites and there is one site I would like to have crawled, and another I do not want crawled, so the one I don't want crawled bans not only Googlebot, but all other compliant Bots in the robots.txt, but this certainly (enormously) simplifies the SEO on the other.
Honestly, if it was me and I had the traffic Google does, I would not suggest, I would state: If you run an AJAX based website and would like your site to be crawled and indexed by Google, place an ! after the # symbol to tell GoogleBot how to access the information.
People think Google is 'overstepping' or 'out of line' by suggesting? LMAO. Be glad I'm not in charge at Google, because I wouldn't ask I would dictate, much the same way M$ does with their browsers / software...
[edited by: encyclo at 1:33 am (utc) on Oct. 8, 2009]
Example: return widget shops in various states:
http://www.example.com/returnShoplist#FL would become
http://www.example.com/returnShoplist#!FL - telling G that there is Ajax content
The bot then returns and asks for:
http://example.com/returnShoplist?_escaped_fragment_=FL
and your server generates the relevant html, through a headless browser
G would then return http://www.example.com/returnShoplist#!FL
for a query on Florida widget shops
Edit: in reply to Future way above - you guys must be able to read a lot faster than me :)
Also, I have some serious doubts about cost / benefit for webmasters vs cost / benefit for big G. I'll certainly not be jumping on this bandwagon anytime soon.
Unfortunately there is now a lot of inappropriate ajax around the web - the kind of thing that's done mostly just to display someone's technical prowess (geek credentials.) That approach is hiding useful content and I think such situations are what this proposal is an attempt to resolve.
So if I remain to use # (i.e. do not implement #!), would this mean that in this way I can avoid the duplicate content?
Or would it require extensive use of rel="canonical" on such pages?
Or basically, what I am asking is will the pages that use AJAX but not change # into #! still be ignored and not indexed by Google, as in this case it is within the user control whether they want the page with changed state using AJAX indexed or not?
[edited by: aakk9999 at 2:18 am (utc) on Oct. 8, 2009]
But from what I understand, Google proposes that they index changed state of the page. So I currently have:
www.example.com/page1.html
www.example.com/page1.html#pic1
www.example.com/page1.html#pic2
www.example.com/page1.html#pic3
So Google says all these could appear in SERPS?
And in my example these pages are the same really apart from a different photo.
So if I remain to use # (i.e. do not implement #!), would this mean that in this way I can avoid the duplicate content?
As I understand Google's proposal, you are correct in this analysis. The issue for Google is the same as for you - currently they cannot adequately distinguish distinct content changes on AJAX-driven pages, and they can't second-guess without creating never-ending duplicates. The proposal would offer the appropriate hints to Googlebot as to what is unique content and what is not.
Google is at the forefront of AJAX usage, but they are not looking to improve indexing their own content, rather to promote AJAX adoption and improve their SERPs. It is in their interest to get more AJAX-driven content into their search results (improving search quality), and such inclusion would be mutually beneficial for webmasters who are currently wary of adoption AJAX solutions due to the technical challenge of getting into the current Google index.
For example in the cases where the frame (heading / footer / sidebars) of the page is graphic heavy and AJAX is used to change text content to save load time. So generated "page" IS a new page when looking from the content point of view but Google is not able to crawl it at the moment.
As long as it is in the user control whether they add ! then this should impact only ones who want to be impacted.
It is important, I think, to consider both the AJAX proposition and the (already implemented) new consideration of fragment-identifiers as two sides of the same coin. This is the same technical issue: fragment-identifiers are the "old-school" use of "#". Ask yourself - how will your current AJAX implementation be seen in relation to Google's attempts to subdivide pages into named fragments?
Ask yourself - how will your current AJAX implementation be seen in relation to Google's attempts to subdivide pages into named fragments?
Interesting thought...
Perhaps Google can distinguish between # where the reference is to an anchor within the same page (which, from what I have seen, are the cases where page fragments were shown as "Jump to" links in SERPS snippet), from # where there is a request to get some part of the page content from the server by using AJAX.
With AJAX pages being indexed, I would imagine that this would not be a "jump to a page fragment", instead it would appear as URL in its own right (including #!etc) in SERPS.
Maybe Google should allocate another "test data centre" like Caffeine to webmaster community before they put these changes live as if it ends up with bugs then who knows what this could do to SERPS, to both, sites not using AJAX (influx of new indexed content), and sites using AJAX (could suddenly trip various filters owing to all the new site content suddenly being indexed).
Or should the SEO strategy for AJAX sites be "add ! to # little by little and see the impact on your site ranking..."
Perhaps Google can distinguish between # where the reference is to an anchor within the same page (which, from what I have seen, are the cases where page fragments were shown as "Jump to" links in SERPS snippet), from # where there is a request to get some part of the page content from the server by using AJAX.
Actually aakk9999 that part is fairly easy, because all you really have to do is parse the page referenced with #Reference in the link and check for <a name="Reference">Anchor Text</a> and you can determine it's already content on the page.
I may be over-simplifying slightly, but for the most part /page-linked.html#Reference + name="Reference" within <a > on /page-linked.html, indicates a named anchor. It's actually relatively simple to determine an anchor tag link given the overall complexity of spidering the Web. If it's not a named anchor <a name=""></a>, you can reasonably determine it's another type of reference, which would include AJAX.
In reading the WebmasterCenteral Blog, (I skimmed through it) I think where they have the tough part is actually parsing the JS / AJAX to get to the content, so they want you to serve it in a slightly different manner to SE spiders, GBot in particular, than via JS / AJAX, and the #! would indicate accessibility...
I could be mistaken in the preceding though as it's late and I have had an adult beverage. :)
Since the # is the only character that separates server side and client side parts of the URL, there is anyway not much choice, if you don't want to make major changes on how URLs are interpreted today.
Current ajax implementations show the default page state to a spider and also to the first arrival of a human visitor, before they click on an ajax link. That does not create a cloaking problem.
And I don't believe it's a coincidence that #! is a Unix "hash-bang", a 2-character command indicating that the contents of a file is a script which should be executed.
And with this announcement we can abolish any notions that bots can't or don't execute javascript. We've known for some time that they can, and now we can be sure that they will. A URL with a #! requires the user-agent to execute AJAX requests to retrieve content for indexing, which requires the user-agent to execute scripts. QED.And I don't believe it's a coincidence that #! is a Unix "hash-bang", a 2-character command indicating that the contents of a file is a script which should be executed.
I was wondering about this, and keep thinking if that's the case (they can / do execute JS), then why ask anyone to make a change? Why not just spider it if they can?
And then they wonder why Google doesn't love them.
For those sites that already have tons of content buried in AJAX, this will be a great way to get content attribution without having to create a separate channel for accessibility or having to rewrite a bunch of code.
I like it. I hope it gets adopted.
Graceful degradation is the way to go, this ensures that all of that "buried" content will be accessible by the broadest range of users. Including lynx, screen readers, mobile devices - and not just Google.