Forum Moderators: open
Nothing is wrong with html: after all, xhtml is just xml-compliant markup that corresponds to the html 4.01 schema, basically. Still, your page still isnt *quite* xml. And the way things are currently going, you can almost bet that pages in the future will be written mainly xml( or at least a growing percentage)
... But in terms of programming generated pages in .net, for instance, its a breeze from a programmers point of view to generate xml content with stylesheets to make it render appropriately.
Unfortunately, only gecko-based browsers and opera can hanndle it...
But, I have faith that the next version of IE can handle it.
If not, XSLT can then be easily encorporated for IE users.
Bottom line, I like playing with the latest and greatest, even though its not always commercially feasible.
I think the question should be could google index the xml and the associated xslt style sheet with its h1 tags anchors alts etc.
Or more succinctly what factors would google use to rank an xml document.
The xhtml standard is I believe supposed to be transition or bridge from html to xml compliancy for the web as whole according to W3 if i remember correctly.
Anyone please feel free to correct anything above.
Seperation of presentaion from content is one. There are many others though.
You're right though john I was thinking in terms of client side transformation...Still I'm amazed that we have to transform a new standard into an old standard for likes of someone like Google.
"I was trying to surf an xml doc in an old version of Netscape earlier today. Barf, said the browser."
The old netscape dosen't have an xml parser. Jeez I'm not trying to be funny but if you're using an old Netscape I really am worried for Google.
[edited by: tantalus at 12:45 pm (utc) on Dec. 18, 2003]
Doesn't Google's searching the XML as txt find the data?
If there are transforms associated then, in a world of browsers that can do the business, the user would see the data transformed..
I'm surprised people are expecting Google to do the transforms.. but then I am new to XML/XSLT.
My impression was that XHTML 1.1 -> 2.0 was here to stay and XSLT can generate it easily enough.
Anyway if you're interested..
I did a search for .xml heres a couple of examples from the serps:
www.****xx.com/weblog/index.xml
File Format: Unrecognized - View as HTML
Similar pages
xx.****.com/index.xml
File Format: Unrecognized - View as HTML
Similar pages
All of them say "File Format: Unrecognized" and all are rss feeds if that makes any difference.
Click on view as html and you get a blank google cache
I'd be far more worried if Google wasn't interested in being compatible for the entire searching public. There are still large corporations and government agencies using Netscape 4.7.
Under 5% usage isn't much to worry about being inaccessible for with the site Joe_Webmaster's nephew made for him in Front Page that gets maybe 500 uniques a month looking for budget widgets, but when you're getting into billions of searches that's a lot of users whose needs would be neglected.
1. Does Googlebot follow a <a href> found in an XML page?
2. What about links that are not in the form of <a href>, such as <link>http://www.domain.com</link>. Does/will google follow those?
I guess my main question is, are pure XML pages currently "dead ends" for Googlebots?
see this thread maybe [webmasterworld.com ]
It seems probable that the days of coding the actual display language will end and a machine language standard will be adopted universally and everything will be done by command text docs or wsiwyg like console apps are done today before the computer world gets turned upside down by xml or xhtml.
Wishing html was dead won't make it so.
but I expect it'll be all the rage in ~2006+?
I suspect a little sooner than that.
But it is certainly the future path - a search around for tools and applications that use XML as a mark-up language are testament to that.
It's not about browsers. It's about the integration of many platform-independant systems over the internet, of which browsers form part.
I doubt whether HTML will ever be "dead" - it will just move through various incarnations.
XML is not a mark-up display language, it's a protocol.
are pure XML pages currently "dead ends" for Googlebots?
If you link to a "pure" XML page then probably, yes (although google may index the text, but I very much doubt that). Certainly googlebot would not follow an XML <link> style tag.
TJ
I'd love to know whether your attaching an xsl stylesheet to your posts :) oops its just gone.
I quickly looked at the advaned search on google and noticed that in 'return results of the file format' drop down neither .txt nor xml was listed.
It does seem to index the title and follow links to but that seems to be about it.
XML is not a mark-up display language, it's a protocol.are pure XML pages currently "dead ends" for Googlebots?
If you link to a "pure" XML page then probably, yes (although google may index the text, but I very much doubt that). Certainly googlebot would not follow an XML <link> style tag.
I agree.
The problem with XML from the point of view of a search engine robot is what it says on the tin, ie eXtensible Markup Language. There are lots of different namespaces and flavours of XML which is one of its appeals, it can be all things to all men. I guess that if it does get past the XHTML stage then there would probably be a limited range of doctypes and dtds that search engines would be prepared to crawl and parse.
I think that the tail may be wagging the dog for some time to come on this one and who in their right mind is going to produce a web page that SE robots can't crawl when it is actually easier to produce one that they can.
Best wishes
Sid
It does seem to index the title and follow links to but that seems to be about it.
That doesn't surprise me.... although "indexing the title" I don't really understand. Are you sure it's not indexing the anchor text of the link to the XML file?
XML is just text. If you create an XML file, but with an .HTML extension, then google will index it. And if you use <a href=> style tags for link structure, then it will probably follow the link and transfer PR through. But it will not validate, and to google it sure will look ugly.
XML is really just a protocol and data storage format. The data from an XML file is parsed into an HTML file for display to the user. And it's the "display to the user" part that google is interested in.
XML is also used to call a function, method or procedure on a SOAP server or other XML based server application over a network. Googlebot would have absolutely no interest in that.
<guess>So I suspect what you're seeing in google is indexed anchor text and nothing more</guess>.
TJ
If you want to play with the latest and greatest, that is fine. Go for it. But if you want to be useful to the greatest number of people, then go with HTML. Google's goal is to serve the many, so they need to be concerned with what works with most browsers.
I expect that xml will gain in popularity, but there is no compelling reason for most sites to change from HTML. There are billions of static pages out there that are going to stay on the web for a long time, and they are owned by people that have no interest in being on the bleeding edge. XML will have to be in the browsers for quite a while before it even starts making a dent in the total number of pages out there.
Kinda right...it seems to use the url as the title, sorry wasn't looking.
kirkcerny501
It might be to do with the doc type you are using...ie
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
"Google needs to worry about the *majority* of their users"
I agree with your sentiments bigdave and, at the end of the day as was said previously all you need do is implement a server side transformation...but what i don't understand is when W3 is trying to set a new standard (this going back two three years now) with all the big guns represented on the comittee ie M$ and IBM etc, I would at least expect Google to show a passing interest. Particularly with potential impact it could have on the net.
Does google like <br> better than <br />
Hi,
I have an XHTML site/page which is in SERPs at #3 for its primary term. It has quite a few <br /> and /> at the end of image tags this does not seem to harm its SERPs ranking.
From a sample of one I can say that XHTML tags do not seem to effect Google.
Best wishes
Sid
what i don't understand is when W3 is trying to set a new standard (this going back two three years now) with all the big guns represented on the comittee ie M$ and IBM etc, I would at least expect Google to show a passing interest. Particularly with potential impact it could have on the net.
Here is someting that a lot of people in the computer industy just don't understand. Standards bodies don't set standards. They never have and they never will. Even after it is voted in as a standard, it is not a standard.
The only real form of standard is a defacto standard, where it is the one that is actually used.
Look at all the HTML that was depreciated with the introduction of CSS, or the new tags such as <strong>. Did those tages really go away with the introduction of the *new* way 6to do things as the "standards" suggest? Hell no! Because the users did not depreciate them. In fact, the older method is usually a lot easier to read.
So what is represented in the HTML 4 docs is not really the standard, Everything new is part of the HTML 4 standard, but those things that have been removed or depreciated from the doc are no less a part of the standard. No one writing a browser will start pretending that they do not know what <b> means, just to meet W3C specs.
I'm sure that there are people in google looking at xml, and they will be ready to fully index it when they decide there is value to it. Probably the best thing XML supporters can do is to start putting up xml pages so that they can reach a critical mass in the supply of pages. They should just understand that if they want to be on the bleeding edge, that it is their blood that will be spilled. Those pages will not rank well for now.
It is better to start new topics in a new thread..
Closed tags such as <br /> are XHTML rather than HTML so your a couple of steps ahead of yourself with a HTML 4.01 Transitional header, which is why your getting suggestion that the tag is xml.
You might want to read this thread.. [webmasterworld.com ]
HTML 4.01 Transitional.. Should I change this doc type statement, delete it, keep it?