homepage Welcome to WebmasterWorld Guest from 23.22.173.58
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

    
Missing <BODY> tag - will Google not index the page properly?
doughayman

5+ Year Member



 
Msg#: 3815098 posted 2:42 pm on Dec 28, 2008 (gmt 0)

To all,

If a standard HTML-based web page does not contain either a <BODY> tag, a </BODY> tag, or both, will Google fail to crawl and index the page properly ?

Are there any other meta tags and associated delimeters that Google frowns upon, if missing ?

Thanks in advance !

 

encyclo

WebmasterWorld Senior Member encyclo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815098 posted 3:04 pm on Dec 28, 2008 (gmt 0)

The
body element is optional in HTML, so there is no technical reason why Google would have a problem with this. The closing tag is certainly unnecessary, but as you suspect I would think the issue is more to do with Googlebot successfully distinguishing the end of the head section and the beginning of the body section.

Assuming there is no confusion in the sense that the page is valid HTML, and that no head elements appear in the source code after any content, then you will almost certainly not have any problems. The lack of a clear delimiter makes it more important that there are no errors in your markup however.

As for the question about other delimiters and required elements, the most frequent cause of errors are due to unclosed elements, meaning that the parser can skip over some of your content.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815098 posted 3:07 pm on Dec 28, 2008 (gmt 0)

If a standard HTML-based web page does not contain either a <BODY> tag, a </BODY> tag, or both, will Google fail to crawl and index the page properly?

That is an interesting question!

I'm going to set up a few pages just for my own peace of mind. I'm aware that you don't need any of the normal markup that we have been accustomed to using. I wonder how a document will perform if it had only content and NO <head>/<body> elements?

doughayman

5+ Year Member



 
Msg#: 3815098 posted 3:54 pm on Dec 28, 2008 (gmt 0)

I should have stated that I have some old sites, where some of its pages are not indexed, where the HTML was handcrafted by yours truly. There may be issues with these files, that have been revealed by some of the public domain HTML validators that are available on the NET.

I was wondering if anyone had done some formalized testing in this arena, and if Google implies certain tags, if missing. I too, will need to some testing in this arena.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815098 posted 5:17 pm on Dec 28, 2008 (gmt 0)

I would think that if the files pass validation (at either HTML 3.2 or at HTML 4.01 Transitional) then the bot will have no issues to try to guess/correct.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815098 posted 6:05 pm on Dec 28, 2008 (gmt 0)

I would think that if the files pass validation (at either HTML 3.2 or at HTML 4.01 Transitional) then the bot will have no issues to try to guess/correct.

I'm setting up my test page at this very moment. It will not pass validation. I'd have to "undo" quite a bit to have a page that validated and matched the existing site. So, I'm going to deal with the 11 errors and 2 warnings that are present. We'll see how not having <head> and <body> elements changes things.

I'm not too certain I have things right. Honestly? I've never built a page without those elements.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<title></title>
<style type="text/css">@import url("file.css");</style>
<div id="body">
<h1></h1>
<p></p>
<!--#include virtual="footer"-->
</div>
<!--#include virtual="right"-->
<!--#include virtual="left"-->
<!--#include virtual="top"-->

Even with the 11 errors and 2 warnings, the page displays just fine in the browsers. The visitor would never know the difference. The bot probably won't either. I've also got a slight advantage too as I use SOC (Source Ordered Content) and can serve primary content first which of course allows Google to index what it came for, the content of the page. The rest of the fluff is secondary. ;)

If the browser displays the page as it should, the bot is most likely going to get the same thing. All that stuff in the <head> is to further refine the documents contents. I would think this experiment would help in determining how SOC comes into play since we don't have the ability to specify a <title> (see update) and description. We have to rely on the first thing the bot indexes which in an SOC environment is going to be an <h1> followed by a summary of the page content (IPW). ;)

References

7.3 The HTML element
[w3.org...]
Start tag: optional, End tag: optional

7.4.1 The HEAD element
[w3.org...]
Start tag: optional, End tag: optional

7.5.1 The BODY element
[w3.org...]
Start tag: optional, End tag: optional

Update: I was able to add a <title> element.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815098 posted 5:34 pm on Dec 29, 2008 (gmt 0)

2008-12-29 Update: My test results are in and the page was indexed just fine. In fact, it holds top positions for its targeted keyword phrases without <html></html>, <head></head>, and <body></body> elements.

As expected, <title> was indexed along with the <h1> and first <p> showing as snippet.

Yes, I can see results in less than 24 hours sometimes.

tedster

WebmasterWorld Senior Member tedster us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815098 posted 6:01 pm on Dec 29, 2008 (gmt 0)

One underlying issue here is how well do Google's error recovery routines work - and when do they fail. I know from experience that browser error recovery varies from browser to browser. Google's error recovery is in still another category, partly because their end goal is not to render the page visually (although they do some checking along those lines) but to analyze it for search relevance.

I once worked with a page that displayed fine in the major browsers, but Google's index was missing just some of the text - with some relatively uncommon terms in it. On investigating, I discovered a missing angle bracket [ > ] for a tag just before the unfindable text. I fixed that and the phrase became findable within a few days.

That experience is now a few years back, and I'll bet that Google's error recovery has continued to improve. After all, they want to find content. Even though they may not include it in the final decision, they still want to make that choice.

And in this case, it's clear that the page could be easily indexed - so that question is now answered - thanks P1R.

encyclo

WebmasterWorld Senior Member encyclo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815098 posted 1:58 am on Dec 31, 2008 (gmt 0)

When talking about delimiters for the sections of a document, we can only make vague guesses as to the way Googlebot handles the markup. I've certainly had "minimalist" pages indexed with no problems, I doubt that there are any elements which can be considered essential to the document being indexed by Googlebot. The HTML specifications require implied HTML, HEAD, and BODY elements when none are present (this means that the parser has to add them into its DOM tree), however as Google only has to extract data and not actually render the page, it may well not function in the same way that a graphical brouwser would.

The challenge is not really to see whether valid minimalist pages will be parsed, but to try and determine how Googlebot uses the body element with pages containing confusing markup. For example:

<html>
<title>test</title>
<h1>Will this be parsed?</h1>
<meta name="keywords" content="test">
<p>Or will the document start here?
</html>

If you think this kind of markup is unlikely, you can find a lot of examples where poorly-implemented server-side includes are contained within a page where the included markup is a complete HTML document rather than a fragment - and so you get multiple head and body elements within the same page.

Googlebot has to be very liberal in what it accepts, due to the nature of the pages it has to digest, in much the same way that browsers handle extremely-broken documents. However, some errors will undoubtedly make the parser skip over zones of content, as tedster mentioned above.

See this example from HTML5 developer (and Google employee) Ian Hickson: Tag Soup: How UAs handle <x> <y> </x> </y> [ln.hixie.ch] to get an idea of how user agents such as browsers and Googlebot work when handling invalid markup.

Quadrille

WebmasterWorld Senior Member quadrille us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3815098 posted 3:02 am on Dec 31, 2008 (gmt 0)

I doubt Google will worry much - but some browsers may decide to fail to display the pages as you would wish.

It's much more a browser issue than an SE issue.

I suspect that a missing <title> tag and / or metadescription would have a much greater influence on the serps.

Depending on your site, geolocation tags, charactersets and 'expire' metatags may also matter.

Test your site in Opera and all the major browsers before making final decisions.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved