Forum Moderators: open

Message Too Old, No Replies

XHTML and Spidering

Are there any issues?

         

pageoneresults

11:17 am on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are there any issues that I should be concerned with when converting to XHTML? Are the spiders recognizing the " /> at the end of the meta description? What else should I be concerned about? I converted my first page early this morning and it validates 100% against...

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "DTD/xhtml1-transitional.dtd">

That was a piece of cake. Now what?

tedster

1:28 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Back in January on this thread [webmasterworld.com] detlev reported that his XHTML pages were doing as well or better than a comparable HTML page.

About those meta tags - isn't this form also valid?
<meta name="description" contents="Ladle read rotten hut"></meta>

bird

2:20 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are the spiders recognizing the " /> at the end of the meta description?

Why would they have to? They only need to be able to ignore it. If you place a space before the /, then they will treat it as an "unknown attribute" to the respective tag. If you leave away the space, then a simplistic parser may not be able to seperate the / from the preceding tag name or attribute. And you don't want it to appear at the end of your description in a SERP, do you? ;)

About those meta tags - isn't this form also valid?
<meta name="description" contents="Ladle read rotten hut"></meta>

From the XHTML 1.0 specification [w3.org]:

...
[big]Appendix C. HTML Compatibility Guidelines[/big]
This appendix is informative.
...
[big]C.2 Empty Elements[/big]
Include a space before the trailing / and > of empty elements, e.g. <br />, <hr /> and <img src="karen.jpg" alt="Karen" />. Also, use the minimized tag syntax for empty elements, e.g. <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents.
...

So yes, the </meta> would be technically valid, but probably not a good idea. I'd avoid the experiment with a real site.

Xoc

2:58 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes. I recommend that you always use <meta></meta> instead of <meta />. The same for every tag that allows a closing tag, such as <input>. The only ones that I can think of that require you use the < /> format are <br />, <hr />, and <img ... />.

I have had absolutely no problem with my XHTML sites with any of the major search engines.

So I agree with tedster and disagree with bird, based off my personal experience.

pageoneresults

4:46 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks to the contributors so far! Xoc, you said...

I recommend that you always use <meta></meta> instead of <meta />.

I cannot find anything on the W3C spec that clarifies this. I always view the source code of the W3C pages to see what they are doing and they use the " /> format.

My goal is to follow the spec and not take any shortcuts if I don't have to.

In regards to XHTML and its current status, do you see the web making a transition to XHTML or will this end up being a long drawn out battle like CSS?

And, in the case of simplistic brochureware sites, would it be necessary to make the conversion? Are there any advantages from an SEO standpoint. I read what detlev had to say in the post mentioned above but I'm not fully understanding the implications just yet.

pageoneresults

5:20 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just did some further testing using Brett's SIM Spider and another program that I have. Both did not recognize the meta tags with the " /> format. When I took that away and added </meta>, everything spidered fine. So, what gives?

papabaer

5:32 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have been using <meta /> for months with no problems. Many of my pages written as such have excellent rankings - top spots. My descriptions appear as intended as well.

pageoneresults

5:38 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hey there papabaer, I was hoping you'd join in. Just tried the <meta /> and that one spiders fine.

Okay, so there are three methods to closing off the <meta> tags...

1. " />
2. </meta>
3. <meta />

The first one is not recognized by the spidering programs I am using and it is the recommended format by the W3C. The second two spider fine. What concerns me is the W3C recommendation which does not mention the use of #2 and #3. So, where is the XHTMLGuy?

(edited by: pageoneresults at 5:57 pm (utc) on Mar. 22, 2002)

bird

5:46 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1. <" />

Uh, what exactly is this supposed to mean?
If you put that in like this literally, then you have a serious problem (an unclosed string attribute). Or is this a shorthand for something else?

pageoneresults

5:52 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1. " />

That is the W3C recommendation for closing off the <meta> tags. Here is a piece of meta from their own site...

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

I should have included the content before the <" to make sure there was no confusion. I'm not closing off the meta with <" />. Sorry, I updated my post above and eliminated the beginning <.

digitalghost

5:58 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The meta element is a child of the head element and is an empty element.

<meta name="author" content="author name" />

The space and a slash before the final angle bracket is valid as meta is an empty element.

DG

pageoneresults

6:00 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<meta name="author" content="author name" />

Okay, that puts me back to my original question. If the spidering programs I'm using (including Brett's) don't recognize the W3C recommendation, what does one do?

pageoneresults

6:03 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just respidered using methods #2 and #3 and both do not validate! Arrrggghhh! Darned if you do and darned if you don't!

digitalghost

6:10 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The empty element tag doesn't have full support yet. The sites I am familiar with use the valid form and haven't had any issues with spiders or being indexed properly, yet.

Google has absolutely no problem with it, but then again, Google will index the contents of your cigar humidor if they find a link to it. :)

I don't know if the bots you are using have any support for XML. With XML, more than any other markup, I find it especially important to go with the W3C recommendations. The browser developers are particularly lost as are most search engines when it comes to XML so I expect they will look first to the W3C when developing for it.

DG

pageoneresults

6:16 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks digitalghost. That brings up an important point, I believe the W3C states that the spidering program needs to read the DTD to parse the document correctly. Would I be correct in assuming that the robots need to be reconfigured to read the XHTML DTD's?

On a side note, Brett, is the SIM Spider set up to validate XHTML properly?

tedster

6:20 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



#1 above is a shorthand version of #3. And #3 still is shorthand because it has no attributes.

digitalghost

6:28 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are you creating a valid document, or a well-formed one?

XML data can be processed without a DTD. If the XML is well-formed all the entities are declared. In many instances a DTD is used during the authoring, then discarded as it slows processing.

Spiders may need the DTD now, although I've seen well-formed XML documents properly indexed, but that shouldn't be a long term requirement.

DG

papabaer

11:50 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I should have clarified, by <meta /> I meant as per WC3 recommendations for handling empty elements e.g.

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />

I have been following the recommended specs since converting to xhtml early last year.

Since migrating to XHTML my pages have had great success in the SERPS, this is why I believe there will be a strong migration to XHTML & CSS - the optimized pages are much more "spider friendly."
Especially when use of tables is limited to "tabular" data and not used for layouts.

pageoneresults

11:54 pm on Mar 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm still concerned that the spidering programs that I've used do not see the meta tags when using " />. Is it possible that these spider tools are not set up to parse XHTML? One of them is Brett's SIM spider and I would sure like to think that he has it set up to handle the latest and greatest!

Xoc

3:37 am on Mar 24, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's been my experience. All the browsers handle either syntax. But spiders are different. There's no easy way to test them. Almost all spiders are going to follow the rules in HTML 3.2. And 3.2 had <meta ...></meta> as an optional syntax, whereas <meta ... /> is just a fluke that it works in a browser.

I was finding that some of the spiders were getting confused, but that was over a year ago. That's why I recommend the closing tag. Both are valid XHTML, but as we all know, just because it's in the spec doesn't mean that the parser handles it correctly.