Forum Moderators: open

Message Too Old, No Replies

google + xhtml revisited

an observation...

         

jackson

12:29 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



Looks like I've been smacked with an issue relating to the XHTML closing meta tag - as in "... />"

Not sure what google's story is in this regard, but here we go...

In February 2003 completely refurbished my site using CSS and making it XHTML 1.0 compliant. The pages validate against W3 and other tools. Pages look good using most current browsers. So far so good.

Did a search on google. Site is listed but full of stuff that bears no relation to what the my site is about.

Did a check using Sim Spider. Guess what? No meta description and no meta keywords. Fiddled around and removed the closing forward slash. Retested and hey, there's the description and all the keywords.

So, we seem to be drawing the same conclusion as many others here have - google cannot read XHTML compliant web pages.

There's more. As mentioned, completely refurbished the site, moved stuff around, added more pages, etc. 4 months down the road, the previous site's description is still being used - and yes, we have the June 15 date stamp. No big deal there.

However, what's worse is that all the search result's content now relates to items in various CSS tags - section titles, navigation markers and copyright notices. Click on "similar pages" and what do I get there? My notice to users using non-compliant and early generation browsers.

In fact this notice appears in the other search engine results as well - being the first bit of readable copy on each web page. Not visible in compliant browsers but is there for others.

To cap this all, ALLTHEWEB has something similar but at least they have the CORRECT DESCRIPTION of each of the pages listed - this obtained from the meta tag description. Go figure.

All this may sound like a whinge. However, I now seem to be stuck between back peddling as in going back to HTML 4.01, corrupting my XHTML compliance or coming up with some sort of "exotic" fiddle to try and fix up this mess. The issue here isn't PR as such but rather, an accurate representation of what I have out there on the Internet.

creative craig

12:34 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



google cannot read XHTML compliant web pages.

I have sites that are XHTML compliant and rank fine in Google and have done for the last 4 months.

Craig

waldemar

1:04 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



Also positive indexing here with an XHTML 1.1 compliant site.

jackson

1:20 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



Craig, its not the ranking that concerns me, its the stuff that google has picked off my site and placed in the result output - for want of a better word.

Besides the usually stuffm this is what pops up:
... this week's picture 'heading' commissioned >> personal >> stories >> all content - ©2003 xxx xxx - all rights reserved.

commissioned >> personal >> stories >> are the navigational links to other parts of the site.

Perhaps the point I'm trying to make here is that the content listed above is inelegant to say the least.

I may need to perpuate a fix of some sort. Something akin to adding in, as someone here has already suggested, a p.title tag with the appropriate and descriptive content tacked in there.

I am sure that this is not was intended when those guys at W3 put together the XHTML spec ...

waldemar

5:09 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



what's worse is that all the search result's content now relates to items in various CSS tags

commissioned >> personal >> stories >> are the navigational links to other parts of the site.

Do they appear in the first sections of your html document? Then try to reorganize your divisions. Some people put navigation and other second-level-content to the very end of the html page leaving the *real* content right after <body>...
I don't think this xhtml-related though.

Oaf357

10:21 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



The true question is what do you want Google to use?

Look here:
[webmasterworld.com...]

I think that will help you.

jackson

11:21 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



waldemar,

Thanks for your response. Put it this way, my site is image intensive. On the main page (index.html) there is nothing there but an image and then a header and by-line of sorts, the navigation links and copyright notice. With this layout, it would thus be difficult to move any of this stuff around short of adding in some "invisble" content - maybe the stuff in the description meta tag.

What blew my cool on this whole thing was using Sim Spider. Using the XHTML recommended meta closing tag in the style of "xxx. />", no description or keywords showed up in Sim Spider. Take out the forward slash and everything pops up.

Now, looking at that result and seeing all the other garbage that is now appearing in the search engine results - google as elsewhere - has brought forth the conclusion that is now the subject of this thread.

papabaer

11:37 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Been XHTML'n for a long while, definitely among the very early (VERY EARLY!) adopters.

Lots of graphic stuff too, all valid XHTML---but! Always in conjunction with accessibility concerns.

There is where you need to focus.
Google.. and your GOOGLE description, will love you for it.

Great rankings, very 'user-friendly' SERP/Descriptions.

jackson

12:15 am on Jun 17, 2003 (gmt 0)

10+ Year Member



Oaf357,

Thanks for the follow up. Took at look at that thread. Made the changes and ... back to square one.

Put in the </meta> tag and this to no avail. Did a Sim Spider validation and bye-bye meta tag description and keywords. Nothing. All I get are the words making up the page. As mentioned in response to waldemar above, my site is image intensive and other than navigational links and stuff like copyright notices, there is very little content on these pages. So, the meta tag page description is vital.

Other than the above observations I am at a loss in figuring out why ALLTHEWEB has got the page description right and all the SE's - like Sim Spider with the correct XHTML syntax - seem to give it a miss?

Hence the above conclusion, most search engines do not work effectively with XHTML compliant pages.

BTW - forgot to mention that the previous version of my web site which was HTML 4.01 compliant - none of these issues were apparent thus reinforcing my opinion. And again, page ranking is not an issue here, the technical details are.

Oaf357

12:33 am on Jun 17, 2003 (gmt 0)

10+ Year Member



You have to remember XHTML is a new thing.

It will soon be fully supported, I'm sure. But, I don't know why ></meta> isn't working for you. It works great for me.

jackson

12:40 am on Jun 17, 2003 (gmt 0)

10+ Year Member



papabaer - thanks for the follow up.

Yes, I've tried to be a good boy here as well. Besides ALT's to all my images and the usual "accessibility" features I also have put in an NS 4 type style sheet as well as notice for those browsers that cannot handle CSS 1.0 and XHTML compliant pages. Let's say this, these pages have been designed to "degrade gracefully" which is what they do as far as I have ascertained and, therein lies the rub.

Not funny when something like this pops up on the SE's results page:

"This message only appears if stylesheets have been switched off ...." etc.

Not what was intended. I'd prefer it that the SE's pick up the meta tag page description. This is not happening'

Looks like plan B is to add in p.description CSS tag. This would be invisible in current browsers but would appear in early generation browsers. Wouldn't go amiss amongst all the other stuff lying around there I suppose.

grahamstewart

12:47 am on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



add in p.description CSS tag. This would be invisible in current browsers

That sounds like hidden text - which I believe Google will heavily penalise your for.

pageoneresults

12:52 am on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since I was heavily involved in the other thread I decided to make some changes and add the </meta>. SIM Spider reads the description and keywords when using that format and it validates.

One of the first things I did was check out one of papabaer's sites to see what he was doing. He's using the preferred method /> of course.

So, it look likes the only way to appease SIM Spider is to close the </meta>. There are more than a couple right now who are chalking up their problems to this issue. When I see multiple complaints I usually raise an eyebrow and check things out.

[edited by: pageoneresults at 1:07 am (utc) on June 17, 2003]

papabaer

12:52 am on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just feel better doing things 'right.'

I've never used <></meta>, /> for all my meta descriptions, and GOOGLE (as well as ATW, et al) pickup the descriptions just fine.

I do keep meta descriptions short... many times choice (well positioned page text) appends the meta description.

I don't worry about 'sim spider' -- gut instinct and experience are good guides for positioning text that you would like to see added to your serp/page description.

[edited by: papabaer at 12:56 am (utc) on June 17, 2003]

jackson

12:54 am on Jun 17, 2003 (gmt 0)

10+ Year Member



Oaf357 - need to tender an apology here.

Er - as we say - made something of an oversight here. Didn't see the '>' in the ></meta>. Slapped the </meta> at the end of the line in question without the closing bracket. No wonder.

Fixed that up, tested and it now works in Sim Spider - (big red face).

Guess we'll need to wait a few weeks now to see what effect this has.

Still, I guess if it wasn't for this forum none of this would have ever been made apparent. Apologies and thanks guys.

pageoneresults

1:01 am on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



jackson, no apologies needed! I brought this same question up in March 2002. Never really had closure to that topic...

XHMTL and Spidering [webmasterworld.com]

Oaf357

1:02 am on Jun 17, 2003 (gmt 0)

10+ Year Member



No problem.

papabaer,

I mentioned in the other thread that as of February/March Google wasn't pulling descriptions using shorthand formatting. How exactly do you format yours?

Oaf357

1:14 am on Jun 17, 2003 (gmt 0)

10+ Year Member



Another thing I realized just a minute ago is that if Google (or any other search engine for that matter) isn't correctly indexing shorthand ( />) meta tags it's because their spiders don't care about DTDs.

Apparently ATW either has a DTD knowledgeable spider or has programmed it to accept meta tags ending with />.

jackson

1:46 am on Jun 17, 2003 (gmt 0)

10+ Year Member



grahamstewart

That I have my "This message only appears if stylesheets ..." as "hidden text" and google and the other SE's are picking up on this seems to be neither here nor there.

As "hidden text" this may be a little contentious. That message doesn't appear in current browsers - IE 5 and 6, Opera 6 and 7 and hopefully NS 7. However it is set to appear in earlier browsers. So whether its "hidden" or not is another matter.

pageoneresults and papabaer

Yes, I have just now discovered the error of my ways thanks to all the guys here.

It appears that the correct syntax to use is thus:
<META NAME="description" Content="some description"></meta>

Before I was using:
<META NAME="description" Content="some description" />

This as per the XHTML recommendations. From the foregoing it seems quite clear that using this particular method "impairs" SE functionality. This with regard to picking up meta tag descriptions, keywords and the rest of the meta tag items for that matter.

Hope this helps.

papabaer

2:11 am on Jun 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I use <meta name="description" content="some description" /> (valid XHTML) and Google has no problems with it.

In fact, I am convinced, it is the clarity of many of my meta descriptions that helps garner traffic. Whether first, fifth or tenth, a good description on the SERPS can be the deciding factor.

<added>I just checked a number of recently added pages (last several days), and the descriptions show fine---using the preferred /> meta/closing XHTML format.</added>

<added title="second addendum">I'm looking at one page now.. Google indexed it yesterday. The entire meta description is list on the results page.</added>

Oaf357

2:26 am on Jun 17, 2003 (gmt 0)

10+ Year Member



Hmm...

I know I for one would like to see the other major SEs produce the same results with />

jackson

4:19 am on Jun 17, 2003 (gmt 0)

10+ Year Member



papabaer,

As an example, the text in one of my page descriptions looks like this:

"xxx - smithfields meat market, the story"

The "xxx" part is my name. I don't think that could be any more simple. As mentioned before, ATW has picked up this description. As for the other SE's - they have that NS4 degrade message.

Oaf357 and papabaer,

As mentioned before, the item that blew my cool was using Sim Spider. Nothing comes up when using ..." />. It works when using ..."></meta>. To me, that's been the deciding factor.

The jury is out. Have made the modifications. Let's give it a week or so and let's see what comes back. Either way, I'll report back - be it to this thread or in a new one.

waldemar

4:46 am on Jun 17, 2003 (gmt 0)

10+ Year Member



It appears that the correct syntax to use is thus:
<META NAME="description" Content="some description"></meta>

Before I was using:
<META NAME="description" Content="some description" />

Maybe there's a problem... tags must be lowercase in xhtml.

(Since when are meta tags so much back in fashion again? I guess, putting that description in your regular content body would improve your search engine results...)

Oaf357

6:39 pm on Jun 17, 2003 (gmt 0)

10+ Year Member



Actually, I tested the meta tag theory.

When I first put my site up I wasn't using them (description and keyword). Then I added description and got a nice improvement in rankings. Then I added keywords and not only the nice improvement in rankings but a nice jump in traffic too.

I'd say that meta tags are still in fashion when properly used. In my case, no more than 150 characters (including spaces and punctuation) in descriptions and no more than twenty words in keywords tag. That seems to be the most effective method I've come up with after asking around, researching, and testing.

jackson

12:02 am on Jun 18, 2003 (gmt 0)

10+ Year Member



waldemar and oaf357

I'm also of the opinion that meta tags matter. ATW is currently using the meta description off each page indexed whereas most of the other SE's are not.

What started off this thread is that google, lycos and the rest all seemed to have bypassed the meta tags and, instead, are using text items that have little if anything to do with what each of those listed page are about.

What I have omitted to mention thus far is that all the pages listed by the other SE's show up with this "extraneous text". The worst case being in Lycos where they show a sequence of at least a half dozen pages all showing "This message only appears if style sheets ..." are part thereof. This is my NS4 degrade message. Clearly, this is not what was intended. This message has nothing to do with what each of those pages are about.

Hence the suggestion that, if this is the first line of text these SE's are picking up off each page, then best add in the meta tag description as that first line instead and in the same manner as the NS4 degrade message. This text would be "invisble" in current browsers and would only be visible in 3rd and 4th generation browsers - which is no big deal. In fact, this may enhance each page's usability factor. Just a back handed idea.

grahamstewart

12:06 am on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Couldn't you just move your NS4 degrade message further down the page?

Or better still do some browser sniffing on the server side and only add the degrade message if it is required?

pageoneresults

12:07 am on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"This message only appears if style sheets ..." are part thereof. This is my NS4 degrade message. Clearly, this is not what was intended. This message has nothing to do with what each of those pages are about.

Care to show us a snippet of that particular code? They should not be indexing that content unless something is wrong somewhere. I'm going to guess if that is showing as your description, that the rest of the page is not getting indexed.

<added>Hey there grahamstewart, we must be on the same time schedules. ;)

grahamstewart

12:15 am on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm guessing here but I reckon he has his "degrading messaging" (LOL) in a named div. And then he uses @import to include a stylesheet that set display:none on this div.

NS4 and various other dodgy browsers can't handle @import so they get to see the message. Of course, spiders don't use CSS either - so they also see the message and index it as the first (and therefore most important) thing on his page.

(pageone: its 9:45am here in Oz, must be later than that in California I guess).

g1smd

1:41 am on Jun 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If sim spider seemingly requires </meta> at the end of the tag, then I would regard that as a bug with their software and not a valid reason to change the code on your site.

You need lower case on all tags and attributes to have valid XHTML code:

<meta name="description" Content="some description" />

Check using [validator.w3.org...] that there isn't some other reason for the observed phenomenom.

Oaf357

1:50 am on Jun 18, 2003 (gmt 0)

10+ Year Member



If sim spider seemingly requires </meta> at the end of the tag, then I would regard that as a bug with their software and not a valid reason to change the code on your site.

Actually. It wasn't JUST a bug. It may be now though.

This 37 message thread spans 2 pages: 37