|Yahoo! Introduces "Robots-Nocontent" Tag |
to allow focusing on the page content
[ysearchblog.com...] and [help.yahoo.com...]
|...webmasters can now mark parts of a page with a 'robots-nocontent' tag which will indicate to our crawler what parts of a page are unrelated to the main content and are only useful for visitors. We won't use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results. Note: Using a "nocontent" tag to mark explicit sections of content is not considered "cloaking" because all of the content on the page is available to protect the relevance of the results (unlike "cloaking" where we may be served content that is different from what visitors see). |
It is not clear though whether slurp will follow the links within class="robots-nocontent".
System: The following message was spliced on to this thread from: http://www.webmasterworld.com/yahoo_search/3329640.htm [webmasterworld.com] by engine - 11:57 am on May 3, 2007 (utc +1)
|When a "robots-nocontent" tag is used to mark a section of content on a web page, Yahoo! will not use the terms contained in that section as information for finding the page or for the abstract of that page in search results. Note: Using a "robots-nocontent" tag to mark explicit sections of content is not considered "cloaking" because all the content on the page is available to us (unlike "cloaking" where we may be served content that is different from what users see.) |
<p class="robots-nocontent">don't read this</p>
Yahoo Guidelines [help.yahoo.com]
Oh that's VERY interesting. I suspect that it will not be used much though and when it is, there will be a danger that if you use it to nocontent the menu structure, say, you could lose the link theming.
The number of people that misunderstand robots.txt syntax - more commands embedded within the page is starting to look for trouble! But well used, it could really help to focus a web page.
I am "scared"
in page embeded robot syntax...
if Yahoo does it right
how do the other SEs react to such syntax?
You would think they could simply come up with one tag for all engines. So now you tell Google where the content is and Yahoo where it isn't. Stupid.
I really don't like the fact that they are using a "<p>" tag... Not a very clean solution.
I like the Adsense comment tag idea much better...
I really don't like the fact that they are using a "<p>" tag... Not a very clean solution.
Semantically speaking, it's the correct tag to use. Judging by the description, the purpose of this new attribute is to "hide" text from yahoo when it isn't relevant to your page.
I don't know about you, but I mark up my paragraphs with the <p> tag.
On another note, I don't thing using it in ANY tags is a very clean solution. When I design pages, I tend to use as few id's and classes as possible.
Firstly, it's an attribute value, not a tag, despite what Yahoo are calling it. Secondly, as Yahoo are supposed to be Web-2.0-Savvy, where's the properly-defined microformat [microformats.org] for this invention? Yahoo have not chosen to use the robots-exclusion profile [microformats.org] (which has never got past the draft stage and is totally unsupported), although they have taken the same approach of defining a
class to ensure that the markup can validate.
|I really don't like the fact that they are using a "<p>" tag... |
You can use any element that accepts a
class attribute, not just a paragraph. The example markup given by Yahoo is formal on that.
Agreed that if well used it should help, but it might have been better if they'd added a "robots-content" value instead so you could hilite/wrap the relevant content - less work than excluding multiple sections, header, nav, footer, adverts etc..
I'm not sure I'm happy about robots using Markup/CSS class attributes, *with new values*, like this - imho it will only cause confusion as there are many who don't understand the concept of multiple classes and the CSS cascade. Multiple robot class names, if introduced for other purposes, re: microformats, along with rel=nofollow is simply adding to presentational markup which the proper usage of a CSS id/class is designed to take away.
Perhaps this should've been an addition to robots.txt instead, where you could put in something like:
where the robots would be applying their own/new values to *existing* classes - by reading their own file.
doesn't this defeat the whole point of search engines?
what content could be interesting to users that the search engine doesn't want to know about?
This is starting to get out of hand.
Hey Yahoo!, you can't just arbitrarily come up with an attribute that only you support (Google can, but not you). What about the other freakin' search engines? Huh?
It will go absolutely nowhere!
Just give us an element that is accepted by all search engines that can be used to prevent specific content from being indexed. You know, something like...
So, is this the new thing for search engines? See who can come out with the most elements and attributes to really muck up the Internet?
lol! Who was the person that thought of that one? You should have added another 10-15 characters and possibly another hyphen, no?
|We won't use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results. Note: Using a "nocontent" tag to mark explicit sections of content is not considered "cloaking" because all of the content on the page is available to protect the relevance of the results (unlike "cloaking" where we may be served content that is different from what visitors see). |
I've read the above multiple times from the Yahoo! Blog. For some reason, I just can't get that to make sense.
In the first part we have...
|We won't use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results. |
Which tells us that it won't be utilized for search relevancy. But then we have this second part...
|Note: Using a "nocontent" tag to mark explicit sections of content is not considered "cloaking" because all of the content on the page is available to protect the relevance of the results. |
Doesn't that seem a bit contradictory?
This is just so absurd. Here, mark your pages for Yahoo! and help us clean up our index. Forget Google, MSN, Ask, etc. We don't care about them.
Tell me, who was in that session that gave you this idea? What about existing protocols? What about the other search engines? And, what happens to the sites overall performance if you decide to exclude all of your navigation? How are you going to follow links? How are you going to determine relevancy based on anchor text?
Exactly what is the purpose of this tag other than trying to create some hubbub for the week of 2007 May 1 in the industry? What?
This is just what I wanted. Actually I just posted a couple weeks ago asking if such a thing existed.
Maybe not for the reasons other people want it -- I want to be able to exclude certain things from being searchable, like phone numbers of my clients (which have to be on all my pages). They don't like seeing their name come up when they search for their phone number.
I don't know why you would use it to block out your navigation... seems like that's Yahoo's job to figure out what your navigation is and weight it appropriately.
|I want to be able to exclude certain things from being searchable, like phone numbers of my clients (which have to be on all my pages). They don't like seeing their name come up when they search for their phone number. |
If its on the web, it can be found. What about Google, MSN, Ask and the others. It will be found there, won't it? And, at some point, a scraper is going to get it and send it right back to Yahoo! in a new page of regurgitated content that does not contain the robots-nocontent attribute.
The attribute has ZERO value to anyone other than Yahoo! It doesn't follow any existing protocol. It isn't being used by the other search engines. Its the same concept as building a site for IE only. ;)
|You can use the "class=robots-nocontent" attribute with all XHTML tags and thus have great flexibility on applying this to your site pages. |
Huh? XHTML only? Must there be an XHTML DOCTYPE present before this attribute is recognized?
<div class="robots-nocontent"> This is a section where ads are displayed on the page. Words that show up in ads may be entirely unrelated to the page contents</div>
I couln't hold this one as it almost made me drop of the chair.
Are they mocking AdSence Sites that poluted their index?
is there an index key that will match this entry in their index that is spelled <div class='human-nocontent'>?
-- crawler focusing on the main content --
this should worded as "crawler focusing on the main content that belongs on you site which in return will insure that particular source is not limited to the number of times it appears in the top ten"
Seems to me people have been asking for this for awhile to get around duplicate content issues.
It does feel like a step backward having to code differently for the search engines. We've come a long way since having to deal with proprietary browser codes, now we're facing proprietary search engine codes?
Bare minimum, the search engines should collaborate on standards or not do it at all.
Okay, now that I've calmed down a bit, let me explain my argument.
As MB mentions above, this should have been a collaborative effort. To institute a major change like this without first collaborating with your peers normally backfires.
While I feel its an excellent tool for the everyday Webmaster to have access to, it needs to work for everyone and not just Yahoo!
The big three were able to agree on the nofollow attribute, why not something of this nature? Could it be that Yahoo! wanted to be the leader this time and let Google, MSN, Ask and the others follow? Those are some lofty expectations, wouldn't you say so? ;)
So what gives? Why wasn't there a joint effort on this?
|This is just so absurd. Here, mark your pages for Yahoo! and help us clean up our index. Forget Google, MSN, Ask, etc. We don't care about them. |
At least they aren't calling out the snitch brigade ;)
This seems to be another sign that SE's are losing the battle. Wouldn't it be interesting to see human reviewed directories rise from the ashes to become the preferred method of finding quality content?
|Wouldn't it be interesting to see human reviewed directories rise from the ashes |
I've been wondering the same thing. I had high hopes that web 2.0 might be the next stage of web evolution, but it's too unstable. Relying heavily on web 2.0 leaves everything in a constant state of chaos. Directories are the opposite end of the spectrum. Boring but dependable.
Local search is where its going to be. The Internet is too big for any one resource to be all encompassing. More and more people will turn to a portal type environment and localize their results.
They do refer to this as an "attribute" in the second sentence of their description, but then decay into calling it a "tag" from there on out. (Should be defined as an "attribute value", anyway. The "attribute" is "class".)
My question is, why use the "class" attribute for this? Are they actually including tag attributes in their indexing functions/storage/algorithms? If so, what a waste.
If they really want to use an indicator within the definition of a tag, why not
<div [b]noyahoo[/b] ...> or something similar (if they don't want to use an actual nonsense tag like
<noyahoo>, as has been suggested here?)
Thanks, Yahoo, but please take it back to the drawing board.
this is becoming a joke. So I can have a Gandhi page to attract teenagers doign papers, only to lead them to pron pages--that just happened to be on the noncontent section.
Who said they didn't crawl the nocontent area?
They just won't rank you for it.
It may be ran through the filters first, just won't dilute your relevance once it's passed inspections. ( Or perhaps not, but that'd be silly. )
Perhaps... exactly because of its implications, Yahoo! wanted to limit its usage to the absolute necessary. ( Instead of allowing huge blind spots on pages, they allow the reduction of relevance you don't want to have. ) This is an "I don't want to come up for this / I don't want my relevance to be diluted with this" and not the wild west all over again.
Also, there's relatively no point in either this attribute, or any similar one for using with Google.( doesn't much care about on-page factors anyway... except for anchor text, but that would be there to get indexed and not ONLY for the users, am I right? Plain text is either present or not, there's no such thing as excessive. )
Over there you either include something to be able to come up for it, or just include it on a different page and optimize there in full. There's no such thing as "keyword density" for Google, so this really doesn't play a role. Unless you allow unmoderated user comments on your key indexed pages, in which case you're stone cold crazy.
This might be useful in Yahoo! for things you'd have been using before if they weren't considered cloaking. Like geo-targeted content. Also it's good to exclude user-added content.
As nofollow is for user-added links, nocontent is for user-added content. (?)
Why not... it'd play out just fine that Google uses nofollow for their link-a-holic algo, and Yahoo! with their ( bit high ) focus on on-page factors start using nocontent. While they don't really care about what non-standard attributes the other side rolls out.
Now, if only MSN would introduce the "no-subdomain-url" and "no-blogspot" attributes, everything would be nice and calm for a while.
|As MB mentions above, this should have been a collaborative effort. To institute a major change like this without first collaborating with your peers normally backfires. |
Yeah, things are getting out of hand, there needs to be a sitdown. There are only a couple or three folks who can pull it off, hopefully one or two will step up to the plate.
My personal reaction to this is that if any folks have enough noisy crap on a page to warrant using this in the first place, well, those are the folks I like as my competitors ;-).
Could be a concession on Yahoo's part that they can't figure out what's important on a page. Or it could be that this is a sign that CSS positioning has made page structure irrelevant.
Search engine algos are at their best when they become an expansion of human perception. This mucks up the whole thing. I lump it in there with the nofollow attribute for controlling PageRank distribution. As soon as you have that kind of disconnect between the algo and the user experience, you're on a slippery slope. Ecological disaster, IMO.
I don't know if I like it or not, but it seems funny everyone is blasting Yahoo! when we have a rel="nofollow" link attribute for PR.
Did Google ask MSN and Y! to make sure they were going to support the tag that directly impacts their Patented PageRank algo? Or, if Y! and M would support it for discounting paid advertisers links? (I don't see any of you ranting in the Google forum because of the increased page clutter.)
No, sorry, my bad, I forgot, it's Google's tag, they can do as they feel is right, but no one else can. Don't use it if you don't like it.
How ridiculous everyone gets their panties in a wad because Yahoo! gave people something some people asked for and didn't ask permission. How out of line did Y! get? What were those idiots thinking? Silly fools, don't they know they have to ask?
|Are they actually including tag attributes in their indexing ... storage... algorythms? |
You really think SEs 'strip' html it's the tags prior to storage? If so, where do those cached pages come from? You know, the ones with the formatting that look like the page on your website, but are served from the SE...
If it weren't ever 'out of date' I would buy the theory they are using something similar to php's fopen(), but since the cache and the actual page aren't always the same, me thinks they haves some copies of them tags on file somewhere...
Added: Maybe I should state "I don't see most ranting because of increased page clutter created by Google. Yes, there were some who were opposed, but that was to the principle of the tag (most recently re: paid advertisers), not the increased clutter, because it's too long. It could just as easily be rel="nf", etc."
Edited: A couple of times, for clarity, etc.
Added2: Sorry, was a little hot when it seemed like all anyone was doing was ranting about how silly Y! was for not making sure everyone could support the tag, when we have tags for other SEs not everyone supports.
Added3: I bet if we all didn't use any proprietary tags, they would stop creating proprietary tags, but that's probably not going to happen.
I probably shouldn't have clicked on the Y! link today. Sorry.
One last addition:
If they really stripped all html prior to processing and ranking, then having an H1 or H2 tag could not possibly matter, because they wouldn't know it's there.
They also wouldn't know what's inside that title tag, or the description, or... I wonder what methods they use to try to detect white text on a white background? The notion SEs strip all html or tags prior to processing and/or storage is illogical.
since when does yahoo pay attention to tags?
I've tried every incarnation of robots noidex, noodp and still can't get a site de-listed. yes, de-listed.