Matt Cutts recently made it clear that nofollow'ed links are completely dropped from Google's link graph [webmasterworld.com]. So I'd say those links are not counted in PR calculations.
I guess that would make them the better candidates for this project then. Oh well its a bunch of work but it will be for the best I believe.
Certainly adds a new dimension to internal link flow. It would be interesting to play with it a bit.
The only time I'd use nofollow on internal links is if I want to control PageRank to the target URL but I want the URL to stay in the main index.
For example, if I want to remove my TOS page from the index or prevent the page from eating up PageRank, I wouldn't use nofollow. It's easier to just use either robots.txt (to block PageRank) or META noindex (to remove the URL from the index).
|We have a site with plenty of static content, the links in the content occasionally link to an internal dynamic engine which I don't want parsed. The question is, which way would be better, to disallow Google through the links or the pages themselves? |
Sorry, but I have to ask this first question: If this has to do with basic site architecture and restricting content, why are you just concerned about Google? Google-specific instructions will leave that internal dynamic engine open to all other spiders.
I would not recommend the use of nofollow to restrict bots from any pages. It's an informal, non-standard directive initially meant to help with comment spam on blogs and other user-generated content sites. Its use for other purposes has been advocated only by one wild-eyed Google engineer ;-), and there's no guarantee those extended uses will continue to be recognized.
I'd say the best be is to future proof yourself and stick with the meta robots noindex directive.
|Google-specific instructions will leave that internal dynamic engine open to all other spiders. |
That's an excellent point and one that gets overlooked here daily.
|I would not recommend the use of nofollow to restrict bots from any pages. It's an informal, non-standard directive initially meant to help with comment spam on blogs and other user-generated content sites. |
I'll second that motion.
|And there's no guarantee those extended uses will continue to be recognized. |
I don't think so either. Personally I feel it was a temporary fix for a much bigger problem that Google is faced with. An algorithm that relies on links as one of its main factors.
|I'd say the best be is to future proof yourself and stick with the meta robots noindex directive. |
I'll second that motion too. But remember, its usually only the good bots that are going to abide by the protocols. If you've got bad bots scraping and indexing that stuff and regurgitating it elsewhere, I think that may become an issue. That's why many will just block all but known bots to minimize the damage that may be done by scrapers.
To answer some of your questions, the site itself has been around for a number of years now and traffic from Google makes up nearly 50% of the total traffic. MSN is a distant second with 8% and Y! comes in 3rd with 4.5%, so I prefer to focus on what affects the Goog first because I believe most of the potential in an upswing in traffic lies there.
Perhaps it would just be best to still use the nofollow (for PR diverting purposes like Halfdeck is saying) and yet still use the robots.txt exclusion for the booking engine.
For the record, the reason I want to exclude those pages is because they just don't offer any real content, plus the URLs contain session IDs and I don't want dupe content ever coming into the mix. We all know how that virus can spread.
Thanks for the input everyone!
|That's why many will just block all but known bots to minimize the damage that may be done by scrapers. |
How do you block an unknown robot?
|How do you block an unknown robot? |
|Block all but known bots to minimize the damage that may be done by scrapers. |
You allow the known bots and disallow the bad bots. Anything that is not known "typically" gets served something from the 400 range of Status Codes.
Do you mean block all users except known robots? The scrapers I see have "normal-looking" user agents. I would think they could come from any IP range too.
"So I'd say those links are not counted in PR calculations."
Matt said the opposite. He likened nofollow links to pages with noindex meta tags. Linking to noindex pages discards the pagerank. Ofcourse, nothing Google is doing on this topic makes any sense whatsoever, so making any assumptions today might be inaccurate tomorrow.
"Matt said the opposite. He likened nofollow links to pages with noindex meta tags. Linking to noindex pages discards the pagerank."
Right. According to Matt Cutts, when Googlebot fishes out links from a page, it ignores any links tagged with nofollow, like those links aren't even there.
No, again, the point is the opposite if the noindex comment is true. That is, the linked-to pages are assigned no pagerank, but the page with the links on it does send pagerank into oblivion. In other words, if there are ten links on a page, and two are no followed, the other eight links each send 1/10th pagerank to their linked to pages, while the other two links discard their pagerank, just as if the linked to page was noindexed.
No steve. If there are 10 links on a page and 2 are nofollowed, then to Googlebot there are 8 links on that page. Google ignores links tagged with nofollow. Those links don't even get added to the link graph.
That is not what he said. That is the point. What you just said 180 degress opposite what he said.
He also didn't say they weren't added to the link graph, he said they were "dropped out of" the link graph, which again is a way of saying the PR is destroyed.
Additionally he also said:
"Plenty of other mechanisms would also work (e.g. a link through a page that is robot.txt'ed out)"
In this case, it has always been clear the PR is discarded. Google would initialy count twenty links on a page for PR purposes. If five of them are through robots.txt redirects that never deliver the PR, that still means the 15 links on the original page that pass PR all would pass 1/20th of the PR available.
He said the same thing in three ways, PR is wasted/lost. What is true in actual reality is different issue of course.
"That is not what he said."
That's almost exactly what he said; the only difference is he said nofollowed links are "dropped" from Google's link graph, while I'd say links are never added to the link graph in the first place.
"He likened nofollow links to pages with noindex meta tags."
No, he likened nofollow links to pages with META NOFOLLOW tags.
"for Google, nofollow'ed links are dropped out of our link graph; we don't even use such links for discovery. By the way, the nofollow meta tag does that same thing, but at a page level."
Implementing rel=nofollow handling is trivial. Just ignore links on a page tagged with nofollow, as if they didn't exist.
Implementing META nofollow handling is similar, except instead of ignoring specific links on a page, you ignore every single link on a page.
[edited by: Halfdeck at 12:46 pm (utc) on Sep. 7, 2007]
We've got a related report in another thread. At least for one person, nofollowed links in Yahoo Answers are showing up in their Webmaster Tools report:
Nofollow Links in Yahoo Answers Appear in Webmaster Console [webmasterworld.com]
To me, this sounds contrary to what Matt said.
Quite, and more muddying of the waters.
"To me, this sounds contrary to what Matt said."
I agree. So nofollow links do get stored somewhere. But I wouldn't assume those nofollowed links that show up for either link: command or inside Webmaster Tools are part of Google's link graph used for PageRank calculation or ranking. But I take back my assumption that nofollowed links are ignored outright.
[edited by: Halfdeck at 4:14 pm (utc) on Sep. 7, 2007]
I don't think you or anybody else was making an assumption, Halfdeck. Matt's "nofollow'ed links are dropped out of our link graph; we don't even use such links for discovery," appears to be pretty clear, don't think we can have interpreted it any other way. (Though it now looks like we have to come up with some new assumptions if what was said in that other thread holds up.)
"META NOFOLLOW tags"
Yes but its still the same thing.
Everything Matt said suggests PR that would have went to links with nofollow on them is dropped, lost, discarded, NOT that the PR is recalculated for less links.
Obviously they could do it either way, but Google would be doing a massive, unnatural reinventing of web if it allowed savvy webmasters to increase the pagerank of pages of their choice by nofollowing hrefs images or duplicates or unpopular pages, even if a random walk would show them to be as important as another page. That's not to say they haven't done it, but that isn't what Matt says they have done.
"Yes but its still the same thing."
Oh come on steve, META noindex, META nofollow, and META noindex,nofollow are three different directives.
"The noindex meta tag will keep a page from showing up in Google’s index at all...The nofollow meta tag will prevent Googlebot from following any outgoing links from a page."
Matt Cutts purposefully said META nofollow instead of META noindex.
Also, assuming a nofollowed link is dropped from the link graph prior to PageRank iteration, how would it be counted toward the total number of links on a page? I don't buy that.
If a nofollowed link stayed in the link graph but passed no PageRank, then I'd agree with you, but that's not what Matt said. And even if he said exactly what you're suggesting and reality reflected his words, I would probably recommend Googlers to rewrite their code.
"are three different directives"
That's true, but the point is the same... the PR that was sent to that page is LOST, not recycled somewhere else on a domain. You are still ignoring ("I don't buy that) what he said.
If they are "dropped" from the link graph then it should be pretty well assumed the PR is "dropped" from the PR web. Assuming otherwise is at the very least the more unlikely scenario.
Additionally you can't just ignore waht he DIDN'T say:
"Nofollow links are never added to the link graph."
"Nofollow links are completely ignored in all pagerank calculations as if they were not there at all."
At the very least his statement is crystal clear as to what he did not say.
"he said they were "dropped out of" the link graph, which again is a way of saying the PR is destroyed."
That's your interpretation.
PageRank calculation is done after the link graph is constructed. In other words, PageRanks aren't calculated during webcrawls. You obviously know this.
1. Google craws the web. From that Google can maintain a list of URLs and a set of links that conect those URLs, aka the link graph.
2. Nofollowed links are dropped from that link graph.
3. Google calculates PageRank.
Basically, you're asserting that a page with 100 links, 99 of which are nofollowed, will pass PageRank that's worth 1/100th of that page's PageRank.
That's not hard to test.
[edited by: Halfdeck at 1:03 am (utc) on Sep. 8, 2007]
It's not hard to test, and has been the status quo. You are making a large number of assumptions counter to what he said, and previous actions.
But of course it would be easy to test by putting up a page that links to 100 pages, then nofollow 99 and see if the remaining one rises in the serps significantly from where it started.
Also once again, Matt did not say this:
"2. Nofollowed links are dropped from that link graph.
3. Google calculates PageRank."
when of course he could have.
|More like people can use it for internal links if they're power-user-y enough to want to sculpt PageRank flow within their site at the link level... It's available if you want to get into that much fine-grained control. |
Why would Matt Cutts advise nofollowing internal links as a "power-user-y" PageRank sculpting tactic if he knew that every time you nofollow an internal link, PageRank gets sucked away into a black hole?
On top of that, Matt Cutts is clearly not saying Google counts nofollow links during PageRank iteration either.
You read his words one way, I read it another way - I'm just going to leave it at that.