Matt Cutts clarified that in one of those many videos. WMT includes no-followed links for the webmaster's information only - but the link provides no link juice.
google will use nofollow links for discovery.
i published a bunch of content:
i then linked to page1 from my starting page with rel=nofollow. page1 linked to page2 and so on. I also tested different variations of having pages in between with follow and then nofollow again.
at most I got google to crawl two successant nofollow links and find the pages.
they were crawled but not indexed (probably as zero PR)
oddly, in a test where link to page1 was nofollowed and link to page2 (from page1) was followed page2 enjoyed a brief stay in the index.
i couldn't really see any practical applications so I never tested any further... and therefore have not repeated the tests with more variables.
Then no link juice and only to discover pages. Ok.
But we did not talk about anchor texts.
Are they counted in a backlink profile (even if there is no link juice)?
Google does not use nofollow links for discovery. They do use other methods for discovery.
Three years ago Loren Baker summarized a video by Matt Cutts about this on Search Engine Journal [searchenginejournal.com]:
|Nofollow as a link attribute causes Google to drop those links out of our link graph. If you have a nofollow link from page A to page B, we won't crawl via page A's link to discover page B. Note that we may still find page B via other links around the web, though. |
Some people have been confused because the links show up in WMT and in Google but that's because Google only shows a sample and not every link shown is used for ranking and not every link used for ranking a site is shown.
|Google does not use nofollow links for discovery. They do use other methods for discovery. |
hrm. maybe that was badly phrased. Google doesn't "use" nofollow links for discovery - but google will follow a nofollow link - for whatever reason.
there was absolutely no other way for google to find the pages, and yet the google bot hit them. i did not search for a page in the index until i saw it had been crawled. i never visited the page from my browser (even though I do not have the G' toolbar installed). there was no GA tracking code on the site.
I have played around with the idea of the content being scraped and republished, but searches for that (unique) keyword came up empty, except for the original linking page.
Does anyone have any thoughts on what I might have missed, i.e. any other ways google could have found the paegs?
Here's two ideas: someone visited the page and their browser has the Google Toolbar installed; or Google buying clickstream data from an ISP.
Was the URL ever communicated via ANY Google service at all? They certainly haver a boatload of them these days.
"Google buying clickstream data from an ISP"
I respect you a lot because I find your posts very useful and take your comments very seriously. Can you please explain this. Does google buy clickstream data from ISPs? And how do they use this data? To which extent it has an effect on search results?
I supposed they could have come up with something more confusing than the term "nofollow", but they would have had to work at it. But then again, maybe confusion is the point...
I can't say with 100% certainty that Google does buy this data - but it is definitely available on the marketplace. There's lots of information to be mined when you've got such traffic information - and most certainly URL discovery, the topic discussed here in this thread.
When people get their shorts in a knot about data privacy and target Google specifically, they are ignoring a much bigger loony mess around data privacy that involves all kinds of players in the entire Internet data stream, including ISPs and governments.
[edited by: tedster at 6:49 pm (utc) on Mar 11, 2010]
The only way, that I can see for google to have found the URLs is through the original page. The only way to get to the second page was from the first...
Both toolbar and ISP scenarios would have required a user click through all the pages that the crawler hit.
I have been able to confirm that google uses TB data to find URLs. I had an issue where google was indexing pages not even supposed to exist. The reason was users generating search result pages for products on the site that didn't exist. At the time (two or three years ago) we hadn't thought to block these pages or tag them as noindex since there weren't any links pointing to them. After those pages started getting SE traffic from the longtail we realised what was happening and added the noindex tag so that the pages we actually wanted in the index wouldn't compete with the 'dead' pages.
Google buying clickstream data is something I haven't looked into (i.e. tested). Mostly because I haven't been able to figure out a test where that would be the "only" way for G' to acquire the data, but I am very confident that they do purchase it.
I'll think about setting up my 'nofollow' test again, just to see if I can eliminate the toolbar scenario. A second test would also help verify the results of the first test - or disqualify them as a bug and or error if I can't reproduce the results.
I've been running a bunch of spiderability / indexing tests and have added this to them as it interested me - thanks for sharing Wendy.
Re: discovery via toolbar etc. During my tests I have visited pages that are linked to by non-spiderable means (but are still potentially indexable) using IE with toolbar installed and with Chrome - and none have been indexed.
I'm not saying it doesn't happen, I'm saying that it hasn't happened to me yet. So the "it couldn't possibly have been indexed via a nofollow path because Google say so" statements don't convince me.
I've been careful not to visit the new urls I've put up; nor have I told anyone about the experiment that could click on them and spoil the test.
If a search engine knows that an authority source restricts it's implemented links with no follows , it may also by implication in this context, use this an an endorsement of the authority that it is passing to the recipient site.
Link juice may not be " followed " through , but other ranking factors may be transferred. I can't believe that Google does not consider this contextual data. In fact , with the abuse [ in Google's eyes ] of paid linking , data like this must surely feed into Google's algorithmn as one of the many factors. Any thoughts ? Fact or fiction ?
I've never seen any evidence of this. To misquote Tom Cruise, "Show me the data!"
Power by association ? I've seen SERP results improve with terms surrounding links in a contextual form. Why would Google not consider this in a no follow form , or association .
I can only imagine this .
This is quite strange if Google eventually follows a nofollow link then the only reason that stands out is the link juice.
As of having a small percentage of no-follow links to look natural & wheel saying g hires statisticians..
Ive wondered before if maybe google divideds websites into two categories: clearly SEO'd & not (clearly) SEO'd..and have the non-SEO'd websites' pages rank better in comparison to actively SEO'd sites.
Pure speculation on my behalf, but Id be surprised if they havent ever considered that (if they're a bunch of statisticians ;)).
Any way to tell?
Oh and...Does using a few no-follows really make a link profile look more natural, or would coming off as an amateur webmaster (who doesnt know #*$! nofollow is :-)) maybe be better? staying under the radar, and all
|or would coming off as an amateur webmaster...maybe be better |
Wouldn't that be a sad state of affairs if there is anything to your suspicions -- "Here's what we recommend that you do (and if you do it, we'll punish you)". Like saying "You'd better not drive more than 55 on the expressway (and if you do that, we'll pull you over for slowing down traffic)". Let's hope that some of the seasoned veterans here will dispell the scary thought that looking like you don't know what you're doing is a GOOD thing. But then again, I've always thought that the so-called "over optimization penalty" is one of the dumbest ideas to ever come down the pike. Like telling a talented child that his essays are too well written, so his grades will have to be lowered.
I did some further research around the subject and thought I'd put these thoughts , myths and claims out for comment after what i read .
Relevance - although no follows pass zilch PR or link text benefit to the recipient site , there is a lot more than link juice involved. Google scores the relevance surrounding the link and associates this with the recipient site. The anchor text is still re inforcing the association of relevant and potentially authoratitve sites. Myth ? Anyone tested this to prove / disprove ?
No follows can create trust. If Google sees a link on Wikipedia , it pays attention to the association. Y/N ?
Natural linking profiles involve a lot of links that don't work. Mistyped URL's, dead links , " click here's " and no follows. Google counts the number of no follows into a link profile to see what balance exists in the overall scheme of things. Is the theory strong / weak / plausable ?
Having all do follows is therefore not natural. Y/N ?
As I said above, no one anywhere I've ead has shown any data on any of these ideas. The closest we have here is gn_wendy's anecdotal evidence -- which is not a true test, but rather a data point. I've seen similar happenings, but when I look I do see other explanations.
Matt Cutts has said "no way no how" about nofollowed links - and said it very clearly, and many times. Yes, sometimes his carefully chosen public words may hide this or that area (some dare call it FUD) -- but what would Google have to gain by hiding this kind of information sbout nofollow?
I regularly analyze backlink profiles in the 5-digit and 6-digit size. Nofollow is not always present to any significant degree, so I don't see anything unnatural about only dofolow linking.
I see nothing of substance in all that conjecture (and I don't just mean you, Whitey, I know that you're only reporting on what you read from others).
[edited by: tedster at 1:22 am (utc) on Apr 20, 2010]
@Reno: I know what I said was highly theoretical..I have no evidence whatsoever of them actually doing that.
Then again, I believe the SEs would not really care a ton if it's fair or not.
I only have this from hearsay - but Ive heard that websites who have adhered to all of google's guidelines have seen the bottom of the SERPs during an algo update more than once before....whereas sites that dont adhere to all of google's guidelines (wheel mentioned this in a paid links kind of thread here) often go unpunished for years.
|If Google sees a link on Wikipedia , it pays attention to the association. |
There are a lot of sites that scrape wikipedia content for their websites - including the links in those articles. I see it everyday when doing backlink analysis of some competitor websites. This is one reason nofollow links on wikipedia do bring some value - but not necessarily because of any link power from wikipedia.
I ran another ranking test through twitter and ended up coming to the same conclusion. There is little or no link power from links on twitter - but a lot of websites republish twitter streams.
|The closest we have here is gn_wendy's anecdotal evidence -- which is not a true test, but rather a data point. |
I agree. The findings are at best inconclusive. I stick to my assumption that G' will use nofollow links for discovery - or retain the URI found in the source code in some index - or that G' at least does something with it.
I find it very hard to believe that G' blatantly ignores the very existence of a link simply because it is tagged nofollow.