Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: open
joined:Oct 17, 2003
If I put those pages in the robots.txt file this would apparently keep robots away from them but if there are still plain text href links to garnish pages left on the page will it transfer some of pagerank to them anyway?
Those URLs may be listed in Google (just the URL; no title, snippet or cache) so you do give them PageRank. Because they're not fetched, they won't give you any back so you loose a little over all.
joined:Oct 17, 2003
(1-0,85)+(0,85*200/4) - 42,65
If there are 9 pages to transfer PR to the number will make
(1-0,85)+(0,85*200/9) - 19,03
I obviously want to give 42 than 19 to notional pages within the site
The question is HOW to shoo Gbot away from those pages - will just disallowing them in robots.txt do?
joined:Oct 17, 2003
What I'll do will be setting JS navigation to garnish pages and only put plain href's to them at site map.
This will most apparently take some PR only from sitemap not notionals.
joined:Oct 17, 2003
Yes, all links on a page count as links but when Google hasn't spidered the remote page, they are called dangling links, and dangling links are dropped from the PR calculations within the first few iterations and put back again a few iterations before the end. In that way, they have minimal effect on the resulting PRs of other pages. That's according to Brin and page's original document.
Google doesn't spider pages that the robots.txt file says it can't index. So they are the same as pages that they haven't even found yet but have links to. That's why I believe those 'Contact' type pages will be treated as danglings and, if they are, they won't suck up any PR - or a very minimal amount.
Case I: Dangling links are counted at each iteration.
After enough iterations, we expect PageA to converge to a steady PR. PageB converges on much less; we'll call it 1 less than PageA on the Toolbar. There was a reason for choosing 19 links.
Case II: Dangling links are not counted until near the end.
After enough iterations, we expect PageA to converge to a steady PR. This time, PageB converges on very slighly less; we'd would probably call it something like 1/30 less than PageA on the Toolbar (not that we get to see it of course). I think this is the where the 'dangling links don't suck PR' ideas came from, but there's a problem. When the link is put back, it should take only _one_ iteration for PageB to snap down to a low PR as in Case I. If pageB links to pageC links to PageD etc., then it will take a few iterations for the PR sucking to trickle down.
I haven't really been able to test this as it looks like there are quite a few iterations after the dangling links are put back, if they're taken away at all.
Remember that PageB doesn't link back? Even if it links to PageA, and only to PageA, we can add maybe three or four iterations I think.
document.write('<a href=\"http://' + n1 + n2 + '\">');</script>
Although it does try.
To test your experiment would require a pencil, some paper, and a fair amount of time - or a PR calculator that can be set up to remove and re-insert dangling links on various iterations. So....
I've never made any attempt to calculate the effect of a dangling link because B&P said at the start what happens to them:-
"...Because dangling links do not affect the ranking of any other page directly, we simply remove them from the system until all the PageRanks are calculated. After all the PageRanks are calculated they can be added back in without affecting things significantly."
I've always taken that to mean that dangling links have no effect on the PRs of other pages. But in another place they said that it would take only a few iterations for them all to be removed, meaning that they are removed during the first two or three(?) iterations and each of them is in the calculations for a short time, so there would be a small effect.
However, this doesn't mean that Google is using Case I for PR calculation.
By the way, the number of iterations strongly depends on the iteration scheme.
Much as though I would love to pretend I play with PR using pencil and paper, I tend to use a spreadsheet (or just staring at a blank wall if I'm feeling brave).
Doc, although I take your point about resolving the equations; in paractice the difference in results between case I and II is zero for the immediate neighbourhood (i.e. for URLS not too far in the link map from the dangling links). But if for example, you have a very deep site with dangling links on your home page, then you should see a large difference between case I and case II. On a very deep site of mine with dangling links near the top, the results seem to match case I.
To be honest, if I do not want Google to index such a page, I'd just add
<meta name="robots" content="noindex,follow">
in the header.
Maybe, there's a small PR loss for the other pages - but then, in the time it takes to sort out the alternative link options, I could acquire a good, relevant link that makes up for this loss - and adds value for my visitors.
the question about the difference in PR between case I and II is quite complicate. It strongly depends on the iteration scheme as well as the number of iterations which are performed to compute the PR of the dangling pages (in case II). Consider, for example, a chain of pages (X1, X2, X3, ...), where the first page is linked to the second pages which is link to the third page and so on. The last page is a dead end. In case II, all these pages have to be taken out of the calculation. Thus, it takes n iterations where the dangling pages are included (in case II) until pages Xn get a non-zero PR if the simple Jacobi iteration is used. (I never had any problems whith such chains of pages. Thus I would conclude that Google is either using a different iteration schemes or computes PR according to case I. I would guess they are doing both.)
Also, the difference between case I and II depends on the question if PR of the non-dangling pages is fixed during the final PR computation or not. The first case is much faster, but less accurate.
Of course, for a global view the difference betwenn case I and II might be not important. However, for the own page/site the difference can be significant. Also, pages can be even affected if they are not in the neighbourhood of dangling pages.
The reason that Kamvar et. al. still remove dangling pages is that they still consider the PR calculation as the determination of eigen vectors. This requires a non zero determinante for the transition matrix, i.e. pages which have at least one outgoing links. They claim that this these technique is accelerating the compatation. However, there are well-known algorithms for sparse matrices which are faster.