Forum Moderators: open
Clearly This is the main algorithm used by google,
and any person doing any kind of search engineering needs
to fully understand it.
Now, are there some tutorials and clear notes out there?
I don't mean the original papers by the google founders,
I'm looking for 100% clear tools, visual examples that a non-technical person could understand quickly.
Any one out there who knows 100% how page rank in google in calculated can provide some links/notes?
There just seems to be too much speculation and guesswork right now.
Even a "basic" example would do - clearly no-one except people working in google will know the complete page rank algorithm! I've read the faqs, there is no simple "dummies" list of steps we can take - can we all help each other to build them? A clear flow chart - see my step 1 below as a starting point.
[edited by: The_Subtle_Knife at 11:20 pm (utc) on Feb. 23, 2003]
Hope that helps your quest. .
I realized that the original paper was misleading in the way that the calculations were done. It uses its own form of PageRank, which the author calls "mini-rank". Mini-rank changes Google's PageRank equation for no apparent reason, making the results of the calculations very misleading.
Again the whole point of this thread is to come up
with a set of agreed rules and exioms to explain
google. According to the google FAQ, they
use 100 factors. Can we start building rules that we know
to be correct - that's what I'm trying to do. Not merely understand Page Rank.
We need to desparately agree on a common language to create
a set of axioms, when can be proven with real examples (theory is useless unless it is shown how it is applied)
So far I have not seen here or anywhere any set of axioms or rules for google. It's all just wish wash, and no examples to prove that it what it's doing.
If it all we need to know is create incoming link.. well OK what type? What are the rules.
I know from previous threads, the google actively ban link farms etc..
There must be a strategy that works that we can all agree upon that works.
Otherwise we have learnt nothing.
Can we come up with and agree a set of variable names,
and case examples using real data? Is anyone prepared
to truely help to make a difference?
Feed back appreciated. The theory is useless without real examples!
This is the purpose of this thread to create a set
of variables that we all agree upon, and a rough equation to the Page Rank Algorithm
The PageRank Algo is already known. There is another factor in the algo that Calum has gone into before.
There are only a few people on here that know how to run a controlled expermient. Very few people on here do their own research. The vast majority doesn't even bother reading official google papers. I am not aware of anyone who has shared their research like I have.
So far I have not seen here or anywhere any set of axioms or rules for google. It's all just wish wash, and no examples to prove that it what it's doing.
My page has 19 different gradiations of PageRank with EXAMPLES. Exactly how many more examples do you want? I show an example for each possibility there is. My paper gives enough information to be falsifiable.
You aren't going to be able to come up with a set of rules using ATW and how far indented down the page something is.
As Chris points out, PageRank is well known. Google's founders' descriptions still apply; the tweaks have been minor. BigDave's link is in my opinion the simplest accurate explanation of the process. Progress from there to "The Anatomy of a Large-Scale Hypertextual Web Search Engine" before the interesting but slightly less easy to follow "The PageRank Citation Ranking: Bringing Order to the Web".
Of course not everyone needs to understand the process. The simple answer for people who want to improve their PageRank is this; Get links from high PageRank pages that have few links and are able to pass PageRank on.
The last part ("and are able to pass PageRank on") is only to cover the case of some very rare penalties. The middle part ("that have few links"), is an important page of how PageRank works but in most cases has less effect than the first part; "Get links from high PageRank pages".
Caveat: PageRank is just one component of Google's ranking methods; it is not helpful to consider it to be the 'main algorithm'. Without link text and on page factors it doesn't help your rankings at all.
There are only a few people on here that know how to run a controlled expermient.
While I do know how to run a controlled experiment, for many factors in the algo (not the pagerank algo) there really is no need to run a controlled experiment. PageRank is the major exception to this.
Some of the obvious ones like title and anchor text are fairly easy to deduce from anecdotal evidence. The same with things like having the actual keyphrase instead of just the different keywords scattered around the page.
I suppose that you could so some page size and keyword density experiments in a controlled fashion.
The one area that I find most intellectually interesting, but I have no intention of testing, is the situations where Google might block PR from getting passed. One example that has been mentioned has been guestbook signing. I would never test it because spamming someone's guestbook is just plain rude.
When I do finish with my experiments with PR, which are not as thorough as yours, I plan to make the results public.
<added>BTW, thank you for making your work available to us.</added>
There are only a few people on here that know how to run a controlled expermient. Very few people on here do their own research. The vast majority doesn't even bother reading official google papers. I am not aware of anyone who has shared their research like I have.
Surely we can agree on a set of statements?
statement 1. googles does this when this..
example.
Probability: x% votes: x/yy
statement 2. google does bla bla
example.
Probability: x% votes: x/yy
I really can't see why this is so extremely difficult to agree to set up.
The last time I wrote this Rule, I still got objections,
I mean can we agree on anything in this forum?
But I'll try again.
Rule 1:
google will not show any results when using the link: function if your Page Rank is less then 4.
Can we just write down all the factors, with examples,
and build up a common knowledge base? In the process
building up a language/variables/terminolgy..
I really can't see why people won't contribute and create this. A definitive list of google knowledge simply explained
in a <"snip"> kind of style.
Having to wade through thousands of forums, sites, papers,
with no upto date examples and still having no concrete idea how it all really works is frankly really
beginning to piss me off. Are we search engineers or just a bunch of clowns?
Are there any real scientists here, or hackers with reverse engineering knowledge. I can probably expect more replies
completely missing the point, going off the beaten track - and still not even a start on a list of axioms,rules,methods
to the behaviour of google.
Google wasn't written in ad-hoc wishy washy way, so why
should be work on such non-scientific ways?
[edited by: Marcia at 8:56 pm (utc) on Feb. 24, 2003]
[edit reason] copyright/trademark issue [/edit]
1. A document that's intended for public distribution cannot use either a name or a recognizable derivative that's already a registered trademark, with extensive copyright protection, without express commissioning from the publishers who own the names.
2. If there's to be a definitive, basic guide to how Page Rank is calculated developed that will be published in a document of some sort, I'm wondering where such a document will be published. Of course, appropriate citations have to be included, and I do imagine there would be difficulty in obtaining legal permissions from people with identities like Sly_Old_Dog. Publishers like O'Reilly might have a problem with issues like that.
3. Trade Secrets
Are there any real scientists here, or hackers with reverse engineering knowledge.
4. Knowledge is on a need_to_know basis, and people who know how to get Page Rank to work for them advantageously don't need to know the mathematical intricacies.
5. To exlpain the intricacies of Page Rank at a basic level it would first be necessary to explain simply and clearly what logarithms are and how they work, since PR is on a log scale rather than integers.
So that's really where it needs to begin - expaining logarithms - for those who want more in depth knowledge than just using PR effectively to achieve high rankings, which is basically what most of us are interested in.
The PageRank scale is logarithmic, with a base of 5, 6 or 20. The actual fact of what the base # is, only Google knows. People here who run tests have speculated that the logarithm base is 5, 6, or maybe even as high as 20.
For those of you who don't like math, what you need to know is this: a PR5 page is at least 5-6 times more powerful than a PR4 page.
As ciml said above, the simplest rule is get links from as high of a PR page as possible with as few links as possible.
In practice here's what I do. If a page offering a link swap is PR0, PR1, or PR2, I don't bother with it. If a page is PR3, I will only do the link swap if it is really easy for me to do. My attitude on these is that they will help a teeny bit on PR, I'll get another text link with my keyword in it. And it may become a PR4-PR6 page in the future. PR4-PR6 pages are what I look for to swap with (sure I look for 7 and 8s to, but I haven't found any in my industry).
Now here's how knowing that a PR5 page is at least 5-6 times more powerful than a PR4 page helps. If a PR5 page has 100 links going out of it, then each link will pass on about the same amount of PageRank as a PR4 page with 20 links going out of it (5 times 20 equals 100).
I find this a useful guide for evaluating link swaps. I have found PR4 pages with over 100 links on them, and I have found PR6 pages with 75 links on them. Links from the PR6 page are worth about 33 links from the PR4 page (75 divide by 5, divide by 5 again, equals 3. 100 divided by 3 equals 33).
You can't completely get away from math if you want to understand this. Also, read the papers linked above in this thread. That and WebmasterWorld discussions are how I learned PageRank.
Also none of this is fact. Nothing you read here at WebmasterWorld is fact. And nothing you read in the original Google papers is fact. The only facts are known to Google staff, and those facts they keep secret. We can only make assumptions based on observation.
Having to wade through thousands of forums, sites, papers, with no upto date examples and still having no concrete idea how it all really works is frankly really
beginning to piss me off.
I'm not really sure why you think this information should all be available to you on a silver platter. If you don't understand how Page Rank is calculated, isn't it up to *you* to do the research on it? I'm feeling pretty "concrete" about how it works, although it's ever-changing, so that's why anyone who wants the knowledge needs to work for it. If you know how it works today, that doesn't mean you'll know how it works tomorrow.
In addition, with the attitude you throw at those who are actually trying to help you, I'm not sure that you deserve to know. Being polite really does go a long way. Try it.
couple of questions, you mention not even bothering with sites that have a page rank of 0,1,2 and that 3's contribute only a tiny bit of PR
Is this specific to your sites current PR or overall for all sites?
If you were starting a brand new site would you have a different opinion?
Obvisouly you would still target the 4's and higher but would you still ignore 1's and 2's?
My limited research of my industry indicates I need a PR 5-6 and an optimized page etc to be highly competitive, and I'm wondering how much work I've got ahead of me.
How much I can derive from creating content pages that link back to the home page? How much I need to get from other sites? Also I found one site that offers a sponsor listing, they have a homepage PR7 and content page PR6.
They include the sponsor link on all pages, is getting 1 site with 100 PR6 pages linking to me the same as going and exchanging links with 100 PR6 sites?
2. If there's to be a definitive, basic guide to how Page Rank is calculated developed that will be published in a document of some sort, I'm wondering where such a document will be published. Of course, appropriate citations have to be included, and I do imagine there would be difficulty in obtaining legal permissions from people with identities like Sly_Old_Dog. Publishers like O'Reilly might have a problem with issues like that.
I beg your pardon Marcia? :)
Are you suggesting I might change my handle to John Smith if I want to be taken seriously?
Marcia:
> rather than publicly disclose their knowledge in full and figuratively speaking, "give away the farm".
I think that's very true. Anyone who reads here long enough will come across many pieces of the puzzle, but those pieces are not likely to end up together in one thread. The real trick is to identify those nuggets.
Having to wade through thousands of forums, sites, papers,
Subtle Knife
IMHO you don't need a set of hard and fast rules to be able to understand google. In order to protect their credibility against a no-doubt ongoing barage of SEO tricks, Google must keep their methods secret.
If you understand the principle of PR, then I think you'd be set..
Having tried to explain PR to non-tech people before now, it's occured to me that PR is not some unique qoogle specific phenominan at all; just a mathematical application of a system present in much of our lives..
The example I tend to use is that of celebrity or notoriety (can one of the moderators correct my spelling please;) )
Take a famous person - Bruce Willis for (random) example. Everybody - in this country at least - knows who he is. Therefore, however unimportant we may all be, we are all voting for his fame (a lot of links from low PR pages if you will). He could be seen as having a high PR. If Bruce goes and marries some girl nobody has ever heard of, she will, by association become famous. (This is a link from a high PR page). In the same vein if he stars in a film, that film would get high fame. (PR)
This sort of voting (linking) happens throughout our society. Another example would be a single company (a website). The CEO would be the root page (/index.html). Everyone in the company knows who he is and he is likely therefore to have the highest PR. His board of directors (the top directories in the site) are likely to be next. The site then departmentalises until you get down to the normal gunt workers (the pages hidden deep in the site) who all have pretty low PR...
If your CEO wants to promote the work of a particular department (the people who make the blue widgets) he might start telling people about it. That department ('/research/development/widgets/blue.html') would end up with a PR much higher than it's peers. (Equivalent to a link being placed on the root page that links to the blue widgets page)
If it's a massive company it's CEO (/index.html) would have a high PR even if very few external people discuss them (link to them). In the same way as Google(.com) is not a particularily large company but lots of (even little) people discuss them, they become famous / high PR'ed
The thing is, PR is ONLY important within your industry (keyword area). Google has millions of times more PR than our site, but if I search for our company by name we're first; not google.com .. In the same way as if you asked someone to name a famous sportsman they wouldn't answer "Bruce Willis"!..
Is this making any sense to anyone? IMHO the principals of Googel/PR are such that you don't neccasarily need to understand the maths (though it helps).. just the idea of balancing the fame of pages?
Cheerio :)
I'm not really sure why you think this information should all be available to you on a silver platter.
Why shouldn't it? Why can't we create a list of rules and axioms that is constantly peer reviewed and updated.
why is no-one being helpful to create this?
The new google patent on re-ranking clearly show that we should be doing this as a think tank. So far we are no further forward.
There should be something called " How google works for idiots/non-geeks"
Not for novice, or beginners, for non search engine centric people, which clearly describes the behaviour of google, not just the page ranking part.
new google re-ranking patent:
[patft.uspto.gov...]
> you mention not even bothering with sites that have a page rank of 0,1,2 and that 3's contribute only a tiny bit of PR. Is this specific to your sites current PR or overall for all sites?
IMO, this is a good general rule for anyone who has any competition for keywords. If your keywords are competitive at all, then you'll want to have at at least a PR4 home page to begin to be competitive (more often you need a PR5 or PR6 as you stated). If you were going after low competition keywords, you could get well ranked for those with a PR3 (maybe even a PR2). I do have some deep content article pages in my site that are PR3, which get top ten rankings for low-traffic, low-competition, highly targeted keywords.
> If you were starting a brand new site would you have a different opinion?
No. I recently helped a client plan and build a new site. I got them 2 text links from two separate PR4 free directory pages (each page had less than 30 links). This site's home page came into Google as a PR4. It ranked in the top 10 for a competitive high traffic keyword on its second Google update.
> Obviously you would still target the 4's and higher but would you still ignore 1's and 2's?
Yes. It takes as much time to request a link exchange from a PR1 or PR2 site as it does a PR4 or PR5 site.
> My limited research of my industry indicates I need a PR 5-6 and an optimized page etc to be highly competitive, and I'm wondering how much work I've got ahead of me.
Probably not as much as you think. A few PR4 and PR5 link swaps should get you to a PR4 or better score. Check into the PR value of getting a Yahoo listing for $299/year. Some Yahoo categories can pass on good PR. Definitely get listed in ODP. It can be hard (its taken me two years), but its worth it. Look for free directories to get listed as well.
> How much I can derive from creating content pages that link back to the home page? How much I need to get from other sites?
Expect to acquire most of your PageRank externally. Building your own pages certainly helps, but how much I have never tried to quantify (I couldn't give a rule of thumb where for every 100 internal pages constructed, one could expect to see x% increase in PageRank).
> is getting 1 site with 100 PR6 pages linking to me the same as going and exchanging links with 100 PR6 sites?
Yes. However there are a couple of things to check. Look at the number of links off of each page. If each page has a thousand links on it, then its no good. If the number of links per page are reasonable, I would consider the sponsor listing if the price was right.
The Subtle Knife:
> why is no-one being helpful to create this?
They are, every day. You just need to read the posts. :-)
not so subtle .... you are absolutely UNBELIEVABLE
You think you just have a RIGHT to these peoples years of experience?
Again You are UNBELIEVABLE!
I've read the responses that you have received and the threads you've been pointed to and WITH NO WORK you have gotten so much information and yet you still want more,
How about this einstein, maybe the people who know or at least have a pretty darn good idea about the mechanics of getting a good ranking page (AT THE #1 SE)... have nothing to gain except competition by participating in your little experiment
UNBELIEVABLE!
I would like to express my thanks to Brett for WW and to all of the people who contribute ... Thanks from someone who appreciates it!
You're not likely to get many people taking time to answer your question, because to many of us, the question is rather pointless.
And don't think we're just lazy or underedjumacated.. There are many scientificitos here, comp sci grads, psych degrees, mathematicians - one thing we all share is a desire to make money. Calculating PR wont make us any money...
(the part of your brain that jumped at that and said HEY WAIT? ignore it, understanding and applying PR knowledge can make you money yes, but calculating it, no. unless you plan on trying to sell a us an app or service.. pls don't)
I know a lot of engineers who've gotten seriously jonesed in the past trying to tackle google as a black box... all I can do is laugh... i mean really, c'mon, of all the things in the world to try and look at as a black box... they spend hours upon hours, for nothing more than what you've already gleaned from these forums.
So, many engineering brains disagree with me, but in the end, you getting angry that someone isn't laying out research for the world to see, tackling it as though it were the genome project... simply indicates that you don't make money for yourself from web results (if you did you'd be busy trying to understand and apply page rank in the real world [webmasterworld.com], right?) unless of course, you're trying to curcumvent the work we all do, find a secret formula, and cloak the world... sigh... the worst laid plans.
The members here who understand as much as they will ever need to know about PR are busy appropriating their time. Word of advice, do the same.
but this is simply an opinion piece.. you can spend your time however you wish, we don't mind the lack of active competition ;)
I stopped going because I was annoyed that they changed their name for that, and it was now too crowded and the quality dropped. But they didn't miss my business. They were packed every night for years after that.
Fiver said:
I know a lot of engineers who've gotten seriously jonesed in the past trying to tackle google as a black box... all I can do is laugh... i mean really, c'mon, of all the things in the world to try and look at as a black box... they spend hours upon hours, for nothing more than what you've already gleaned from these forums.
As an engineer, I had to figure out how PR worked out of curiosity. I am also interested in how Google applies all the various factors.
As a webmaster, I believe in getting good content which allows me to get links. Links and content generates makes Google happy. I prefer to use SEO to make things better for my user, not google. It just so happens that making things better for google and the user at the same time is a great way to do things. The calculations don't help much here. Get content, get links, make easy navigation.
But you are definitely right, most engineers, don't know when to step back and stop calculating.
> As a webmaster, I believe in getting good content which allows me to get links.
BigDave,
What a great way of putting it!
Of course a bit more understanding of PR does allow you to distribute it across your site the way you want it to be distributed. But by and large, beg for links and provide the content to back up your begging :)
You were saying a page is not likely to be pagerank 6 without incoming links, but these are your own words from another thread:
First off, going back to the theory that any page contributes PR, and the domain from which it originates (yours or someone else's) means nothing..If I create a 1000-page website, then it's probably going to have a structure such as this:
Home page -> 10 Sub-Pages -> Each to 10 Sub-Pages -> Each to 10 Sub-Pages.. or something of that sort.
As we know, the toolbar PR is different than the actual PR (the "actual PR" being a number that Google doesn't tell us), but the toolbar is an indicator of actual PR with a logrithmic scale of 6-8 that may look something like this:
Toolbar PR -> Actual PR
0 -> 0
1 -> 1
2 -> 8
3 -> 64
4 -> 512
5 -> 4096
6 -> 32768
7 -> 262144
8 -> 2097152
9 -> 16777216
10 -> 134217728These conversions are speculation on my part, but they are of that form.
So my brand new site full of 1000 brand new pages (each page having a theoretical actual PR of 1 prior to indexing) means that my site has a total PR of 1000 which it will pass around based on the internal and external linking.
I've seen a site come from nowhere to pagerank 6 in one update, and whilst you are right, there may have been lots of PR3s linking to it, it's highly unlikely that the site could have arranged that in the space of 1 month.