| This 82 message thread spans 3 pages: < < 82 ( 1 2  ) || |
|Google Algorithm - What are the 200 Variables?|
At PubCon, Matt Cutts mentioned that there were over 200 variables in the Google Algorithm.
I thought I’d start a list...
- Age of Domain
- History of domain
- KWs in domain name
- Sub domain or root domain?
- TLD of Domain
- IP address of domain
- Location of IP address / Server
- HTML structure
- Use of Headers tags
- URL path
- Use of external CSS / JS files
- Keyword density of page
- Keyword in Title Tag
- Keyword in Meta Description (Not Meta Keywords)
- Keyword in KW in header tags (H1, H2 etc)
- Keyword in body text
- Freshness of Content
Per Inbound Link
- Quality of website linking in
- Quality of web page linking in
- Age of website
- Age of web page
- Relevancy of page’s content
- Location of link (Footer, Navigation, Body text)
- Anchor text if link
- Title attribute of link
- Alt tag of images linking
- Country specific TLD domain
- Authority TLD (.edu, .gov)
- Location of server
- Authority Link (CNN, BBC, etc)
Cluster of Links
- Uniqueness of Class C address.
Internal Cross Linking
- No of internal links to page
- Location of link on page
- Anchor text of FIRST text link (Bruce Clay’s point at PubCon)
- Over Optimisation
- Purchasing Links
- Selling Links
- Comment Spamming
- Hidden Text
- Duplicate Content
- Keyword stuffing
- Manual penalties
- Sandbox effect (Probably the same as age of domain)
- No Follow Links
- Performance / Load of a website
- Speed of JS
- XML Sitemap (Aids the crawler but doesn’t help rankings)
- PageRank (General Indicator of page’s performance)
Welcome to WebmasterWorld!
The human editorial portion of the rankings really doesn't have anything to do with PageRank. PageRank is about links and the human editorial input is basically a + or - regarding the general usefulness of a result in a set of results.
I knew this was going on (and I'm actually glad, because IMO a hand review could easily benefit me / my sites) but the discussion we were having RE human input is if it's actually part of the ranking algo directly or another dimension of the overall results with a less immediate impact than the underlying mechanism, and it appears it might have a more immediate impact than I originally thought.
As far as Matt and his button, his actually says 'Spam', while Amit Singhal has the 'Like' or 'Not Spam' button in his office... Personally, I believe they have button wars which are ultimately settled over a game of asteroids, but this is my opinion only and might be a slightly controversial view of how rankings are actually determined by Google. Most of the other people I've talked to think they play interoffice, team Battleship to keep the SERPs more fair and balanced...
PageRank is all about popularity among peers; other pages (specifically, other URLs). In its original form every URL was given one voting point, and the web was given a nominal upper limit.
The one voting point was split among all the links (whether on-domain or off-domain). The voting point was retained, and to it was added all the other part-points from all inbound links. Each page now had 1+X voting points.
The process was repeated, but now some pages had more (possibly much more) voting power than others.
More iterations occured, until the sum of all the "points" on all the "pages" totalled the pre-defined limit.
Some dampening occurs, with less voting power arriving from a link than left the voting page. Indeed, the dampening and the upper limit are mathematically connected. Internal links are dampened more than external links.
This is not quite an accurate depiction of PageRank, but it is sufficient for my point, which is
PageRank does not, will not, never has, never will have anything to do with the content of a page
It does not speak of
- Correctness/Truth ("are we getting closer to getting the 200 factors")
- Authority ("is this subject something the site is renowned for")
- Trust ("is this site a penalty magnet")
- Content value of any kind
- Markup structure in any way
- Load Speed
- Server latency
- Page size in KB
- Keyword Density
- Anchor Text
- ANYTHING OTHER THAN A NUMERICAL REPRESENTATION OF HOW LIKELY A RANDOM SURFER IS TO LAND ON A GIVEN PAGE
Many, many other factors now piggy back on the PR mechanism (seed proximity, semantics, good/bad neighbourhood), but is not PR. And I'm talking actual PR, not the crappy Toolbar kind.
Apologies for the slight OT, but the Page Rank Vs Ranking Factor confusion has been made explicitly at least twice, and I dare say many more times by those who read without posting, or post without commenting on it
I'm surprised that no one has picked up on what I said earlier in this thread. As a guess, of the 200 factors there are probably 20 or 30 that you are both able to affect and are worth the effort. The other 170-180 are either naturally occurring things like age of page that you can't do anything about or they have so little effect that they are not worth trying to do anything about.
I bet a brewery that analyses it's beer measures 30 or 40 factors. Bacterial count, carbon dioxide content, age, tannins etc etc. What the customer cares about is that it looks nice with a nice head, tastes good and makes you pi$$ed after a few pints. It is similar with Google. My advice to all you desperate website owners is concentrate on 20 things that you can do something about and do them well. There are loads of free tools out there to help you.
Good advice, Sid. I see people obsessing about relatively small factors that someone somewhere mentioned (c-blocks and kw density immediately come to mind) but ignoring some really major factors. Like their navigation and architecture, or having 800 links in the page template.
|there are probably 20 or 30 that you are both able to affect and are worth the effort. |
Absolutely, spot on, I just checked some of my #1 ranked pages and there 20-30 key elements I focus upon for every page and then let Google takes its course:-)
Of course, as has been started in another thread, the things one should not do as well!
Shaddows - That was a great summary. In my original post, I created the misconceptions section for PageRank as I didn't want people to think I forgot about it.
@OP nice list! some additional factors (imho):
Intent behind query ? Informational / Transactional / Navigational ?
Personalization turned on ?
Query IP ?
Keyword position in title ?
Keyword in filename
Image quality/resolution ( image search )
Tags, comments, (video search )
Social signals reinforcement ( RT’s, diggs, etc )
user interaction / satisfaction with results ?
|You don't win the prize unless you get all 200 in correct order.... happy! |
I have seen authority sites (think BBC, CNN etc) get away with things that would be considered bad for small sites. I believe that TrustRank is an early decision branch, which if evaluates true, skips some of the factors that concern small sites.
Many potential historical factors are listed in this Google patent [appft1.uspto.gov]. The question has always been, which ones are active?
One of the complexities with historical factors is that for certain queries the same factor )e.g. backlink growth) can work as either a negative or a positive.
Other factors not yet mentioned:
1. the number of IMPRESSIONS a domain receives in Google SERPs overall. Especially when that number spikes, it's been mentioned in patents as a possible spam signal.
2. the quality of advertisers that a site runs.
They'll probably adjust it but judging by the linked thread ranking for some 'odd terms' it seems like you can throw most factors, except 'authority' (a combination of (.*)Rank factors) and having the text on the page out...
Google test new search page, featuring sidebar [webmasterworld.com]
It doesn't have the text in inbound links.
It doesn't have the text in the title.
It doesn't have the text in a heading.
It doesn't have the text in a link out.
The overall topic of the surrounding pages doesn't have anything to do with one of the subjects it ranks for.
All it really has is 'authority' without good content regarding the subject, without good relevant links on the subject, without a picture or ten on the page. It really has nothing to do with the topic, other than mentioning the words in small text on the page.
How many variables are people saying are probably actually important? I think you could probably narrow it down to 3 judging by the preceding... PageRank, TrustRank, Text on the Page.
|...without good content regarding the subject, without good relevant links on the subject, without a picture or ten on the page. It really has nothing to do with the topic... |
Sounds more like Halloween than Thanksgiving -- scarey!
|I think you could probably narrow it down to 3 judging by the preceding... PageRank, TrustRank, Text on the Page. |
Pretty much all the factors fall into those three areas - but each one has a lot of detail to it.
The work Google did this past year with their so-called "intention engine" is still a frustration for many. Certain query terms are simply classified as a certain type of user intention. If a website type (according to Google's taxonomies) doesn't match up with that intention, then you can pretty much forget ranking that site on that query.
Some queries seem to have a "diverse" intention attached, and then there's more of a hope. I've been considering this intention engine obstacle more lately. It seems to me that when Google gets user intention right, then maybe that traffic wouldn't really do much good for the excluded types of sites even if they would send it.
I see what you're saying tedster, but what I was referring to seems to be the opposite of your comments which are in regard to 'exclusion' based on 'intent' of the search... In the case I was referring to it appears the converse is true: if the site fits the perceived 'intention type' even if it's completely unrelated overall but has 'Trust & Page Rank', then it ranks? Scary, IMO.
|It seems to me that when Google gets user intention right, then maybe that traffic wouldn't really do much good for the excluded types of sites even if they would send it. |
This part is really interesting, because it might depend on the definition of 'do any good'. For instance: If it's an informational site that includes links to (possibly ads for) products and the 'intention' is regarded as shopping the traffic might very well have done them some good, even if it did not match the 'perceived intent' of the search...
The 'intent factor' might provide better results, but that doesn't really help the webmaster who's used to making a living off a site with a different 'intent' or alternate focus when compared with 'search intent'. I don't think it's 'wrong' of Google to try to get the search results right this way, but do think webmasters should prepare for some changes in traffic patterns and might very well need to make some adjustments if they plan to stay in the race.
Also: I guess what I was trying to say WRT only TrustRank PR and Text on the page is... Throw out the title, h1, page name, URL, and a bunch of other ish as 'super important', because obviously they're not. The original number 1 had a page with exactly the same structure as the 'wrong page' ranking and was in the same directory of the same site and only one click away. The non-ranking page had a better title, url, page name, text on the page, and overall content for the search, but did not rank.
Edited: Clarification, additions.
Here's some more detail about the former number one. I'll just post information and let the readers decide what (if anything) could possibly be included / excluded from the list of 200 variables based on this result. (Keep in mind this is simply looking at 'on-page factors' which may or may not have been mentioned as possible variables in this thread and is only from one result.)
About the former number one page:
338 = Lines of CSS. (On Page)
365 = Line number of the <body>
1315 = Lines of Source Code.
7 = Words in the Title. (2 From the Search)
32 = Words in Headings.
164 = Words in plain Body Text.
439 = Words in Links. (Includes URLs)
644 = Total Words on the Page.
(644 Includes Source Code. 196 (or so) Viewable Words)
0 = Keywords, Description, Words from the Search in the Domain Name.
1 = Canonical Tag, <h1>, <iframe>
<h1> = 7 Words. Words from the Search 0.
2 = <h3>s, <h4>s, Google Ad Blocks (Small)
<h3>s = 7 Words Total. Words from the Search 2 of 4.
<h4>s = 4 Words Total. Words from the Search 0.
3 = <strong> (4 unique words; 1 from the 4 word search;)
4 = <p>s
5 = <h2>s (15 Total Words. Words From the Search 0.)
27 = <script>s
91 = <a>s
121 = <div>s
56 = Number of Images.
17.36k = File Size According to FireFox.
2 of 4 = Words from the search in the <title>
2 of 4 = Words from the search anywhere in the URL
2 = Links to the page you really wanted to find RE the search. (2 Out of 91)
About the links you wanted to find:
1 = Picture (1st Link to the correct page. Line 536)
1 = Text: 'Newer Post' (Line 907)
At Least 1 = Number of people who think it does a better job of satisfying the search and is a much better result for the search than the current number 1. (If you happen to own the page (site) this post is about sorry about your ranking. I didn't do it on purpose and think yours is a way better choice than the one it was replaced by. I actually had no idea it happened until someone else pointed it out...)
[edited by: tedster at 5:59 am (utc) on Nov. 28, 2009]
You can't look at these things in isolation. What matters is comparing what is on the #1 with what is on the #2 and so on. Ranking isn't absolute it is comparative.
For some terms you just need to get one or two things right and you will rank well which is why folks who bother to research long tail terms find it so easy to rank for those terms.
In some areas you have to get the mix just right and do a couple of things better than the competition AND hope they don't analyse what you have done and spot it so they can continue the arms race.
True enough Sid, but the comparison in this case does not necessarily need to be between position #1 and #2 in the SERPs to see what's important...
The site had a similar page, with the main difference being a keyword which was actually in the search as the focus, making the non-ranking page much more appropriate for the search, yet it was not returned in the results. (IOW: The page ranking had 2 of the 4 keywords in the search, the not ranking page had 3 and the 3rd word changed the entire focus of the page, like the difference between sun and rain, cat and mouse, etc. The different word completely changed the information on the page to what the search was actually about.)
So, in this case you can forget about the #1 and #2 result comparison for a minute and compare the Page in the results to the Page Not in the results to see where the differences are (which must be factors other than the above since the pages were very similar, yet unique in the information they presented) and by doing this, IMO, you can eliminate some of the factors I listed above from the 'super important' list.
If all the factors listed above actually carried significant ranking importance the correct page from the site would have ranked rather than one which simply linked to it with poor anchor text and an image, because the correct page 'had all the right words in all the right places', so to speak...
One possible factor I picked up from a patent somewhere - I haven't verified this - is that a page that begins to rank for one query and then a different query that is not just some modification of the first, can sometimes get an additional boost for both queries.
Unless I've missed it in a reply: Keywords within domain name seem to have a noticeable influence.
I always thought it was a shame that whether the site was any good or not didn't play a more important role in the algorythm ;-)
When links weren't such a manipulated factor, PageRank (backlink power) was a pretty good indicator of whether a site was good or not. But now? Not so much.
It would not be a list of items it would be a complicated algorithm. Depending on many factors some items may count for some people and other won't. Some factors work better in combination with others. Some factors have lesser value depending on doing on not doing some other factors. Almost all of them can give you a penalty if done in a spammy way.
Like I have said before you can have a copy of the on page algo and your not going to rank very well without good backlinks. You spend a week working on your website and I will spend a week getting links and I will rank very far ahead of you. Everybody wants to worry about on page factors because that is easy to do. Link building is hard work.
You really only need to work on the basics. Don't waste a lot of time trying follow some on page formula. Build a website and have lots of good content with Good titles. Check analytics a lot to come up with new title and content ideas. Then build links until you pass out.
| This 82 message thread spans 3 pages: < < 82 ( 1 2  ) |