100 variables

Forum Moderators: open

Message Too Old, No Replies

100 variables

muesli

8:56 pm on Aug 29, 2002 (gmt 0)

Split from: [webmasterworld.com...]

just learned about "keyword order" in a different thread, and as i liked the way the list in the thread "where is everybody from? [webmasterworld.com]" evolved i thought i'll update this one ;-)

page rank
keyword in title
keyword in H1
keyword in URL
keyword density (total word count being considered)
keyword in title
keyword in links to page (anchor text)
keyword in bold / strong, etc.
keyword in other parts (full text, alt, title, meta description)
keyword proximity (if search for 2+ keywords)
keyword order (does or not order in page match order in query)
keyword prominence (how early in page/tag)
URL length

more guessing:
==============
absence of competitive keywords other than search term

[edited by: Brett_Tabke at 12:09 am (utc) on Aug. 30, 2002]

muesli

10:35 pm on Aug 29, 2002 (gmt 0)

BHoLAN,

the argument about the relationship between filesize and keyword density makes very much sense. i meant this with the brackets text "total word count being considered".

the fact alone that a page is bigger than another one IMO doesn't affect ranking. the argument would be: google wants to educate us to design smaller pages while sacrifying search quality. as only very few people consider SEO at all this would not have much effect on file sizes. i interpret brett's statistics in a way that it reflects the distribution of different file sizes in the entire web. so i add your theory to the "guesses" ;-)

updated list:
=============

page rank
keyword in title
keyword in H1 and H2
keyword in URL
keyword density (total word count being considered)
keyword in title
keyword in links to page (anchor text)
keyword in bold / strong, etc.
keyword in other parts (full text, alt, title, meta description)
keyword proximity (if search for 2+ keywords)
keyword order (does or not order in page match order in query)
keyword prominence (how early in page/tag)
URL length

more guessing:
==============
absence of competitive keywords other than search term (muesli)
file size (brotherhood of LAN)
acceleration of LP / relation between LP and age (beachboy, slud)
themes

ciml

12:46 pm on Aug 30, 2002 (gmt 0)

> keyword order (does or not order in page match order in query

Definitely. If you search for something fairly competitive then you should see a change. For example "hotel in yourcountry" is different from "hotel yourcountry" is different from "yourcountry hotel".

There's a strong element that I can see, to some extent, but have never tracked down. I guess it's OK to post this example, given that it's Google's page anyway. Everything I know about Google tells me that this page [google.com] should be much higher up in a search for "solutions". Density, title, PageRank and inbound link text (from a bunch of the highest PR pages on the Web) all point to it being ranked very well, yet it's not in the top 150.

It might be an element of old-fashioned link popularity (i.e. how many domains link to you). Maybe it helps to get the right link text from more than one domain. I just don't know.

muesli

2:39 pm on Aug 30, 2002 (gmt 0)

hi calum,

the search doesn't bring up google's page in the first 15 pages. the top results have everything to a lesser extent than google's page: keyword density, PR, etc. what they DO have is plenty of anchor text, all containing the word "solutions".

so in a later step, when we might add percentages to the list, anchor text should reflects this important role it seems to play.

muesli

ciml

3:50 pm on Aug 30, 2002 (gmt 0)

> what they DO have is plenty of anchor text, all containing the word "solutions"

And yet Google have a whole bunch of PR9 and even PR10 pages linking to it with that word. So is it a case of more links (or even links from more domains) with matching anchor text helps, irrespective of their PageRank?

muesli

7:24 pm on Aug 30, 2002 (gmt 0)

So is it a case of more links (or even links from more domains) with matching anchor text helps, irrespective of their PageRank?

i'd say so, yes. PR certainly has its role but is maybe overestimated. many links on decent PR pages from many different domains IMO do the trick. may be the "time span" of the links play a role, too. very old links and very yound ones containing the KW are a perfect sign of relevance and continuity. for the #1 ranking the time span is definitely immense.

muesli

squared

8:21 pm on Aug 30, 2002 (gmt 0)

Although it does come up first for search solutions:

[google.com ]

ciml

1:10 pm on Aug 31, 2002 (gmt 0)

Good point squared. Quite often I see this pattern, and generally it's a one word phrase.

That may be just because the one word phrase is more cometitive; even with the lack of weight for solutions that the others have, it's so overwhelmingly optimised (including backlinks) for the two word phrase that it gets over the 'barrier' (whatever that is).

startup

2:40 pm on Aug 31, 2002 (gmt 0)

For "solutions search" it ranks #10. Keyword and anchor text order are playing a part. For "search" it quit looking after the 17th page.
So, this page is better optimized for "solutions" than it is "search".

ciml

2:45 pm on Aug 31, 2002 (gmt 0)

Or "search" is more competitive than "solutions"?

startup

2:58 pm on Aug 31, 2002 (gmt 0)

"competitive" meaning more people missing the mark.
search = 203,000,000
solutions = 25,000,000

Yes, I did not consider the competitiveness of the query at all.

muesli

4:23 pm on Aug 31, 2002 (gmt 0)

time for an update of the list (added/changed=bold):

updated list:
=============
page rank
keyword in title
keyword in H1 and H2
keyword in URL
keyword density (total word count being considered)
keyword in title
keyword in links to page (anchor text)
keyword in bold / strong, etc.
keyword in other parts (full text, alt, title, meta description)
keyword proximity (if search for 2+ keywords)
keyword order (does or not order in page match order in query)
keyword prominence (how early in page/tag)
URL length

more guessing:
==============
absence of competitive keywords other than search term (muesli)
file size (brotherhood of LAN)
acceleration of LP / relation between LP and age (beachboy, slud)
themes
number of different domains where keyword appears in anchor text (ciml)
"age span" of inbound anchor text where keyword appears (muesli)

anything we should add? has anybody made experiences with the guesses, so we can delete them or move them to the first list?

ciml

5:36 pm on Aug 31, 2002 (gmt 0)

OK, here's some more guessing.

"Keyword in links to page" really means "keyword near links to page", so how about "keyword density and HTML title of the pages that link."?

That might be quite easy for Google to implement compared with contextual PageRank?

Now to get futuristic...

* Keyword density and HTML title of the pages that are linked from the pages that link to the page in question. (In other words, do the 'Similar pages' match the phrase?)

* Keyword density and HTML title of the pages that link to the pages that link to the page in question. (A very low-tech attempt at contextual PageRank).

* Title of the ODP category the page is in (and maybe the parent categories too).

* Title of the ODP category the pages that link to the page (and maybe their parent categories too).

* Full Contextual PageRank (i.e. calculate PageRank across the Web for each phrase. Not likely any time soon.)

* 'Topic Sensitive PageRank' (i.e. calculate PageRank across the Web for a few topics (eg. ODP), then match one of those topical PageRanks to the search phrase instead of the general PR - see Haveliwala's paper of that name.)

* Build the list of phrase hits using PageRank and on-page factors (like Google does) then use those pages to find the most authoritative (similar to Bharat's and Mihaila's Hilltop).

There are so many things for Google to try, but presumably the key is not just how well they work but how practical they are.

brotherhood of LAN

5:43 pm on Aug 31, 2002 (gmt 0)

ciml,

Everything you mention there is off the page, which might be the only way out for google? :)

Would you say thats its scale/technology/finance reasons that may stop these things from happening alot quicker? Or is it just hard to implement.

ciml

6:05 pm on Aug 31, 2002 (gmt 0)

The main reason I'm talking off-page factors is that muesli covered the on-page factors so well.

As for scale/technology/finance reasons, I think that we can be fairly sure that "Title of the ODP category the page is in" would have minimal implications while "Full Contextual PageRank" could not be done on a monthly cycle with current technology.

I don't know how long it takes Google to iterate PageRank, but I'd guess not too long (since they spend most of the cycle spidering and the indices must take a lot of crunching). But how many potential search phrases are there? It's currently unthinkable to iterate PageRank for each word combination or even each word, IMO.

Which is less expensive to calculate, Topic Sensitive PageRank or Hilltop? I'm sure that Google will have investigated both.

muesli

6:06 pm on Aug 31, 2002 (gmt 0)

BHoLAN, are you sure some of the guesses aren't part of the algo already? would we notice?

squared

6:15 pm on Aug 31, 2002 (gmt 0)

Personally, I think there's too much discussion on topic sensitive pagerank (or whatever you want to call it). I don't think Google should do this. Google is interested in high quality and relevant results. If a relevant page is picked up because it's high quality, although it's on a site that includes many differnet topics, it should still make it to the top of search results. Topic sensitive page rank might exclude many eclectic sites.

-Squared

brotherhood of LAN

6:40 pm on Aug 31, 2002 (gmt 0)

BHoLAN, are you sure some of the guesses aren't part of the algo already? would we notice?

I was thinking much of what ciml mentioned would already be in action or in the pipeline.

I think people are expecting google to use more emphasis on a theme approach alongside current PR and one page rankings.

Even if they were in there and we weren't 100% sure, if we list a variable and ask why it could be of use and how effective that would be, then well, we've as good a chance as google at putting the jigsaw together :)

I've never studies G like others, though I'd hope that whatever-way the process their PR it will not be normalised into one simple value. There must be stages where values can be used again in the algo to help get the results.

ie. all the things that ciml mentions off-page. DMOZ is noted for its 'neutrality' and the fact that everything is human edited. This will no doubt be taken into account...and could be an additional factor in itself further along the algo if themes were to be implemented in a more mainstream way.

muesli, the way I see it, there are only a finite amount of ways they can make the SERP's relevant...after that its throwing away the ones the won't be using and figuring out how they use the rest :)

ciml

6:51 pm on Aug 31, 2002 (gmt 0)

squared:
> Topic sensitive page rank might exclude many eclectic sites.

If Google decides to approach themes by using domains, then I'll agree with you 100% squared. My assumption has been that they won't, and that instead they'll use the link graph (as PageRank does). So a page about widgets would do well if it has links from authorities about widgets, not if just because it's on a domain about widgets.

My concern about ODP related methods is that a), it will favour only the exact URLs in the ODP (something makes me twitchy about that but it doesn't sound too problematic) or that b) the ODP entry would apply to the domain (that would be a major backward step IMO).

brotherhood:
> already be in action or in the pipeline

The pipeline possibility is probably why so many of us think in terms of theme pyramids, and not just PageRank and link text. Up until now, most of the approaches I've tried on the Web have ended up being useful for Google ranking some time afterwards. I want to keep it that way.

mbauser2

7:11 pm on Aug 31, 2002 (gmt 0)

absence of competitive keywords other than search term

I am really, truly, sincerely doubting that Google cares about what SEO's think are "competititve keywords".

I'm also kinda doubting that there's a good metric for "competitivie". Some phrases aren't competititve, they're just popular. What's Google going to do, knock down any widgets page that happens to mention George Bush? That would be weird (and hugely disappointing to any widget company that gets endorsed by George Bush).

Anyway, what is a thread of complete guesses to accomplish, besides creating more urban legends and misinformed newbies?

ciml

7:37 pm on Aug 31, 2002 (gmt 0)

I agree about the 'absence of competitive keywords', mbauser, but I think it's much better to discuss guesses openly than to post them as fact.

muesli

8:39 pm on Aug 31, 2002 (gmt 0)

ok, update:

part of the google algo:
========================

page rank

kw in title

kw in H1 and H2

kw in URL

kw density (total word count being considered)

kw in title

kw in and near links to page (anchor text)

kw in bold / strong, etc.

kw in other parts (full text, alt, title, meta description)

kw proximity (if search for 2+ kw)

kw order (does or not order in page match order in query)

kw prominence (how early in page/tag)

URL length

several ODP-related measures (i suggest not to get into details here yet as we look for a more generic formula)

more guessing:
==============

file size

acceleration of link pop / relation between link pop and age (beachboy, slud)

themes

number of different domains where kw appears in anchor text (ciml)

"age span" of inbound anchor text where kw appears (muesli)

kw density and HTML title of the pages that link (ciml)

kw density and HTML title of the pages that are linked from the pages that (ciml)

link to the page in question. (do the 'Similar pages' match the phrase?) (ciml)

kw density and HTML title of the pages that link to the pages that link to the page in question. (low-tech attempt at contextual PR) (ciml)

'Topic Sensitive PR' (see Haveliwala's paper of that name) (ciml)

Build the list of phrase hits using PR and on-page factors (like Google does) then use those pages to find the most authoritative (similar to Bharat's and Mihaila's Hilltop). (ciml)

i dropped "absence of other competitive keywords" as mbauser convinced me that google wouldn't be able to destinguish between competitive and popular. please let me know if you have anything to add in the first list.

startup

11:55 pm on Aug 31, 2002 (gmt 0)

We get to guess?
-Font size of KW
-<b><i><strong>for the anchor text, also include font size and <H tags>
- Where the link or links appears on the page eg, in a paragraph, top of page,

rmjvol

5:49 am on Sep 1, 2002 (gmt 0)

Nice list, muesli.

You have "keyword in title" as your 2nd & 6th points. How are you differentiating these?

rmjvol

profitpuppy

9:59 am on Sep 1, 2002 (gmt 0)

Here are some more guesses/rumors:

1. Slight preferance for edu/gov domains
2. Already mentioned, but having the exact keyword phrase in inbound links. Ie if you are trying to optimize a page for "swimming suits", then having all inbound links with the keyword phrase "swimming suits" will have a better effect than "best swimming suits site on the web" (the importance of the keyword phrase will be diluted by the other words in the link).
3. I'm guessing that Google judges pages based on the outgoing links, especially for pages where there are lots of outgoing links, to make sure that they are of a similar theme/and penalizes sites that have lots of outgoing links to disparate sites ... could be measured by similar high keyword density in the sites you are linking too ... maybe that's a bit far fetched.

4. possible slight preferance for one word domain names that are exactly the same as the keyword. Eg if you are searching for "xyz" then google will give preference to www.xyz.com over www.xyzstuff.com
5. links from sites on the same ip address have less weight and may be penalized if too many cross links from different domains on the same ip.

In my very humble opinion, the most important factor is having the keyword phrase in inbound links.

brotherhood of LAN

3:21 pm on Sep 1, 2002 (gmt 0)

I read a little of that hilltop pr thing after ciml mentioning it further up last night.

If they are using the 'flavour' of "expert documents" ie DMOZ (and other places perhaps- like high PR pages) then it seems google will have more than a 100variables to play with...just a wonder of whats the most efficient AND effective ones.

Whole lotta "iteratin" if you ask me :)

Does anyone think their is more significance to DMOZ more than simply just having "bonus" from a link in their directory? I've always thought "number of clicks away from home page of DMOZ" is something that could be a measure of scale and importance...either way you'd think that google's reliance on DMOZ will be fairly heavy.

/sidenote
muesli, any chance you can seperate the compiling list into "on the page" - "PR related" and maybe even "theme related" ? :)

startup

3:44 pm on Sep 1, 2002 (gmt 0)

We are discussing these (Google applies hypertext analysis using more than 100 variables to determine relevance) variables correct?
"Hilltop" goes alot further than hypertext. This is not to say they cannot use some of the variables it. PR made G what it is today and I don't think they are going to stray too far from it.
If you want to see a type of HT in action, teoma is where to look. HT was done for compaq and they may still own it. If they don't it was in the package that was sold to AV. So we will never so it duplicated exactly. The idea can be used with variations.
"hypertext analysis" should include some analysis of the page linking to the target page, but is not themed at this point.

flowilu

8:11 pm on Sep 1, 2002 (gmt 0)

A sorted list hopefully in accordance with muesli.

*: considered to be part of Google ranking algorithm, others are guesses.

Factors on page
1. density in body text (total word count being considered)*,
2. keyword in URL*, URL length*, keyword as one word domain name, file size
3. keyword in title*, in ALT tag*, in meta description*,
4. keyword font size, in H1/H2 tags*, bold/strong/italic tags*,
5. two or more keywords:
- keyword proximity in body text*
- keywords order (kw 1 ... kw2 or kw2 ... kw1)*
6. keyword prominence (how early in page/tag)*
7. outgoing links of a similar theme [profitpuppy]

Factors on pages that link to page in question
1. keyword in anchor text of links*,
2. keyword font size, headline size, bold/strong/italic tags [startup]
3. Open Directory related measures*
4. where the link appears on the page eg, in a paragraph, top of page, [startup]
5. keyword in body text near to link
6. keyword density and HTML title of the page [ciml]
7. number of different domains where keyword appears in anchor text [ciml]
8. Slight preferance for edu/gov domains [profitpuppy]
9. dilution of the keyword phrase by the other words in the link [profitpuppy]
10. less weight for links from sites on the same ip address [profitpuppy]
11. "age span" of pages with keyword in anchor text [muesli]

Page Rank*
- Link popularity:
- acceleration of link popularity [beachboy, slud]
- relation between link pop and age

Contextual Pagerank criteria [ciml]
- keyword density and HTML title of the pages that link to the pages that link to the page in question (low-tech attempt at contextual PR)
- do the 'Similar pages' match the phrase?
- 'Topic Sensitive PR' (see Haveliwala's paper of that name)
- Build the list of phrase hits using PR and on-page factors (like Google does) then use those pages to find the most authoritative (similar to Bharat's and Mihaila's Hilltop)

User behaviour factors
- clicks on pages on SERP and duration there
- toolbar voting button results

startup

9:33 pm on Sep 1, 2002 (gmt 0)

Okay let's add KWs, with all possible attributes and positions in links (anchor text) on the target page.
Click tracking uses alot a processor time, even though it is possible. System loads would be staggering. I know smaller SEs have gotten away with using it. The amount of queries going through G make it unlikely.
Tool Bar, very very, possible.

muesli

11:35 am on Sep 2, 2002 (gmt 0)

thank you flowilu for revising the list. i'll now go one step further and add my estimation of importance (bold). i try to jugde the factors on the two concrete queries posted her: "solutions" and "search solutions" (without quotes).

factors on page ("search solutions": 35%, "solutions": 25%)
================
1. density in body text (total word count being considered),
2. keyword in URL, URL length,
3. keyword in title, in ALT tag, in meta description,
4. keyword font size, in H1/H2 tags, bold/strong/italic tags,
5. two or more keywords:
- keyword proximity in body text
- keywords order (kw 1 ... kw2 or kw2 ... kw1)
6. keyword prominence (how early in page/tag)

factors on pages that link to page in question ("search solutions": 35%, "solutions": 40%)
==============================================
1. keyword in anchor text of links,
3. Open Directory related measures
5. keyword in body text near to link
6. keyword density and HTML title of the page
7. number of different domains/IPs where keyword appears in anchor text (i think all agree on this one)

PageRank ("search solutions": 20%, "solutions": 25%)
========
- link popularity (PR formula)

guesses ("search solutions": 10%, "solutions": 10%)
========
(see posts above)

those are first guesstimates of mine. what are yours?

This 39 message thread spans 2 pages: 39

100 variables

muesli

muesli

ciml

muesli

ciml

muesli

squared

ciml

startup

ciml

startup

muesli

ciml

brotherhood of LAN

ciml

muesli

squared

brotherhood of LAN

ciml

mbauser2

ciml

muesli

startup

rmjvol

profitpuppy

brotherhood of LAN

startup

flowilu

startup

muesli

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week