Forum Moderators: open
I wanted to start it due to the "PR for sale" thread, and wondering how much the implementation of themes on Google is going to affect such things as the sale of PR, and everything else for that matter ;)
Pagerank
Many resident experts here! But my summary of it is, more links the better, higher PR links even better. High PR links are trumps, especially in on-topic pages, with little in the form of other links "stealing" your potential PR.
Themes
So themes rely less on the pagerank model by placing less emphasis on sheer numbers and more relevancy on the actual words and how the context they are used. So Google must have a way of implementing some sort of mathematical "score" for a theme - for every site.
On the Page
Most of the Google threads are concerned about on-the-page stuff and PR. I believe the on-the-page stuff is commonly referred to as IR? Anyway, it seems the evolution of search engines has moved away from the spammable information that can be on the page. Spammable why? Because people are greedy, therefore (ironically), its the least reliable source of information to determine relevancy of page to an extent!
Anyway, how will a search engine like Google measure the theme relevancy of a page when its busy number crunching Pagerank? I have read the Wisenut PDF that Brett posted in the PR for sale thread, and have read the Google/backrub doc that has been referenced here in the forums many times before.
So the question I ask is this : How would Google implement the idea of themes to a website?
IMO, I'm going to make a bold statement (for someone relatively ignorant about the G algo) and say that the implementation of themes is almost non-existent.
It's evident that the one the page stuff can be manipulated, as can the PR system.
The third element being the theme.....how would google implement this into its algo......
/added
I'm throwing this thread into the wind, hoping that I'm pressing the right buttons and said the right things ;) Sometimes I wish GG would start these sorta threads
Without going back through my papers - I think it is something like this:
<Title>brotherhood_of_LAN is a smart webmaster</title>
<h1>smart webmasters r us</h1>
<a href=http://www.example.com/>brotherhood_of_LAN site is smart</a>
WORD IDS:
brotherhood_of_LAN=UNK
smart=29849823
webmaster=20830762
webmasters=38674987
site=378649872
DOC IDS:
This one=038749872987349
YourSite=3975698743
Someone searches for smart - google looks up word id - finds site id - finds it is cross linked to your site.
I am saving a few steps here (mostly the ones I don't know or understand), but I think this is how it works more or less...
I dunno, how about the 2 examples
"Planet of the Apes : Discovery Channel"
"Discovery of Apes on the Planet"
Say these two pages have the same PR, and have roughly the same amount of links to each of them, no doubt different pages due to different topic.
So the on the page stuff will be roughly equal, as will PR. That will lead to Google using the same "numbers" for totally different subjects?
If there is more emphasis placed on theme, I guess what I am getting at is that theme can also be "spammed".
a) the SEO places the usual spam on the page
b) buys pagerank
c) attempt to use similarl words to the pages with PR that point to you.
"Planet of the Apes : Discovery Channel"
"Discovery of Apes on the Planet"
Say you put a third site in,
"Discovery of New Planet by Apes"
(i know i know...no one will search for this but..)
and someone searches for "discovery apes".
Perhaps in this example the the "discover channel" one would be commercially related, and the second one is produced by a non-profit .edu site....and the third one has the information that the searcher is REALLY looking for.
So what is the .edu is selling pagerank to the discovery channel site ;)
See what I'm getting at? Many words have different meanings, and can be used in a different context. As far as I know the wee bot ain't capable of abstract thought....so it "counts" each word as it is...a_word.
The question is - are themes as potentially spammable as on-the-page stuff and PR given the search engines of today.....
If yo make a fake page on planet of the apes and link it to your page - will that help?
Probably.
But if google goes back from that page - what kind of theme is that going to get.
People don't like PageRank, but it is still very good at what it does - combine it with everything else and it pretty powerful.
Just depends on how much processing google can do. You can probably get around almost anything with enough time and effort. Google can't defend against everything.
The question is - are themes as potentially spammable as on-the-page stuff and PR given the search engines of today.....
It seems to me that the value of themes is that you suddenly have the possibility of two or more "drill down" subject categories for many searches. Teoma and Vivisimo are examples of this. The very idea of spam is to put it in front of the greatest number of eyeballs, while the idea of user "drill down" is that the searcher selects what he wants and eliminates what is irrelevant to his needs.
The spammer, in order to spam for themes, automatically limits his eyeball audience, as he can effectively spam for only one theme at a time. Therefore, content analysis is more immune to spam.
But content analysis is so difficult (natural language processing), and so expensive computationally, that it's rarely an obvious, working solution. Google does not do any content analysis that I can detect, but I believe that they're interested in the topic.
All comes down the money then :o Just like the sale of PR :)
>>topic
Anyone think that Google would obtain their "topical structure" from DMOZ or just assume something from the web...or maybe program a theme in....?
I guess this is the problem that google faces...they "could" have relevant results if they just let their spiders run around a little more....but at the same time they risk presenting a whole bunch of 404's to the user at the end of their "update"
Well..I was wondering how themes worked with G, guess I have a better idea now :)