Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Club of Trust - is there a primary index?

Threshold for relevance is inferior to the threshold for trust, right?

         

photopassjapan

3:27 pm on Oct 24, 2006 (gmt 0)

10+ Year Member



A very simple question, fishfingers has mentioned this before, and honestly that's my only lead in trying to imagine how this works. ( although i have an all too active imagination )

It's whether you can or can not enter the primary index for complex searches ( 3 words and up ) that dare to include a keyword that's competitive. Meaning if a page is more relevant than others for a long search term, ( why ) is it excluded from the primary index when the competitive keyword in there is just... common sense to be added.

Okay so...
A page is relevant for a competitive "keyword".
Only relevant, not enough links, references, just... you know that silly thing called content ;)

Is indexed, has no problems whatsoever.

It's also relevant for "keyword2" and "keyword3", which are it's main theme actually and are much less competitive... or should i say, are next to non-existant on commercial pages. While "keyword" is general enough that using it may or may not indicate a commercial site where the bloodshed is taking place.

So...

If you enter "keyword" as the search term, you get a list of the most trusted, linked, most relevant sites ( okay theoretically :P ) and this list has a thresold set high enough that no one can just simply jump into it without having a low-end of the same parameters. Meaning if there are only 300 domains qualified for the thresold, heck, Google will display 300 and act dumb as for where the rest of the umpteen million results are. This is what fishfingers called the "primary index" which is like a club for highly trusted sites. Entry with your G passport.

Scene two.

You enter "keyword2 keyword3". No such high thresold, for these keywords are not commercially competitive. You get the page mentioned earlier in a very decent position. Because it's trusted, if not that much, and has backlinks, if not tens of thousands, and again, you know... i'll keep repeating this childish parameter... content.

Scene three.

People rightfully associate the three terms. And are used to crappy results so they try to be as precise as they can be with the phrase they enter... so...

You enter "keyword keyword2 keyword3". Or as a matter of fact any of the combinations involving "keyword", the commercially usable phrase. It's not a commercial phrase, just general enough to allow the possibility to be used as such... as in "cars" shouldn't necessarilly mean "we sell cars" but for Google thresolds it does.

The above mentioned page does not appear on the list.
Eventhough it's more relevant than some other pages, has actual content and was top 10 before we added "keyword".

The problem is: "keyword2" and "keyword3" even with the least knowledge about them, is connected in everyone's mind with "keyword", meaning "keyword" is a MUST on the websites mentioning the other two. It's just a generalisation that everyone knows, as "tomatoes" and "cucumbers" are to "vegetables".

If a site is good enough to be top ten for "tomatoes and cucumbers" why does it disappear to oblivion... and in favor of sites much less relevant... when you add "vegetables"? Because a highly competitive keyword could mean sales, thus are within the closed doors of the club, right?

The high thresold set for trust and other non-content related parameters... result in a list of sites much less relevant in the SERPs. Meaning the more people try to be precise and enter like three to four key terms, which include but a single general word... only to refine their searches... the less relevant the results become after the top few results.

So the secondary index comes up for "keyword" where it's on the pages, most definately connected in background context, connected in relevance, but it's not entered in the search bracket? For if you enter it, it means "please filter my sites to those which are favoured by the primary index".

... these are not rules ( or perhaps they are ), but experiences.
Some of the things i babble about are more like questions on your experiences of this, and i know this is not news right now, but i figured it's just as important. ( to me :P )

Our site itself is a non-commercial reference site.
Just so you know :)

And i'm not saying Google only lists commercial sites for general keywords, no, of course not. It's just that the possibility of commercial use calls for a higher thresold of trust, to filter out spam, mfa, scrapers, whatnot.

My question is why extensive terms that only include a single such word, but mixed with two or three others fall under this... phenomenon.

tedster

1:44 pm on Oct 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's an interesting post from mistah in the Create a Custom Search Engine [webmasterworld.com] thread that describes a situation that sounds parallel:

I've had a chance to play around with this I'm getting some funny results.

I've got approx 2000 sites in my "custom search engine." For some seaches
"widgets" brings up no results, but "blue widgets" brings up several results.
Anyone else finding this?

Hmmm... nowthat's very suggestive of something in the logic of how Google codes search, isn't it? And the possibility of using this custom search service to study Google's behavior over a more limited sampling of domains? I would think that the custom service is protected from exposing too much of a reverse engineering tool, but still something useful might be learned, right?

europeforvisitors

2:55 pm on Oct 25, 2006 (gmt 0)



I would think that the custom service is protected from exposing too much of a reverse engineering tool, but still something useful might be learned, right?

Maybe the learner is Google, which can give a little "TrustRank" boost to sites that are often added to custom search engines (at least until the SEOs catch on and develop "custom search engine spam").

tedster

3:13 pm on Oct 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



photopassjapan, when you add the extra, more generic search term [tomatoes cucumbers vegetables] how many more total results are delivered?

photopassjapan

2:54 am on Oct 26, 2006 (gmt 0)

10+ Year Member



It differs based on where you add the generic keyword.

If you add it as the first word you'll sometimes get more results, let's say 60.000.000 as opposed to the 50.000.000 without it.

If you add it as the last, you'll sometimes get fewer, for example 40.000.000. But can't say it's always true.

You have to pick a keyword that is most likely on all of these pages though. Having different number of results just by playing with word order ( oh and i did NOT enter the phrases in quotes ) makes no sense, unless the results are filtered in descending order of keyword importance. ( "First word has to be relevant, second should be, third would be nice, fourth... well... why not... are there any such sites?" ...kind of SERP sorting. )

Thresholds would be...

word1-relevance > word2-relevance > word3-relevance

...and in the same time perhaps...

most-competitive-word-trust > word2-trust > word3-trust?

In which case using the most generic term as the first word, resulting in more matching sites - would make sense. Or would it ;)?

But in either case, the list is populated with results in favor of more trusted sites, even if they are less relevant. ( the ones dropping out aren't your first day in school myspace kind of sites either, they're well established, just not in the race. They're trusted enough to come up for less generic but far from obscure 2,3 word searches... you know... )

I've tried it now with a couple of phrases, and the site that ranked top 10 with "tomatoes cucumbers carrots" - and is in fact highly relevant for "vegetables" too - simply disappears off the list when you do a search for "tomatoes cucumbers carrots vegetables".

Played with this for a while to see how other supposedly similarly trusted/relevant sites' URLs behave on SERPs. And it's the same.

The threshold seems to be set to a level which you can only enter with a lot of relevant and trusted IBLs. That i can tell just by looking at the sites that stay, even after adding the generic term. Not to mention they are the ones we like to visit ourselves so i know them well :)

Sorry, i may not be making any sense right now. But there is filter on the SERPs that seems to be overruling relevance, to a level where it can't even show more than a couple of hundred ( 1-400 ) results. Rest are omitted, including quite well established pages.

Which is just... not good.
I think. -.-

jcmoon

5:27 pm on Nov 7, 2006 (gmt 0)

10+ Year Member



It almost sounds like we've retrograded a decade in our search engine technology. I vividly remember being a computer science student in '97, and when I got frustrated with my Java textbook, I predicted there might be a book called Java For Dummies out there.

So I went to the search engine I trusted -- HotBot -- and entered /java dummies/ figuring the word "for" would just get thrown out as too common. Nope, nothing but crap. So I irritably went back to the searchbox and put /java for dummies/ (without quotes at all), and boom, top result is a page about a book called Java For Dummies.

I didn't buy the book, but this was quite a memorable experience in realizing how the search engine didn't work in that it didn't bring me to what I wanted in the first query.