Forum Moderators: open
- Key phrase in domain name / URLs
- Lots of internal links with key phrase in anchor text, ALT tags, etc
- Lots of low PR backlinks with good link text that don't get shown with the Google link: search. Try ATW.
- Their SEO is bad but the competitors are worse :-)
- They're using some nasty spam technique you haven't spotted
- They got lucky
I think the engineers at the Plex might be a little disheartened that you would think that there algo is that simple!
I would certainly look at the alt tags, structure of the site (should be terrible if no SEO), and lastly the incoming link text, whether high or low.
Originally Google was set up with two inverted indexes. One was the fancy index, the other was the plain index. The idea behind this was that many one-word searches could be satisfied by consulting the smaller fancy index, without the need for going on to the plain (full text) index.
I have seen many (too many) examples of pages ranking high when the keywords appear only in anchor text from backlinks, and do not appear at all on the page itself.
What Google has done in their war against spam, it seems to me, is to overhaul their fancy index so that instead of being based on scraping the most important words off of a page, now it's based on scraping the anchor text from backlinks. This makes sense for two reasons: 1) anchor text (especially if the backlink is external to the site) is more immune to spamming, compared to on-page features, and 2) the entire front end of their ranking process was tuned to PageRank, which was link based, and was precomputed irrespective of the content on the page. It would not be that difficult to compile a separate fancy inverted index based on anchor text in links, at the same time that PageRank is computed, since the overall architecture would not have to change that much.
What I think happens is that if they get a very close match based on anchor text from the fancy inverted index, then this is sufficient to satisfy the search query. If they cannot return enough such matches to fill the SERP page with links, then they go on to consult the main (full text) inverted index. That's why you see pages flying to the top of the SERPs based on close matches with anchor text, when the search terms do not even appear on the page.
The entire effort at Google is optimized for speed. The more precomputation you can do before consulting the full text of a page, the faster you can return results that are superficially ranked ("superficial" here means a ranking based on something less than full-text analysis). This is what made Google scalable, and this is the reason why they can handle 200 million queries a day.
In a sense, Google became so big so fast, that they are now a prisoner of their own efficiencies.
We've recently taken on a new (PR4) client who is now no. 3 in serps for a very competitive phrase without us having touched the site - yet. (No reference - yet - on the home page to the main phrase)
Achieved through use of optimised anchor text on other decent sites.
`
A good link from an important source with good keyword text does wonders! :)
If Google wants an accurate count of maximum page hits, they would have to add the fancy index hits to the plain index hits and then subtract all the same-page overlaps.
But I suspect it's getting very tricky to figure out the overlaps without incurring too much overhead. Remember, they have to come up with this count once per query! It looks to me like they aren't subtracting the overlaps anymore because it's too hard to compute, and they just hoped no one would notice. It probably only happens on the higher counts, which makes it less noticeable because no one is able to prove differently anyway.
Of course, webmasters who know exactly how many files are in each of their directories can figure out instantly that Google is miscounting. But for every webmaster who says Google is wrong, you have some dimwit media pundit starting out a column with a cheap lead sentence such as, "If you search in Google for blah, blah, you get XXX,000 hits...."
If this doesn't get fixed soon, then Google may have just decided it was easier to take a little flak from WebmasterWorld than to redesign a monster algorithm.
Thanks for these comments guys, obviously apart from FLaMiN
I just said "yes" as in.. i have some ideas ..
But I cant mention them on here, they are rude words like "doorways" and "cloaking" ..
Now that you mention the ODP listing tho, that just MIGHT be the reason. lol.
You'd really need to give the URL and let someone check it out, but those are rude words too "give me url"