Forum Moderators: open
Is this a bug? Is someone else seeing similar things?
Also, I don't think that this is a reasonable behaviour:
1. companyXY is a trademark. Most of the results shown have no right to use it. However, these pages are shown while all results from companyXY are removed.
2. It doesn't make sense for the user to shown all results except those from www.companyxy.com for the normal search and just results from www.companyxy.com when repeating the search with 'omitted results included'. (There are no results 'included' - they are replaced)
The reason that this word is triggering a filter might be that people are bidding on this keyword on Google Adwords.
the page may has focused upon "term1 term2" so exactly that it doesn't have enough relevancy to deal with "term1 also term2" or "term1 domain term2" - remember google doesn't give much weighting to the order of words as you type them in.
if you add a 3rd word to your search in place of the domain name, do you see the same result?
ranking highly for "term1 term2" does not indicate a high ranking for "term1 term2 term3"
it doesn't have enough relevancy to deal with "term1 also term2" or "term1 domain term2" - remember google doesn't give much weighting to the order of words as you type them in.ranking highly for "term1 term2" does not indicate a high ranking for "term1 term2 term3"
It's not a ranking problem because the pages are completely removed from the SERPs (there are only a few results left for these combination). Moreover, the problem is independent from the order of the words. Also - as already mentioned - the only relevant result for a search which contains (the trademark) companyXY is www.companyXY.com. Of course, the page isn't optimized for companyXY but it appears several times (title, description, text).
if you add a 3rd word to your search in place of the domain name, do you see the same result?
I tried this for several different words. In most of the cases, pages from that domain are shown at the top of the results. But there are also a few results where the pages are completely removed. In the latter case 'repeating the search with the omitted results included' leads always to the strange behaviour described above.
Google is trying (and, considering the difficulties, is surprisingly good at) eliminating "similar results" even from other domains. One search I do every now and then eliminates 90% of the top 100 results, causing content from the second primary source to first appear in searches in position 11 rather than position 116.
The first source can't be said to be harmed, as it dominates page 1 anyway; the second and third sources must surely approve of this.
As an aside, the eliminated results are split about equally into (1) pages on the same domain, wildly different, but containing the same author name; and (2) pages, each on a separate domain, that contain a near BUT NOT EXACT copy of a page on yet another domain.
[edited by: hutcheson at 4:32 pm (utc) on May 4, 2004]
The page is still in the search results! It's just not shown when there is a higher-ranking page in the same domain, or a VERY SIMILAR higher-ranking page on another domain. So if you're counting on SMC product descriptions, or your hotel-now hotel promotional blurbs, to get picked up by Google -- count again. With seven million results, people aren't going to be saying "Oh, the first three million aren't enough, I'll look at the others."
Put another way, it's not a "mom-and-pop" filter, it's a plagiarism filter.
If Google kept first-indexed-on (date/time) data for pages, it would be able to determine to a reasonably high level of reliability which pages are original and which are copies. Attempting to filter out duplicates without such data is guaranteed to fail frequently (and result in original pages being filtered out).
Again, as a concept, this is not rocket science but it seems to be beyond the algo designers at the Plex.
Kaled.
It sounds to me like the results from that domain are "considered similar" to one (or more) of the results from another domain.
This sounds reasonable (and there are cases where the original content is filtered out) but doesn't seem to fit for this case for the following reasons:
- I couldn't find similar results
- even if the first (top) result from this domain would be considered as similar there are numerous other (unique) results
- the pages from companyXY are the original pages and most of them have high PR
- if they are considered as original content for 'word1 word2' it would be strange to see these pages as duplicate content for 'word1 word2 companyXY' (assuming that some similar pages exists)
"Similar" pages [as in "show similar pages" in the search results] won't necessarily be found by the same search term, but generally seem to be connected by hyperlinks and perhaps by similarity of vocabulary. The omitted "very similar pages" can be on any domain, but seem to always have very similar page text. But all the examples I have are pretty extreme -- nearly all of the page text is identical.