Forum Moderators: Robert Charlton & goodroi
The pages in Wikipedia do say "(Redirected from #*$!)" but that seems to be some internal redirection in their server software, because the browser address still shows the old one.
So Wikipedia eats up two lines, when they really only deserve one, in Google. If I tried this, they'd say I was making doorway pages, no?
Here are some terms to search in Google to see this:
<Sorry, no specific search terms.
See Forum Charter [webmasterworld.com]>
[edited by: tedster at 9:12 pm (utc) on July 18, 2006]
It's terrible and dumb, but as long as Google refuses to index destination pages, it will be like that.
Google does not understand Wikipedia's redirects properly
as long as Google refuses to index destination pages, it will be like that
<rant>
I run a niche site and our pages are all hand-written and very "on target". We have experts who volunter to write our pages, and an editor makes sure the writing is professional. Yet if you look up certain keywords in G, wikipedia will be at the top on Google, no matter how useless the results. I have pages and pages of insanely relevant URLs that are on page 3 or 7 or 12 of G, and totally crap wikpedia pages are on page 1, either the top or second listing.
</rant>
And why should the snippets be different? Is Wikipedia serving Google a different page than what we see when they spider? If you click each link you get an identical page, so why would Google summarize each differently?
And why should the snippets be different?
As mcavic said above, dupe pages should be filtered out in the SERPs, but this at least explains why one of these pages isn't. Google's dupe filter checks the title and the snippet. If they're the same, they're considered duplicates and one will not be served on the same results page as the other.
So in this case, since the snippets are different the pages aren't filtered.
Why the snippets are different is a very good question.
Bot 1 rips through the site and will index anything and everything. This bot is smart enough to follow sort queries and all sorts of other stuff that will cause issues.
Bot 2 comes around (at a later time) and does a comparison for dup content. It now has to determine which of the dup content to keep. Which one it keeps seems to be related to the number of inbound links and/or PR the page has.
I think in Wiki's case, you'll see duplicate listings appear sporadically. It takes a bit of time for Googlebot to process all of that data and "do the right thing".
I think that's why we see such wild fluctuations in the page counts when doing site: searches. Google is "continually" merging and purging. ;)
In referece to case issues, Google's smart and understands that there could be a case sensitive URI structure. So, it ends up indexing both upper and lower case versions but will eventually purge one of them, usually the upper case version unless of course you are case sensitive.
This is where harnessing the bot comes into play. Preventing the indexing of sort queries, case issues, anything that should NOT be getting indexed and/or followed. There are all sorts of ways to implement these strategies too. You gotta be careful though! ;)
Think of it this way, let's say that you only had one chance of getting Googlebot to index your site once a month. And, let's say that there was a limit on the number of pages it would index. Wouldn't you want to make sure that the bot was not bouncing around your site generating sort queries, etc?
I have pages and pages of insanely relevant URLs that are on page 3 or 7 or 12 of G, and totally crap wikpedia pages are on page 1, either the top or second listing.
Google as many users searching for the page is ultimately unable to detect if the words written are from an expert or not. The very fact that you are searching for an answer makes most users not an expert in a subject obviously.
Now that idiotic assumption that searchers are the ones able to judge by providing linking brought WP up where it is now.
Even if G has changed it's algo now ... old wisdom installed something like peer review, which is still flawed by nepotism, but I guess better than an algorithm programmed by a few.