Wikipedia gets two listings for one term?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Wikipedia gets two listings for one term?

Capitalization buys Wikipedia an extra listing

joestern

8:47 pm on Jul 18, 2006 (gmt 0)

I'm not sure why, but Wikipedia seems to be getting pretty luck with Google. If you search for words that have unusual capitalization or are two-word-phrases, you often get two hits for Wikipedia. One is indented, and has a different snippet, but clicking on them brings up the same page in Wikipedia.

The pages in Wikipedia do say "(Redirected from #*$!)" but that seems to be some internal redirection in their server software, because the browser address still shows the old one.

So Wikipedia eats up two lines, when they really only deserve one, in Google. If I tried this, they'd say I was making doorway pages, no?

Here are some terms to search in Google to see this:

<Sorry, no specific search terms.
See Forum Charter [webmasterworld.com]>

[edited by: tedster at 9:12 pm (utc) on July 18, 2006]

joestern

9:33 pm on Jul 18, 2006 (gmt 0)

Sorry about that - I can't post them, but trying a few searches for two-word terms or for terms like company names with caps in the middle or all caps brings these examples up.

steveb

1:41 am on Jul 19, 2006 (gmt 0)

Yes, Google does not understand Wikipedia's redirects properly, so any page like
Word1_word2
that gets a redirect from
Word1_Word2
will get two listings, sometimes indented because they will both be on the same page, but other times one might be #17 and other might be #52.

It's terrible and dumb, but as long as Google refuses to index destination pages, it will be like that.

mcavic

5:38 am on Jul 19, 2006 (gmt 0)

Google does not understand Wikipedia's redirects properly

Wikipedia's redirects aren't redirects at all in the HTTP sense, and thus they do generate duplicate content. Essentially duplicate pages should be filtered out in normal SERPs, but it seems that in Wikipedia's case, not all of them are.

as long as Google refuses to index destination pages, it will be like that

I don't know what this means.

soapystar

7:36 am on Jul 19, 2006 (gmt 0)

this is a good example of how certain sites can get a way with almost anything while others have to worry over every last letter between two pages to aviod filters and supplemental hell...its the new way and one of the ways big brands have risen in BD..its simply because they are given an open cheque to spam....thus big brands now spam (by this i mean downright cheat) there way to the top...while this example is about duplicate content and not cheating the real point is the difference between what an authority site can do and a small site..

mr_lumpy

9:45 am on Jul 19, 2006 (gmt 0)

I have never grasped Wikipedia's prominence in Google. The other SE's don't seem to "love" wikipedia pages as much.

<rant>
I run a niche site and our pages are all hand-written and very "on target". We have experts who volunter to write our pages, and an editor makes sure the writing is professional. Yet if you look up certain keywords in G, wikipedia will be at the top on Google, no matter how useless the results. I have pages and pages of insanely relevant URLs that are on page 3 or 7 or 12 of G, and totally crap wikpedia pages are on page 1, either the top or second listing.
</rant>

joestern

4:37 pm on Jul 19, 2006 (gmt 0)

I don't at all think that Wikipedia's prominence is unwarranted. In fact, I use the site a lot. I just don't think they deserve TWO out of the top ten spots here.

And why should the snippets be different? Is Wikipedia serving Google a different page than what we see when they spider? If you click each link you get an identical page, so why would Google summarize each differently?

jimbeetle

5:02 pm on Jul 19, 2006 (gmt 0)

And why should the snippets be different?

As mcavic said above, dupe pages should be filtered out in the SERPs, but this at least explains why one of these pages isn't. Google's dupe filter checks the title and the snippet. If they're the same, they're considered duplicates and one will not be served on the same results page as the other.

So in this case, since the snippets are different the pages aren't filtered.

Why the snippets are different is a very good question.

BillyS

7:53 pm on Jul 19, 2006 (gmt 0)

>>I'm not sure why, but Wikipedia seems to be getting pretty luck with Google.

Wikipedia ranks for just about everything now. I've even seen pages that were incomplete / "holding spots" out rank a 1,300 word article I wrote on the same topic.

Shanada

2:16 pm on Jul 26, 2006 (gmt 0)

Yet isn't this all down to link strength? I mean, think of how many people link to Wikipedia, the vast majority of them are deep-links which Google likes...

Hugene

2:24 pm on Jul 26, 2006 (gmt 0)

Wikipedia is popular with everyone, even the engineers at google it seems.

If you have a 1000 word article on a widget that has an empty page on Wikipedia, why not write a quick article for Wikiedia for that widget and then link to your site at the end?

vincevincevince

3:01 pm on Jul 26, 2006 (gmt 0)

why not write a quick article for Wikiedia for that widget and then link to your site at the end?

Because your site link will be removed in due course and you won't be able to remove the article you created as it's now licensed to Wikipedia.

pageoneresults

3:18 pm on Jul 26, 2006 (gmt 0)

Here's the way I see Googlebot working on a large site like Wiki...

Bot 1 rips through the site and will index anything and everything. This bot is smart enough to follow sort queries and all sorts of other stuff that will cause issues.

Bot 2 comes around (at a later time) and does a comparison for dup content. It now has to determine which of the dup content to keep. Which one it keeps seems to be related to the number of inbound links and/or PR the page has.

I think in Wiki's case, you'll see duplicate listings appear sporadically. It takes a bit of time for Googlebot to process all of that data and "do the right thing".

I think that's why we see such wild fluctuations in the page counts when doing site: searches. Google is "continually" merging and purging. ;)

In referece to case issues, Google's smart and understands that there could be a case sensitive URI structure. So, it ends up indexing both upper and lower case versions but will eventually purge one of them, usually the upper case version unless of course you are case sensitive.

This is where harnessing the bot comes into play. Preventing the indexing of sort queries, case issues, anything that should NOT be getting indexed and/or followed. There are all sorts of ways to implement these strategies too. You gotta be careful though! ;)

Think of it this way, let's say that you only had one chance of getting Googlebot to index your site once a month. And, let's say that there was a limit on the number of pages it would index. Wouldn't you want to make sure that the bot was not bouncing around your site generating sort queries, etc?

mattg3

5:57 pm on Jul 26, 2006 (gmt 0)

I have pages and pages of insanely relevant URLs that are on page 3 or 7 or 12 of G, and totally crap wikpedia pages are on page 1, either the top or second listing.

Google as many users searching for the page is ultimately unable to detect if the words written are from an expert or not. The very fact that you are searching for an answer makes most users not an expert in a subject obviously.

Now that idiotic assumption that searchers are the ones able to judge by providing linking brought WP up where it is now.

Even if G has changed it's algo now ... old wisdom installed something like peer review, which is still flawed by nepotism, but I guess better than an algorithm programmed by a few.

joestern

5:24 pm on Jul 31, 2006 (gmt 0)

FIXED!

I looks like Google fixed this!

All of the search terms that I had on my list for this problem now show only one result!

Thanks Google!