Welcome to WebmasterWorld Guest from

Message Too Old, No Replies

Duplicate Content Observation

Some sites are losing ALL of their relevant pages

7:05 pm on Sep 29, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member caveman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 17, 2003
votes: 0

We've just done a whole bunch of analysis on the dup issues with G, and I wish to post an observation about just one aspect of the current problems:

The fact that even within a single site, when pages are deemed too similar, G is not throwing out the dups - they're throwing out ALL the similar pages.

The result of this miscalculation is that high quality pages from leading/authoritative sites, some that also act as hubs, are lost in the SERP's. In most cases, these pages are not actually penalized or pushed into the Supplemental index. They are simply dampened so badly that they no longer appear anywhere in the SERP's.

The current problem is actually not new IMHO. It began surfacing on or about Dec 15 or 16 of last year. At that time, the best page for the query simply seemed to take a 5-10 spot drop in the SERP's...enough to kill most traffic to the page, but at least the page was still in the SERP's. If there were previously indented listings, those were dropped way down.

From early Feb through about mid March, the situation was corrected and the best pages for specific queries were again elevated to higher rankings. When indented listings were involved however, the indented listing seemed now to be less relevant than was the case pre-Dec.

In mid March to about mid May, the situation worsened again, approximately to the problems witnessed in mid Dec., i.e., the most relevant pages dropped 5-10 spots, indents vanished as was the case in Dec.

But the most serious aspect of the problem began in mid May, when G started dropping even the best page for the query out of the visible SERP's.

A few days ago, the problem worsened, going deeper into the ranks of high quality, authoritative sites. This added fuel to what has become the longest non-update thread [webmasterworld.com] I've ever seen.

Why This is Such a Problem
The short answer is, that a lot of very useful, relevant pages, are now not being featured. I'm not talking about just downgraded. They're nowhere.

Now, I'm sure that there are sites that deserved the loss of these vanished pages. But there are plenty of others whose absense is simply hurting the SERP's. There is a difference between indexing the world's information, and making it available after all.

Hypothetical Example

We help a client with a scientific site about insects (not really, but the example is highly analogous). Let's discuss this hypothetical site's hypothetical section about bees. Bees are after all very useful little creatures. :-)

There are many types of bees. And then there are regional differences in those types of bees, and different kinds of bees within each type and regional variation (worker, queen, etc). Now, if you research bees, and want to search on a certain type of bee - and in particular a worker bee from the species that does its work in a certain region of the world, then you'ld like to find the page on that specific bee.

Well, you used to be able to find that page, near the top of the SERP's, when searching for it.

Then in mid Dec, you could find it, but only somewhere in the lower part of the top 20 results.

Now, G is not showing any pages on bees from that site. Ergghh.

What is an Affected Site To Do?
One option, presumably, would be to stop allowing the robots to index the lesser pages that are 'causing' the SE's to drop ALL the related pages. But this is a disservice to the user, especially in an era when GG has gone on record as taking pride in delivering especially relevant results, and especially for longer tail terms.

Should we noindex all the bee subpages, so that at least searchers can find SOME page on bees from this site? (I'm assuming that noindexing or nofollowing the 'dup' pages that are not really 'dup' pages at all would nonetheless free the one remaining page on the topic to resurface; perhaps a bad assumption.)

In any case, I refuse. Talk about rigging sites simply for the purpose of ranking. That's exactly what we're NOT supposed to be doing.

G needs to sort this out. ;-)

Note: Posters, please limit comments to the specific issues outlined in this thread. There are a lot of dup issues out there right now. This is just one of them.

4:54 pm on Nov 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Dec 7, 2003
votes: 0

Most of the <titles> are 99% similiar - for new sections i'm developing i will use "full" titles at Menu level to draw traffic, but make Page titles unique to avoid duplicated data.
This isn't as informative for users, but avoids problems.
5:15 pm on Nov 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member annej is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Feb 17, 2002
votes: 0

I just checked both my sites with site:mydomain.com and found my larger site seems to have no listings without the www. The only supplementary results are pages I've removed.

But the small site that got hit then recovered during Bourbon has several pages listed without the www and all are now supplemental.

What I can't understand is that I did the same fixes for each. I set up a 301 redirect and went through the sites to be sure I always used www and the end /.

Hopefully Jagger 3 will fix the problem otherwise I guess it will be back to the drawing board.

5:15 pm on Nov 2, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member caveman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Apr 17, 2003
votes: 0

Guys, there are sites where one particular subpage template is used thousands of times, with no ill effect. Be careful here to not run off changing lots of things unnecessarily. It's all to do with the amount of content that is perceived to be unique across the templated pages.
9:53 am on Nov 4, 2005 (gmt 0)

Full Member

10+ Year Member

joined:Nov 12, 2003
votes: 0

Canonical URL problem was causing dupe content for me - that has been half fixed this morning on G.
This 154 message thread spans 6 pages: 154