Forum software duplicate content issues and Panda

10:10 pm on Jun 3, 2012 (gmt 0)

Good evening guys.

One of my websites uses Invision power board as it's forum software.

There is a fundamental flaw that was introduced in V3. Each page of a thread is seen as a completely new thread, not a page of a thread. Some threads can have 20+ pages. GWT reports each page as a duplicate title and meta.

Furthermore searching Google for an exact match for a topic, often returns pages 1 and 2 of the thread but Google sees them as entirely different threads. I know this because where it mentions posts and participants, it only takes them in to account for the page in question, not the entire thread as it did previously and as all other forum software does.

So for example page 1 might say 20 posts by 7 participants and page 2: 6 posts by 2 participants.

I've submitted the bug, along with my proposed fixes, which IPB have agreed is a problem and they're going to address, but this may take weeks/months.

During the discussion in their forums, it emerged that some sites have been hit by Panda as a result.

I want to create my own short term fix. What I want to do, is noindex, follow pages 2+

On the same note, I want to noindex the profiles, as most members don't fill them in and they're largely duplicated.

Does noindex work with Panda? I can't obviously delete the members or page 2+, but I don't want to leave the duplicate content there, for it to hurt me in the future.

Thanks a lot.
6:52 am on Jun 4, 2012 (gmt 0)

Duplicate title would be solved by adding " - Page n" to the end of the title.

There's a more insidious problem with forum software that is never addressed. When new content appears on page 1, what was on page 1 is now on page 2, and what was on page 2 is now on page 3.

There's two problems.
- Page 2 is seen as a duplicate of page 1 and page 3 is seen as a duplicate of page 2 until all pages have been respidered and reindexed.
- For a given search, page 2 is listed in search results and some text is shown in the snippet. However, since indexing there's been dozens of new posts or dozens of new threads. When I click the link in SERP, the page I am taken to no longer contains the content shown in the snippet. It's now several pages away, and there's no clue how many pages away it might be.
9:59 am on Jun 4, 2012 (gmt 0)

g1smd, new content gets added to page 2 in all forum softwares that I've used. i've never seen it as you've just mentioned.

Although the thread index will work in the way you've described, the thread index would never be what ranks in the SERPS. It's just an index of sorts, similar to the way a Wordpress blog would work. Except with a forum, new posts to a thread, cause that thread to be bumped to the top of the list.

The forum threads do have "- Page n" appended to them, but they're still being flagged in GWT. But the second page, isn't being seen as page 2, just as a brand new thread. The issue mainly lays in their URL structure.

Personally, page 2+ should have Page n first, proceeded by the title of the thread. To essentially un-optimise the second page, to prevent it competing with the first.

But lets say I noindex 500,000 low quality profiles, is this enough for Panda? Or am I supposed to physically delete my members profiles? or any "thin" content for that matter.
12:54 am on Jun 5, 2012 (gmt 0)

I cannot find an official word on this. Is noindex enough to remove "duplicate" content to recover from Panda? I realise that alone may not be enough to fully recover, but what I mean is, is noindex treated similar to returning a 404?
8:18 pm on Jun 5, 2012 (gmt 0)

I've just found 56 copies of my homepage, in Google's index. The joys of using a CMS with bugs.

I am guessing 56 copies of my homepage, is going to be a giant problem with Panda?

This website survived Panda 100%, until we updated our CMS that introduced these crazy problems and then boom, we were hit on the next update.

I didn't study Panda like I should have, largely because I wasn't effected and didn't expect to be.

Hopefully fixing all these issues, will fix things.
7:47 am on Jun 6, 2012 (gmt 0)

