Welcome to WebmasterWorld Guest from 54.167.175.107

WordPress And Google: Avoiding Duplicate Content Issues

What about posts in few different categories?

   
3:50 pm on Sep 26, 2006 (gmt 0)

5+ Year Member



Hey guys,

I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.

What do you think, is this duplicate content, or not? How does Google treat such a behaviour?

Any clues?

THanks,
Manca

5:01 am on Sep 27, 2006 (gmt 0)

10+ Year Member



nope not from what i see on my blogs. if it can't find the page (like when i changed the names of some of them), it will go supplemental. other than that, i wouldn't worry about that dup penalty or supplemental stuff.
10:36 am on Sep 27, 2006 (gmt 0)

5+ Year Member



but you have same content on main page (index.php), the same content on category page (category.php, or if rewritten domain.com/category/name), the same content on archive page (domain.com/2006/09/ for example), the same content on single post page (domain.com/2006/09/name_of-post/)...don't you think that's all duplicated content?
2:27 pm on Sep 27, 2006 (gmt 0)

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



It is duplicate content, and this can sometimes get you in trouble. I have a WordPress based blog on which I post about once each day. I use search engine friendly URLs, and I have a calendar in the sidebar where people can select posts of a specific date. Because for every day there is often just one post, the result is that often almost two identical pages are accesible for every post I make, i.e. /blog/2006/09/27/ and /blog/2006/09/27/here-the-subject/. The only difference between these two pages is that the date page has no title by default and the post page has a title.

I ran into huge duplicate problems because of this about a year ago and many pages went supplemental. I resolved the issue by putting an on-the-fly generated robot meta "noindex,follow" in the date pages, and category pages. The indexable version is the post itself, which has a proper title to display in the SERPs and is therefore the most likely to be clicked. After this auto-generation of robot meta tags all supplementals eventually disappeared and rankings increased.

5:17 pm on Sep 27, 2006 (gmt 0)

5+ Year Member



Umm, nice idea lammert... I may try inserting that on category and archive pages...but what about main page, and post page?
THere is also the same content displayed there, right?
12:46 am on Sep 28, 2006 (gmt 0)

5+ Year Member



I was thinking about making a very similar thread.

I run a number of wordpress blogs, and recently my site got hit by google, and I believe it's because of the duplicate content.

I have about 20 categories, and often post articles in at least two of these categories.

So there is:
1) The post
2) The Index
3) Category 1
4) Category 2 (maybe more categories)
5) The Monthly Archive.

That is a LOT of duplicate content. 5x or more.

I now try to post my articles in as few categories as possible, and have blocked google from my monthly archives, (As it's easier to navigate using the categories).

I'm not sure what else we can do to minimise dup content. Maybe add a "noindex" tag to the pages on the index (after page one of course)

12:48 am on Sep 28, 2006 (gmt 0)

5+ Year Member



This works well. It only allows the index, pages, and posts to be indexed.


<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>
12:58 am on Sep 28, 2006 (gmt 0)

5+ Year Member



The above code will also allow all index pages to be indexed, ie page2/ page3 etc.

If you only want the first page to be indexed, add in

<?php if ( is_home() ) {?>
<?php if ( $paged < 2 ) {?>
code...
<?php }?>
<?php }?>

On thinking about this more, my posts, and categories are most import to me, and not so much page2, page3, page4, etc from the index.

I feel it would be best for me to allow indexing of posts, categories, and the front index page only.

What do you think?

===

Im not sure how some large blogs can get away with posting in many sections, AND using the tagging system. Some posts have about 8 tags, leading to many many duplicate pages!

[edited by: Ma2T at 12:59 am (utc) on Sep. 28, 2006]

1:12 am on Sep 28, 2006 (gmt 0)

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I'm going daft trying to figure out how to deal with the duplicates.

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>

Where does that code go? Which file(s) and where?

[edited by: Marcia at 1:40 am (utc) on Sep. 28, 2006]

1:17 am on Sep 28, 2006 (gmt 0)

5+ Year Member



In header.php, in the <head> section
1:24 am on Sep 28, 2006 (gmt 0)

5+ Year Member



Yeah, I think using NOINDEX meta is by far the best way to have Gbot non indexing pages with dup content.
I have still allow it to index main page, all pages and single post pages, though...it should be a big problem I think...categories and archives are the ones that have to be excluded from indexing, because they contain the same stuff all the time, while main page and "sub pages" (page1, page2, page3, etc), have different content... for example if you update daily you'll see dif. content on those pages all the time...
I am sure big G loves fresh and unique content updated requlary, hence I think it's still ok to have main page and sub pages available to be indexed...

Just my 2 cents. I've already done that. Let's see the results :)

Manca

1:34 am on Sep 28, 2006 (gmt 0)

5+ Year Member



I´ve also been having problems with google indexing the /post/feed page instead of the /post.
For some reason both versions are in supplemental.Dunno if it might be because of dup content issues.
Read the thread:
[webmasterworld.com...]
1:37 am on Sep 28, 2006 (gmt 0)

5+ Year Member



I agree with you manca, and good luck.

I think we first have to make a choice and use one of two systems, Date based (Monthly archives), or categories. For me categories are very important, so I will go with these rather than the dates.

Also I think that categories are more important to me than say, page 4 and page 5 of my main site. (Also we link to categories from every page, and we don't link to page 5 from every page)

I think this is my final answer for my situation.

Allow:
Main Index page, Articles, Categories.

Disallow:
Page 2, 3,5 etc from the index, monthly archives.

1:44 am on Sep 28, 2006 (gmt 0)

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member



When trying parent and child categories on a test blog recently, I tried categories like so

/parent/child1/child2/

The same posts get archived in all of them up the tree, as many as there are. If an entry is for the /child2/ it ends up in all of them.

Any ideas on how to handle that?

1:46 am on Sep 28, 2006 (gmt 0)

5+ Year Member



Not a bad advice Ma2T...
I'll try to think something up. Probably you're right about categories, they are very important and it would be bad not having them indexed, on the other hand older pages (2,3,4...) are not that important, as they are actually only numbers in URI and don't have a lot links point to them, thus their rankings will be low, anyways.

Pretty good thinking ;) Thanks for giving me some clues. I was blind definitely.

1:50 am on Sep 28, 2006 (gmt 0)

5+ Year Member



Marcia. You have just made me realise another problem. I have about 15 categories under one categories.. Thats even more duplicates under the main category :/

The more parents the worse it is I guess.

You can add a "noindex" to certain categories

Tags:

is_category('6')
When the archive page for Category 6 is being displayed.
is_category('Cheeses')
When the archive page for the Category with Name "Cheeses" is being displayed.

Eg:
<?php if ( is_category('6') ) {?>
code..
<?php }?>

Now its time to add this to my blog!

--

No problem manca, im glad I could help, im still giving this some thought to work out the best way. I agree with you on the whole page number thing also, good thought. It's going to be hard to eliminate all duplicate content, but hopefully it won't be too much of a problem.

[edited by: Ma2T at 1:55 am (utc) on Sep. 28, 2006]

2:02 am on Sep 28, 2006 (gmt 0)

5+ Year Member



Does anyone know some google behaviour?

If I add a "noindex" to site.com/category/ (only that page)

Would it stop site.com/category/article-name/ from being indexed?.. This page would not include the "noindex" tag.

I'm just wondering if Google would pass this restriction down the rest of the folder?

I'm hoping not.. I assume it wouldn't, but I would like some confirmation if possible.

2:02 am on Sep 28, 2006 (gmt 0)

5+ Year Member



Actually, there will be a lot child categories...Just recall additional pages for certian categories...for example category/name/page/2

damn...

3:05 am on Sep 28, 2006 (gmt 0)

5+ Year Member



"noindex" is a page attribute, so no, it wouldn't cause a problem with other pages deeper in your site–unless of course you use nofollow as well.

If you block it in Robots.txt then that would be a different story ;)

I had this same duplicate content problem, but it didn't become a problem until I got some serious link juice, which caused Google to finally deep-crawl my site, and hence find all of those category pages.

I managed to fix it via robots.txt and meta noindex tags.

Works like a dream, but may take a while for Google to sort it out once you make the changes.

I now only allow indexing of my index page, and my post pages. Everything else is blocked.

[edited by: Dead_Elvis at 3:06 am (utc) on Sep. 28, 2006]

3:21 am on Sep 28, 2006 (gmt 0)

10+ Year Member



With the millions of blogs and the huge popularity of WordPress as a platform, one would think Google would already be taking these issues into consideration.
5:00 am on Sep 28, 2006 (gmt 0)

5+ Year Member



I tried the code in my header.php (inside the head tag) and I keep getting a parse error.. any ideas why?

thnx

5:58 am on Sep 28, 2006 (gmt 0)

5+ Year Member



This forum breaks the pipe (¦), you need to replace that with a single pipe. Check those single quote marks too.

I'm not sure how concerned I should be with this since my WP is displayed in an iframe, however I made some of the modifications. The container page has been sitting at noindex for a while and this was changed today. We'll see how the iframe gets handled from here.

8:55 am on Sep 28, 2006 (gmt 0)

5+ Year Member



Well, I assume we have to wait for Google and see what's going to happen.
2:08 pm on Sep 28, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Nice to see this thread about WordPress here at WebmasterWorld.

These are the same sorts of issues that I have been banging on about with forums, such as vBulletin, for the last year or two.

If you herd the bot into indexing what you want to be indexed and restrict all the alternative URLs you will not see any Supplemental Results for your site.

If you are already indexed, it will take a year for the supplemental results to fade out, but you will notice other improvements within a month or so of making the changes.

6:19 pm on Sep 28, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have heard Google handles wordpress out of the box with no problem. I can't speak from experience since all my blogs are small.

Matt Cutts uses Wordpress. Search for the the character that Matt dressed up as last year. Matt seems to rank ok for that term.

You spammed my index...prepare to die!

8:31 pm on Sep 28, 2006 (gmt 0)

5+ Year Member



smells so good-

thanks alot for you help! I was pulling my hair out for a few hour over this! :0)

9:09 pm on Sep 28, 2006 (gmt 0)

5+ Year Member



So, what are we going to do now? Filter additional pages, filter archives and leave main page, categories and single post ready for INDEX?

I actually did that, we'll be waiting for results...

9:16 pm on Sep 28, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Don't forget that if anything is Supplemental that it will hang around in the SERPs for many many months. That doesn't mean that the fix isn't working. It just means that Google hangs on to Supplemental results for a long time.

Your measure of success is in seeing how well the URLs that you do want to be indexed are doing.

10:01 pm on Sep 28, 2006 (gmt 0)

5+ Year Member



Yeah,
thanks for that info g1smd.
I just hope I won't get supp. results, cause as of now I just have probably one or two pages in supp. index of 100 indexed.
And also another question I'd like to ask here would be about pages.
Namely, when I search for domain.com/page it doesn't appear in 1st place in SERPs, I dunno why's that?

And another thing I noticed about those pages, is that google indexed both domain.com/page and domain.com/page/ but interesting both of them aren't in supplemental index. Very weird...I don't get this.
What do you recommend me do with those pages? Should I interlink them as page/ or page, cause they are not actual directories, but as you know mod_rewritten dynamic urls.

Got clues?

10:10 pm on Sep 28, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Well, that is two different URLs leading to the exact same content, so that is classic "duplicate content".

Get your .htaccess file to rewrite one form to the other, and issue the "301" for the original one. That will cure it.

Which one, and which way, is up to you...

.

Don't worry about any URLs that appear as Supplemental Results after they have been turned into redirects. That is normal. Google hangs on to URLs that return a 301 or a 404 for one year after they start doing so.

They do NOT count as Duplicate Content if their HTTP code is 301 or 404. They will get cleaned up soon enough.

Your measure of success is in seeing that the URLs that you do want to be indexed do get indexed, and that they are no longer tagged as Supplemental, perhaps a few weeks after the fixes are put in place.

Again, you need to look into why a URL is Supplemental, only if that URL returns a "200 OK" response.

This 142 message thread spans 5 pages: 142
 

Featured Threads

Hot Threads This Week

Hot Threads This Month