Welcome to WebmasterWorld Guest from 54.167.83.224

Message Too Old, No Replies

WordPress And Google: Avoiding Duplicate Content Issues

What about posts in few different categories?

     
3:50 pm on Sep 26, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


Hey guys,

I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.

What do you think, is this duplicate content, or not? How does Google treat such a behaviour?

Any clues?

THanks,
Manca

5:01 am on Sept 27, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 13, 2005
posts:52
votes: 0


nope not from what i see on my blogs. if it can't find the page (like when i changed the names of some of them), it will go supplemental. other than that, i wouldn't worry about that dup penalty or supplemental stuff.
10:36 am on Sept 27, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


but you have same content on main page (index.php), the same content on category page (category.php, or if rewritten domain.com/category/name), the same content on archive page (domain.com/2006/09/ for example), the same content on single post page (domain.com/2006/09/name_of-post/)...don't you think that's all duplicated content?
2:27 pm on Sept 27, 2006 (gmt 0)

Senior Member from KZ 

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 10, 2005
posts:2886
votes: 1


It is duplicate content, and this can sometimes get you in trouble. I have a WordPress based blog on which I post about once each day. I use search engine friendly URLs, and I have a calendar in the sidebar where people can select posts of a specific date. Because for every day there is often just one post, the result is that often almost two identical pages are accesible for every post I make, i.e. /blog/2006/09/27/ and /blog/2006/09/27/here-the-subject/. The only difference between these two pages is that the date page has no title by default and the post page has a title.

I ran into huge duplicate problems because of this about a year ago and many pages went supplemental. I resolved the issue by putting an on-the-fly generated robot meta "noindex,follow" in the date pages, and category pages. The indexable version is the post itself, which has a proper title to display in the SERPs and is therefore the most likely to be clicked. After this auto-generation of robot meta tags all supplementals eventually disappeared and rankings increased.

5:17 pm on Sept 27, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


Umm, nice idea lammert... I may try inserting that on category and archive pages...but what about main page, and post page?
THere is also the same content displayed there, right?
12:46 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2006
posts:76
votes: 0


I was thinking about making a very similar thread.

I run a number of wordpress blogs, and recently my site got hit by google, and I believe it's because of the duplicate content.

I have about 20 categories, and often post articles in at least two of these categories.

So there is:
1) The post
2) The Index
3) Category 1
4) Category 2 (maybe more categories)
5) The Monthly Archive.

That is a LOT of duplicate content. 5x or more.

I now try to post my articles in as few categories as possible, and have blocked google from my monthly archives, (As it's easier to navigate using the categories).

I'm not sure what else we can do to minimise dup content. Maybe add a "noindex" tag to the pages on the index (after page one of course)

12:48 am on Sept 28, 2006 (gmt 0)

New User

5+ Year Member

joined:June 28, 2006
posts:4
votes: 0


This works well. It only allows the index, pages, and posts to be indexed.


<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>
12:58 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2006
posts:76
votes: 0


The above code will also allow all index pages to be indexed, ie page2/ page3 etc.

If you only want the first page to be indexed, add in

<?php if ( is_home() ) {?>
<?php if ( $paged < 2 ) {?>
code...
<?php }?>
<?php }?>

On thinking about this more, my posts, and categories are most import to me, and not so much page2, page3, page4, etc from the index.

I feel it would be best for me to allow indexing of posts, categories, and the front index page only.

What do you think?

===

Im not sure how some large blogs can get away with posting in many sections, AND using the tagging system. Some posts have about 8 tags, leading to many many duplicate pages!

[edited by: Ma2T at 12:59 am (utc) on Sep. 28, 2006]

1:12 am on Sept 28, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 29, 2000
posts:12095
votes: 0


I'm going daft trying to figure out how to deal with the duplicates.

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>

Where does that code go? Which file(s) and where?

[edited by: Marcia at 1:40 am (utc) on Sep. 28, 2006]

1:17 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2006
posts:76
votes: 0


In header.php, in the <head> section
1:24 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


Yeah, I think using NOINDEX meta is by far the best way to have Gbot non indexing pages with dup content.
I have still allow it to index main page, all pages and single post pages, though...it should be a big problem I think...categories and archives are the ones that have to be excluded from indexing, because they contain the same stuff all the time, while main page and "sub pages" (page1, page2, page3, etc), have different content... for example if you update daily you'll see dif. content on those pages all the time...
I am sure big G loves fresh and unique content updated requlary, hence I think it's still ok to have main page and sub pages available to be indexed...

Just my 2 cents. I've already done that. Let's see the results :)

Manca

1:34 am on Sept 28, 2006 (gmt 0)

New User

10+ Year Member

joined:Sept 1, 2005
posts:14
votes: 0


I´ve also been having problems with google indexing the /post/feed page instead of the /post.
For some reason both versions are in supplemental.Dunno if it might be because of dup content issues.
Read the thread:
[webmasterworld.com...]
1:37 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2006
posts:76
votes: 0


I agree with you manca, and good luck.

I think we first have to make a choice and use one of two systems, Date based (Monthly archives), or categories. For me categories are very important, so I will go with these rather than the dates.

Also I think that categories are more important to me than say, page 4 and page 5 of my main site. (Also we link to categories from every page, and we don't link to page 5 from every page)

I think this is my final answer for my situation.

Allow:
Main Index page, Articles, Categories.

Disallow:
Page 2, 3,5 etc from the index, monthly archives.

1:44 am on Sept 28, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 29, 2000
posts:12095
votes: 0


When trying parent and child categories on a test blog recently, I tried categories like so

/parent/child1/child2/

The same posts get archived in all of them up the tree, as many as there are. If an entry is for the /child2/ it ends up in all of them.

Any ideas on how to handle that?

1:46 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


Not a bad advice Ma2T...
I'll try to think something up. Probably you're right about categories, they are very important and it would be bad not having them indexed, on the other hand older pages (2,3,4...) are not that important, as they are actually only numbers in URI and don't have a lot links point to them, thus their rankings will be low, anyways.

Pretty good thinking ;) Thanks for giving me some clues. I was blind definitely.

1:50 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2006
posts:76
votes: 0


Marcia. You have just made me realise another problem. I have about 15 categories under one categories.. Thats even more duplicates under the main category :/

The more parents the worse it is I guess.

You can add a "noindex" to certain categories

Tags:

is_category('6')
When the archive page for Category 6 is being displayed.
is_category('Cheeses')
When the archive page for the Category with Name "Cheeses" is being displayed.

Eg:
<?php if ( is_category('6') ) {?>
code..
<?php }?>

Now its time to add this to my blog!

--

No problem manca, im glad I could help, im still giving this some thought to work out the best way. I agree with you on the whole page number thing also, good thought. It's going to be hard to eliminate all duplicate content, but hopefully it won't be too much of a problem.

[edited by: Ma2T at 1:55 am (utc) on Sep. 28, 2006]

2:02 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 30, 2006
posts:76
votes: 0


Does anyone know some google behaviour?

If I add a "noindex" to site.com/category/ (only that page)

Would it stop site.com/category/article-name/ from being indexed?.. This page would not include the "noindex" tag.

I'm just wondering if Google would pass this restriction down the rest of the folder?

I'm hoping not.. I assume it wouldn't, but I would like some confirmation if possible.

2:02 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


Actually, there will be a lot child categories...Just recall additional pages for certian categories...for example category/name/page/2

damn...

3:05 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 17, 2006
posts:61
votes: 0


"noindex" is a page attribute, so no, it wouldn't cause a problem with other pages deeper in your site–unless of course you use nofollow as well.

If you block it in Robots.txt then that would be a different story ;)

I had this same duplicate content problem, but it didn't become a problem until I got some serious link juice, which caused Google to finally deep-crawl my site, and hence find all of those category pages.

I managed to fix it via robots.txt and meta noindex tags.

Works like a dream, but may take a while for Google to sort it out once you make the changes.

I now only allow indexing of my index page, and my post pages. Everything else is blocked.

[edited by: Dead_Elvis at 3:06 am (utc) on Sep. 28, 2006]

3:21 am on Sept 28, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:Aug 13, 2002
posts:408
votes: 0


With the millions of blogs and the huge popularity of WordPress as a platform, one would think Google would already be taking these issues into consideration.
5:00 am on Sept 28, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:June 18, 2005
posts:49
votes: 0


I tried the code in my header.php (inside the head tag) and I keep getting a parse error.. any ideas why?

thnx

5:58 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:Aug 1, 2006
posts:112
votes: 0


This forum breaks the pipe (¦), you need to replace that with a single pipe. Check those single quote marks too.

I'm not sure how concerned I should be with this since my WP is displayed in an iframe, however I made some of the modifications. The container page has been sitting at noindex for a while and this was changed today. We'll see how the iframe gets handled from here.

8:55 am on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


Well, I assume we have to wait for Google and see what's going to happen.
2:08 pm on Sept 28, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Nice to see this thread about WordPress here at WebmasterWorld.

These are the same sorts of issues that I have been banging on about with forums, such as vBulletin, for the last year or two.

If you herd the bot into indexing what you want to be indexed and restrict all the alternative URLs you will not see any Supplemental Results for your site.

If you are already indexed, it will take a year for the supplemental results to fade out, but you will notice other improvements within a month or so of making the changes.

6:19 pm on Sept 28, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 16, 2003
posts:746
votes: 0


I have heard Google handles wordpress out of the box with no problem. I can't speak from experience since all my blogs are small.

Matt Cutts uses Wordpress. Search for the the character that Matt dressed up as last year. Matt seems to rank ok for that term.

You spammed my index...prepare to die!

8:31 pm on Sept 28, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:June 18, 2005
posts:49
votes: 0


smells so good-

thanks alot for you help! I was pulling my hair out for a few hour over this! :0)

9:09 pm on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


So, what are we going to do now? Filter additional pages, filter archives and leave main page, categories and single post ready for INDEX?

I actually did that, we'll be waiting for results...

9:16 pm on Sept 28, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Don't forget that if anything is Supplemental that it will hang around in the SERPs for many many months. That doesn't mean that the fix isn't working. It just means that Google hangs on to Supplemental results for a long time.

Your measure of success is in seeing how well the URLs that you do want to be indexed are doing.

10:01 pm on Sept 28, 2006 (gmt 0)

Junior Member

5+ Year Member

joined:June 14, 2006
posts:107
votes: 0


Yeah,
thanks for that info g1smd.
I just hope I won't get supp. results, cause as of now I just have probably one or two pages in supp. index of 100 indexed.
And also another question I'd like to ask here would be about pages.
Namely, when I search for domain.com/page it doesn't appear in 1st place in SERPs, I dunno why's that?

And another thing I noticed about those pages, is that google indexed both domain.com/page and domain.com/page/ but interesting both of them aren't in supplemental index. Very weird...I don't get this.
What do you recommend me do with those pages? Should I interlink them as page/ or page, cause they are not actual directories, but as you know mod_rewritten dynamic urls.

Got clues?

10:10 pm on Sept 28, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Well, that is two different URLs leading to the exact same content, so that is classic "duplicate content".

Get your .htaccess file to rewrite one form to the other, and issue the "301" for the original one. That will cure it.

Which one, and which way, is up to you...

.

Don't worry about any URLs that appear as Supplemental Results after they have been turned into redirects. That is normal. Google hangs on to URLs that return a 301 or a 404 for one year after they start doing so.

They do NOT count as Duplicate Content if their HTTP code is 301 or 404. They will get cleaned up soon enough.

Your measure of success is in seeing that the URLs that you do want to be indexed do get indexed, and that they are no longer tagged as Supplemental, perhaps a few weeks after the fixes are put in place.

Again, you need to look into why a URL is Supplemental, only if that URL returns a "200 OK" response.

This 142 message thread spans 5 pages: 142
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members