homepage Welcome to WebmasterWorld Guest from 23.20.63.27
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 142 message thread spans 5 pages: 142 ( [1] 2 3 4 5 > >     
WordPress And Google: Avoiding Duplicate Content Issues
What about posts in few different categories?
manca




msg:3097708
 3:50 pm on Sep 26, 2006 (gmt 0)

Hey guys,

I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.

What do you think, is this duplicate content, or not? How does Google treat such a behaviour?

Any clues?

THanks,
Manca

 

mrhazelj




msg:3098585
 5:01 am on Sep 27, 2006 (gmt 0)

nope not from what i see on my blogs. if it can't find the page (like when i changed the names of some of them), it will go supplemental. other than that, i wouldn't worry about that dup penalty or supplemental stuff.

manca




msg:3098811
 10:36 am on Sep 27, 2006 (gmt 0)

but you have same content on main page (index.php), the same content on category page (category.php, or if rewritten domain.com/category/name), the same content on archive page (domain.com/2006/09/ for example), the same content on single post page (domain.com/2006/09/name_of-post/)...don't you think that's all duplicated content?

lammert




msg:3099052
 2:27 pm on Sep 27, 2006 (gmt 0)

It is duplicate content, and this can sometimes get you in trouble. I have a WordPress based blog on which I post about once each day. I use search engine friendly URLs, and I have a calendar in the sidebar where people can select posts of a specific date. Because for every day there is often just one post, the result is that often almost two identical pages are accesible for every post I make, i.e. /blog/2006/09/27/ and /blog/2006/09/27/here-the-subject/. The only difference between these two pages is that the date page has no title by default and the post page has a title.

I ran into huge duplicate problems because of this about a year ago and many pages went supplemental. I resolved the issue by putting an on-the-fly generated robot meta "noindex,follow" in the date pages, and category pages. The indexable version is the post itself, which has a proper title to display in the SERPs and is therefore the most likely to be clicked. After this auto-generation of robot meta tags all supplementals eventually disappeared and rankings increased.

manca




msg:3099289
 5:17 pm on Sep 27, 2006 (gmt 0)

Umm, nice idea lammert... I may try inserting that on category and archive pages...but what about main page, and post page?
THere is also the same content displayed there, right?

Ma2T




msg:3099829
 12:46 am on Sep 28, 2006 (gmt 0)

I was thinking about making a very similar thread.

I run a number of wordpress blogs, and recently my site got hit by google, and I believe it's because of the duplicate content.

I have about 20 categories, and often post articles in at least two of these categories.

So there is:
1) The post
2) The Index
3) Category 1
4) Category 2 (maybe more categories)
5) The Monthly Archive.

That is a LOT of duplicate content. 5x or more.

I now try to post my articles in as few categories as possible, and have blocked google from my monthly archives, (As it's easier to navigate using the categories).

I'm not sure what else we can do to minimise dup content. Maybe add a "noindex" tag to the pages on the index (after page one of course)

cheesehead2




msg:3099832
 12:48 am on Sep 28, 2006 (gmt 0)

This works well. It only allows the index, pages, and posts to be indexed.


<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>

Ma2T




msg:3099844
 12:58 am on Sep 28, 2006 (gmt 0)

The above code will also allow all index pages to be indexed, ie page2/ page3 etc.

If you only want the first page to be indexed, add in

<?php if ( is_home() ) {?>
<?php if ( $paged < 2 ) {?>
code...
<?php }?>
<?php }?>

On thinking about this more, my posts, and categories are most import to me, and not so much page2, page3, page4, etc from the index.

I feel it would be best for me to allow indexing of posts, categories, and the front index page only.

What do you think?

===

Im not sure how some large blogs can get away with posting in many sections, AND using the tagging system. Some posts have about 8 tags, leading to many many duplicate pages!

[edited by: Ma2T at 12:59 am (utc) on Sep. 28, 2006]

Marcia




msg:3099852
 1:12 am on Sep 28, 2006 (gmt 0)

I'm going daft trying to figure out how to deal with the duplicates.

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>

Where does that code go? Which file(s) and where?

[edited by: Marcia at 1:40 am (utc) on Sep. 28, 2006]

Ma2T




msg:3099856
 1:17 am on Sep 28, 2006 (gmt 0)

In header.php, in the <head> section

manca




msg:3099858
 1:24 am on Sep 28, 2006 (gmt 0)

Yeah, I think using NOINDEX meta is by far the best way to have Gbot non indexing pages with dup content.
I have still allow it to index main page, all pages and single post pages, though...it should be a big problem I think...categories and archives are the ones that have to be excluded from indexing, because they contain the same stuff all the time, while main page and "sub pages" (page1, page2, page3, etc), have different content... for example if you update daily you'll see dif. content on those pages all the time...
I am sure big G loves fresh and unique content updated requlary, hence I think it's still ok to have main page and sub pages available to be indexed...

Just my 2 cents. I've already done that. Let's see the results :)

Manca

kektex




msg:3099875
 1:34 am on Sep 28, 2006 (gmt 0)

I´ve also been having problems with google indexing the /post/feed page instead of the /post.
For some reason both versions are in supplemental.Dunno if it might be because of dup content issues.
Read the thread:
[webmasterworld.com...]

Ma2T




msg:3099878
 1:37 am on Sep 28, 2006 (gmt 0)

I agree with you manca, and good luck.

I think we first have to make a choice and use one of two systems, Date based (Monthly archives), or categories. For me categories are very important, so I will go with these rather than the dates.

Also I think that categories are more important to me than say, page 4 and page 5 of my main site. (Also we link to categories from every page, and we don't link to page 5 from every page)

I think this is my final answer for my situation.

Allow:
Main Index page, Articles, Categories.

Disallow:
Page 2, 3,5 etc from the index, monthly archives.

Marcia




msg:3099887
 1:44 am on Sep 28, 2006 (gmt 0)

When trying parent and child categories on a test blog recently, I tried categories like so

/parent/child1/child2/

The same posts get archived in all of them up the tree, as many as there are. If an entry is for the /child2/ it ends up in all of them.

Any ideas on how to handle that?

manca




msg:3099889
 1:46 am on Sep 28, 2006 (gmt 0)

Not a bad advice Ma2T...
I'll try to think something up. Probably you're right about categories, they are very important and it would be bad not having them indexed, on the other hand older pages (2,3,4...) are not that important, as they are actually only numbers in URI and don't have a lot links point to them, thus their rankings will be low, anyways.

Pretty good thinking ;) Thanks for giving me some clues. I was blind definitely.

Ma2T




msg:3099899
 1:50 am on Sep 28, 2006 (gmt 0)

Marcia. You have just made me realise another problem. I have about 15 categories under one categories.. Thats even more duplicates under the main category :/

The more parents the worse it is I guess.

You can add a "noindex" to certain categories

Tags:

is_category('6')
When the archive page for Category 6 is being displayed.
is_category('Cheeses')
When the archive page for the Category with Name "Cheeses" is being displayed.

Eg:
<?php if ( is_category('6') ) {?>
code..
<?php }?>

Now its time to add this to my blog!

--

No problem manca, im glad I could help, im still giving this some thought to work out the best way. I agree with you on the whole page number thing also, good thought. It's going to be hard to eliminate all duplicate content, but hopefully it won't be too much of a problem.

[edited by: Ma2T at 1:55 am (utc) on Sep. 28, 2006]

Ma2T




msg:3099907
 2:02 am on Sep 28, 2006 (gmt 0)

Does anyone know some google behaviour?

If I add a "noindex" to site.com/category/ (only that page)

Would it stop site.com/category/article-name/ from being indexed?.. This page would not include the "noindex" tag.

I'm just wondering if Google would pass this restriction down the rest of the folder?

I'm hoping not.. I assume it wouldn't, but I would like some confirmation if possible.

manca




msg:3099908
 2:02 am on Sep 28, 2006 (gmt 0)

Actually, there will be a lot child categories...Just recall additional pages for certian categories...for example category/name/page/2

damn...

Dead_Elvis




msg:3099942
 3:05 am on Sep 28, 2006 (gmt 0)

"noindex" is a page attribute, so no, it wouldn't cause a problem with other pages deeper in your site–unless of course you use nofollow as well.

If you block it in Robots.txt then that would be a different story ;)

I had this same duplicate content problem, but it didn't become a problem until I got some serious link juice, which caused Google to finally deep-crawl my site, and hence find all of those category pages.

I managed to fix it via robots.txt and meta noindex tags.

Works like a dream, but may take a while for Google to sort it out once you make the changes.

I now only allow indexing of my index page, and my post pages. Everything else is blocked.

[edited by: Dead_Elvis at 3:06 am (utc) on Sep. 28, 2006]

vik_c




msg:3099956
 3:21 am on Sep 28, 2006 (gmt 0)

With the millions of blogs and the huge popularity of WordPress as a platform, one would think Google would already be taking these issues into consideration.

victorP




msg:3100038
 5:00 am on Sep 28, 2006 (gmt 0)

I tried the code in my header.php (inside the head tag) and I keep getting a parse error.. any ideas why?

thnx

smells so good




msg:3100064
 5:58 am on Sep 28, 2006 (gmt 0)

This forum breaks the pipe (¦), you need to replace that with a single pipe. Check those single quote marks too.

I'm not sure how concerned I should be with this since my WP is displayed in an iframe, however I made some of the modifications. The container page has been sitting at noindex for a while and this was changed today. We'll see how the iframe gets handled from here.

manca




msg:3100158
 8:55 am on Sep 28, 2006 (gmt 0)

Well, I assume we have to wait for Google and see what's going to happen.

g1smd




msg:3100449
 2:08 pm on Sep 28, 2006 (gmt 0)

Nice to see this thread about WordPress here at WebmasterWorld.

These are the same sorts of issues that I have been banging on about with forums, such as vBulletin, for the last year or two.

If you herd the bot into indexing what you want to be indexed and restrict all the alternative URLs you will not see any Supplemental Results for your site.

If you are already indexed, it will take a year for the supplemental results to fade out, but you will notice other improvements within a month or so of making the changes.

MrSpeed




msg:3100808
 6:19 pm on Sep 28, 2006 (gmt 0)

I have heard Google handles wordpress out of the box with no problem. I can't speak from experience since all my blogs are small.

Matt Cutts uses Wordpress. Search for the the character that Matt dressed up as last year. Matt seems to rank ok for that term.

You spammed my index...prepare to die!

victorP




msg:3100958
 8:31 pm on Sep 28, 2006 (gmt 0)

smells so good-

thanks alot for you help! I was pulling my hair out for a few hour over this! :0)

manca




msg:3101014
 9:09 pm on Sep 28, 2006 (gmt 0)

So, what are we going to do now? Filter additional pages, filter archives and leave main page, categories and single post ready for INDEX?

I actually did that, we'll be waiting for results...

g1smd




msg:3101026
 9:16 pm on Sep 28, 2006 (gmt 0)

Don't forget that if anything is Supplemental that it will hang around in the SERPs for many many months. That doesn't mean that the fix isn't working. It just means that Google hangs on to Supplemental results for a long time.

Your measure of success is in seeing how well the URLs that you do want to be indexed are doing.

manca




msg:3101100
 10:01 pm on Sep 28, 2006 (gmt 0)

Yeah,
thanks for that info g1smd.
I just hope I won't get supp. results, cause as of now I just have probably one or two pages in supp. index of 100 indexed.
And also another question I'd like to ask here would be about pages.
Namely, when I search for domain.com/page it doesn't appear in 1st place in SERPs, I dunno why's that?

And another thing I noticed about those pages, is that google indexed both domain.com/page and domain.com/page/ but interesting both of them aren't in supplemental index. Very weird...I don't get this.
What do you recommend me do with those pages? Should I interlink them as page/ or page, cause they are not actual directories, but as you know mod_rewritten dynamic urls.

Got clues?

g1smd




msg:3101109
 10:10 pm on Sep 28, 2006 (gmt 0)

Well, that is two different URLs leading to the exact same content, so that is classic "duplicate content".

Get your .htaccess file to rewrite one form to the other, and issue the "301" for the original one. That will cure it.

Which one, and which way, is up to you...

.

Don't worry about any URLs that appear as Supplemental Results after they have been turned into redirects. That is normal. Google hangs on to URLs that return a 301 or a 404 for one year after they start doing so.

They do NOT count as Duplicate Content if their HTTP code is 301 or 404. They will get cleaned up soon enough.

Your measure of success is in seeing that the URLs that you do want to be indexed do get indexed, and that they are no longer tagged as Supplemental, perhaps a few weeks after the fixes are put in place.

Again, you need to look into why a URL is Supplemental, only if that URL returns a "200 OK" response.

This 142 message thread spans 5 pages: 142 ( [1] 2 3 4 5 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved