WordPress And Google: Avoiding Duplicate Content Issues

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

WordPress And Google: Avoiding Duplicate Content Issues

What about posts in few different categories?

manca

3:50 pm on Sep 26, 2006 (gmt 0)

Hey guys,

I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.

What do you think, is this duplicate content, or not? How does Google treat such a behaviour?

Any clues?

THanks,
Manca

mindaugas13

2:19 pm on Oct 20, 2006 (gmt 0)

graywolf's solution probably works well because of the "more" tag. might be a good way to go all around.

laertes

2:39 pm on Oct 20, 2006 (gmt 0)

Could someone explain please how the "More Tag" solution works for those not overly familiar with WP?

john5000

7:36 pm on Oct 20, 2006 (gmt 0)

laertes,

This might help:

[codex.wordpress.org...]

graywolf

5:35 pm on Oct 21, 2006 (gmt 0)

Basically the "more" tag allows you to decide where the dividing line for what to show on the home and category pages is. If you write 4 paragraphs put the "more" tag at the end of the first. This way the remaining content (majority) is only on the individual page and not duplicated anywhere else.

As I see it the individual pages are the "money pages". There used to be a great illustration on this page [searchengineworld.com...] but it seems to have gone AWOL

triumph

1:18 am on Oct 22, 2006 (gmt 0)

Basically the "more" tag allows you to decide where the dividing line for what to show on the home and category pages is. If you write 4 paragraphs put the "more" tag at the end of the first. This way the remaining content (majority) is only on the individual page and not duplicated anywhere else.

Graywolf, thanks for the tip.

Do you find most of your search results in Google go directly to an individual post, or do the related categories show up in the index?

For example, for some of my keywords I get the category indexed as a result and for other keywords the individual post gets indexed.

graywolf

3:09 am on Oct 22, 2006 (gmt 0)

The individual post shows up in the SERP's and is where all the traffic goes. Category listing show sometimes but mostly only as a second result.

victorP

3:48 am on Oct 22, 2006 (gmt 0)

the "more" tag has solved alot of my problems.. I can report an increase in SE traffic after I went through all my posts and broke them up...

Tomseys

4:24 am on Oct 22, 2006 (gmt 0)

Great thread. The more tag sounds like the best solution. Thanks for the information!

triumph

2:20 pm on Oct 22, 2006 (gmt 0)

The individual post shows up in the SERP's and is where all the traffic goes. Category listing show sometimes but mostly only as a second result.

Do you think this is a result of the "more tag" in categories?

ALbino

4:44 pm on Oct 22, 2006 (gmt 0)

I don't use Wordpress, or blog, but it seems to me that the comments on the individual pages would make the overall content "different" than the main posts. Is this not the case?

triumph

7:13 pm on Oct 22, 2006 (gmt 0)

For those of you who have too many posts (like myself) to shorten everything by hand.. try this plugin [guff.szub.net ]

triumph

7:30 pm on Oct 22, 2006 (gmt 0)

Will there be a duplicate content issue if the main page displays the full post?

iProgram

2:40 am on Oct 23, 2006 (gmt 0)

The more tag works only for new posts, isn't it? It could not solve the problem for archived posts. So the noindex meta tag is a better idea for me.

Does noindex meta tag also block Mediapartners-Google bot?

optimierung

8:53 am on Oct 24, 2006 (gmt 0)

g1smd

Include all the stuff for Googlebot in that section because if there is a User-agent: Googlebot section, then Google totally ignores the User-agent: * section of the robots.txt file.

Just read your remark: I cannot confirm your statement; I have seen that Googlebot has always taken into account the * section if there is also a Googlebot section.

Or are there other conditions I do not know?

tedster

10:00 am on Oct 24, 2006 (gmt 0)

Here's a recent reference thread on that -- with confirmation from GoogleGuy. This was news to a lot of us!

Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]

traffik daddy

12:32 pm on Oct 24, 2006 (gmt 0)

I'm also getting the same, I have tried different ways of implementing the PHP code in my header.php but still get a parse error.

<?php if(is_home() �� is_single() �� is_page()){
echo �<meta name="robots" content="index,follow">�;
} else {
echo �<meta name="robots" content="noindex,follow">�;
}?>

Is there anything else I need to change apart from the �� to get it working? :(

traffik daddy

12:33 pm on Oct 24, 2006 (gmt 0)

I'm also getting the same, I have tried different ways of implementing the PHP code in my header.php but still get a parse error.

<?php if(is_home() �� is_single() �� is_page()){
echo �<meta name="robots" content="index,follow">�;
} else {
echo �<meta name="robots" content="noindex,follow">�;
}?>

Is there anything else I need to change apart from the �� to get it working? :(

optimierung

1:28 pm on Oct 24, 2006 (gmt 0)

tedster

Why Google Might "Ignore" a robots.txt Disallow Rule

Many thanks for referencing that brain washing robots.txt thread! For many years I have seen the specifications the other way round!

BUT just now I am checking one simple and very specific example (disallowing only one page of a site) which does not fit to the "new rules": Only mentioned in the * section the page should be indexed by G; in earlier times SITE: showed only the URL as usual but now the page is not listed at all (all other pages of the site are indexed).

manca

3:55 pm on Oct 24, 2006 (gmt 0)

traffik daddy,
you do have parse error indeed cuz of not escaping "".
Try this:

<?php if(is_home() �� is_single() �� is_page()){
echo "<meta name=\"robots\" content=\"index,follow\">";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\">";
}?>

Chef_Brian

1:05 pm on Oct 25, 2006 (gmt 0)

Yep, I am loving this thread since moving to wordpress about 15 months ago. Speaking of the "more" tag, I tried using this for a while but found that by simply using "the excerpt" instead of "the content" on category pages you can even do better than the more tag.

Every time I post to any of my blogs now I always create a unique except for each post. In doing this I get a "snippet" of content that only goes to one category page and a full "page" of content that goes to the permalink page. Of course I only select one category per post. In doing this it really splits up the content nicely.

Like others I would love some recommendations however on how to keep google bot from indexing my various rss feeds.

Any ideas?

Brian

kektex

11:05 pm on Oct 25, 2006 (gmt 0)

Like others I would love some recommendations however on how to keep google bot from indexing my various rss feeds.

I am having the same problem with google indexing the www.example.com/post/feed url and showing that instead of the actual post in the SERPS.All supplemental of course.
Last month I dissallowed */feed and */trackbak in robots.txt. Let�s see how that works.

john5000

3:00 am on Oct 26, 2006 (gmt 0)

One option to avoid those feed pages (which look very strange to the average person) from being indexed is redirecting all feeds to feedburner (or a similar service).

You can then configure feedburner to just use a summary for each post. The way I have things set up is feedburner uses either the first 250 words of a post, or if I've manually entered a meta description in the exerpt field (with help from head-meta description plugin) then the description/excerpt gets shown in the feed.

Unfortunately, I believe feedburner pages can still be indexed, and partial feeds may be unwanted for some publishers. But atleast the feed has clickable links back to your website and I would imagine GOOG and other SEs might recognize feedburner feeds as feeds, and not penalize for duplicate content (any thoughts on this?).

I thought about just adding <meta name="robots" content="noindex,follow"> to the top of all the wordpress feed template files, but then I read on one of google's Q/A pages that this might interfere with indexing the site in google's blog search.

kektex

5:13 pm on Oct 26, 2006 (gmt 0)

I thought about just adding <meta name="robots" content="noindex,follow"> to the top of all the wordpress feed template files, but then I read on one of google's Q/A pages that this might interfere with indexing the site in google's blog search.

So you think it might be a bad idea to disallow */feed pages in robots.txt?
I�m very troubled by this problem. Around 80% of my google hits are coming in to the /post/feed pages.

john5000

5:59 pm on Oct 26, 2006 (gmt 0)

Hi kektex, yah i've noticed a lot of feed pages showing up when I do searches these days.

Here's what I read from google:

What if I don't want to be listed [In Google Blog Search]?
If you do not publish a site feed for your blog, it will not be included in Blog Search. However, if you previously published a site feed that was included, the old posts will remain in the index, even though new ones are not added.
Blog Search will also respect robots.txt files and NOINDEX, NOFOLLOW meta tags, as described here.
[google.com...]
I dont know if "disallow" is treated differently from "noindex". I also don't know how these tags would effect other blog search engines like technorati.
I'm still learning all this stuff on the fly. Hopefully some others can chime in on the feed issue :)

Kangol

7:11 pm on Oct 26, 2006 (gmt 0)

"I am having the same problem with google indexing the www.example.com/post/feed url and showing that instead of the actual post in the SERPS."

Yeap the disallowing the feeds is a tricky one. I've manage only to add a nofollow to the links that point to feeds.

Here goes:
edit /wp-includes/feed-functions.php
find function comments_rss_link
add the nofollow: echo "<a rel='nofollow' href='$url'>$link_text</a>";

The only problem is that WP will pars doublequotes as singlequotes.

Lovejoy

10:33 pm on Oct 27, 2006 (gmt 0)

All all,

I know this sounds simplistic, but what about simply moving blog posts to pages after a day or two, that way you wouldn't have duplicate articles listed.

tgbob

12:31 pm on Oct 30, 2006 (gmt 0)

i never had anything go supplemental, but i did have most of my pages listed as "similar pages" when i did a "site:". after implementing these changes:

Things I've done:
- Basic 301 from all non-www pages to www pages
- Same 301 if the page is called without / at the end to redirect it to the same page with / in the end.
- An unique meta desc. tags using head-meta plugin
- Testing if page is archive or page>1 and adding noindex,follow meta tag
- Optimized title for page, so it looks: "Name of Post: Name of blog" for perm. posts and "Name of blog" for main page. Also for categories: "Name of category Category: Name of blog" (unique titles)

things are all moving out of the similar pages section into pages of their own right. nothing is supplemental. in fact, it's all looking really good now.

Gumball Monkey

5:41 am on Nov 9, 2006 (gmt 0)

For anyone looking for the 301 code, this is how to modify the WordPress .htaccess code:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST}!^www.yoursite.com$ [NC]
RewriteRule ^(.*)$ [yoursite.com...] [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule . /index.php [L]
</IfModule>

Replace 'yoursite.com' with your own URL - obviously! ;)

victorP

7:58 am on Nov 9, 2006 (gmt 0)

Manca-

Just wondering if you would mind posting the final code you decided to use?

g1smd

12:10 pm on Nov 9, 2006 (gmt 0)

be aware that the forum eats a space before an exclamation mark, so the corrected code is actually:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.yoursite\.com$ [NC]
RewriteRule ^(.*)$ http://www.yoursite.com/$1 [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>

This 142 message thread spans 5 pages: 142