Welcome to WebmasterWorld Guest from 54.227.231.144

Message Too Old, No Replies

WordPress And Google: Avoiding Duplicate Content Issues

What about posts in few different categories?

   
3:50 pm on Sep 26, 2006 (gmt 0)

5+ Year Member



Hey guys,

I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.

What do you think, is this duplicate content, or not? How does Google treat such a behaviour?

Any clues?

THanks,
Manca

2:19 pm on Oct 20, 2006 (gmt 0)

10+ Year Member



graywolf's solution probably works well because of the "more" tag. might be a good way to go all around.
2:39 pm on Oct 20, 2006 (gmt 0)

5+ Year Member



Could someone explain please how the "More Tag" solution works for those not overly familiar with WP?
7:36 pm on Oct 20, 2006 (gmt 0)

5+ Year Member



laertes,

This might help:

[codex.wordpress.org...]

5:35 pm on Oct 21, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Basically the "more" tag allows you to decide where the dividing line for what to show on the home and category pages is. If you write 4 paragraphs put the "more" tag at the end of the first. This way the remaining content (majority) is only on the individual page and not duplicated anywhere else.

As I see it the individual pages are the "money pages". There used to be a great illustration on this page [searchengineworld.com...] but it seems to have gone AWOL

1:18 am on Oct 22, 2006 (gmt 0)

10+ Year Member



Basically the "more" tag allows you to decide where the dividing line for what to show on the home and category pages is. If you write 4 paragraphs put the "more" tag at the end of the first. This way the remaining content (majority) is only on the individual page and not duplicated anywhere else.

Graywolf, thanks for the tip.

Do you find most of your search results in Google go directly to an individual post, or do the related categories show up in the index?

For example, for some of my keywords I get the category indexed as a result and for other keywords the individual post gets indexed.

3:09 am on Oct 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The individual post shows up in the SERP's and is where all the traffic goes. Category listing show sometimes but mostly only as a second result.
3:48 am on Oct 22, 2006 (gmt 0)

5+ Year Member



the "more" tag has solved alot of my problems.. I can report an increase in SE traffic after I went through all my posts and broke them up...
4:24 am on Oct 22, 2006 (gmt 0)

10+ Year Member



Great thread. The more tag sounds like the best solution. Thanks for the information!
2:20 pm on Oct 22, 2006 (gmt 0)

10+ Year Member



The individual post shows up in the SERP's and is where all the traffic goes. Category listing show sometimes but mostly only as a second result.

Do you think this is a result of the "more tag" in categories?

4:44 pm on Oct 22, 2006 (gmt 0)

10+ Year Member



I don't use Wordpress, or blog, but it seems to me that the comments on the individual pages would make the overall content "different" than the main posts. Is this not the case?
7:13 pm on Oct 22, 2006 (gmt 0)

10+ Year Member



For those of you who have too many posts (like myself) to shorten everything by hand.. try this plugin [guff.szub.net ]
7:30 pm on Oct 22, 2006 (gmt 0)

10+ Year Member



Will there be a duplicate content issue if the main page displays the full post?
2:40 am on Oct 23, 2006 (gmt 0)

10+ Year Member



The more tag works only for new posts, isn't it? It could not solve the problem for archived posts. So the noindex meta tag is a better idea for me.

Does noindex meta tag also block Mediapartners-Google bot?

8:53 am on Oct 24, 2006 (gmt 0)

5+ Year Member



g1smd
Include all the stuff for Googlebot in that section because if there is a User-agent: Googlebot section, then Google totally ignores the User-agent: * section of the robots.txt file.

Just read your remark: I cannot confirm your statement; I have seen that Googlebot has always taken into account the * section if there is also a Googlebot section.

Or are there other conditions I do not know?

10:00 am on Oct 24, 2006 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Here's a recent reference thread on that -- with confirmation from GoogleGuy. This was news to a lot of us!

Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]

12:32 pm on Oct 24, 2006 (gmt 0)

10+ Year Member



I'm also getting the same, I have tried different ways of implementing the PHP code in my header.php but still get a parse error.

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>

Is there anything else I need to change apart from the ¦¦ to get it working? :(

TD

12:33 pm on Oct 24, 2006 (gmt 0)

10+ Year Member



I'm also getting the same, I have tried different ways of implementing the PHP code in my header.php but still get a parse error.

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>

Is there anything else I need to change apart from the ¦¦ to get it working? :(

TD

1:28 pm on Oct 24, 2006 (gmt 0)

5+ Year Member



tedster
Why Google Might "Ignore" a robots.txt Disallow Rule

Many thanks for referencing that brain washing robots.txt thread! For many years I have seen the specifications the other way round!

BUT just now I am checking one simple and very specific example (disallowing only one page of a site) which does not fit to the "new rules": Only mentioned in the * section the page should be indexed by G; in earlier times SITE: showed only the URL as usual but now the page is not listed at all (all other pages of the site are indexed).

3:55 pm on Oct 24, 2006 (gmt 0)

5+ Year Member



traffik daddy,
you do have parse error indeed cuz of not escaping "".
Try this:

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo "<meta name=\"robots\" content=\"index,follow\">";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\">";
}?>

1:05 pm on Oct 25, 2006 (gmt 0)

10+ Year Member



Yep, I am loving this thread since moving to wordpress about 15 months ago. Speaking of the "more" tag, I tried using this for a while but found that by simply using "the excerpt" instead of "the content" on category pages you can even do better than the more tag.

Every time I post to any of my blogs now I always create a unique except for each post. In doing this I get a "snippet" of content that only goes to one category page and a full "page" of content that goes to the permalink page. Of course I only select one category per post. In doing this it really splits up the content nicely.

Like others I would love some recommendations however on how to keep google bot from indexing my various rss feeds.

Any ideas?

Brian

11:05 pm on Oct 25, 2006 (gmt 0)

5+ Year Member



Like others I would love some recommendations however on how to keep google bot from indexing my various rss feeds.

I am having the same problem with google indexing the www.example.com/post/feed url and showing that instead of the actual post in the SERPS.All supplemental of course.
Last month I dissallowed */feed and */trackbak in robots.txt. Let΄s see how that works.
3:00 am on Oct 26, 2006 (gmt 0)

5+ Year Member



One option to avoid those feed pages (which look very strange to the average person) from being indexed is redirecting all feeds to feedburner (or a similar service).

You can then configure feedburner to just use a summary for each post. The way I have things set up is feedburner uses either the first 250 words of a post, or if I've manually entered a meta description in the exerpt field (with help from head-meta description plugin) then the description/excerpt gets shown in the feed.

Unfortunately, I believe feedburner pages can still be indexed, and partial feeds may be unwanted for some publishers. But atleast the feed has clickable links back to your website and I would imagine GOOG and other SEs might recognize feedburner feeds as feeds, and not penalize for duplicate content (any thoughts on this?).

I thought about just adding <meta name="robots" content="noindex,follow"> to the top of all the wordpress feed template files, but then I read on one of google's Q/A pages that this might interfere with indexing the site in google's blog search.

5:13 pm on Oct 26, 2006 (gmt 0)

5+ Year Member



I thought about just adding <meta name="robots" content="noindex,follow"> to the top of all the wordpress feed template files, but then I read on one of google's Q/A pages that this might interfere with indexing the site in google's blog search.

So you think it might be a bad idea to disallow */feed pages in robots.txt?
I΄m very troubled by this problem. Around 80% of my google hits are coming in to the /post/feed pages.

5:59 pm on Oct 26, 2006 (gmt 0)

5+ Year Member



Hi kektex, yah i've noticed a lot of feed pages showing up when I do searches these days.

Here's what I read from google:

What if I don't want to be listed [In Google Blog Search]?

If you do not publish a site feed for your blog, it will not be included in Blog Search. However, if you previously published a site feed that was included, the old posts will remain in the index, even though new ones are not added.

Blog Search will also respect robots.txt files and NOINDEX, NOFOLLOW meta tags, as described here.
[google.com...]

I dont know if "disallow" is treated differently from "noindex". I also don't know how these tags would effect other blog search engines like technorati.

I'm still learning all this stuff on the fly. Hopefully some others can chime in on the feed issue :)

7:11 pm on Oct 26, 2006 (gmt 0)

10+ Year Member



"I am having the same problem with google indexing the www.example.com/post/feed url and showing that instead of the actual post in the SERPS."

Yeap the disallowing the feeds is a tricky one. I've manage only to add a nofollow to the links that point to feeds.

Here goes:
edit /wp-includes/feed-functions.php
find function comments_rss_link
add the nofollow: echo "<a rel='nofollow' href='$url'>$link_text</a>";

The only problem is that WP will pars doublequotes as singlequotes.

10:33 pm on Oct 27, 2006 (gmt 0)

10+ Year Member



All all,

I know this sounds simplistic, but what about simply moving blog posts to pages after a day or two, that way you wouldn't have duplicate articles listed.

12:31 pm on Oct 30, 2006 (gmt 0)

10+ Year Member



i never had anything go supplemental, but i did have most of my pages listed as "similar pages" when i did a "site:". after implementing these changes:

Things I've done:
- Basic 301 from all non-www pages to www pages
- Same 301 if the page is called without / at the end to redirect it to the same page with / in the end.
- An unique meta desc. tags using head-meta plugin
- Testing if page is archive or page>1 and adding noindex,follow meta tag
- Optimized title for page, so it looks: "Name of Post: Name of blog" for perm. posts and "Name of blog" for main page. Also for categories: "Name of category Category: Name of blog" (unique titles)

things are all moving out of the similar pages section into pages of their own right. nothing is supplemental. in fact, it's all looking really good now.

5:41 am on Nov 9, 2006 (gmt 0)

5+ Year Member



For anyone looking for the 301 code, this is how to modify the WordPress .htaccess code:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST}!^www.yoursite.com$ [NC]
RewriteRule ^(.*)$ [yoursite.com...] [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule . /index.php [L]
</IfModule>

Replace 'yoursite.com' with your own URL - obviously! ;)

7:58 am on Nov 9, 2006 (gmt 0)

5+ Year Member



Manca-

Just wondering if you would mind posting the final code you decided to use?

12:10 pm on Nov 9, 2006 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



be aware that the forum eats a space before an exclamation mark, so the corrected code is actually:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST}
!^www\.yoursite\.com$ [NC]
RewriteRule ^(.*)$
http://www.yoursite.com/$1 [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME}
!-f
RewriteCond %{REQUEST_FILENAME}
!-d
RewriteRule . /index.php [L]
</IfModule>

This 142 message thread spans 5 pages: 142