homepage Welcome to WebmasterWorld Guest from 54.163.70.249
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 142 message thread spans 5 pages: < < 142 ( 1 2 3 [4] 5 > >     
WordPress And Google: Avoiding Duplicate Content Issues
What about posts in few different categories?
manca




msg:3097708
 3:50 pm on Sep 26, 2006 (gmt 0)

Hey guys,

I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.

What do you think, is this duplicate content, or not? How does Google treat such a behaviour?

Any clues?

THanks,
Manca

 

mindaugas13




msg:3128527
 2:19 pm on Oct 20, 2006 (gmt 0)

graywolf's solution probably works well because of the "more" tag. might be a good way to go all around.

laertes




msg:3128576
 2:39 pm on Oct 20, 2006 (gmt 0)

Could someone explain please how the "More Tag" solution works for those not overly familiar with WP?

john5000




msg:3129035
 7:36 pm on Oct 20, 2006 (gmt 0)

laertes,

This might help:

[codex.wordpress.org...]

graywolf




msg:3130026
 5:35 pm on Oct 21, 2006 (gmt 0)

Basically the "more" tag allows you to decide where the dividing line for what to show on the home and category pages is. If you write 4 paragraphs put the "more" tag at the end of the first. This way the remaining content (majority) is only on the individual page and not duplicated anywhere else.

As I see it the individual pages are the "money pages". There used to be a great illustration on this page [searchengineworld.com...] but it seems to have gone AWOL

triumph




msg:3130305
 1:18 am on Oct 22, 2006 (gmt 0)

Basically the "more" tag allows you to decide where the dividing line for what to show on the home and category pages is. If you write 4 paragraphs put the "more" tag at the end of the first. This way the remaining content (majority) is only on the individual page and not duplicated anywhere else.

Graywolf, thanks for the tip.

Do you find most of your search results in Google go directly to an individual post, or do the related categories show up in the index?

For example, for some of my keywords I get the category indexed as a result and for other keywords the individual post gets indexed.

graywolf




msg:3130375
 3:09 am on Oct 22, 2006 (gmt 0)

The individual post shows up in the SERP's and is where all the traffic goes. Category listing show sometimes but mostly only as a second result.

victorP




msg:3130412
 3:48 am on Oct 22, 2006 (gmt 0)

the "more" tag has solved alot of my problems.. I can report an increase in SE traffic after I went through all my posts and broke them up...

Tomseys




msg:3130437
 4:24 am on Oct 22, 2006 (gmt 0)

Great thread. The more tag sounds like the best solution. Thanks for the information!

triumph




msg:3130695
 2:20 pm on Oct 22, 2006 (gmt 0)

The individual post shows up in the SERP's and is where all the traffic goes. Category listing show sometimes but mostly only as a second result.

Do you think this is a result of the "more tag" in categories?

ALbino




msg:3130813
 4:44 pm on Oct 22, 2006 (gmt 0)

I don't use Wordpress, or blog, but it seems to me that the comments on the individual pages would make the overall content "different" than the main posts. Is this not the case?

triumph




msg:3130921
 7:13 pm on Oct 22, 2006 (gmt 0)

For those of you who have too many posts (like myself) to shorten everything by hand.. try this plugin [guff.szub.net ]

triumph




msg:3130935
 7:30 pm on Oct 22, 2006 (gmt 0)

Will there be a duplicate content issue if the main page displays the full post?

iProgram




msg:3131277
 2:40 am on Oct 23, 2006 (gmt 0)

The more tag works only for new posts, isn't it? It could not solve the problem for archived posts. So the noindex meta tag is a better idea for me.

Does noindex meta tag also block Mediapartners-Google bot?

optimierung




msg:3132737
 8:53 am on Oct 24, 2006 (gmt 0)

g1smd
Include all the stuff for Googlebot in that section because if there is a User-agent: Googlebot section, then Google totally ignores the User-agent: * section of the robots.txt file.

Just read your remark: I cannot confirm your statement; I have seen that Googlebot has always taken into account the * section if there is also a Googlebot section.

Or are there other conditions I do not know?

tedster




msg:3132770
 10:00 am on Oct 24, 2006 (gmt 0)

Here's a recent reference thread on that -- with confirmation from GoogleGuy. This was news to a lot of us!

Why Google Might "Ignore" a robots.txt Disallow Rule [webmasterworld.com]

traffik daddy




msg:3132883
 12:32 pm on Oct 24, 2006 (gmt 0)

I'm also getting the same, I have tried different ways of implementing the PHP code in my header.php but still get a parse error.

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>

Is there anything else I need to change apart from the ¦¦ to get it working? :(

TD

traffik daddy




msg:3132884
 12:33 pm on Oct 24, 2006 (gmt 0)

I'm also getting the same, I have tried different ways of implementing the PHP code in my header.php but still get a parse error.

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo ‘<meta name="robots" content="index,follow">’;
} else {
echo ‘<meta name="robots" content="noindex,follow">’;
}?>

Is there anything else I need to change apart from the ¦¦ to get it working? :(

TD

optimierung




msg:3132935
 1:28 pm on Oct 24, 2006 (gmt 0)

tedster
Why Google Might "Ignore" a robots.txt Disallow Rule

Many thanks for referencing that brain washing robots.txt thread! For many years I have seen the specifications the other way round!

BUT just now I am checking one simple and very specific example (disallowing only one page of a site) which does not fit to the "new rules": Only mentioned in the * section the page should be indexed by G; in earlier times SITE: showed only the URL as usual but now the page is not listed at all (all other pages of the site are indexed).

manca




msg:3133226
 3:55 pm on Oct 24, 2006 (gmt 0)

traffik daddy,
you do have parse error indeed cuz of not escaping "".
Try this:

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo "<meta name=\"robots\" content=\"index,follow\">";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\">";
}?>

Chef_Brian




msg:3134287
 1:05 pm on Oct 25, 2006 (gmt 0)

Yep, I am loving this thread since moving to wordpress about 15 months ago. Speaking of the "more" tag, I tried using this for a while but found that by simply using "the excerpt" instead of "the content" on category pages you can even do better than the more tag.

Every time I post to any of my blogs now I always create a unique except for each post. In doing this I get a "snippet" of content that only goes to one category page and a full "page" of content that goes to the permalink page. Of course I only select one category per post. In doing this it really splits up the content nicely.

Like others I would love some recommendations however on how to keep google bot from indexing my various rss feeds.

Any ideas?

Brian

kektex




msg:3135001
 11:05 pm on Oct 25, 2006 (gmt 0)

Like others I would love some recommendations however on how to keep google bot from indexing my various rss feeds.

I am having the same problem with google indexing the www.example.com/post/feed url and showing that instead of the actual post in the SERPS.All supplemental of course.
Last month I dissallowed */feed and */trackbak in robots.txt. Let΄s see how that works.

john5000




msg:3135181
 3:00 am on Oct 26, 2006 (gmt 0)

One option to avoid those feed pages (which look very strange to the average person) from being indexed is redirecting all feeds to feedburner (or a similar service).

You can then configure feedburner to just use a summary for each post. The way I have things set up is feedburner uses either the first 250 words of a post, or if I've manually entered a meta description in the exerpt field (with help from head-meta description plugin) then the description/excerpt gets shown in the feed.

Unfortunately, I believe feedburner pages can still be indexed, and partial feeds may be unwanted for some publishers. But atleast the feed has clickable links back to your website and I would imagine GOOG and other SEs might recognize feedburner feeds as feeds, and not penalize for duplicate content (any thoughts on this?).

I thought about just adding <meta name="robots" content="noindex,follow"> to the top of all the wordpress feed template files, but then I read on one of google's Q/A pages that this might interfere with indexing the site in google's blog search.

kektex




msg:3135791
 5:13 pm on Oct 26, 2006 (gmt 0)

I thought about just adding <meta name="robots" content="noindex,follow"> to the top of all the wordpress feed template files, but then I read on one of google's Q/A pages that this might interfere with indexing the site in google's blog search.

So you think it might be a bad idea to disallow */feed pages in robots.txt?
I΄m very troubled by this problem. Around 80% of my google hits are coming in to the /post/feed pages.

john5000




msg:3135840
 5:59 pm on Oct 26, 2006 (gmt 0)

Hi kektex, yah i've noticed a lot of feed pages showing up when I do searches these days.

Here's what I read from google:

What if I don't want to be listed [In Google Blog Search]?

If you do not publish a site feed for your blog, it will not be included in Blog Search. However, if you previously published a site feed that was included, the old posts will remain in the index, even though new ones are not added.

Blog Search will also respect robots.txt files and NOINDEX, NOFOLLOW meta tags, as described here.
[google.com...]

I dont know if "disallow" is treated differently from "noindex". I also don't know how these tags would effect other blog search engines like technorati.

I'm still learning all this stuff on the fly. Hopefully some others can chime in on the feed issue :)

Kangol




msg:3135919
 7:11 pm on Oct 26, 2006 (gmt 0)

"I am having the same problem with google indexing the www.example.com/post/feed url and showing that instead of the actual post in the SERPS."

Yeap the disallowing the feeds is a tricky one. I've manage only to add a nofollow to the links that point to feeds.

Here goes:
edit /wp-includes/feed-functions.php
find function comments_rss_link
add the nofollow: echo "<a rel='nofollow' href='$url'>$link_text</a>";

The only problem is that WP will pars doublequotes as singlequotes.

Lovejoy




msg:3137586
 10:33 pm on Oct 27, 2006 (gmt 0)

All all,

I know this sounds simplistic, but what about simply moving blog posts to pages after a day or two, that way you wouldn't have duplicate articles listed.

tgbob




msg:3139648
 12:31 pm on Oct 30, 2006 (gmt 0)

i never had anything go supplemental, but i did have most of my pages listed as "similar pages" when i did a "site:". after implementing these changes:

Things I've done:
- Basic 301 from all non-www pages to www pages
- Same 301 if the page is called without / at the end to redirect it to the same page with / in the end.
- An unique meta desc. tags using head-meta plugin
- Testing if page is archive or page>1 and adding noindex,follow meta tag
- Optimized title for page, so it looks: "Name of Post: Name of blog" for perm. posts and "Name of blog" for main page. Also for categories: "Name of category Category: Name of blog" (unique titles)

things are all moving out of the similar pages section into pages of their own right. nothing is supplemental. in fact, it's all looking really good now.

Gumball Monkey




msg:3151029
 5:41 am on Nov 9, 2006 (gmt 0)

For anyone looking for the 301 code, this is how to modify the WordPress .htaccess code:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST}!^www.yoursite.com$ [NC]
RewriteRule ^(.*)$ [yoursite.com...] [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule . /index.php [L]
</IfModule>

Replace 'yoursite.com' with your own URL - obviously! ;)

victorP




msg:3151115
 7:58 am on Nov 9, 2006 (gmt 0)

Manca-

Just wondering if you would mind posting the final code you decided to use?

g1smd




msg:3151241
 12:10 pm on Nov 9, 2006 (gmt 0)

be aware that the forum eats a space before an exclamation mark, so the corrected code is actually:

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{HTTP_HOST}
!^www\.yoursite\.com$ [NC]
RewriteRule ^(.*)$
http://www.yoursite.com/$1 [R,L]
RewriteBase /
RewriteCond %{REQUEST_FILENAME}
!-f
RewriteCond %{REQUEST_FILENAME}
!-d
RewriteRule . /index.php [L]
</IfModule>

skweb




msg:3151383
 3:13 pm on Nov 9, 2006 (gmt 0)

Our company has six blogs using WP - never tinkered with the settings. Indeed, only when one does a site: command, one can see RSS/trackback/email this/comments RSS feeds but generally not in an actual Google search.

I have been following the whole discussion on duplicate content - simply not an issue, even though we often assign multiple categories to each post. Google relies heavily on what you actually link to and what anchor text you use to serve the results.

So for all of you who believe that we are in business for our readers rather than bots, relax - just create good content and life will be good.

This 142 message thread spans 5 pages: < < 142 ( 1 2 3 [4] 5 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved