homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 142 message thread spans 5 pages: < < 142 ( 1 2 [3] 4 5 > >     
WordPress And Google: Avoiding Duplicate Content Issues
What about posts in few different categories?

 3:50 pm on Sep 26, 2006 (gmt 0)

Hey guys,

I was wondering what do you think about blogs and WordPress. As you know wordpress can have categories in which it'll show certian posts.
So now I can have 3 categories: A,B,C and then make a post which will be posted in all 3 cats...it'll show in each category, as well as on main page and in archives. As you can see there are many places on the site where that certian post shows.

What do you think, is this duplicate content, or not? How does Google treat such a behaviour?

Any clues?




 9:31 pm on Oct 10, 2006 (gmt 0)

My bad, this is what I added:

<meta name="robots" content="noindex,follow">

Strange that google would can the site?



 5:34 pm on Oct 14, 2006 (gmt 0)

Hey there, im just getting started with wordpress.

Could someone please tell me what would happen if I just removed the links to the monthly archives from my sidebar template file? (which im thinking of doing anyway)

If there are no links pointing to these archives, will the SEs never be able to see them?


 6:50 pm on Oct 14, 2006 (gmt 0)

If anyone ever externally links to any them, they will get picked up anyway.

You're better of with a meta robots noindex tag on them to stop them being indexed.


 6:52 pm on Oct 14, 2006 (gmt 0)

>> My bad <<

It's the little slips like that that can have such devastating consequences for indexing that can really catch you out...

Been there. Done that. :-)


 8:00 pm on Oct 14, 2006 (gmt 0)


My mistake was what I posted on webmasterworld ...

I have the correct no index follow on my site. Google still shows not even one page or post from my blog. Only the homepage, what I find so darn strange is that I am ranked number one for a phrase with more than 30 million results, a phrase that other site owners also use from thier own site name.

I used to have about 150 posts/pages indexed in google that was bringing me about 2k in visitors during the halloween season.



 8:17 pm on Oct 14, 2006 (gmt 0)

OK. Good that the site is correct. I just wanted to make a comment for the benefit of future WebmasterWorld readers.


Well I now have an interesting situation to look at too:

A perfectly indexed 160-page site, suddenly shows up with only 120 pages indexed. The pages were all updated 3 weeks ago, wondering if some error was introduced back then.


 4:51 am on Oct 15, 2006 (gmt 0)

The only issues I haven't been able to solve is the trackback urls.

I wonder if the major bots will understand and follow the rule below

Disallow: /*/trackback/

I know in their guidelines they say the following works

Disallow: /*?

But if you use that and try to use the page removal page it throws an error.

Any thoughts?

[edited by: JeremyL at 4:55 am (utc) on Oct. 15, 2006]


 4:22 pm on Oct 15, 2006 (gmt 0)

Use that notation only in the User-agent: Googlebot section. Other bots do not understand it.

Include all the stuff for Googlebot in that section because if there is a User-agent: Googlebot section, then Google totally ignores the User-agent: * section of the robots.txt file.


 8:15 pm on Oct 15, 2006 (gmt 0)

>>If anyone ever externally links to any them, they will get picked up anyway.<<

g1smd, thats a good point, thanks. This stuff is all very new to me.

So can you tell me if I need to have a robots.txt file to add all these noindex codes? I think it all goes in the header.php right? I dont even know if I have a robots.txt file... where would I look?

Also, what would happen if I receive links to an index page (other than home page) or a category page, and I'm using noindex for category and index pages? Would there still be any link juice from those links?

Are most netizens (who aren't all website owners) savvy enough to use the permalink when linking to a post instead of linking to an index or category page?


 9:00 pm on Oct 15, 2006 (gmt 0)

Look in domain.com/robots.txt for the robots.txt file.

Not all sites have one though.

I have no idea if PageRank is passed right through non-indexed pages, but I suspect that it might be.


 10:37 pm on Oct 15, 2006 (gmt 0)

Ok, I'll have to check if i have a robots.txt

Now what happens with pagerank in the case of redirects.

For example, I'm using the Permalink Redirect Wordpress Plugin, which redirect URLs without a trailing slash to URLs with the trailing slash.

If somebody links to the URL without the trailing slash, is the pagerank passed through to the redirected URL?


 10:39 pm on Oct 15, 2006 (gmt 0)

Yes. Most, if not all, of the Pagerank passes through a redirect like that, where both URLs are within the same site.


 7:57 pm on Oct 16, 2006 (gmt 0)

I'll just say I had a lot of categories on one of my blogs, like 30. I dumped a ton of them - kept only those cats with 10+ entries, and made sure I didn't file a post under multiple cats. A couple of weeks later, pages are starting to pop up in the main index, whereas a month before, the blog was 99% supplemental. No change in the amount/quality of inbound links.


 4:26 am on Oct 18, 2006 (gmt 0)

Thanks g1smd and others for all the info in this thread.

Here's what I've done so far:

yes-www plugin: redirect non-www to www
Permalink Redirect Plugin: redirects no trailing slash to trailing slash.

I dont know which plugin is doing this (probably second one) but index.php is redirecting to root

and posts and pages with /trackback/ at the end are redirected to the post's or page's permalink.


Now can someone please tell me if I added this code together correctly (it looks funny to me):

<?php if ( is_home() ) {?>
<?php if ( $paged < 2 ) {?>
<?php if(is_home() is_single() is_page()){
echo <meta name="robots" content="index,follow">;
} else {
echo <meta name="robots" content="noindex,follow">;
<?php }?>
<?php }?>


And what exacly will this code do?

Will it only allow indexing of the first index page (which is also the home page), as well as individual posts (permalinks), and individual pages (different than index pages)?

And will it dissallow indexing of all category pages, all index pages beyond page 1, all date archive pages, all feed pages (such as anything with /feed/ added at the end), and basically anything besides what is specifically allowed above?

(btw, i've gotta study for exams now so sorry if i cant get back for a while)


 4:55 pm on Oct 18, 2006 (gmt 0)

I believe that code only allows for the first page to be indexed..


 6:00 pm on Oct 18, 2006 (gmt 0)

Its funny I've run into this thread. I've been tweaking my WP for 3 days...

When the main page is being displayed.
When any single Post page is being displayed.
When any Page is being displayed.
When any Category archive page is being displayed.

So <?php if(is_home() is_single() is_page()){
echo <meta name="robots" content="index,follow">;

Will add <meta name="robots" content="index,follow"> into your header.

You can configure it as you want. Make sure that you replace "" and "".


 10:00 pm on Oct 18, 2006 (gmt 0)

Hey guys,
Glad to see you posted some serious and very useful ideas and stuff for tweaking out WP blogs.
As of me, I did all that was suggested and everything that made my site w/o any single dup. page, and guess what? Big G indexed EVERYTHING on my site I want it to index, everything visiable with site: command and damn, I do not have any single supp. page...
It really works ;) Things I've done:
- Basic 301 from all non-www pages to www pages
- Same 301 if the page is called without / at the end to redirect it to the same page with / in the end.
- An unique meta desc. tags using head-meta plugin
- Testing if page is archive or page>1 and adding noindex,follow meta tag
- Optimized title for page, so it looks: "Name of Post: Name of blog" for perm. posts and "Name of blog" for main page. Also for categories: "Name of category Category: Name of blog" (unique titles)

ANd that's basically it. If you do all the listed above you will definitely have a great wp blog w/o any single supp. page in google index, I bet!

Hope it helps...and have a great BLOGGING, cuz blogging rocks ;)



 10:36 pm on Oct 18, 2006 (gmt 0)

Kangol, thanks for that explanation, it helps a lot.

manca, great news! I was hoping to see a post like yours and there it is.

so can you tell me what exactly was the code you inserted for all the index/noindex information?


 4:52 am on Oct 19, 2006 (gmt 0)

You might try using the is_paged() and feeding it a condition >1 in the head section

<?php if ( is_paged() ) :?><meta name="ROBOTS" content="NOINDEX,NOFOLLOW">
<?php endif;?>


 5:59 am on Oct 19, 2006 (gmt 0)

While on this topic, has anyone effectively found a way to not index any feeds from wordpress including any comment feed, category and index page feeds?


 4:59 pm on Oct 19, 2006 (gmt 0)

My final header is:

<?php if ( $paged >1 ) {
echo '<meta name="robots" content="noindex,follow"/>';
<?php if (is_search() ) {
echo '<meta name="robots" content="noindex,follow"/>';
<?php if (is_archive() ) {
echo '<meta name="robots" content="noindex,follow"/>';
<?php if (is_trackback() ) {
echo '<meta name="robots" content="noindex,follow"/>';

Hope Google will pick it up without errors.

[edited by: Kangol at 5:00 pm (utc) on Oct. 19, 2006]


 5:38 pm on Oct 19, 2006 (gmt 0)

I've found this thread fascinating. I run a couple of TypePad blogs and the principles are much the same. I have another isue to bring up.

In August I lost many of the pages on the Google index, along with lots of other people, apparently. I had about 90 pages indexed and it went down gradually to 4.

It has recovered twice and then disappeared twice - at the moment only four pages are listed, my home page and three random others.

My traffic has dropped a lot, obviously, and I'm tempted to start listing more posts on the home page, so that there are more potential hits for search engines.

This has the unwelcome side-effect of meaning that more content is displayed in two places, in the categories as well as on the main page.

It looks like a choice between few posts on the main page and almost zero Google hits until the indexing issue recovers, or many stories on the main page with the attendant risk of duplicate content penalties.

Any ideas?


 7:50 pm on Oct 19, 2006 (gmt 0)

I saw that w3.org does not validate:
<meta name="robots" content="noindex,follow">
but validate:
<meta name="robots" content="noindex,follow"/>

What is the corect version ">" or "/>"?


 8:06 pm on Oct 19, 2006 (gmt 0)

<..... "..."> is for HTML 4.01 pages.

<..... "..." /> is only for XHTML pages.


 9:32 pm on Oct 19, 2006 (gmt 0)

I've been running more wordpress blogs than I care to admit and have NEVER had problems with duplicate content and sites going supplemental or experiencing dupe content penalties.

I remember having a dialog with Adam of Google at one point and him alluding to Google being aware of how wordpress works and posts can occur under more than one URL.

I've even got my blog configured to show thousands of posts on the category page and I don't have problems. I do however use the "more" tag to keep the majority of the content on the page and not elsewhere.


 9:54 pm on Oct 19, 2006 (gmt 0)

sweet thread.

I'm with graywolf though, never noticed any problems with dup content, although I investigate some of these options just to be sure.


 10:35 pm on Oct 19, 2006 (gmt 0)

Interesting experience. Glad to hear you shared it with us.
Although my experience shows a little bit different situation, it doesn't mean you're lucky or anything, I myself just have had a WP site with more than half pages in supp. index just because of they were not "unique", their own meta descritions didn't exist, non-www and www problems and so forth.
After I've done some major changes to my blogs, guess what happened? I got everything (every single post) indexed and ranked very good in SERPs.

It may be that google is aware about blogs and their whole concept but my theory is, prevention is better than cure ;)

That's all I can say about this issue. And my advice is: Just do everything to avoid duplicate content on your pages and you'll be fine with Google, I am almost 100% sure ;)



 12:51 am on Oct 20, 2006 (gmt 0)

Anyone know if I 301 redirect trackback urls to the real post if the trackback feature will actually work? I haven't looked hard into how the trackback system works so I'm not sure.

[edited by: JeremyL at 12:51 am (utc) on Oct. 20, 2006]


 7:05 am on Oct 20, 2006 (gmt 0)

I installed WordPress in a subdirectory of a non-profit site I work on, and I've been having a canonical issue, even with the entire site 301'd from non-www to www. No matter what or how I've tried, the folder with WordPress will not redirect. What Google did was pick up the blog page with http*:example.com/blog/ and they dumped the rest of the site with the www, which has been indexed since late 1998.

No idea how to fix this, it's 301'd for the root, but what I've done for the time being is bar Googlebot from the blog in robots.txt altogether, so now the right pages are back in and ranking as they were, but the Toolbar now shows PR0 for the homepage. No problem with the other engines, just Google.


 11:27 am on Oct 20, 2006 (gmt 0)

Run Xenu LinkSleuth over the site, starting at www, and then again starting at non-www, and see what you get. There should be plenty of clues.

Look especially for an .htaccess file in the /blog/ folder with different and special rules just for WordPress.


 2:19 pm on Oct 20, 2006 (gmt 0)

graywolf's solution probably works well because of the "more" tag. might be a good way to go all around.

This 142 message thread spans 5 pages: < < 142 ( 1 2 [3] 4 5 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved