homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Wordpress - Robots.txt to fix Duplicate Content
I need a robots.txt file to prevent dup content from being crawled

 7:46 pm on Apr 26, 2007 (gmt 0)

Here's the problem:

I'm doing some SEO for a site and the site also has a blog attached: http://www.example.com/blog/ . The problem is, there is duplicate content on the blog and I need a robots.txt file that will allow the bots to index only one copy of the content while avoiding the others.

For example, one post will appear in three different locations:

1. Category page
2. Single post page
3. Archive page

Ideally, I would like to only have the 'Single post page' to be the page that is indexed while the others are ignored by bots (but I want to keep all three copies for ease of the user navigation).

I know how to create a robots.txt page, but I'm not confident that I can pull this off without blocking pages on the root domain and/or blocking all but one copy of all content.

I know this is a common problem for Wordpress blogs, and I've done some research and found some answers, but I'm not confident in what I'm doing.

Is there anyone out there who has a robots.txt file addressing this same issue? Can I see it? I would greatly appreciate it.


[edited by: encyclo at 1:48 am (utc) on April 27, 2007]
[edit reason] switched to example.com [/edit]



 8:40 pm on Apr 26, 2007 (gmt 0)

I copied this from here or somewhere. Seems to do the trick though haven't had time to confirm:

User-agent: *
Disallow: */feed*
Disallow: */trackback
Disallow: */wp-admin
Disallow: */wp-content
Disallow: */wp-includes
Disallow: *wp-login.php


 8:07 am on Apr 28, 2007 (gmt 0)

All of my blog pages went supplimental on me. I'll try this, thanks.


 6:50 pm on Apr 28, 2007 (gmt 0)

For WordPress, see the post by ogletree this earlier thread:



 6:23 pm on May 7, 2007 (gmt 0)

Only Google understands the * wildcard notation.

Those MUST all go in the User-agent: Googlebot section of your robots.txt file.


 2:40 pm on May 22, 2007 (gmt 0)

I didn't think that still applied.



 2:16 pm on May 23, 2007 (gmt 0)

bouncybunny is correct, yahoo also handles the wildcard aka pattern matching.

the biggest players will handle it but most of the smaller players don't since it is not officially part of the robots.txt protocol.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved