Welcome to WebmasterWorld Guest from 54.224.197.251

Forum Moderators: goodroi

Message Too Old, No Replies

Wordpress - Robots.txt to fix Duplicate Content

I need a robots.txt file to prevent dup content from being crawled

     
7:46 pm on Apr 26, 2007 (gmt 0)

New User

10+ Year Member

joined:Feb 4, 2007
posts:4
votes: 0


Here's the problem:

I'm doing some SEO for a site and the site also has a blog attached: http://www.example.com/blog/ . The problem is, there is duplicate content on the blog and I need a robots.txt file that will allow the bots to index only one copy of the content while avoiding the others.

For example, one post will appear in three different locations:

1. Category page
2. Single post page
3. Archive page

Ideally, I would like to only have the 'Single post page' to be the page that is indexed while the others are ignored by bots (but I want to keep all three copies for ease of the user navigation).

I know how to create a robots.txt page, but I'm not confident that I can pull this off without blocking pages on the root domain and/or blocking all but one copy of all content.

I know this is a common problem for Wordpress blogs, and I've done some research and found some answers, but I'm not confident in what I'm doing.

Is there anyone out there who has a robots.txt file addressing this same issue? Can I see it? I would greatly appreciate it.

Thanks

[edited by: encyclo at 1:48 am (utc) on April 27, 2007]
[edit reason] switched to example.com [/edit]

8:40 pm on Apr 26, 2007 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 22, 2003
posts:1483
votes: 0


I copied this from here or somewhere. Seems to do the trick though haven't had time to confirm:

User-agent: *
Disallow: */feed*
Disallow: */trackback
Disallow: */wp-admin
Disallow: */wp-content
Disallow: */wp-includes
Disallow: *wp-login.php

8:07 am on Apr 28, 2007 (gmt 0)

New User

10+ Year Member

joined:Jan 10, 2005
posts:36
votes: 0


All of my blog pages went supplimental on me. I'll try this, thanks.
6:50 pm on Apr 28, 2007 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9074
votes: 6


For WordPress, see the post by ogletree this earlier thread:

[webmasterworld.com...]

6:23 pm on May 7, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Only Google understands the * wildcard notation.

Those MUST all go in the User-agent: Googlebot section of your robots.txt file.

2:40 pm on May 22, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 10, 2006
posts:665
votes: 0


I didn't think that still applied.

[webmasterworld.com...]

2:16 pm on May 23, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
posts:3284
votes: 225


bouncybunny is correct, yahoo also handles the wildcard aka pattern matching.

the biggest players will handle it but most of the smaller players don't since it is not officially part of the robots.txt protocol.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members