Welcome to WebmasterWorld Guest from 54.166.152.121

Forum Moderators: goodroi

Message Too Old, No Replies

Wordpress - Robots.txt to fix Duplicate Content

I need a robots.txt file to prevent dup content from being crawled

     
7:46 pm on Apr 26, 2007 (gmt 0)

5+ Year Member



Here's the problem:

I'm doing some SEO for a site and the site also has a blog attached: http://www.example.com/blog/ . The problem is, there is duplicate content on the blog and I need a robots.txt file that will allow the bots to index only one copy of the content while avoiding the others.

For example, one post will appear in three different locations:

1. Category page
2. Single post page
3. Archive page

Ideally, I would like to only have the 'Single post page' to be the page that is indexed while the others are ignored by bots (but I want to keep all three copies for ease of the user navigation).

I know how to create a robots.txt page, but I'm not confident that I can pull this off without blocking pages on the root domain and/or blocking all but one copy of all content.

I know this is a common problem for Wordpress blogs, and I've done some research and found some answers, but I'm not confident in what I'm doing.

Is there anyone out there who has a robots.txt file addressing this same issue? Can I see it? I would greatly appreciate it.

Thanks

[edited by: encyclo at 1:48 am (utc) on April 27, 2007]
[edit reason] switched to example.com [/edit]

8:40 pm on Apr 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I copied this from here or somewhere. Seems to do the trick though haven't had time to confirm:

User-agent: *
Disallow: */feed*
Disallow: */trackback
Disallow: */wp-admin
Disallow: */wp-content
Disallow: */wp-includes
Disallow: *wp-login.php

8:07 am on Apr 28, 2007 (gmt 0)

10+ Year Member



All of my blog pages went supplimental on me. I'll try this, thanks.
6:50 pm on Apr 28, 2007 (gmt 0)

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member



For WordPress, see the post by ogletree this earlier thread:

[webmasterworld.com...]

6:23 pm on May 7, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Only Google understands the * wildcard notation.

Those MUST all go in the User-agent: Googlebot section of your robots.txt file.

2:40 pm on May 22, 2007 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I didn't think that still applied.

[webmasterworld.com...]

2:16 pm on May 23, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



bouncybunny is correct, yahoo also handles the wildcard aka pattern matching.

the biggest players will handle it but most of the smaller players don't since it is not officially part of the robots.txt protocol.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month