homepage Welcome to WebmasterWorld Guest from 54.145.209.77
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Wordpress - Robots.txt to fix Duplicate Content
I need a robots.txt file to prevent dup content from being crawled
jman123

5+ Year Member



 
Msg#: 3324959 posted 7:46 pm on Apr 26, 2007 (gmt 0)

Here's the problem:

I'm doing some SEO for a site and the site also has a blog attached: http://www.example.com/blog/ . The problem is, there is duplicate content on the blog and I need a robots.txt file that will allow the bots to index only one copy of the content while avoiding the others.

For example, one post will appear in three different locations:

1. Category page
2. Single post page
3. Archive page

Ideally, I would like to only have the 'Single post page' to be the page that is indexed while the others are ignored by bots (but I want to keep all three copies for ease of the user navigation).

I know how to create a robots.txt page, but I'm not confident that I can pull this off without blocking pages on the root domain and/or blocking all but one copy of all content.

I know this is a common problem for Wordpress blogs, and I've done some research and found some answers, but I'm not confident in what I'm doing.

Is there anyone out there who has a robots.txt file addressing this same issue? Can I see it? I would greatly appreciate it.

Thanks

[edited by: encyclo at 1:48 am (utc) on April 27, 2007]
[edit reason] switched to example.com [/edit]

 

skipfactor

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 3324959 posted 8:40 pm on Apr 26, 2007 (gmt 0)

I copied this from here or somewhere. Seems to do the trick though haven't had time to confirm:

User-agent: *
Disallow: */feed*
Disallow: */trackback
Disallow: */wp-admin
Disallow: */wp-content
Disallow: */wp-includes
Disallow: *wp-login.php

shyspinner

5+ Year Member



 
Msg#: 3324959 posted 8:07 am on Apr 28, 2007 (gmt 0)

All of my blog pages went supplimental on me. I'll try this, thanks.

encyclo

WebmasterWorld Senior Member encyclo us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3324959 posted 6:50 pm on Apr 28, 2007 (gmt 0)

For WordPress, see the post by ogletree this earlier thread:

[webmasterworld.com...]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3324959 posted 6:23 pm on May 7, 2007 (gmt 0)

Only Google understands the * wildcard notation.

Those MUST all go in the User-agent: Googlebot section of your robots.txt file.

bouncybunny

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3324959 posted 2:40 pm on May 22, 2007 (gmt 0)

I didn't think that still applied.

[webmasterworld.com...]

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3324959 posted 2:16 pm on May 23, 2007 (gmt 0)

bouncybunny is correct, yahoo also handles the wildcard aka pattern matching.

the biggest players will handle it but most of the smaller players don't since it is not officially part of the robots.txt protocol.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved