homepage Welcome to WebmasterWorld Guest from 107.22.70.215
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Wordpress - Robots.txt to fix Duplicate Content
I need a robots.txt file to prevent dup content from being crawled
jman123




msg:3322916
 7:46 pm on Apr 26, 2007 (gmt 0)

Here's the problem:

I'm doing some SEO for a site and the site also has a blog attached: http://www.example.com/blog/ . The problem is, there is duplicate content on the blog and I need a robots.txt file that will allow the bots to index only one copy of the content while avoiding the others.

For example, one post will appear in three different locations:

1. Category page
2. Single post page
3. Archive page

Ideally, I would like to only have the 'Single post page' to be the page that is indexed while the others are ignored by bots (but I want to keep all three copies for ease of the user navigation).

I know how to create a robots.txt page, but I'm not confident that I can pull this off without blocking pages on the root domain and/or blocking all but one copy of all content.

I know this is a common problem for Wordpress blogs, and I've done some research and found some answers, but I'm not confident in what I'm doing.

Is there anyone out there who has a robots.txt file addressing this same issue? Can I see it? I would greatly appreciate it.

Thanks

[edited by: encyclo at 1:48 am (utc) on April 27, 2007]
[edit reason] switched to example.com [/edit]

 

skipfactor




msg:3322974
 8:40 pm on Apr 26, 2007 (gmt 0)

I copied this from here or somewhere. Seems to do the trick though haven't had time to confirm:

User-agent: *
Disallow: */feed*
Disallow: */trackback
Disallow: */wp-admin
Disallow: */wp-content
Disallow: */wp-includes
Disallow: *wp-login.php

shyspinner




msg:3324691
 8:07 am on Apr 28, 2007 (gmt 0)

All of my blog pages went supplimental on me. I'll try this, thanks.

encyclo




msg:3324960
 6:50 pm on Apr 28, 2007 (gmt 0)

For WordPress, see the post by ogletree this earlier thread:

[webmasterworld.com...]

g1smd




msg:3333136
 6:23 pm on May 7, 2007 (gmt 0)

Only Google understands the * wildcard notation.

Those MUST all go in the User-agent: Googlebot section of your robots.txt file.

bouncybunny




msg:3346482
 2:40 pm on May 22, 2007 (gmt 0)

I didn't think that still applied.

[webmasterworld.com...]

goodroi




msg:3347604
 2:16 pm on May 23, 2007 (gmt 0)

bouncybunny is correct, yahoo also handles the wildcard aka pattern matching.

the biggest players will handle it but most of the smaller players don't since it is not officially part of the robots.txt protocol.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved