Welcome to WebmasterWorld Guest from 54.162.155.183

Message Too Old, No Replies

Good, Basic Solution to Resolve Wordpress Duplicate Content Issues

basic solution to duplicate content in wrodpress

     
8:54 pm on May 2, 2007 (gmt 0)

5+ Year Member



OK, I've read a ton of information about the duplicate content risks asociated with Wordpress, but I have yet to find the one-stop shop for the proper code to add to the .htaccess and robots.txt file. So, I thought it would be helpful to see if someone could post the definitive "basic" code required for each of the below requirements.

I need to make sure that the .htaccess does the following:

1) all non-www redirects to www
2) pages like /index.php get redirected to /index.php/ (as discussed here [webmasterworld.com...]
3) www.domain.com/index.php/ the always redirects to the root directory www.domain.com

I also need the robots.txt file to only disallow Google (and other engines) to index certain directories/pages of the site that could cause duplicate filters (resulting in the site being banished to the supplemental index).

7:13 pm on May 3, 2007 (gmt 0)

5+ Year Member



You need also change the default format in the <title> because use the Site name - Post Name and it help to get suplemental results.

A good practique is use only the post title or the post title and then the site title.

I hope this help.

7:23 pm on May 3, 2007 (gmt 0)

5+ Year Member



There's a one-stop-plug-in available to do all of that, as messing with .htacess on a wordpress site causes problems.

The rules here don't allow posting of URLs so try searching for "enforce www preference" and you should find it.

7:54 pm on May 3, 2007 (gmt 0)

5+ Year Member



www.domain.com/index.php/ the always redirects to the root directory www.domain.com

I had to do the direct opposite on my wordpress site to get the permalink situation working correctly. (www.domain.com to www.domain.com/index.php/)

Think this could cause problems? One bad fallout of this is I totally lost indexing on MSN about 6 months ago.

9:02 pm on May 3, 2007 (gmt 0)

5+ Year Member



OK, so after a ton of searching, reading and so on...I found what I think is the solution to the Wordpress Duplicate content issue. I will go point by point and answer each of my questions:

1) all non-www redirects to www

This can be accomplished in a variety of ways ( I implemented all):

a) In your Google Webmaster Dashboard > Preferred domain, specify weather Google should show the site as www or non-www.

b) Some web hosting companies allow you to specify this distinction in your admin panel. Some do not.

c) Lastly, you should add the following code to your .hta access file which will redirect

RewriteCond %{HTTP_HOST} ^example\.com
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule ^(.*[^/])$ http://www.example.com/$1/ [R=301,L]

2) pages like /index.php get redirected to /index.php/ (as discussed here [webmasterworld.com...] ).

You can add the following code to your .htaccess file (* Or see below for a plug-in that does this as well as the next requirement):

#Add / to pages
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule ^(.*[^/])$ $1/ [R=301,L]

3) www.domain.com/index.php/ redirects to the root directory www.domain.com

OK, Found and installed this nifty plug-in called, "permalink redirect" [fucoder.com...] which replies a 301 permanent redirect, if requested URI is different from entry’s (or archive’s) permalink. It is used to ensure that there is only one URL associated with each blog entry. This accomplishes the requirements #2 & #3.

Note: I also uncover a plug-in called "wordpress duplicate content cure" [seologs.com...] which makes only index, page, and posts indexed via nofollow tags. Conversely, you can also add the following code to your header.php file which works as well:

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo "<meta name=\"robots\" content=\"index,follow\">";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\">";
}?>

Whew, what a couple of days ;) Hope this makes someone's life a bit easier.

[edited by: tedster at 12:36 am (utc) on May 4, 2007]
[edit reason] fix link [/edit]

11:53 pm on May 3, 2007 (gmt 0)

10+ Year Member



Nice work!
12:28 am on May 4, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



There was a thread here a few months ago about WordPress issues, I believe.

Didn't Matt Cutts also cover some of that ground, on his blog, late last year too?

1:54 am on May 4, 2007 (gmt 0)

WebmasterWorld Senior Member tedster is a WebmasterWorld Top Contributor of All Time 10+ Year Member



A link to the Wordpress [webmasterworld.com] discussion is available in the Hot Topics thread, pinned to the top position of this forum's index page.
2:57 am on May 4, 2007 (gmt 0)

5+ Year Member



Tedster,

Thanks! That was one of the deep posts I waded through seeking information ;) I was hoping to start a more concise posts so that people (like me) could find the information a little more quickly. Conversely, if I was missing something or made a mistake, the response wouldn't be 15 pages away.

4:36 am on May 4, 2007 (gmt 0)

5+ Year Member



I really don't know my way around Apache, so it'd be awesome if someone else confirmed the code I posted...

Thanks!

 

Featured Threads

Hot Threads This Week

Hot Threads This Month