Welcome to WebmasterWorld Guest from 54.163.168.15

Forum Moderators: Ocean10000 & incrediBILL & phranque

Lost in Wordpress tweaks and code

need help tweaking .htaccess and robot.txt

   
6:42 pm on May 2, 2007 (gmt 0)

5+ Year Member



OK, I've spending the last 4 hours reading the various topics relating to wordpress and modifications needed to avoid duplicate content penalties and subsequent supplemental indexing. (Whew! A lot of info to process)

Anyway, I still feel uncertain as to whether or not I've found the "definitive" answers I so desperately want.

1) No Index code for Wp: From what I've read the first thing I want to do is to address what URLs are indexed. We want only the index, pages, and posts to be indexed. This is accomplished by way of adding the following code to the header.php:

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo "<meta name=\"robots\" content=\"index,follow\">";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\">";
}?>

2) 301 code to modify the WordPress .htaccess code: Our goal here is to resolve any canonical issues with a 301 redirect...

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain\.com
RewriteRule (.*) [domain.com...] [R=301,L]
</IfModule>

However, this does not seem to redirect the /index.php to the root directory. I was unable to figure this one out...HELP?

Then, I saw discussion of code to rewrite all pages generated by wordpress (ex. www.domain.com/page1; www.domain.com/category/page1) to pages with / at the end of them, because of dup. content issues without rewriting real files, like wp-admin.php or wp-login.php, and basically all physical files on server. Just 301 redirected from page to page/ will be great, again w/o having any affection to physical files...Again, HELP?

3) Lastly, is the robot.txt file. Lots of people doing things differently. I found the following, which seemed the most comprehensive, but is it overkill?

User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /manual
Disallow: /manual/*
Disallow: /phpmanual/
Disallow: /category/

User-agent: Googlebot
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$

# disallow all files with? in url
Disallow: /*?*

# disable duggmirror
User-agent: duggmirror
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*

# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

Thanks in advance for any advice and help! Everyone on this forum is an invaluable wealth of information.

11:22 pm on May 2, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



If you want Googlebot to keep away from the URLs listed under the User-Agent: * heading you need to replicate those URL rules in the User-Agent: Googlebot section too.

If there is a User-Agent: Googlebot section in the robots.txt file then Googlebot reads ONLY that section.

11:25 pm on May 2, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Disallow: /manual

The above rule disallows any URL that begins with: / m a n u a l which makes the following rule redundant:

Disallow: /manual/*

1:44 pm on May 3, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



> However, this does not seem to redirect the /index.php to the root directory. I was unable to figure this one out...HELP?

More information is needed on this point: Do you mean that example.com/index.php is not redirected to www.example.com/index.php, or do you mean that example.com/ is not redirected to www.example.com/ ? Or both? Or something else?

This might be caused by an error in your rule order (if you didn't post all of the rules that might affect this problem), or it might be an error in your server configuration where PHP is configured to execute before mod_rewrite, and therefore, PHP files cannot be rewritten or redirected by mod_rewrite.

Jim

4:35 pm on May 3, 2007 (gmt 0)

5+ Year Member



G1smd,

Does the first user-agent: * command apply to Google-bot as well? Since it's a wildcard, could I just add the second conditions to it, as follows:

User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /manual
Disallow: /manual/*
Disallow: /phpmanual/
Disallow: /category/
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$

Or do I NEED a second rule specifically for Googlebot?

4:37 pm on May 3, 2007 (gmt 0)

5+ Year Member



JD,

What I meant is that I want www.example.com/index.php to redirect to www.example.com.

Hope that calrifies things.

5:36 pm on May 3, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Ah, well there was no code whatsoever to do that in your post, so I assumed something else was going on...

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]

This can be adapted to redirect example.com/<any-directory>/index.php to www.example.com/<any-directory>/ as well; We've posted the code here dozens of times if you need it.

Also, this code should be placed above your generic domain redirect in order to avoid multiple "chained" redirects in the case where example.com/index.php is requested and needs to be redirected to www.example.com/ -- You want to avoid having to use two redirects to do that.

Jim

9:10 pm on May 3, 2007 (gmt 0)

5+ Year Member



JD,

By placing this code "above" generic redirect, do you mean above the following code (which redirects non-www to www):

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain\.com
RewriteRule (.*) [domain.com...] [R=301,L]
</IfModule>

10:01 pm on May 3, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



>> Does the first user-agent: * command apply to Google-bot as well? <<

No. It does not.

If there is a section for Googlebot then Google reads ONLY the Googlebot section.

See this earlier thread for more: [webmasterworld.com...]

3:09 am on May 4, 2007 (gmt 0)

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Put the new index redirection code after the RewriteBase directive in your existing code, and before the first RewriteCond.

Jim

12:01 am on May 5, 2007 (gmt 0)

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Yes, redirect index pages first, but make sure that the target URL includes which domain you are going to be redirected to too.

Cater for all other cases after that specific stuff has already been done. You must avoid creating a Redirection Chain.

 

Featured Threads

Hot Threads This Week

Hot Threads This Month