Forum Moderators: phranque
Anyway, I still feel uncertain as to whether or not I've found the "definitive" answers I so desperately want.
1) No Index code for Wp: From what I've read the first thing I want to do is to address what URLs are indexed. We want only the index, pages, and posts to be indexed. This is accomplished by way of adding the following code to the header.php:
<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo "<meta name=\"robots\" content=\"index,follow\">";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\">";
}?>
2) 301 code to modify the WordPress .htaccess code: Our goal here is to resolve any canonical issues with a 301 redirect...
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain\.com
RewriteRule (.*) [domain.com...] [R=301,L]
</IfModule>
However, this does not seem to redirect the /index.php to the root directory. I was unable to figure this one out...HELP?
Then, I saw discussion of code to rewrite all pages generated by wordpress (ex. www.domain.com/page1; www.domain.com/category/page1) to pages with / at the end of them, because of dup. content issues without rewriting real files, like wp-admin.php or wp-login.php, and basically all physical files on server. Just 301 redirected from page to page/ will be great, again w/o having any affection to physical files...Again, HELP?
3) Lastly, is the robot.txt file. Lots of people doing things differently. I found the following, which seemed the most comprehensive, but is it overkill?
User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /manual
Disallow: /manual/*
Disallow: /phpmanual/
Disallow: /category/
User-agent: Googlebot
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
# disallow all files with? in url
Disallow: /*?*
# disable duggmirror
User-agent: duggmirror
Disallow: /
# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*
# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*
Thanks in advance for any advice and help! Everyone on this forum is an invaluable wealth of information.
More information is needed on this point: Do you mean that example.com/index.php is not redirected to www.example.com/index.php, or do you mean that example.com/ is not redirected to www.example.com/ ? Or both? Or something else?
This might be caused by an error in your rule order (if you didn't post all of the rules that might affect this problem), or it might be an error in your server configuration where PHP is configured to execute before mod_rewrite, and therefore, PHP files cannot be rewritten or redirected by mod_rewrite.
Jim
Does the first user-agent: * command apply to Google-bot as well? Since it's a wildcard, could I just add the second conditions to it, as follows:
User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /manual
Disallow: /manual/*
Disallow: /phpmanual/
Disallow: /category/
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Or do I NEED a second rule specifically for Googlebot?
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]
Also, this code should be placed above your generic domain redirect in order to avoid multiple "chained" redirects in the case where example.com/index.php is requested and needs to be redirected to www.example.com/ -- You want to avoid having to use two redirects to do that.
Jim
By placing this code "above" generic redirect, do you mean above the following code (which redirects non-www to www):
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain\.com
RewriteRule (.*) [domain.com...] [R=301,L]
</IfModule>
No. It does not.
If there is a section for Googlebot then Google reads ONLY the Googlebot section.
See this earlier thread for more: [webmasterworld.com...]