homepage Welcome to WebmasterWorld Guest from 23.23.22.200
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Lost in Wordpress tweaks and code
need help tweaking .htaccess and robot.txt
chazeo




msg:3328959
 6:42 pm on May 2, 2007 (gmt 0)

OK, I've spending the last 4 hours reading the various topics relating to wordpress and modifications needed to avoid duplicate content penalties and subsequent supplemental indexing. (Whew! A lot of info to process)

Anyway, I still feel uncertain as to whether or not I've found the "definitive" answers I so desperately want.

1) No Index code for Wp: From what I've read the first thing I want to do is to address what URLs are indexed. We want only the index, pages, and posts to be indexed. This is accomplished by way of adding the following code to the header.php:

<?php if(is_home() ¦¦ is_single() ¦¦ is_page()){
echo "<meta name=\"robots\" content=\"index,follow\">";
} else {
echo "<meta name=\"robots\" content=\"noindex,follow\">";
}?>

2) 301 code to modify the WordPress .htaccess code: Our goal here is to resolve any canonical issues with a 301 redirect...

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain\.com
RewriteRule (.*) [domain.com...] [R=301,L]
</IfModule>

However, this does not seem to redirect the /index.php to the root directory. I was unable to figure this one out...HELP?

Then, I saw discussion of code to rewrite all pages generated by wordpress (ex. www.domain.com/page1; www.domain.com/category/page1) to pages with / at the end of them, because of dup. content issues without rewriting real files, like wp-admin.php or wp-login.php, and basically all physical files on server. Just 301 redirected from page to page/ will be great, again w/o having any affection to physical files...Again, HELP?

3) Lastly, is the robot.txt file. Lots of people doing things differently. I found the following, which seemed the most comprehensive, but is it overkill?

User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /manual
Disallow: /manual/*
Disallow: /phpmanual/
Disallow: /category/

User-agent: Googlebot
# disallow all files ending with these extensions
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$

# disallow all files with? in url
Disallow: /*?*

# disable duggmirror
User-agent: duggmirror
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*

# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

Thanks in advance for any advice and help! Everyone on this forum is an invaluable wealth of information.

 

g1smd




msg:3329293
 11:22 pm on May 2, 2007 (gmt 0)

If you want Googlebot to keep away from the URLs listed under the User-Agent: * heading you need to replicate those URL rules in the User-Agent: Googlebot section too.

If there is a User-Agent: Googlebot section in the robots.txt file then Googlebot reads ONLY that section.

g1smd




msg:3329297
 11:25 pm on May 2, 2007 (gmt 0)

Disallow: /manual

The above rule disallows any URL that begins with: / m a n u a l which makes the following rule redundant:

Disallow: /manual/*

jdMorgan




msg:3329751
 1:44 pm on May 3, 2007 (gmt 0)

> However, this does not seem to redirect the /index.php to the root directory. I was unable to figure this one out...HELP?

More information is needed on this point: Do you mean that example.com/index.php is not redirected to www.example.com/index.php, or do you mean that example.com/ is not redirected to www.example.com/ ? Or both? Or something else?

This might be caused by an error in your rule order (if you didn't post all of the rules that might affect this problem), or it might be an error in your server configuration where PHP is configured to execute before mod_rewrite, and therefore, PHP files cannot be rewritten or redirected by mod_rewrite.

Jim

chazeo




msg:3329933
 4:35 pm on May 3, 2007 (gmt 0)

G1smd,

Does the first user-agent: * command apply to Google-bot as well? Since it's a wildcard, could I just add the second conditions to it, as follows:

User-agent: *
# disallow all files in these directories
Disallow: /cgi-bin/
Disallow: /z/j/
Disallow: /z/c/
Disallow: /stats/
Disallow: /dh_
Disallow: /about/
Disallow: /contact/
Disallow: /tag/
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /contact
Disallow: /manual
Disallow: /manual/*
Disallow: /phpmanual/
Disallow: /category/
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.wmv$
Disallow: /*.cgi$
Disallow: /*.xhtml$

Or do I NEED a second rule specifically for Googlebot?

chazeo




msg:3329934
 4:37 pm on May 3, 2007 (gmt 0)

JD,

What I meant is that I want www.example.com/index.php to redirect to www.example.com.

Hope that calrifies things.

jdMorgan




msg:3330001
 5:36 pm on May 3, 2007 (gmt 0)

Ah, well there was no code whatsoever to do that in your post, so I assumed something else was going on...

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php
RewriteRule ^index\.php$ http://www.example.com/ [R=301,L]

This can be adapted to redirect example.com/<any-directory>/index.php to www.example.com/<any-directory>/ as well; We've posted the code here dozens of times if you need it.

Also, this code should be placed above your generic domain redirect in order to avoid multiple "chained" redirects in the case where example.com/index.php is requested and needs to be redirected to www.example.com/ -- You want to avoid having to use two redirects to do that.

Jim

chazeo




msg:3330163
 9:10 pm on May 3, 2007 (gmt 0)

JD,

By placing this code "above" generic redirect, do you mean above the following code (which redirects non-www to www):

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} ^domain\.com
RewriteRule (.*) [domain.com...] [R=301,L]
</IfModule>

g1smd




msg:3330192
 10:01 pm on May 3, 2007 (gmt 0)

>> Does the first user-agent: * command apply to Google-bot as well? <<

No. It does not.

If there is a section for Googlebot then Google reads ONLY the Googlebot section.

See this earlier thread for more: [webmasterworld.com...]

jdMorgan




msg:3330359
 3:09 am on May 4, 2007 (gmt 0)

Put the new index redirection code after the RewriteBase directive in your existing code, and before the first RewriteCond.

Jim

g1smd




msg:3331364
 12:01 am on May 5, 2007 (gmt 0)

Yes, redirect index pages first, but make sure that the target URL includes which domain you are going to be redirected to too.

Cater for all other cases after that specific stuff has already been done. You must avoid creating a Redirection Chain.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved