Forum Moderators: phranque
It is well known that someone can use .htaccess in order to do many operations...transform php pages to html, redirect to other pages atc. In the case of tranformation, a robots.txt can also be used to hide the original pages from bots and spiders so as to avoid duplicate content and loose rankings.
However, are there any specific patterns that we can use when writting a .htaccess file that can provide a more 'spider/bot friendly' way?
Thx in advance
////spider friendly Example////////
/////////HTACESS//////////
RewriteRule ^(.*)/(.*)/hotels/(.*)-(.*)/hotel-facilities.htm$ intermediate/hotel-facilities.php?area_name=$1&island_name=$2&island_name=$3&hotel_name=$4 [nc]
/////////ROBOTS////////////////
User-agent: *
Disallow: intermediate/hotel-facilities.php
//////////////////////////////////////////////
This is a way to create spider friendly urlz while hidding source files from spiders to avoid duplicate content....
I dont know about RedirectMatch, or RedirectPremanent etc. if they are spider friendly and if not how to handle...
I just ask if there are specific htaccess rules which are spider friendly...
hope I am more clear thia time :)
Your robots.txt syntax is incorrect, however. The URL-paths should all start with "/".
Disallow: /intermediate/hotel-facilities.php
You should be aware that if there are any links on your site or others that point (or used to point to) your php pages, then the search engines may list those php URLs with link text, but no title or description. If you want to remove those php URLs from search engines, then delete the robots.txt lines, and add a second rule to your mod_rewrite code:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /intermediate/hotel-facilities\.php\?area_name=([^&]+)&island_name=([^&]+)&island_name=([^&+])&hotel_name=([^&]+)
RewriteRule ^intermediate/hotel-facilities\.php$ http://www.example.com/%1/%2/hotels/%3-%4/hotel-facilities.htm [R=301,L]
This second rule will redirect any direct requests for your .php page to the corresponding .htm page. Search engines will follow the redirect, update their search listings with the .htm URL, and assign the PageRank or link popularity of the .php URL to the .htm page.
Because the new rule tests %{THE_REQUEST}, which is the original URL requested by the browser or robot, it will not interfere with your first rule.
So, the first rule lets your server work with .htm URLs without telling the spiders that you use .php, and the second (new) rule tells the spiders to stop using .php URLs, and use the .htm URLs instead.
The Redirect/RedirectMatch family of directives is not sufficiently-flexible to be of use for your needs.
[added] Be aware that the RewriteCond should be all on one line, with a space between the "\" and the "/" characters where you may see a line-wrap above, depending on your screen size. [/added]
Jim