Forum Moderators: phranque

Message Too Old, No Replies

Spider Friendly .htaccess

spicific patterns.....

         

omoutop

6:42 am on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi to all!

It is well known that someone can use .htaccess in order to do many operations...transform php pages to html, redirect to other pages atc. In the case of tranformation, a robots.txt can also be used to hide the original pages from bots and spiders so as to avoid duplicate content and loose rankings.

However, are there any specific patterns that we can use when writting a .htaccess file that can provide a more 'spider/bot friendly' way?

Thx in advance

jd01

10:15 am on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi omoutop,

I am not sure I understand the question - Are you asking if the way an .htaccess file is written can impact spidering, or are you asking if you can use .htaccess to take advantage of 'better' URL's?

Please, give an example of what you are trying to accomplish.

Justin

omoutop

10:58 am on Aug 11, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



for instance:

////spider friendly Example////////
/////////HTACESS//////////
RewriteRule ^(.*)/(.*)/hotels/(.*)-(.*)/hotel-facilities.htm$ intermediate/hotel-facilities.php?area_name=$1&island_name=$2&island_name=$3&hotel_name=$4 [nc]
/////////ROBOTS////////////////
User-agent: *
Disallow: intermediate/hotel-facilities.php
//////////////////////////////////////////////

This is a way to create spider friendly urlz while hidding source files from spiders to avoid duplicate content....

I dont know about RedirectMatch, or RedirectPremanent etc. if they are spider friendly and if not how to handle...

I just ask if there are specific htaccess rules which are spider friendly...

hope I am more clear thia time :)

jdMorgan

2:08 am on Aug 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you do an internal rewrite using mod_rewrite as you have shown, then spiders will be completely unaware of the mod_rewrite code. So that's not a problem.

Your robots.txt syntax is incorrect, however. The URL-paths should all start with "/".


Disallow: /intermediate/hotel-facilities.php

You should be aware that if there are any links on your site or others that point (or used to point to) your php pages, then the search engines may list those php URLs with link text, but no title or description. If you want to remove those php URLs from search engines, then delete the robots.txt lines, and add a second rule to your mod_rewrite code:


RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /intermediate/hotel-facilities\.php\?area_name=([^&]+)&island_name=([^&]+)&island_name=([^&+])&hotel_name=([^&]+)
RewriteRule ^intermediate/hotel-facilities\.php$ http://www.example.com/%1/%2/hotels/%3-%4/hotel-facilities.htm [R=301,L]

This second rule will redirect any direct requests for your .php page to the corresponding .htm page. Search engines will follow the redirect, update their search listings with the .htm URL, and assign the PageRank or link popularity of the .php URL to the .htm page.

Because the new rule tests %{THE_REQUEST}, which is the original URL requested by the browser or robot, it will not interfere with your first rule.

So, the first rule lets your server work with .htm URLs without telling the spiders that you use .php, and the second (new) rule tells the spiders to stop using .php URLs, and use the .htm URLs instead.

The Redirect/RedirectMatch family of directives is not sufficiently-flexible to be of use for your needs.

[added] Be aware that the RewriteCond should be all on one line, with a space between the "\" and the "/" characters where you may see a line-wrap above, depending on your screen size. [/added]

Jim

omoutop

6:23 am on Aug 12, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thx Jim for all your info...!
this is what I was askying for..

omoutop

DonMateo

8:03 am on Aug 12, 2005 (gmt 0)

10+ Year Member



Excellent! I was just about to start a new thread asking for the best way to avoid duplicate content penalties when using ReWrites. Now there's no need - thanks!