Forum Moderators: phranque

Message Too Old, No Replies

Redirect all 'html' to 'php'

A bug in my htaccess?

         

fish_eye

11:37 pm on Aug 1, 2005 (gmt 0)

10+ Year Member



I recently changed every 'htm' and 'html' page on a site to 'php'. I have the following code in the htaccess:

# add the www
RewriteCond %{HTTP_HOST} ^example\.com\.au
RewriteRule ^(.*)$ http://www.example.com.au/$1 [R=301]
# (temp) redirect all htm to php
RewriteCond %{REQUEST_URI} ^(.*)\.htm.*
RewriteRule ^(.*)\.htm.* http://www.example.com.au/$1\.php [R=301]

I've noticed that slurp has recently been getting 404's on http://www.example.com.au/http://www.example.com.au/widgets.php

It's happily reading the robots.txt - but I think that every other request from its robot it in this time period has had the doubled up http.

In the same time period google has had "11+2" hits (2 robots.txt, 11 others) and has had no 404s.

When I was mucking around with the htaccess I probably had a bug that caused the double up on the http and this was probably only there for about 10 minutes.

Is the code above okay - and therefore does it seem likely that slurp just happened to crawl in that 10 minute window? I do not have access to logs that are older than 24 hours (something I will address).

PS. I remember reading somewhere that a technique similar to the above does not retain google PR - is that so?

jdMorgan

12:04 am on Aug 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You don't need a RewriteCond on this:

# (temp) redirect all htm to php
RewriteRule ^([^.]+)\.html?$ http://www.example.com.au/$1.php [R=301,L]

Yes, it's likely that Slurp picked up some bogus URLs while you were testing. You could always detect and redirect those URLs if Slurp carries on for more than a few weeks.

Jim

fish_eye

12:18 am on Aug 2, 2005 (gmt 0)

10+ Year Member



I noticed the missing 'L' just after I posted - thanks Jim.

I guess I could put this in for a couple of weeks - sheesh!


RewriteRule ^\/http\:\/\/www\.example\.com\.au\/(.*)$ http://www.example.com.au/$1 [R=301,L]

jdMorgan

12:52 am on Aug 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You're getting way carried away with the escaping rules for mod_rewrite... :)

RewriteRule ^http://www\.example\.com\.au/(.*)$ http://www.example.com.au/$1 [R=301,L]

It might be worthwhile -- Slurp has a bad reputation for asking for removed (404/410) pages... Sometimes for up to a year.

Jim

fish_eye

1:06 am on Aug 2, 2005 (gmt 0)

10+ Year Member



Yes - I've noticed the one year delay :(

Yes - enjoyed all those slashes - was not sure ... but I KNEW you'd correct them - thanks.

Yes - I noticed the first slash (escaped or otherwise) was not required.

Many thanks again Jim.

Sam.

jdMorgan

1:12 am on Aug 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> but I KNEW you'd correct them - thanks.

:) I have to... It's self-defense. Otherwise, they're all over the Web within a year, and we start getting questions about them here. Believe me, I have had my very own code typos come back to haunt me repeatedly in threads here. :o

Jim

fish_eye

5:27 am on Aug 2, 2005 (gmt 0)

10+ Year Member



Geez - one year with an extra line in my htaccess just because I stuffed up for less than ten minutes. :(

Any comment on the use of this (sledgehammer htm->php) technique for search indexes and pr (etc)?