Forum Moderators: phranque

Message Too Old, No Replies

index.php does not redirect to root - help

index.php does not redirect to root - help

         

Jason_Brown

5:18 pm on Oct 4, 2008 (gmt 0)

10+ Year Member



Hi Guys,

I have just converted a site from html to php. I knew that I would have to do some redirection not to lose PR etc, so in my research I found the code I thought I needed.

I created an htaccess file with the following code:

RedirectMatch 301 (.*)\.html$ http://www.example.com$1.php

At first it seemed to have worked. In discussion with another webmaster he noticed that http://www.example.com/index.php was showing and not redirecting to the root .com/

In further discussions I added the next set of code:

RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.html\ HTTP/
RewriteRule ^(.*)index\.html$ http://www.example.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /.*index\.php\ HTTP/
RewriteRule ^(.*)index\.php$ http://www.example.com/$1 [R=301,L]

I thought this would sort out the index page but it has not.

The site is an addon domain (if that could make a difference). I am unsure if this could cause duplicate content issues or any other issues.

Appreciate any assistance you can offer :)

Thanks
Jason

jdMorgan

7:08 pm on Oct 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Jason,

Welcome to WebmasterWorld!

You could have simply named your .php page files with .html extensions, and then set up your server to parse .html files for PHP using a single directive -- either an AddHandler or an AddType, depending on how PHP is installed on your server. This would have avoided any need for redirection, and would have had no impact whatsoever on your search rankings; As it is, you may now expect a temporary loss of rankings while the search engines find and follow all of your redirects and re-assign your old URLs' rank to your new URLs. The effects vary depending on the current rank of your pages and how often each is crawled, but range from a minor drop for a few days or weeks to being dumped to page 20 of the SERPs for nine months... :(

If you do decide to change your page URLs again in the future, then I'd suggest you take the opportunity to completely remove "file extensions" from your URLs, and then internally rewrite those extensionless URLs to the proper files when requested from your server. In this way, should you ever change your site technology again, your URLs would stay the same and you'd just adjust the (single) RewriteRule to point to a different file extension for extensionless page URLs.

This isn't perfect, but is as close as you can get to following Sir Tim Berners-Lee's directive [w3.org] -- simply-put: Never change or remove a URL -- ever.

Sir Berners-Lee is the inventor of the hyperlink. It doesn't really matter if you believe in his philosophy, but it does matter that the search engines generally do. For best results in search, follow his advice.

Using mod_rewrite on Apache, or ISAPI rewrite on IIS, you can change your filenames all you want, but you never have to change your URLs.

OK, time to get to the code. The rules are not in the proper order, and some efficiency improvements are needed. I'd go with this -- replacing everything you posted above:


Options +FollowSymLinks
RewriteEngine on
#
# Redirect direct client requests for <any-directory-level>/index.html to
# <any-directory-level>/ in canonical domain
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]
#
# Redirect direct client requests for <any-directory-level>/index.php to
# <any-directory-level>/ in canonical domain
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]
#
# Redirect remaining html URLs to PHP URLs in canonical domain
RewriteRule ^(([^/]+/)*[^.]+)\.html$ http://www.example.com/$1.php [R=301,L]
#
# Redirect remaining URLs to canonical domain
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

Note that these rules are in order from most-specific to least-specific. If you later add internal rewrites, add them after these external redirects, and put the rewrites in order from most-specific to least-specific as well. The purpose of putting all external redirects first is to avoid having an external redirect 'expose' an internal rewrite to the client. The purpose of going in order from most-specific to least-specific is to avoid executing multiple redirects or rewrites for any single HTTP request. It also helps to prevent unexpected pattern matches and the resulting unexpected execution of the 'wrong' rules.

Also, the reason you don't want to mix mod_alias (RedirectMatch) directives and mod_rewrite directives is that directives are executed in per-module order, and not necessarily in the order you put them into .htaccess. So if you mix directives from different modules, they won't execute in the order you expect them to, and this can lead to "mysterious" problems that are impossible to find if you don't know about the per-module directive processing. So getting rid of the mod_alias Redirects is really part of the same fix described in the previous paragraph.

Now that you've got some code to try, search for and get a copy of the "Live HTTP Headers" add-on for Firefox/Mozilla browsers, and install it. Then make a list of all possible old/wrong URLs, and start testing them one-by one. Observe the server response headers using Live HTTP Headers, and make sure you get one and only one 301 redirect from the old/wrong URL to the new/correct one. Check these with both the www and non-www domain, too.

Important: After changing any code on your server, be sure to completely flush your browser cache. Otherwise, your browser will serve your requests from its cache on your local hard drive, and no request will be sent to your server. And if no request is sent to your server, then the new code can have no effect, and your test results will be invalid.

If any trouble, post back here with the requested URL, the desired target URL, and the actual result of each test. If you get any server errors, pleas post the relevant contents of your server access *and* server error log.

For links to resources about mod_rewrite and regular-expressions and information about this forum itself, see our forum charter. For more example code, see our forum library.

Jim

g1smd

7:08 pm on Oct 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Think very carefully about what you are doing.

What you should be doing is keeping the externally used URLs all the same even though the filename on the server has changed.

However, you can only do that (easily!) if the rest of the URL (folder names, and filename before extension) has stayed the same, or there is a very simple relationship between the old and the new.

For that you either need a rewrite to connect the old URL to the new filename or you need to set up your server so that you use .html filenames but they are then parsed to run their PHP scripted content as if they were .php files.

If the rest of the folder path and filename has completely changed, and there is *not* a very simple relationship between the old and the new, then it isn't going to be easy to keep the same URLs. In that case you will need a redirect to connect requests for the old URL to instead now request the new URL.

Finally, yes, you do need to redirect index filename requests to strip the index part (and force www at the same time for those), and redirect all remaining non-www requests to the www version.

Your code for the index redirect must be first but that example isn't the most efficient. There are some better examples that jdMorgan has provided in recent days. In particular, the .* part (in your example) is greedy and can be more efficiently coded.

You will need a redirect from .html URLs to .php URLs if you are changing all your URLs.

However, done properly, I would have kept all the URLs all the same and used a rewrite to connect the .html URL requests to the ,php filename on the server, or else set up the server to parse .html files for PHP scripts (and that's just one line of code in .htaccess to make that happen).

The non-www to www redirect is fairly easy, and must be the last of the redirects before your rewrite code (if you have any).

[Heh. jd posted just a couple of seconds before I did.]

Jason_Brown

7:32 pm on Oct 4, 2008 (gmt 0)

10+ Year Member



JD / G1,

Many thanks for your replies.

I'm still trying to absorb the information. My intial reaction is to go back and change all the file extensions and then use the parse you mention.

I only updated the site on Thursday night.

About 30 files so could be done quite quickly. In your opinion would that be the most secure and effecient way?

Many thanks
Jason

ps I was told this was a great forum and you have lived up to your reputation.

g1smd

7:37 pm on Oct 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes. Keep the same URLs if you can. As the change was only a few days ago you can likely get away with it if you act quickly (and especially so if Google hasn't found the new URLs yet) and change them back. Google will have seen the .html URLs returning a 404 error for a few days, and then they and their content magically re-appears. That's not a major issue if it is just a few days, and site ownership hasn't changed and they are on the same server as before.

Their system will likely simply write it off as a few days of downtime. In fact, it is entirely possible that they didn't even visit your site in those three days, and have no idea about what you did, and will never know if you act quickly.

You'll need to alter the code above a little bit.

.

You will need the index redirect for .../index to .../ just as before (and for all index.php and index.html and index.htm requests).

You'll need to redirect .php URLs to .html URLs just in case something has found those "new" URLs in the few days they were active (that's a reverse of the example above).

You'll need the non-www to www redirect just as before.

You'll need to either parse .html files for the PHP scripts inside, or else internally rewrite .html URL requests to pull the equivalent .php file from the server.

[Me and jd play tag on this forum.] :-)

Jason_Brown

8:04 pm on Oct 4, 2008 (gmt 0)

10+ Year Member



Just to confirm I only changed the extension, all other folders etc were the same. The redirect worked for all pages, the only issue was the index.php did not redirect to .com/

When you entered the url www.example.com you were sent to www.example.com/index.php

If I understand, if I rename the files then I can use a parse instead of the redirect?

If that is correct could you show me the parse or where is a good source.

If not then I'll use the above code.

Many many thanks
Jason

g1smd

8:13 pm on Oct 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You might need AddType or AddHandler depending on your server setup.

I use this:

# Allow .html .htm extensions to be PHP scripted files:

AddType application/x-httpd-php .html .htm 

You might need something else.

This doesn't really fit your situation, but it might help you with parts of what you need to know: [webmasterworld.com...]

Jason_Brown

10:10 pm on Oct 4, 2008 (gmt 0)

10+ Year Member



Just checked with my hosting support and they have confirmed I need to use

AddHandler application/x-httpd-php .html .htm

I am about to rename all the files back html and change the htaccess file.

I think that should resolve all issues with external links and PR.

Just had a thought - in the new design I put a lot of internal links within the content to php pages so I have to redirect all php pages to html - (why did I change it ?)

so can I use

RedirectMatch 301 (.*)\.php$ http://www.example.com$1.html

to rectify that?

I really appreciate your help and will definitly come here before I do anything with any site in the future :)

Thanks again
Jason

jdMorgan

10:31 pm on Oct 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You need to put everything back the way it was: change the files back to .html, and theon-page links back to .html

Then you'll probably want to redirect requests for .php URLs back to .html, and retain the /index file redirects and domain canonicalization rules as well.

The only rule that changes from what I posted above is the third one: swap the letters "html" and "php" and you're done.

The bottom line is that redirects can only "fix" things if your links are correct: Otherwise, you'll confuse the 'bots, and every request for a .php page on your site as the result of a .php link on your pages will result in a second request, because the first one will get a redirect to .html.

Jim

[edited by: jdMorgan at 10:33 pm (utc) on Oct. 4, 2008]

g1smd

10:48 pm on Oct 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, and to confirm that again - the links on your pages need to point to the .html URLs.

The link to "Home" should be to "/" or to "www.domain.com/" too.

The redirect from .php to .html is just there for anything that has found the .php URLs in the two days they were up - and make them forget about them and use the .html URLs instead.

Don't use the RedirectMatch code you showed above.

Use jd's code with the one change that he mentioned above (in the third rule, swap the words "php" and "html" with each other).

Jason_Brown

11:37 pm on Oct 4, 2008 (gmt 0)

10+ Year Member



Jim,

have just gone through entire site and changed everything :)

php is not working.

my htaccess looks like this.

AddHandler application/x-httpd-php .html .htm

Options +FollowSymLinks
RewriteEngine on

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.html\ HTTP/
RewriteRule ^(([^/]+/)*)index\.html$ http://www.example.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.php\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]

RewriteRule ^(([^/]+/)*[^.]+)\.php$ http://www.example.com/$1.html [R=301,L]

RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteRule (.*) [example...] [R=301,L]

Any suggestions? is it the addhandler in the wrong place?

Thanks
Jason

g1smd

11:45 pm on Oct 4, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I can't see anything wrong with that code, though there is a slightly more efficient way of doing the index redirects - where the first two rules can now be combined into one rule. It is in the other thread that I linked to above.

Can I also get you to add comments before each rule describing what it does... so it will mean something to you in six months time?

.

Can you quickly try the "AddType" version of that PHP-enabling code that I posted, just in case the host is wrong about what you need?

I remember I had to try several variations before I got it to work.

jdMorgan

12:13 am on Oct 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Warning: Although combining the rules may be more efficient, it also leaves a "hole" if the URL in THE_REQUEST has been rewritten, the rule can fail.

Jim

Jason_Brown

12:18 am on Oct 5, 2008 (gmt 0)

10+ Year Member



OK...

tried the addtype with still no joy.

With the more effiient code you described in the other thread, I will change to this:

# Force all remaining requests for named index files to drop
# the index file filename, and force www:
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.(html?¦php)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*)index\.(html?¦php)$ http://www.example.com/$1 [R=301,L]

Good suggestion about description of code, I will try to remember.

i'm now not sure apart from going back to support.

Nothing to do with addon ? Clutching at straws now.

Thanks
Jason

g1smd

12:29 am on Oct 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim reckons there is a problem with using that more efficient code in this application - I'm not sure what that problem is.

As far as I know, there are no rewrites in any of what we are doing.

The final problem, here, is in getting the AddHandler or AddType code to work.

Don't forget to also change the ¦ to be real "pipe" symbols. The forum software does not display them correctly.

What is happening now when you try to access your site? Do you see raw HTML or do you see your PHP scripting as well? What does Live HTTP Headers show for the Content-Type returned in the HTTP Header by the server?

Jason_Brown

1:13 am on Oct 5, 2008 (gmt 0)

10+ Year Member



YOU BEAUTY !

needed to make a tiny change:

AddHandler application/x-httpd-php5 .html .htm

Note the php 5

Thank you both very much for your help today.

Its 2.15am here in the UK - its the second night in a row I wont get much sleep.

Speak soon

Jason

g1smd

8:27 am on Oct 5, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm in the UK too, and I turned in minutes after my last post - an early night for me!

I completely forgot that sometimes you need php4 or php5 instead of just php - mostly you see that in the URL, and only a few rare sometimes in the setup line.

Anyway, glad you got it working OK in the end.

.

So, now your site should be working off all the same URLs that it always did and all your links should point to same filenames and same extensions as before - apart from for index files. Those should have a link that ends in a trailing / without the index filename being stated.

Your site should redirect any incoming external non-www URL requests to the www version. If someone requests any URL that includes an index filename (either .php, or .html or .htm) then the site should redirect to strip the index filename from the URL. If someone requests a .php URL (because they found and remembered one from sometime during the two days they did exist) they should be redirected to the .html URL.

.

You're not quite finished. The last few things to do are these:

1. Run your site through Xenu LinkSleuth. You need to make sure that all of your internal navigation is perfect - that you never hit a redirect when internally browsing the site, and that all the URLs are www, etc. There are two ways to do things, and I would do both.

Firstly I would just scan the site starting at example.com and then again at www.example.com and see what happens. That will verify your internal linking is correct, but is only part of the story.

Secondly, I would make a simple text file listing a load of test URLs. You can get the basic list from the output of the previous Xenu tests above. The initial list will show all 30 pages of your site as they exist now, as well as images, stylesheets, JavaScript files, and so on.

Copy any URL that ends with a / and duplicate it to end in /index.html and again for /index.php. Add a few URLs that should not exist at all, random letters. Add entries for /robots.txt etc too. Make sure the list also shows your images, stylesheet files, and JavaScript files. You'll now have a list of perhaps 80 URLs, maybe more.

What you now do is copy that list of 80 entries and change .html to .php each time (just use "find and replace" in your text editor), and add all of that back to the main list. You now have a list of 160 URLs. Now copy that list of 160 and this time remove the www from all of the URLs (find "www" replace with ""). Add those all back to the main list. You now have a list of 320 URLs.

Using cut and paste, and find and replace, it should only take a few minutes to build the list. Save this file as a text file. Get Xenu to check all the URLs in this file and make sure that all of the responses are correct: 200 / 301 / 404 etc for each one.

2. (Sign up for, and) Look at the data in Google WebmasterTools a few times per week for the next few weeks to see if Google did find any .php URLs and just keep an eye on how long they take to correct themselves (it can take 6 weeks, or it can take 3 days - it depends).

3. Run site checks to see what Google lists. Use all of these:

http://www.google.co.uk/search?num=100&q=site:www.yoursite.com

http://www.google.co.uk/search?num=100&q=-inurl:www+site:yoursite.com

http://www.google.co.uk/search?num=100&filter=0&q=site:www.yoursite.com

http://www.google.co.uk/search?num=100&filter=0&q=-inurl:www+site:yoursite.com

Be very careful with those searches. Some have www in a different place - that is correct. Just keep an eye on what they list, and how that changes over time.

4. Keep an eye on your log files and/or Google WMT for any URLs that are being requested that issue a 404. I find that I get incoming links from forums with a full stop or comma on the end of the URL. That is caused by the dumb auto-linking parser in their forum software. I set up a redirect to strip the punctuation so that the link now works. There is an example of that in the thread linked to above. You might want to install those two redirects, placed before all the stuff you just worked on yesterday.