homepage Welcome to WebmasterWorld Guest from 54.226.180.223
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
permanent 301 redirect - shtml to php without extension.
mavi




msg:4447102
 3:12 am on Apr 29, 2012 (gmt 0)

Hi there.

I just changed my site from shtml to php and used this code to hide the php extension.

RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php

But i would now like to redirect ALL shtml files to these non-extension-php-files

For example:
myfile.shtml should permanently 301 redirect to
myfile

Any suggestions would be gr8!
thanx
m.

 

lucy24




msg:4447115
 3:45 am on Apr 29, 2012 (gmt 0)

OK, what have you tried so far and what are the results?

Yup. This is one of those nasty sadistic forums where we force you to do it yourself, with nudges where needed. Kinda like forcing your kids to clean their own rooms even though you could do it yourself in a quarter the time.

mavi




msg:4447120
 3:59 am on Apr 29, 2012 (gmt 0)

I've tried this:

# first remove the .php
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php

# then redirect shtml to no-extension
RewriteRule ^(.+)\.shtml$ $1 [L,NC,R=301]


both work independently, but the codes don't work together...
I guess I'm creating endless redirects?
I've also tried redirecting to no-extension first.

g1smd




msg:4447155
 8:12 am on Apr 29, 2012 (gmt 0)

It's too late to redirect a request to a different URL once you have rewritten the request to fetch the content from a non-default location on the server hard drive.

The rules should be redirect first, rewrite second.

The redirect should include target domain name. Additionally, (.+) is too general a pattern. You need ([^.]+) here.

The rewrite with (.*) pattern captures all requests. The -f condition forces the server to check the hard drive to see if this request matches a real file. This takes forever. If it's not a file the request is then rewritten and the pointer now points at a .php filename. mod_rewrite then checks the htaccess file again, and the (.*) pattern again captures this request and the hard-drive is once again checked to see if this request matches a real file. This takes forever. The request does now match a real file and mod_rewrite exits. The content handler now takes over and checks to see if this file exists. It does, and the file is fetched and served.

The -f code is very very inefficient. The hard drive is checked several times per request. The (.*) pattern captures all requests. Since you know you want to rewrite only extensionless requests change the rule pattern to match only extensionless requests (something like ([^/.]+) or (([^/]+/)*[^/.]+) will do it) and delete the -f condition.

Add a non-www to www redirect after the shtml redirect and before the internal rewrite.

lucy24




msg:4447292
 9:32 pm on Apr 29, 2012 (gmt 0)

To be sure I've got this right:

All your pages used to be .shtml. (I'm going to have trouble with this. My eyes see "shtml" but my brain processes "https" and says "Huh? That's not an extension!" Oops.)

You still have the same pages in the same places, only now they have the .php extension.

You want to hide the extension so all the user sees is pagename-- which is the same as always, except for the extension.

Did I get that right?

One thing that's easy to forget is that everything passes through htaccess. Not just the URLs your user types in or clicks on, but also your stylesheets and images and javascript and behind-the-scenes php/html/whatever. But probably 95% of your rules only need to apply to pages, because if people are blocked or redirected from pages, they'll never get as far as asking for the wrong image in the wrong place.

So always constrain your rules as tightly as possible. If it applies only to one extension, or only to directories or only to extensionless requests, put that part in the Rule itself. That way the server won't even need to look at the condition most of the time.

There is some boilerplate on the redirect-to-rewrite two-step elsewhere in this Forum. You can search for it and you'll see the basic pattern:

RewriteCond looking at THE_REQUEST
RewriteRule to create pretty URL, ending in [R=301,L]

followed by

RewriteRule to get content from "real" page, ending in [L]

mavi




msg:4447305
 11:23 pm on Apr 29, 2012 (gmt 0)

@g1smd thanx. it almost worked this way, but with (([^/]+/)*[^/.]+) people were able to to see my server path

@lucy24: Yes, the same pages/filenames still exist. I was using shtml because i used to use SSI commands, and now its all php. and since i also want to get rid of the extensions, I want to avoid parsing shtml as php (which would have been another option)

So at the end I did this:(below) And it works.
But is it correct? Am i NOT slowing my server down - OR is my code it more complicated than it should be? Again, Im not using (([^/]+/)*[^/.]+) because it shows my server path.



# parse file work as file.php
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php

# force redirect of /dir/foo.shtml to /dir/foo
RewriteCond %{THE_REQUEST} ^GET\s.+\.shtml [NC]
RewriteRule ^(.+)\.shtml$ /$1 [R=301,L,NC]

# non-www to www
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.domain\.com$
RewriteRule ^(.*) [domain.com...] [R=301,L]

g1smd




msg:4447310
 12:15 am on Apr 30, 2012 (gmt 0)

You ignored the instructions to place redirects first, rewrites last. That's why "people can see your server path".

The redirects should each have the protocol and domain name in the target. That means both of them.

The -f condition takes hundreds, perhaps thousands, of times longer than processing the RegEx pattern that I suggested.

These are three of the most common errors seen in htaccess configuration.

Additionally, please use example.com in code snippets in this forum. That stops the auto-linking of the URL.

lucy24




msg:4447332
 1:51 am on Apr 30, 2012 (gmt 0)

RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(.*)$ $1.php


User-agent requests
directory/images/prettypic.jpg

The RewriteRule applies upfront to all requests, internal and external, with or without slash or extension, so request gets bumped up to Condition. Condition evaluates request:

Is there such a file as

directory/images/prettypic.jpg.php ?

Nope, we can skip this rule.

This step happens for every single image and stylesheet and external .js file and supplementary php file and requests for robots.txt and sitemap.xml and ... well, you get the idea.

Never use [NC] when you will be capturing and reusing. Depending on your server, you will end up with either Duplicate Content or a 404.

To be safe, you should redirect both .shtml and .php extensions. This will protect you against type-ins and against any links that you put up before you decided to go extensionless-- as well as against Senior Moments. ("Why do I keep getting these ### requests for directory/index.html? Oh. Because in one place on one page I absent-mindedly linked to it.") You can shove it all into a single Rule-plus-Condition:

RewriteCond %{THE_REQUEST} \.(php|shtml)\ HTTP
RewriteRule ^([^.]+)\.(php|shtml)$ http://www.example.com/$1 [R=301,L]

(I put in the space-plus-HTTP part to ensure that the extension came at the end of the request, as in the Rule. Note that literal spaces always have to be escaped in mod_rewrite, since a space by itself has syntactic meaning.)

If you don't include the full protocol and domain in each Redirect target, mod_rewrite will use whatever form came with the request. If it was the wrong form, the user will then get redirected twice. This creates extra work for the server and annoys search engines.

mavi




msg:4447363
 4:08 am on Apr 30, 2012 (gmt 0)

Hi again.

I hope this is an improvement now :)

# force redirect of shtml to no-extension
RewriteCond %{THE_REQUEST} ^GET\s.+\.shtml
RewriteRule ^(([^/]+/)*[^/.]+)\.shtml$ http://www.example.com/$1 [R=301,L]

# non-www to www
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(([^/]+/)*[^/.]+) http://www.example.com/$1 [R=301,L]

# parse file as file.php
RewriteCond %{REQUEST_FILENAME}\.php -f
RewriteRule ^(([^/]+/)*[^/.]+)$ $1.php


Removed the NC
Put the non-www to www in the middle and
added domain name to the shtml redirect. Should I also add it to the rewrite rule in # parse file as file.php
using (([^/]+/)*[^/.]+)
PROBLEM: When I remove the -f i get an internal server error

mavi




msg:4447365
 4:13 am on Apr 30, 2012 (gmt 0)

BTW: should non-www rewrite rule be with (.*) ?
BTW2: All my shtml files are gone now.

g1smd




msg:4447383
 6:29 am on Apr 30, 2012 (gmt 0)

That's looking a bit better. :)

The non-www/www rule should be for (.*) meaning all non-www redirected to www.

In the rewrite target, $1.php should be /$1.php [L]

Domain name is needed only on the two redirects. Don't add it to the rewrite otherwise you'll turn it into a redirect.

Clear your browser cache before you test again.

lucy24




msg:4447401
 8:18 am on Apr 30, 2012 (gmt 0)

PROBLEM: When I remove the -f i get an internal server error

I hope you mean just the -f, not the whole Condition. You'd certainly expect an error if you remove the "matching" half of the condition!

But you don't need a Condition here at all. You're simply making the server do the same work twice-- first while evaluating the Condition, and then later when it goes to the page for real. Go ahead and let it stick a php onto the extensionless request. If the file doesn't exist you'll end up with the same 404 either way, with less work for the server.

To clarify: the / in front of the rewrite target is not necessary to make the rule work. It's a security precaution.

mavi




msg:4447509
 2:31 pm on Apr 30, 2012 (gmt 0)

THANX for all the help with this !
It works gr8.

One more question though: What's the ^GET\s.+\ ?
Should I just use
RewriteCond %{THE_REQUEST} \.(php|shtml)\ HTTP
... because both works.

And is there a some kind of rule where i should put error docs and normal 301 redirects? I have about 200 which are over 5 years old.






RewriteCond %{THE_REQUEST} ^GET\s.+\.(php|shtml)
RewriteRule ^(([^/]+/)*[^/.]+)\.(php|shtml)$ http://www.example.com/$1 [R=301,L]

RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule ^(.*) http://www.example.com/$1 [R=301,L]

RewriteRule ^(([^/]+/)*[^/.]+)$ /$1.php

mavi




msg:4447548
 3:49 pm on Apr 30, 2012 (gmt 0)

BTW, "sometimes" the shtml doesn't disappear when you type a url with shtml and hit enter ... like 5 percent of the times. Is that normal?

g1smd




msg:4447645
 7:36 pm on Apr 30, 2012 (gmt 0)

Use Live HTTP Headers to check that a 301 redirect occurs, or something else happens.

In place of
^GET\s.+\.(php|shtml) I would use ^[A-Z]{3,9}\ /[^.]+\.(php|shtml)\ HTTP/

You don't need the RewriteBase line at all, certainly not in the middle of the file.

You need the [L] flag on every rule.

Whatever you mean by "normal redirects" they should all use RewriteRule syntax and appear at the top of the file.

mavi




msg:4447694
 9:51 pm on Apr 30, 2012 (gmt 0)

With "normal" redirect i meant something like this:

Redirect 301 /folder/file1273.htm /folder/file

I have over 300 5-10 year old 301 redirects like this and there is no way I can simplify them. So should I also use RewriteRule for them and move them up?

g1smd




msg:4447702
 9:57 pm on Apr 30, 2012 (gmt 0)

Yes. Never mix mod_rewrite directives (RewriteRule) with mod_alias directives (Redirect and RedirectMatch) in the same site.

Use RewriteRule for all of the rules.

mavi




msg:4447718
 10:45 pm on Apr 30, 2012 (gmt 0)

ok THAAANX again for your help ! :)

lucy24




msg:4447761
 1:54 am on May 1, 2012 (gmt 0)

I have over 300 5-10 year old 301 redirects like this and there is no way I can simplify them. So should I also use RewriteRule for them and move them up?

5-10 years old?!?!?! Just how slow on the uptake are your users? If you're waiting for search engines to stop periodically crawling old URLs ... well, I hope you come from a long line of centenarians.

My current rough-and-ready rule, just because I had to come up with something, is:

Redirect (301) for one year (rounded off to the nearest quarter to make it easier for me to keep track).

Change to Gone (410) for another year. If you don't already have a custom 410 page for humans, make one. Or simply send 'em to your 404 page. At the beginning of this period there will be a flurry of activity as the search robots run around screaming Where'd it go? What happened? Something has changed! but it will soon level off.

After the second year, pull the plug. Yank those long-gone suckers out of your htaccess entirely. Again, make sure you have a nice human-friendly 404 page and keep its links up-to-date. The search engines will still come by periodically to sulk. But at this point they are just being silly and can be safely ignored.

That's for pages. For supporting files, three months instead of a year.

Oh, and...

If you have a text editor that speaks RegEx (just about any dialect will do), open a copy of your htaccess. If you are brave, this should even work as an unsupervised global replace.

#1 change . to \.
^(Redirect \d\d\d \S+?[^\\])\. TO $1\\.
#2a now change Redirect to Rewrite
^Redirect(?:Match)? 301 /(.+) TO RewriteRule $1 [R=301,L]
#2b and
^Redirect(?:Match)? 410 /(.+) TO RewriteRule $1 - [G]

Abracadabra :)

For sorting: In general, sort in order of strongest to weakest: [F] before [G] before [R] before your basic Rewrite in [L]. (I stress: in general. There are exceptions.) Within each group, sort from most specific to most general. If a Rule has more than one Condition, start with the one most likely to fail.

If you have a whole bunch of conditionless Rules, like all those former Redirects, you may group them thematically without putting a space after every single one. But always space before and after a Condition(s)-plus-Rule group.

g1smd




msg:4447824
 6:44 am on May 1, 2012 (gmt 0)

I leave 301 redirects in place essentially forever, using the word "permanent" as a hint. Just last month Google asked for some URLs that haven't existed since 1998 - but which (presumably) something somewhere links to and which still bring the odd visitor. The site has had two or three changes of URL structure since then (originally .html, then two versions with parameters, and now extensionless), and the redirects get the visitor to the right page with the minimum or fuss. YMMV.

lucy24




msg:4447830
 7:19 am on May 1, 2012 (gmt 0)

Well, the redirects I'm currently dumping are linked from nowhere ... except other pages from the same long-gone directories. (googlebot looks around in bewilderment: "###! I swear there were links to this page around here somewhere!") And, of course, sitemaps last seen in 2010.

I do have some redirects that will stay forever, but they're for pages that a human might really try to reach by mistake. For assorted historical reasons I have a fair number of directories that contain pages but no index page, so each of those potential requests gets pointed to the appropriate place. (Never, ever the top-level home page.) No point to changing the structure, because then I'd just be redirecting the ### search engines instead :)

It's like a business putting up a sign saying "We've moved." Eventually they will take down the sign. Doesn't mean they've gone back to the old location; it just means that everyone who might be interested should know by now.

mavi




msg:4448050
 4:32 pm on May 1, 2012 (gmt 0)

Aha.... Well, I think most of these old redirects are very important in my case. They have some very strong back links and I keep getting big traffic from my old links. And for many of these pages I rank #1 on search engines. To me it looks like, in times of Facebook and Twitter there are less "real" websites and therefore less links from NEW sites. (Especially in my niche) So I will check which ones are really important, but I think I will keep most of them. But I will try to change to rewriteRule instead of using redirect 301.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved