Forum Moderators: phranque

Message Too Old, No Replies

htaccess file suddenly broken

         

bmen

1:27 am on Jun 7, 2012 (gmt 0)

10+ Year Member



I have a website that I wrote and maintain for a client. I am currently hosting it on 1&1. The website was working perfectly up until 3-4 weeks ago. No changes were made to it anywhere in that time frame, the client just noticed that the redirects were no longer working.

The site is set up so that all urls are redirected to a view.php file, which then loads the correct page into the website template using a php include statement.

There are three main test cases:
1)
http://www.example.com/factsheets/workplace/industrial-safety

-> should redirect to: /view.php?fs=workplace&p=industrial-safety

2)
http://www.example.com/library/topical-index

-> should redirect to: /view.php?f=library&p=topical-index

3)
http://www.example.com/factsheets/pdfs/workplace-safety-programming.pdf
-> should NOT redirect

When it first broke (loaded the view.php page, but can't find the correct content page to include), I used a php echo statement in the view.php file to find out what the GET variables were that were being passed. Consistently they turned out to be f=reset: and p=view, no matter what url was tried.

I tried editing the htaccess file to update it and make sure everything was clean, simple, and correct. Now, case 1 is working fine. Case 2 was working 5 minutes ago, but isn't now - it tries to redirect to the view page. Case 3 is going straight to the '/library/topical-index.php' page that actually holds the content instead of redirecting to the view.php page so that it loads inside the template. Both htaccess files are below.

Thanks in advance for any help you can provide!


Original htaccess file
(which WAS working. I did try calling 1&1, but they assured me they hadn't done anything except update Apache a little over a month ago, but that was before my site broke so it MUST be my script).

NOTE: last 8 rewriterules are the relevant ones.


RewriteEngine on
RewriteBase /

#Exceptions to rewrite rules
RewriteRule ^(phaseout)($|/) - [L]

#Control user navigation quirks (like just deleting parts of the link)
redirect 301 /index.htm http://www.example.com
redirect 301 /default.htm http://www.example.com
redirect 301 /default.html http://www.example.com

RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^.*/index\.html
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,L]

#Ensure the website is always addressed as www.hazardcontrol.com
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

ErrorDocument 404 /sitemanagement/404
ErrorDocument 403 /sitemanagement/403

... (a bunch of redirect 301 rules to keep search engine rank from the old site) ...

#Rewrite dynamic URIs as static/flat URIs for user ease and search engine happiness
RewriteRule ^factsheets/([^/\.]+)/([^/.]+)$ view.php?fs=$1&p=$2 [L]
RewriteRule ^factsheets/([^/.]+)/([^/.]+).php$ view.php?fs=$1&p=$2 [L]
RewriteRule ^factsheets/([^/.]+)/([^/.]+).htm$ view.php?fs=$1&p=$2 [L]
RewriteRule ^factsheets/([^/.]+)/([^/.]+).html$ view.php?fs=$1&p=$2 [L]

RewriteRule ^([^/.]+)/([^/.]+)$ view.php?f=$1&p=$2 [L]
RewriteRule ^([^/.]+)/([^/.]+).php$ view.php?f=$1&p=$2 [L]
RewriteRule ^([^/.]+)/([^/.]+).htm$ view.php?f=$1&p=$2 [L]
RewriteRule ^([^/.]+)/([^/.]+).html$ view.php?f=$1&p=$2 [L]





New htaccess file
NOTE: Last 3 rewriterules are the relevant ones.


RewriteEngine on
RewriteBase /

#Exceptions to rewrite rules
RewriteRule ^(phaseout)($|/) - [L]

#Control user navigation quirks (like just deleting parts of the link)
redirect 301 /index.htm http://www.example.com
redirect 301 /default.htm http://www.example.com
redirect 301 /default.html http://www.example.com

# -> Prevent users from accessing another domain name
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://www.example.com/$1 [R=301,N]

# -> Prevent users from trying to access the 'index page' of a folder or section
RewriteCond %{THE_REQUEST} ^.*/index\.html
RewriteRule ^(.*)index.html$ http://www.example.com/$1 [R=301,N]

# -> Ensure the website is always addressed as www.example.com
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,N]

#Custom error docs
ErrorDocument 404 /sitemanagement/404
ErrorDocument 403 /sitemanagement/403

#Maintain legacy search engine page ranks by redirecting them to our new pages
...(the same redirect 301 rules as the original htaccess file) ...

#Rewrite dynamic URIs as static/flat URIs for user ease and search engine happiness
RewriteRule ^(\.pdf)$ - [L]
RewriteRule ^factsheets/([^/\.]+)/([^/\.]+)(.*)$ view.php?fs=$1&p=$2 [L]
RewriteRule ^([^/\.]+)/([^/\.]+)(.*)$ view.php?f=$1&p=$2 [L]

[edited by: bmen at 1:54 am (utc) on Jun 7, 2012]

Leosghost

1:34 am on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You might want to read this [webmasterworld.com...]
Especially the 8th bullet point..an admin will be along ( on their time ) to "clean up".. re the 8th bullet point..

Btw..using 1&1 is not, in most people's experience, a good idea for mission critical sites..nor those of the local kindergarten etc, even..their support* is, second, to just about everyone..
* I use the word "support" in a very loose sense here re 1&1..one might go so far as to say "surrealistic" sense..

bmen

1:58 am on Jun 7, 2012 (gmt 0)

10+ Year Member



I thought I had read the charter, but missed that point. Sorry! I edited my post to remove all specific names, and to remove all the irrelevant redirect statements. Thanks for alerting me to my error.

The lack of support with 1&1 has caused me to switch away from them for new sites. Regrettably, switching hosts isn't much of an option for this customer right now. Any help in pointing out an error in the htaccess code would be much appreciated.

Leosghost

2:20 am on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I thought I had read the charter, but missed that point. Sorry! I edited my post to remove all specific names, and to remove all the irrelevant redirect statements. Thanks for alerting me to my error.

Don't worry, you actually went back and edited ..many don't..

re "errors"..I'm afraid that it being after 04.15 where I am ..and it having been a very very long day, any suggestions re errors in code might very well suffer from bleary eyes and even more bleary brain..and might well ( almost certainly introduce more errors..if errors there be ..and not just 1&1 "the twilight zone of tech support" normal FUBAR :) ..

But lucy24 is up and about nearer "the left side of pond" daylight times ..and bright eyed and scaly tailed :)..and speaks fluent htaccess..and I even saw gs1md earlier..

So you'll be advised and in good hands..

Meanwhile ..welcome to WebmasterWorld :)

wilderness

2:40 am on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



gs1md and lucy are more aware of these things than I, however the likely reason is a "loop"?

What do your error logs say?

Explore this from the Apache docs:

The [N] flag causes the ruleset to start over again from the top, using the result of the ruleset so far as a starting point. Use with extreme caution, as it may result in loop.

lucy24

5:09 am on Jun 7, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yawp! I would have completely overlooked that [N]. Don't know about you, but I say unless your name is jdmorgan, don't mess with [N]. Or [S]. Or [C]. They will simply make the rule sneak up and bite you behind.

# -> Ensure the website is always addressed as www.example.com
RewriteCond %{HTTP_HOST} ^example\.com$ [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,N]

This should be your last Redirect, and it should have an [L] flag like all other Redirects. The RewriteCond covers the likeliest condition, but for a truly robust canonicalization rule it should say

:: crossing fingers ::

!^(www\.example\.com)?$

Your post often says "redirect" when you mean "rewrite" (Apache likes to confuse you by saying "internal redirect"), but the actual rules have it right, so that's the important part.

And speaking of redirect...

redirect 301 /index.htm http://www.example.com


Double yawp! Do not mix mod_rewrite and mod_alias (Redirect by that name) in the same htaccess. You can sometimes do it in the config file when it is your own server and you know beyond the shadow of a doubt what will execute when, but otherwise don't risk it. Especially when the host goes tweaking the configuration and possibly moving to a new version and not telling you what else has been changed.

The "index" bit should in any case be a generic rule, expressed something like

RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]

It would normally be your second-to-last redirect, before the final www mop-up.

^factsheets/([^/\.]+)/([^/\.]+)(.*)$ view.php?fs=$1&p=$2 [L]

This format-- I've picked one at random-- is bad news. You seem to be saying: Make the first "directory" into the first query, the second "directory" into the second query ... and ignore anything that comes afterward. That "anything" could be a directory slash, or it could be "index.php", or it could be google's beloved "anyoldrandomgarbage.html". Stick around and you will get to hear g1smd reading you the riot act about Infinite URL Space. Figure out exactly what, if anything, comes after the second capture, and write the rule accordingly.

RewriteRule ^factsheets/([^/\.]+)/([^/.]+)$ view.php?fs=$1&p=$2 [L]
RewriteRule ^factsheets/([^/.]+)/([^/.]+).php$ view.php?fs=$1&p=$2 [L]
RewriteRule ^factsheets/([^/.]+)/([^/.]+).htm$ view.php?fs=$1&p=$2 [L]
RewriteRule ^factsheets/([^/.]+)/([^/.]+).html$ view.php?fs=$1&p=$2 [L]

These are all the same rule. If you wanted to use them-- which you don't-- they would collapse to

{blahblah}(\.(php|htm?))?$

This in particular
RewriteRule ^([^/.]+)/([^/.]+).php$ view.php?f=$1&p=$2 [L]

risks an infinite loop. You need a RewriteCond looking at THE_REQUEST to make sure you're not rewriting something that has already been rewritten.

But what you really need most of all is to sort out what your URLs actually look like.

:: looking airily around for g1smd and lecture on Going Extensionless ::

phranque

5:53 am on Jun 7, 2012 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, bmen!

you've gotten some great advice here and hopefully you can act on that and solve it or try another iteration here soon.

speaking more generally your problem description at each stage should include something more specific than "broke" to describe the HTTP response.
at the least you should be describing the status code and any relevant headers such as the Location: header in this case.
there are tools such as Live HTTP Headers for firefox that will show you the entire request and response for each resource.
also if you have access to your server log files you will often find useful clues in the access log and error log.

{blahblah}(\.(php|htm?))?$

i think that's missing an 'l'.
(how could a linguistic lucy do that?)
should be:

{blahblah}(\.(php|html?))?$

bmen

1:53 am on Jun 8, 2012 (gmt 0)

10+ Year Member



Thanks for the welcome and the help all y'all! I'm digesting and trying to apply it all right now, and will let you know the results as soon as I can.

bmen

1:40 am on Jun 15, 2012 (gmt 0)

10+ Year Member



OK y'all, I've tried heavily editing my htaccess file, and here's what I have simplified and edited it down to at the moment:

RewriteEngine on
RewriteBase /

RewriteRule ^(\.pdf)$ - [L]

#RewriteCond %{THE_REQUEST} ^factsheets/.* [NC]
RewriteRule ^factsheets/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)(\.(php|htm?))?$ view.php?fs=$1&p=$2 [L]

#RewriteCond %{THE_REQUEST} ^(about|copyright|library|products|sitemanagement)/.* [NC]
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)(\.(php|htm?))?$ view.php?f=$1&p=$2 [L]

#RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]

#RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
#RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]


When I try this on my local test server, it works just fine IF I comment out the RewriteCond's (as above). For some reason, however, it throws a 404 if I try to use any of the RewriteCond's.

On 1&1, it just loads the plain content page directly without rewriting the URL to load the 'view.php' page which is supposed to then load the content into the template that is part of the view.php page. Any ideas?

lucy24

4:13 am on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



THE_REQUEST is the entire request, beginning with GET. Or POST or whatnot. You have to either give the whole thing, or leave out the beginning anchor.

And
:: cough, cough ::
you have replicated my typo as caught by phranque up above:

(\.(php|htm?))?$

error for
(\.(php|html?))?$


The extension has to be html? (meaning html or htm) not htm? (meaning htm or, ahem, a malformed request).

And you only need this full package if all three forms actually exist. Otherwise, let them carry on and collect their well-earned 404s.

Oops.

The rule
RewriteRule ^(\.pdf)$ - [L]


will always fail, unless you really are plagued with people asking for
www.example.com/.pdf


like that. Otherwise, you need to leave off the opening anchor. And you don't need the parentheses, since you are not capturing.

Note also that this [L] has to come further down if your RewriteRules include anything in [F]. You don't want a bunch of evil robots running wild among your pdfs do you? (Or do you? The OP was several days ago; I can't remember that far.)

I think you have got some of your THE_REQUEST lines backward, anyway. Conceptually backward, not structually backward. The basic principle in a redirect-to-rewrite situation is: When the rule refers to a long complicated icky URL, look at THE_REQUEST to verify that that is what the (human) user originally asked for. You don't need to verify THE_REQUEST in the case of a pretty URL, because the whole point of your rewrite is that pretty URLs can only come from the outside. You will never rewrite to them.

g1smd

6:36 am on Jun 15, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The rules are in the wrong order.

You need to list redirects before rewrites otherwise the redirects will expose previously rewritten filepaths back out on to the web as new URLs.

Within each of those two groups list from most specific to most general. For the redirects that will see the index redirect listed before the non-www to www redirect.

For your index redirect, you are missing the RewriteCond that should go with it. You had it in your original post, but not your latest.

If you ever test THE_REQUEST, the RegEx pattern will need to begin ^[A-Z]{3,9}\ / because the request begins GET / or POST / followed by any path, file and/or parameters.

To exclude .pdf requests from being rewritten, make sure the rule is the first of the rewrites.

Don't use Redirect in the same file you use RewriteRule. Convert all of the rules to use RewriteRule.