Forum Moderators: phranque

Message Too Old, No Replies

Errors in .htaccess - where?

         

JackR

7:45 pm on Oct 15, 2011 (gmt 0)

10+ Year Member



Hey everyone,

I wanted to ask for some help to optimise my .htaccess file. One checker reports errors, but I'm not sure what exactly is wrong:



Options +FollowSymLinks
RewriteEngine on
#RewriteOptions MaxRedirects=3
ErrorDocument 404 /notfound.shtml
# RewriteBase /
#
# Return 410-Gone for myEmail URLs
RewriteRule myEmail - [G]
#
# Return 410-Gone for specific query string
RewriteCond %{QUERY_STRING} &usg=#*$!#*$!#*$!#*$!#*$!x
RewriteRule .* - [G]
#
# Internally rewrite links URLs to non-existent path to force a 404-Not Found response
RewriteRule ^links/ /notfound.shtml [R=404,L]
#
# Externally redirect request with specific query strings
RewriteCond %{QUERY_STRING} ^a=j$
RewriteRule .* /joinus.htm? [R=301,L]
RewriteCond %{QUERY_STRING} ^a=g$
RewriteRule .* /example.html? [R=301,L]
#
# Externally redirect direct client requests for "<any-directory>/index.html" and # "<any-directory>/index.htm" to "<any-directory>/" RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?.*\ HTTP/
RewriteRule ^(([^/]*/)*)index\.html?$ [domain.com...] [R=301,L]
#
# Internally rewrite specific URLs to example.cgi
RewriteRule ^([^/]*/)*[^.]+\.html?$ /cgi-bin/example.cgi [L]
RewriteRule ^sitemap\.xml$ /cgi-bin/example.cgi?a=sX [L]
#
# Externally redirect the non www hostname to the www hostname
RewriteCond %{HTTP_HOST} ^domain.com [NC]
RewriteRule (.*) [domain.com...] [R=301,L]
#
# Externally redirect to fix up FQDN and appended port numbers
RewriteCond %{HTTP_HOST} ^domain.com(\.|:[0-9]*) [NC]
RewriteRule (.*) [domain.com...] [R=301,L]



Thank you

g1smd

7:54 pm on Oct 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes. There's syntax errors, rules in the wrong order, overuse of the .* pattern, several places that need + not *, and escaping missing from some of the literal periods.

There's at least one post per week here, for the last decade, covering the fixes for these issues in detail.

There's also a line of code that is effectively commented out. For long comments spread the comment over two lines with a # at the beginning of each.

Use example.com in this forum to stop URL auto-linking.

JackR

8:24 pm on Oct 15, 2011 (gmt 0)

10+ Year Member



Thanks for the reply. I feared as much.

I'm not going to touch it just in case I kill the site. One for the web guy on Monday I think.

JackR

8:33 pm on Oct 15, 2011 (gmt 0)

10+ Year Member



Actually, after searching through previous threads here on the forum I found the thread which I used for the .htaccess originally:

[webmasterworld.com...]

Can anyone please suggest what I should revise and/or remove entirely?

Thank you

lucy24

10:41 pm on Oct 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Pin this line to your wall.
Mod_Rewrite is as clear as Medieval Icelandic Poetry!

You were joking but there's a truth behind the joke: mod_rewrite has very clear and specific rules. They just happen to be different from the rules that apply to everything else in the known universe.

In all, it seems you may be copying and pasting code without fully understanding it.

Well, where would we be without the scissors and glue-pot? :)

If your site has been functioning since 2005 (date of the original thread) something must have been working. Although possibly not

RewriteRule ^links/ /notfound.shtml [R=404,L] 


which for some reason is the one that caused me to leap up and scream.

g1smd

10:48 pm on Oct 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



RewriteRule ^links/ /notfound.shtml [R=404,L]

... is actually valid code in some scenarios.

I much prefer
RewriteRule ^links/ /does-not-exist [L]

as that also triggers the correct 404 response.

JackR

11:27 pm on Oct 15, 2011 (gmt 0)

10+ Year Member



After 5 years of trying, I gave up lucy24!

Here's my slightly modified version.

I'm open to suggestions (g1smd, *cough*) as to how to make this .htaccess cleaner than the first snow of winter. I'm guessing that this file can probably be half as long ...



Options +FollowSymLinks
RewriteEngine on
#
# Redirect all requests for all non-canonical domains to same page in www.spicegirlslondonescorts.com
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
#
RewriteOptions MaxRedirects=3
ErrorDocument 404 /notfound.shtml
# RewriteBase /
#
# Return 410-Gone for myEmail URLs
RewriteRule myEmail - [G]
#
# Return 410-Gone for specific query string
RewriteCond %{QUERY_STRING} &usg=ALkJrhgz6MRDcFkp-kcCoKgNS9ERG-CLtQ
RewriteRule .* - [G]
#
# Internally rewrite links URLs to non-existent path to force a 404-Not Found response
RewriteRule ^links/ /does-not-exist [L]
#
# Externally redirect request with specific query strings
RewriteCond %{QUERY_STRING} ^a=j$
RewriteRule .* /joinus.htm? [R=301,L]
RewriteCond %{QUERY_STRING} ^a=g$
RewriteRule .* /example.html? [R=301,L]
#
# Externally redirect direct client requests for "<any-directory>/index.html" and # "<any-directory>/index.htm" to "<any-directory>/" RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?.*\ HTTP/
RewriteRule ^(([^/]*/)*)index\.html?$ http://www.example.com/$1? [R=301,L]
#
# Internally rewrite specific URLs to example.cgi
RewriteRule ^([^/]*/)*[^.]+\.html?$ /cgi-bin/example.cgi [L]
RewriteRule ^sitemap\.xml$ /cgi-bin/example.cgi?a=sX [L]
#
# Externally redirect to fix up FQDN and appended port numbers
RewriteCond %{HTTP_HOST} ^example.com(\.|:[0-9]*) [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

lucy24

2:05 am on Oct 16, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



# Redirect all requests for all non-canonical domains to same page

goes after all other redirects, because any specific redirect will have taken care of the www at the same time. Never use two redirects if one will do. Especially if...

RewriteOptions MaxRedirects=3

should go near the very top-- but not as a RewriteOption, because
MaxRedirects is no longer available in version 2.1 and later

If your version of Apache is that old, you've got more serious problems than the details of htaccess ;) I believe MaxRedirects is now a core directive and goes by a different name. Code for: I know it exists, and the default is generally 10, but I can't find the ### thing.

Edit: Further investigation tells me that, first, I have got it mixed up with LimitInternalRecursion which is something entirely different, and, second, that Apache dumped the MaxRedirects option almost as soon as it was created:
MaxRedirects is available in Apache 2.0.45 and later

which doesn't leave a lot of room.
If you really need more internal redirects than 10 per request, you may increase the default to the desired value.

Wiser heads apparently prevailed, and decided that if you believe you need more than ten redirects, you probably should not be allowed loose near mod_rewrite in the first place ;)

Now then...
# RewriteBase /

Last time I looked, / was the default, so you won't need this. Keep it as a commented-out line if it makes you comfortable.

Put a blank line after each RewriteRule. This is not a mod_rewrite thing; it's to keep yourself organized. (Space characters have meaning; blank lines don't.) You don't need # before empty lines, but again, you can do it that way if it makes you comfortable.

I am a little mystified by the rule involving the /links/ directory. Does it actually exist? If everything involving this directory is getting redirected, we're not talking about something like a private directory or storage space, because you don't have any exceptions. (Like IP address if you want to exclude yourself, or %{THE_REQUEST} if it's used internally.)

JackR

12:13 pm on Oct 16, 2011 (gmt 0)

10+ Year Member



Thank you Lucy

According to Firebug, the server is as follows: Apache/2.2.3 (CentOS).

The site used to have a Links directory. This was removed entirely several years ago, but Google still tries occasionally to crawl one or two of those 'gone' pages.

I've read through your comments and made further revisions. Please let me know what you think. It's looking better already!

One thing I did wonder though: is it not possible to group ALL instances of 'RewriteCond' or 'RewriteRule' together?



Options +FollowSymLinks
RewriteEngine on

# Redirect all requests for all non-canonical domains to same page in www.example.com
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

# Custom 404 page
ErrorDocument 404 /notfound.shtml

# Return 410-Gone for myEmail URLs
RewriteRule myEmail - [G]

# Return 410-Gone for specific query string
RewriteCond %{QUERY_STRING} &usg=ALkJrhgz6MRDcFkp-kcCoKgNS9ERG-CLtQ
RewriteRule .* - [G]

# Internally rewrite links URLs to non-existent path to force a 404-Not Found response
RewriteRule ^links/ /does-not-exist [L]

# Externally redirect request with specific query strings
RewriteCond %{QUERY_STRING} ^a=j$
RewriteRule .* /joinus.htm? [R=301,L]
RewriteCond %{QUERY_STRING} ^a=g$
RewriteRule .* /example.html? [R=301,L]

# Externally redirect direct client requests for "<any-directory>/index.html" and # "<any-directory>/index.htm" to "<any-directory>/" RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]*/)*index\.html?.*\ HTTP/
RewriteRule ^(([^/]*/)*)index\.html?$ http://www.example.com/$1? [R=301,L]

# Internally rewrite specific URLs to example.cgi
RewriteRule ^([^/]*/)*[^.]+\.html?$ /cgi-bin/example.cgi [L]
RewriteRule ^sitemap\.xml$ /cgi-bin/example.cgi?a=sX [L]

# Externally redirect to fix up FQDN and appended port numbers
RewriteCond %{HTTP_HOST} ^example.com(\.|:[0-9]*) [NC]
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

lucy24

10:06 pm on Oct 16, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



is it not possible to group ALL instances of 'RewriteCond' or 'RewriteRule' together?

Each RewriteCond applies only to the immediately following RewriteRule. You can't have a batch of rules all attached to the same condition, darn it. So every time you pass a rule, putting in an empty line tells you (the site maintainer) "this one's done, now on to the next one".

mod_rewrite works on a "two steps forward, one step back" principle. It only looks at the conditions if the rule itself might apply: for example, a request for such-and-such directory. So try to work at least part of the condition into the rule. At least \.html$ (or php or whatever you use), so Apache doesn't have to backtrack and check the rule every single time. If someone's allowed into the main file, you can generally assume they've got permission to access all the associated files like images and style sheets, so you don't need to slow down your server by making it check every last request.

If someone asks for a nonexistent directory, don't they already get a 404 without need for any action on your own part? They do in MAMP, which leads me to assume it's default behavior.

g1smd

10:10 pm on Oct 16, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



At the beginning of the file you list rewrites which block access for malicious requests.

Next come the redirects. These redirect requests to a new or different URL.

The non-www/www redirect(s) must be the last of the redirects and listed before the content-delivery rewrites.

JackR

11:44 pm on Oct 16, 2011 (gmt 0)

10+ Year Member



Noted and understood. Makes sense that there's clear space between each rule.

Before I upload my cleaner version, is there anyting either of you would suggest can be safely removed?