homepage Welcome to WebmasterWorld Guest from 54.197.171.109
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
How to fix this htaccess code?
Please take a look at this code
Murmur




msg:4678689
 10:20 am on Jun 10, 2014 (gmt 0)
RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[^.]+)\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?domain\.com [NC]
RewriteRule \.(jpe?g|gif)$ - [F]

ErrorDocument 404 http://domain.com
DirectoryIndex index.html

Redirect 301 /download http://domain.com




(1) Can someone please advise what is the function of this HTACCESS code and does it look clean?

(2) Can this code be simplified to make it parse much quicker without changing any functionality? I feel it resolves with a minor truncation.

(3) How can we change the code so that if someone enters domain.com they get sent to domain.com/ (trailing slash included).

(4) My server admin keeps telling me that all instances of http://domain.com in the code SHOULD be changed to:

http://domain.com/ (with trailing slash included)

And the reasoning was that the correct mode is with the trailing slash. Without it, there would be a small delay (5ms?) in resolving the page.

Is he is right about that and if so, how should this code be altered?

I realize a lot of people will have different views on how to change the code but there doesn't seem to be a fixed idea about how to treat the trailing slash.....

 

phranque




msg:4678706
 11:00 am on Jun 10, 2014 (gmt 0)

welcome to WebmasterWorld, Murmur!


(1) Can someone please advise what is the function of this HTACCESS code and does it look clean?



RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} ^([^.]+\.)+([^.]+\.[^.]+)\.?(:[0-9]+)?$
RewriteRule (.*) http://%2/$1 [R=301,L]

boilerplate mod_rewrite startup code.
followed by mod_rewrite ruleset that looks designed for a hostname canonicalization redirect.
i would suggest something like this instead:
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) http://example.com/$1 [R=301,L]

this should be the last ruleset for external redirects.


RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com [NC]
RewriteRule \.(jpe?g|gif)$ - [F]

this ruleset looks like it is designed to provide a 403 Forbidden response for image file hotlink protection.


ErrorDocument 404 http://example.com

this specifies a custom 404 error document.
however since you specified the protocol and hostname, this will cause a "soft 404" which in this case is a redirect to a 200 OK (the homepage).
also showing the home page for a 404 Not Found is not friendly.
i would specify a local/relative path to a helpful error document.


DirectoryIndex index.html

this defines index.html as the default directory index document.
whenever a directory path (trailing slash) is requested, index.html in tht directory will be used instead of displaying the directory contents.


Redirect 301 /download http://example.com

this is a mod_alias directive that will 301 redirect requests for /download to the homepage.
you should not mix mod_rewrite and mod_alias directives.
you should be using a RewriteRule here.


IMPORTANT: Please Use Example.com For Domain Names in Posts [webmasterworld.com]

phranque




msg:4678717
 11:15 am on Jun 10, 2014 (gmt 0)

(3) How can we change the code so that if someone enters example.com they get sent to example.com/ (trailing slash included).



you want to use mod_dir's DirectorySlash Directive:
http://httpd.apache.org/docs/current/mod/mod_dir.html#directoryslash


(4) My server admin keeps telling me that all instances of http://example.com in the code SHOULD be changed to:

http://example.com/ (with trailing slash included)

And the reasoning was that the correct mode is with the trailing slash. Without it, there would be a small delay (5ms?) in resolving the page.

all external redirects should specify the full canonical protocol, hostname and path to avoid chained redirects.
an additional redirect is a full round trip request/response which is surely more than 5ms.

Murmur




msg:4678725
 12:03 pm on Jun 10, 2014 (gmt 0)
I have edited the code to include your suggestion as follows:


RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) http:/[smilestopper]/example.com/$1 [R=301,L]
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com [NC]
RewriteRule \.(jpe?g|gif)$ - [F]

ErrorDocument 404 http:/[smilestopper]/example.com/
DirectoryIndex index.html

Redirect 301 /download http:/[smilestopper]/example.com/



(1) We have a wildcarding in our url so that for instance:

http:/[smilestopper]/wwwwwwwwww.example.com
ends up at http:/[smilestopper]/example.com/

http:/[smilestopper]/example.com/nonexistantpage.html
ends up at http:/[smilestopper]/example.com/

What part of this code handles that?
Will this continue to work with the above code?

(2) How do we change this code to include the trailing slashes? Is it now better that we have added / (trailing slash) at the end of the last two instances of example.com? Or should it be http:/[smilestopper]/www.example.com/ and then let the code switch it to http:/[smilestopper]/example.com/ ?

(3) By external redirects using the full canon protocal, do you mean if you are redirecting another domain name to example.com, you should use: http:/[smilestopper]/www.example.com/ rather than say http:/[smilestopper]/example.com/ or http:/[smilestopper]/example.com (without slash)? I'm not clear on this point.

(4) Does the new code I have given here (with your changes) look fully functional? In your estimation, will this change make an improvement?


Welcome all views on this & thank you in advance....
lucy24




msg:4678787
 5:25 pm on Jun 10, 2014 (gmt 0)

Psst! You can also prevent unwanted smileys by wrapping your code in [code ] tags. This also makes it look more, er, code-like.

RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com [NC]
RewriteRule \.(jpe?g|gif)$ - [F]

The first line says "only apply this rule if the request has a referer at all". That's to exclude search engines as well as the occasional human browser that simply doesn't send a referer.

The second line says "don't apply the rule if the referer is my own site". I used this same form for years before realizing that it's wrong.

Once you've got your htaccess hammered into shape, it will contain a domain-name-canonicalization redirect. That means everyone who lands on your site will use the same form of the name: either example.com or www.example.com, your choice, but only one or the other. And always with the same casing. This, in turn, means that any referer in the other form is automatically fake. So the anti-hotlinking line should say

RewriteCond %{HTTP_REFERER} !^http://example\.com
OR
RewriteCond %{HTTP_REFERER} !^http://www\.example\.com

Not both, nothing optional, and no [NC] tags. If you use https you do need to allow for it, but here I assume you don't.


I was working up some boilerplate on cleaning up your htaccess so let's try it here. It doesn't have any lethal typos, but is otherwise a bit rough around the edges:
Cleaning up an htaccess file

Step 1: Organize. Collect all the directives for each module in one place. The server doesn't care, but you-- and anyone who comes along after you-- will appreciate it.

Tip: Use a text editor with a "Find All" window to pull up all lines beginning with the element "Rewrite..." That takes care of mod_rewrite; dump them all at the end for now.

Step 2: Get rid of all <IfModule> envelopes. Not their contents, just the envelopes themselves. These envelopes are hallmarks of mass-produced htaccess files that have to work anywhere, on any server. You are now on your own site. Any given mod is either available to you or it isn't.

Step 3: Sort by module. The server doesn't care what order the directives are listed in, or even if rules from different modules are all garbled together. Each module works separately, seeing only its own directives. But humans need to be able to find things.

For most people it will be most practical to group one-liners at the beginning:

Options -Indexes

is a good start. If your htaccess file contains only one line, that's probably it. Other quick directives are ones starting with words like AddCharset or Expires. Then list your error documents.

If you have any very short Files or FilesMatch envelopes, put them near the top too. For example:
<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

<FilesMatch "\.(css|js)">
Header set X-Robots-Tag "noindex"
</Files>


Be sure to have an "Allow from all" envelope for your custom 403 page. If you are on shared hosting and they provide default error-document names such as "forbidden.html", this has probably already been done in the config file. But it does no harm to repeat it.

Step 4: Consolidate redirects.

Step 4a: Get rid of mod_alias. If your htaccess file contains any mod_rewrite directives, it can't use mod_alias (Redirect... by that name), or things may happen in the wrong order. For large-scale updating, use these Regular Expressions, changing \1 to $1 if that's what your text editor uses. Each of these can safely be run as an unsupervised global replace.

# change . to \. in pattern
^(Redirect \d\d\d \S+?[^\\])\.
TO
\1\\.

# now change Redirect to Rewrite
^Redirect(?:Match)? 301 /(.+)
TO
RewriteRule \1 [R=301,L]

# and if needed
^Redirect(?:Match)? 410 /(.+)
TO
RewriteRule \1 - [G]

^Redirect(?:Match)? 403 /(.+)
TO
RewriteRule \1 - [F]


Step 4b: Sort your RewriteRules. At the beginning is the single line

RewriteEngine on

A RewriteBase is almost never needed; get rid of any lines that mention it. Instead, make sure every target begins with either protocol-plus-domain or a slash / for the root.

Sort RewriteRules twice.

First group them by severity. Access-control rules (flag [F]) go first. Then any 410s (flag [G]). Not all sites will have these. Then external redirects (flag [R=301,L] unless there is a specific reason to say something different). Then simple rewrite (flag [L] alone). Finally, there may be a few rules without [L] flag, such as cookies or environmental variables.

Function overrides flag. If your redirects are so complicated that they've been exiled to a separate .php file, the RewriteRule will have only an [L] flag. But group it with the external redirects. If certain users are forcibly redirected to an "I don't like your face" page, the RewriteRule will have an R flag. But group it with the access-control [F] rules.

Then, within each functional group, list rules from most specific to most general. In most htaccess files, the second-to-last external redirect will take care of "index.html" requests. The very last one will fix the domain name, such as with/without www.

Leave a blank line after each RewriteRule, and put a
# comment
before each ruleset (Rule plus any preceding Conditions). A group of closely related rulesets can share an explanation.

Step 5: Notes on error documents.

Reminder: ErrorDocument directives must not include a domain name, or else everything will turn into a 302 redirect. Start each one with a / representing the root.

Caution: Since each module is an island, any module that can issue a 403 must have its own error-document override. "Allow from all" covers mod_authzzzz. If you have RewriteRules that end in [F], make sure your 403 documents can bypass these rules.

not2easy




msg:4678809
 6:11 pm on Jun 10, 2014 (gmt 0)

Just to be clear, this part is regarding the 404 error document line that you have:
Reminder: ErrorDocument directives must not include a domain name, or else everything will turn into a 302 redirect. Start each one with a / representing the root.

This is where you need to change
ErrorDocument 404 http://example.com/
To be
ErrorDocument 404 /errorpage.html
using the actual name of the page, whatever you named it.

Murmur




msg:4678829
 6:56 pm on Jun 10, 2014 (gmt 0)

RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) http://example.com/$1 [R=301,L]
RewriteCond %{HTTP_REFERER} !^http://(www\.)?example\.com
RewriteRule \.(jpe?g|gif)$ - [F]

ErrorDocument 404 /index.html
DirectoryIndex index.html



Using the above, tidying up and removing the unnecessary 301 and condition looking for referrer has left me with the above. Would this be search engine friendly and does it look okay (apart from the missing error page, which i'll add later)?

As a side point, after testing the above code, I note that the size of the web page has been reduced from 137k to just under 132k - I cannot explain this.

Any views appreciated.

lucy24




msg:4678871
 9:00 pm on Jun 10, 2014 (gmt 0)

Access-control rules go BEFORE any redirects or 404/410s. No point in redirecting a request that will end up getting blocked.

Within redirects, the domain-name-canonicalization rule comes last. Typically the "index.html" redirect is second-to-last.

removing the unnecessary 301 and condition looking for referrer

Uh-oh, I think you misunderstood something. Your anti-hotlinking routine has to include either
RewriteCond %{HTTP_REFERER} .
OR
RewriteCond %{HTTP_REFERER} !^-?$
unless you actually want to lock out search engines as well as referer-less humans.

I note that the size of the web page has been reduced from 137k to just under 132k

Size of what web page? Your logs show the size of the response sent out by the server, including any headers. Is this the size given in one of those online tools? Same tool, before and after? Changes to htaccess should have no effect on the size of material sent out, unless you were previously shipping an extra 5k worth of headers, brr, or perhaps some superfluous include files.

In any case, I really hope 137k is the total size of everything, including supporting files. That would be a lot of text; I don't have anything that size except a few ebooks and similar.

Murmur




msg:4678930
 2:03 am on Jun 11, 2014 (gmt 0)

RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://www\?example\.com
RewriteRule \.(jpe?g|gif)$ - [F]
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) http://example.com/$1 [R=301,L]

ErrorDocument 404 /index.html
DirectoryIndex index.html



This is my latest version with the order basically changed. Added back the referrer condition. Does it look ok?

Yes it was an online tool I used and the size of the page was reduced by 5k - I have no idea why but it may be that the old code was forcing a partial revisit or something.

Much thanks as always....

lucy24




msg:4678989
 6:00 am on Jun 11, 2014 (gmt 0)

www\?example\.com

Typo, I hope, because the condition would otherwise always fail (\? means "a literal question mark").

Incidentally...
Lines like DirectoryIndex and FollowSymLinks will do no harm. But if you're on shared hosting, the odds are overwhelming that the same information is already in the config file-- especially for something like "index.html" which is the Apache default. So why make the server read it all over again?

Murmur




msg:4679066
 12:39 pm on Jun 11, 2014 (gmt 0)

It is a dedicated server. Well you say \? looks like a typo because the condition would otherwise always fail.

Whenever I tried to replace the ? with a dot, the images on the site would not load - so (using my Sherlock brain) I guess they were not loading when the condition was working. And loading fine when the condition was not working.

I figured this may be to do with the fact that the site always displays in browser as example.com or http://example.com/ (never with a www.)

So I have adapted the code to remove www from second RewriteCond (down from top) and it is like this:

RewriteEngine On
Options +FollowSymlinks
RewriteCond %{HTTP_REFERER} .
RewriteCond %{HTTP_REFERER} !^http://example\.com
RewriteRule \.(jpe?g|gif)$ - [F]
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) http://example.com/$1 [R=301,L]

ErrorDocument 404 /index.html
DirectoryIndex index.html



It is working fine generally, except if I try to
visit a directory at example.com - then the images will not load.

All feedback greatly appreciated.

lucy24




msg:4679140
 6:07 pm on Jun 11, 2014 (gmt 0)

the site always displays in browser as example.com or http://example.com/ (never with a www.)

Any anti-hotlinking routine has to use the form of the hostname that you actually use; that's the whole point of this function.

By the way, where is the domain-name redirect? It should happen in the same htaccess file as any other redirects.

It is working fine generally, except if I try to
visit a directory at example.com - then the images will not load.

? Do you mean it works everywhere except in one directory, or that it doesn't work on your site at all? Look at your logs and see if there's a difference in the form of the request.

By the way: If it's a dedicated server, why is any of this happening in htaccess? Or is this just temporary while you iron out the kinks?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved