homepage Welcome to WebmasterWorld Guest from 54.196.225.45
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Proper Order for htaccess
is this right?
Lorel




msg:4184255
 2:59 pm on Aug 9, 2010 (gmt 0)

I'm confused as to the proper order to list items in an htaccess file. Is the following correct? Also is it ok to leave blank lines between items?

ErrorDocument 404 /missing.htm
AddHandler server-parsed .htm

Options +Includes
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^(www\.EXAMPLE\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]

#if the "/" page is not supposed to have a query string or non-blank query
RewriteCond %{QUERY_STRING} .
# Redirect to "/" after clearing query string
RewriteRule ^$ http://www.example.com/? [R=301,L]

RewriteRule ^us\.html$ http://www.example.com/us.htm [R=301,L]

RewriteRule ^assesment\.htm$ http://www.example.com/assessment.htm [R=301,L]
RewriteRule ^PDF/Catalog\.pdf - [G]

 

g1smd




msg:4184304
 4:51 pm on Aug 9, 2010 (gmt 0)

Access controls first.

Redirects next. Within the list of redirects, most specific first, most general last.

Rewrites last. Within the list of rewrites, most specific first, most general last.

Most specific means affects the smallest number of URL requests. Most general means affects the largest number of URL requests.

Your existing rules 1 to 6, should be re-ordered 6 - 4 - 5 - 2 - 3 - 1 and a question mark added to the redirect target of at least the first five rules.

I would not redirect all query strings to the root. I would strip only the query string from the original URL and redirect to that.


See also: [webmasterworld.com...]

Lorel




msg:4184328
 5:30 pm on Aug 9, 2010 (gmt 0)

Your existing rules 1 to 6, should be re-ordered 6 - 4 - 5 - 2 - 3 - 1 and a question mark added to the redirect target of at least the first five rules.


Please let me know if I got this right (see below) -- I'm assuming options +Includes needs to be on top.

I got all these off this forum over the years and have never seen a question mark being used with these redirects. Can you explain where to put it?

I would not redirect all query strings to the root. I would strip only the query string from the original URL and redirect to that


In this particular event they were linking to the home page with a query string but I'll make note of that.

Thanks,

Options +Includes
Options +FollowSymLinks
RewriteEngine on

RewriteRule ^assesment\.htm$ http://www.example.com/assessment.htm [R=301,L]
RewriteRule ^PDF/Catalog\.pdf - [G]

#if the "/" page is not supposed to have a query string or non-blank query
RewriteCond %{QUERY_STRING} .
# Redirect to "/" after clearing query string
RewriteRule ^$ http://www.example.com/? [R=301,L]

RewriteRule ^us\.html$ http://www.example.com/us.htm [R=301,L]

RewriteCond %{HTTP_HOST} !^(www\.EXAMPLE\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*index\.htm\ HTTP/
RewriteRule ^(([^/]+/)*)index\.htm$ http://www.example.com/$1 [R=301,L]

ErrorDocument 404 /missing.htm
AddHandler server-parsed .htm

g1smd




msg:4184403
 7:41 pm on Aug 9, 2010 (gmt 0)

You can't count. Counting individual RewriteRule lines, and based on your first post, 6 - 4 - 5 - 2 - 3 - 1 was the required order.

Yes, Options remains first and can be combined in a single line.

Any RewriteCond lines remain attached to the same RewriteRule as before, moved as a pair.

jdMorgan




msg:4184886
 4:00 pm on Aug 10, 2010 (gmt 0)

I would not redirect all query strings to the root. I would strip only the query string from the original URL and redirect to that

This seems to be a non-issue here, as the rule only removes query strings from requests for "/" and redirects to "/".

# if the "/" page is not supposed to have a query string or non-blank query
RewriteCond %{QUERY_STRING} .
# Redirect to "/" after clearing query string
RewriteRule ^$ http://www.example.com/? [R=301,L]

To clear a query string, add a "?" to the end of the target URL on "the right side" of the rule, as in the present example.

Jim

Future




msg:4187134
 8:29 pm on Aug 14, 2010 (gmt 0)

Thanks for a healthy discussion.
Need more info. to learn here.

Rgds.,

Angonasec




msg:4187414
 10:01 pm on Aug 15, 2010 (gmt 0)

According to the protocol:
Q/
Access controls first.
Redirects next. Within the list of redirects, most specific first, most general last.
Rewrites last. Within the list of rewrites, most specific first, most general last.
Most specific means affects the smallest number of URL requests. Most general means affects the largest number of URL requests.
/Q

Two questions Gents:
(Using Apache 1.3 on a virtual host account.)

1) What about setting caching, and "Expires"? Where should they be positioned?

# Turn on Expires and set default to 0
ExpiresActive On
ExpiresDefault A0

# Set up caching on media files for 1 year
<FilesMatch "\.(gif|jpg|ico|png)$">
ExpiresDefault A29030400
Header append Cache-Control "public"
</FilesMatch>

2) Secondly:
I assume controlling visibility of the .htaccess file itself comes under "Access control" and should therefore be placed at the top of the file. Correct?

# Strong htaccess protection
<Files ~ "^.*\.([Hh][Tt][Aa])">
order allow,deny
deny from all
</Files>

jdMorgan




msg:4187743
 3:07 pm on Aug 16, 2010 (gmt 0)

Several points need clarification and expansion, since this thread got put on our front page:

  • These rules of thumb apply to access controls, external redirects, and internal rewrites implemented using Apache mod_rewrite in server config files (e.g. httpd.conf) and in per-directory config files (.htaccess files).

  • First, Apache "executes" .htaccess files in per-module order. Each Apache module in turn parses the .htaccess files looking for directives that it understands. Therefore, only the order of directives belonging to the same Apache module is meaningful, since the order of invocation will depend on the module execution order, and not on the order in which directives appear in the .htaccess file. (Short answer: Put the module-specific directives in any order that you like, although I suggest grouping them by module to keep the .htaccess file(s) organized for easier maintenance.)

    The module execution order is determined by the reverse of the LoadModule list in Apache 1.x, and by an internal priority scheme in Apache 2.x.

  • These rules of thumb apply across *all* config files. For example, *all* external redirects must be invoked before *any* internal rewrites as the server processes server config files (e.g. httpd.conf) first, then each .htaccess file encountered in the directory-path to the document or object that will be served (or the script used to produce them).

    In simple terms, if a higher-level .htaccess file or config file does an internal rewrite and then a lower-level .htaccess file does an external redirect, the internal server filepath target of the earlier rewrite will be exposed as a URL to the client by this later external redirect, leading to "ugly URLs" and problems in search engine listings. This is one reason to consolidate all redirects into the highest-level config file possible, even at the expense of server performance.

  • Because of the per-module execution order and the need to avoid exposing internal filepaths as URLs, many of the contributors here recommend *not* mixing mod_alias directives such as "Redirect" and "RedirectMatch" with mod_rewrite directives such as "RewriteRule." Unless you are on Apache 1.x and can control the LoadModule order, there is no way to guarantee that mod_alias directives will be processed first, and that no mod_rewrite redirect will be undesirably preempted by a Redirect or RedirectMatch redirect. Even if it works today, there is no guarantee that a config change or a server upgrade won't break the code tomorrow by changing the module execution order.

  • When making changes to server config files such as httpd.conf, it is necessary to restart the server in order to re-compile the new code.

  • In order to avoid invalid testing results due your browser serving stale cached versions of server response codes, pages, and objects, you must delete your browser cache before testing any new server-side code -- to include server config files, .htaccess files, and scripts.

    -----

    To help make sense of mod_rewrite, it is important to keep the concepts of URLs and internal filepaths separate and distinct. URLs are defined and exist "out there, on the Web." A URL "exists" as soon as it is published as a link on a HTML Web page, regardless of whether it resolves to an existing domain, server, or "file."

    Internal filepaths define a different name-space, and have meaning only inside a server. A filepath does not "exist" unless there is a physical file accessible by using that path inside this server (we'll ignore Aliases and symlinks in the name of simplicity here).

    The primary job of an HTTP server is to translate URLs in incoming client-initiated HTTP requests to the proper filepath of the requested object, or to the filepath of the script needed to generate or create that object or to retrieve it from a database.

    In the absence of any more-complicated rewriting by mod_rewrite, mod_alias, or mod_dir, this 'translation' is accomplished by removing the protocol and hostname from the requested URL, and prepending the DocumentRoot path defined in the server configuration. The result is a filepath.

    Apache mod_dir can be used to rewrite HTTP requests for URLs like "example.com/" to the internal filepath "/index.html" or "/index.php".

    Apache mod_rewrite and mod_alias can be used to rewrite HTTP requests for just about any URL that resolves to this server.

    These three modules "live at the boundary" between the HTTP Web-world and the internal filesystem of the server (which depends on the server's operating system). This points out one of the main reasons for the existence of HTTP and URLs: to eliminate the need for clients (like browsers and search engine robots) to know or care about the specific operating system or filesystem organization of the server from which an object is to be requested.

    -----

    Why I recommend the access controls - external redirects - internal rewrites order:

    In general, there is no benefit from redirecting or rewriting requests from unwelcome user-agents. For example, why waste time and server resources redirecting a non-canonical-domain request from an unwelcome user-agent? Therefore, all access control rules should generally go first.

    As discussed above, the exposure of previously-rewritten internal server filepaths by subsequent external redirects should be avoided to prevent search engines and browsers from "seeing" and displaying these internal filepaths. Failure to properly order the external-redirect and internal-rewrite rules under all conditions can result in the utter defeat of any attempts to use "search-engine-friendly" (SEF) or "short and pretty" URLs.

    -----

    Why we say to put the most-specific rules first:

    As a group of Webmasters, we're generally concerned with search engine ranking here at WebmasterWorld. It has been observed that while most search engines will pass "link juice" through one 301 redirect, they either discount or discard this "ranking power" if two or more redirects are encountered. Therefore, it can be important to search engine ranking to redirect straight to the final, fully-correct URL if a redirect is to be done at all.

    Consider, for example, the following very-common two-rule set:

    # Externally redirect all non-blank non-canonical hostname requests to canonical hostname
    RewriteCond %{HTTP_HOST} !(www\.example\.com)?$
    RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
    #
    # Externally redirect request for URL-path /old-page.html to www.example.com/new-page.php
    RewriteRule ^old-page\.html$ http://www.example.com/new-page.php [R=301,L]

    Now imagine requesting "http://example.com/old-page.html" from this server. The first rule will be invoked because the requested hostname is non-canonical, redirecting the client to "http://www.example.com/old-page.html". Then the second rule will get invoked because the request is for "/old-page.html", resulting in a redirect to "http://www.example.com/new-page.php". The end result is two stacked or chained redirects for the one initial browser request, wasting client and server resources, slowing down the "user experience," and possibly discarding all link-juice associated with the "http://example.com/old-page.html" URL.

    Now reverse these two rules, and work through the thought experiment again. Only a single redirect will be invoked.

    -----

    I consistently (and very possibly, annoyingly) try to use the terms "External redirect" and "Internal Rewrite" when needed to distinguish the very important difference between these functions. A redirect is a server response that tells the client, "The resource you have requested has moved. Please ask for it again at this new URL." In addition to providing the new URL, it provides information about the status of this redirect -- Whether it is permanent (301), temporary (307), or unspecified (303). (Note that the 302 response code 'sort of' means "temporary" in HTTP/1.0, but is ambiguous in HTTP/1.1. It was therefore replaced with the 307 response code.)

    This redirect response terminates the current HTTP transaction. The client may or may not choose to follow the redirect and make a new request to the server using the new URL provided in the server's previous redirect response. Although a browser will usually follow a redirect, it is important to note that it is the browser's option to do so; Always remember that the client, not the server, is in charge.

    To summarize this point, a redirect is a message to the client to use a different URL. It can be seen as a URL-to-URL translation.

    By contrast, an internal rewrite modifies the requested-URL-to-internal-filepath mapping of a request to something other than what it would have been by default. This modified URL-to-filepath mapping takes place within the context of the current HTTP transaction, and is therefore not 'visible' to the client browser or search engine robot. Only if the server config code is improperly implemented (as previously discussed) can the client know that an internal rewrite took place. So an internal rewrite is simply a non-default URL-to-filepath translation.

    Hopefully, this clarifies some of the more troublesome misunderstandings that we see here: Confusion about the "direction" of internal rewrites, the time and place where mod_rewrite "acts," and the differences between redirects and rewrites and URLs and files seemingly abound, causing much grief, loss of time, loss of search ranking, and loss of revenue.

    Jim

  • jdMorgan




    msg:4187753
     3:18 pm on Aug 16, 2010 (gmt 0)

    Just a tweak:

    # Strong .htaccess, .htpasswd, and .htgroup file protection
    <FilesMatch \.[Hh][Tt][AaPpGg].+$">
    Order allow,deny
    Deny from all
    </FilesMatch>

    Jim

    g1smd




    msg:4187973
     10:04 pm on Aug 16, 2010 (gmt 0)

    I propose that post #4187743 be read out loud and in full at every SEO conference, and repeated at each one until everyone "gets it". All of the points made above, are crucial to the understanding of how this stuff works, and failure to follow those tips seemingly accounts for the vast majority of inefficient or malformed .htaccess code found on the web.

    SevenCubed




    msg:4187977
     10:18 pm on Aug 16, 2010 (gmt 0)

    Good idea g1! And Jim should consider writing a good "Learn Apache" book, since joining WW I've had Apache demystified by reading the posts here. I went looking for a good Apache book at local library but there is no such creature. The Apache docs themselves are presented "matter of factly" and hard to learn from. The examples posted here with comments make it easier to understand. We all have different learning abilities and for me I learn by example or by trial and error but what Jim has written above has been saved locally to my PC for future reference. I know I've said it before but I'll say it one last time to apply forever more; thanks so very much Jim for sparing the time to put together these types of tutorials.

    parorrey




    msg:4188306
     4:55 pm on Aug 17, 2010 (gmt 0)

    I second SevenCubed for Jim's book.

    Lorel




    msg:4188309
     5:10 pm on Aug 17, 2010 (gmt 0)

    I also think Jim should write an Apache book -- this is the only forum on the Internet where you can get reliable indepth knowledge of htaccess. If one of us posts an error we qet a response real quick so others don't copy our error.

    Angonasec




    msg:4188602
     11:27 am on Aug 18, 2010 (gmt 0)

    Thanks for the tweak Jim.

    I take it this thread ONLY refers to ordering: Access, Redirects, and Rewrites.

    Therefore other things like; caching, Expires, and DirectoryIndex, etc can be placed in any order and any position within an .htaccess file.

    However, it's probably worth mentioning another source of confusion in htaccess ordering.

    I recently ran into an htaccess order related snag with the way my virtual host makes subdomains and merges the htaccess files.

    My host makes subdomains like Folders within the main directory and then "invisibly" de-merges them via the server configuration merging:

    So the actual directory arrangement is this:
    www/.htaccess/subdomain/.htaccess

    And the server ("invisibly") makes the site urls appear to be like this:
    www/.htaccess
    subdomain/.htaccess

    As an aside:
    In case the "invisibility" breaks-down, and the wrong url shows, I do have as insurance the redirect:
    redirect 301 /subdomain/ htttp://subdomain.example.com/
    (NB: Extra "t" added to foil WebmasterWorld bug)

    A further complication is that my host has disabled htaccess inheritance in order to facilitate their custom access logging package. Therefore most of the www htaccess data has to be repeated in the subdomain htaccess file. (Very inefficient, but apparently my fellow virtual hosters can't live without tagged access logging!)

    The problem I'm highlighting arose when I used <Files *>. ..</Files> and <Limit> ...</Limit> containers in various places in my www and subdomain files. Most of my access directives stopped working, and I finally had to abandon using "containers" like this and used Rewrites in their place.

    There were none of the usual 500 server errors, it was far more obscure. The CTO advised that I'd run into a very complex "container precedence problem". They had stopped inheritance, but not precedence! None of the CTO's suggested fixes worked, and he concluded:

    "When you start having multi-level .htaccess files like yours, the complexity of it all increases exponentially due to scope and how the various directives interact with each other..."

    Apparently during the +server level merging+ of the various htaccess instructions, some are obeyed, and some are ignored, depending on their placement.

    I tried many variations of my <Files *> and <Limit> containers, but couldn't get them to work as desired, so put the contained instructions into Rewrites instead, then things worked properly.

    I only mention this here to show how a subtle change of order between htaccess files, as well as within htaccess files, can seriously roger things. aka "container precedence problem".

    g1smd




    msg:4188863
     9:35 pm on Aug 18, 2010 (gmt 0)

    Your line of code:

    Redirect 301 /subdomain/ http://subdomain.example.com/

    will be problematical.

    As it says above, do not mix Redirect and RewriteRule directives in the same site. Use RewriteRule for all of the directives and mind the order of them all.

    jdMorgan




    msg:4189007
     5:03 am on Aug 19, 2010 (gmt 0)

    @ 'ang on:

    ...and if you want to have multiple domains or users on one physical server without all that mickey-mouse logging/rewriting/merging stuff, give it a unique IP address!

    That description fits second-rate hosting, I'm afraid. There is no reason for it except for *them* to save money... and your resulting problems apparently don't matter to them. Unacceptable, in my book.

    Jim

    Angonasec




    msg:4189736
     12:26 pm on Aug 20, 2010 (gmt 0)

    Noted: Thanks g1smd, I'm busy consolidating directives to Rewrites in place of Redirects, as Jim wrote:

    Q/
    Because of the per-module execution order and the need to avoid exposing internal filepaths as URLs, many of the contributors here recommend *not* mixing mod_alias directives such as "Redirect" and "RedirectMatch" with mod_rewrite directives such as "RewriteRule." Unless you are on Apache 1.x and can control the LoadModule order, there is no way to guarantee that mod_alias directives will be processed first, and that no mod_rewrite redirect will be undesirably preempted by a Redirect or RedirectMatch redirect. Even if it works today, there is no guarantee that a config change or a server upgrade won't break the code tomorrow by changing the module execution order.
    /Q

    Jim: Thanks for your comments on my virtual host. But I plead leniency for them, it may have been my summary that unintentionally gave the impression they are below par. They have a very good reputation, and I've been there many years and will remain until a better virtual host appears.

    As soon as YOU set up a virtual hosting business, I'll switch.
    So now you have four pot-boilers: Day job, pillar of WebmasterWorld, book author, and hosting service.

    jdMorgan




    msg:4189749
     1:10 pm on Aug 20, 2010 (gmt 0)

    It's not at all personal, it's a cold factual assessment of this bit:
    A further complication is that my host has disabled htaccess inheritance in order to facilitate their custom access logging package. Therefore most of the www htaccess data has to be repeated in the subdomain htaccess file. (Very inefficient, but apparently my fellow virtual hosters can't live without tagged access logging!)

    There are many, many hosts that get "custom logging" working right -- without crippling their customers...

    Also, if you could 'blow up' custom access logging on the server simply by adding "RewriteOptions inherit" to a .htaccess file, then that's a potentially-serious 'security' problem... :o

    Jim

    Angonasec




    msg:4190074
     12:45 am on Aug 21, 2010 (gmt 0)

    Noted, Jim. I've gently nudged them for a while over it, but don't want to add to the pressures they must be under.

    This is interesting:

    "simply by adding "RewriteOptions inherit" to a .htaccess file..."

    I take it that would only work in the server htaccess configuration file, not in my virtual htaccess files? I've never tried that in my virtual account. I don't want to "blow up" their custom logging, but wonder if it may actually get my account locally inheriting www rules to subdomain... security risk notwithstanding.

    Food for thought.

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Code, Content, and Presentation / Apache Web Server
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved