Forum Moderators: phranque

Message Too Old, No Replies

Creating htaccess file size efficiencies

         

Whitey

11:36 am on Jul 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



g1smd wrote this some time ago :

It is not the number of lines that is the problem, but the processing time for each request.

With intelligent pattern creation and the use of "local OR" notation it is likely the number of lines could be significantly cut.

Consider similar URLs and look at the th(is|at) and [ct]hat logic, etc. [webmasterworld.com...]
Can someone please explain in terms that I can convey to our developer [ perhaps a bit of understanding on my part is needed ].

We are looking to redirect around 20,000 URL's / lines.

penders

2:27 pm on Jul 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I find g1smd's statement a little bit contradictory...

It is not the number of lines [in .htaccess] that is the problem, but the processing time for each request.


Well, yes. Except that the processing time for each request is somewhat related to the number of lines. (Bit of a catch 22?)

With intelligent pattern creation and the use of "local OR" notation it is likely the number of lines could be significantly cut.


And, as mentioned, the purpose of this is to specifically reduces the number of lines in .htaccess...

The basic idea is that instead of something verbose like:

RewriteRule ^oldpage1$ /newpage1 [R=301,L] 
RewriteRule ^oldpage2$ /newpage2 [R=301,L]
RewriteRule ^oldpage3$ /newpage3 [R=301,L]
RewriteRule ^oldpage4$ /newpage4 [R=301,L]
RewriteRule ^oldpage5$ /newpage5 [R=301,L]
# ...for 20,000 lines!


You do something like:

RewriteRule ^oldpage(/d+)$ /newpage$1 [R=301,L]


By using regex pattern matching we have reduced 20,000 lines to just 1 line in your .htaccess file. Yes, that is probably an over simplification of your problem, but the basic idea is to work with patterns in the old and new URLs.

You would also see better performance if you do these redirects in your server config (httpd.conf) rather than per-directory .htaccess files.

However, to go beyond Apache/mod_rewrite for a moment... if there literally are no patterns in the URLs and you just have a 1 to 1 mapping (on 20,000+ URLs) then it is probably better to seek a different solution. Such as using a custom 404 to do a database lookup on the URL. If it is a recognised redirect then... redirect, otherwise fallback to a 404. This way you only get the performance hit on a 404, and not every request. And it's only a single database lookup on the primary key, so it will scale OK.

lucy24

4:00 pm on Jul 21, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Note too that the issue is not "lines in htaccess" as such. It's two separate things:
--the number of config lines, wherever they are, that each request has to pass through
and
--the mere existence of an htaccess-- or even the possibility of one, as expressed in the Overrides directive, since the server has to check each directory in the path on every request, on the off chance that an htaccess has been added since three nanoseconds ago

20,000 separate redirects is too many, no matter where the lines are. Keep in mind that every single request has to pass through all those lines, even though 90% of each human visit is non-page files that generally aren't subject to redirection. (That is, you may move images, but a browser only sees the URL named in the html, which will be correct.)

If you can't collapse your 20,000 rules into patterns-- which admittedly is at least 90% of the posts in this subforum ;) -- then do something like

RewriteCond %{REQUEST_URI} (.+)
RewriteRule (?:/|\.html)$ /fixup.php?page=%1 [L]

replacing .html with whatever extensions you actually use. Omit / if no directory-index pages are to be redirected.

Although this is superficially a rewrite alone, position it among your redirects, because fixup.php will end up issuing a 301 (or 404/410 as the case may be). The sole purpose of the Condition is to save the server the work of capturing and storing anything unless it's actually needed-- which, again, is only a small minority of human requests.

If you go this route, make sure fixup.php includes a provision for index.html requests unless you have dealt with them in a previous line.