Forum Moderators: phranque

Message Too Old, No Replies

RE: "A guide to fixing duplicate content & URL issues on Apache"

A question about the JDMorgan 301 "framework" - how scalable is it?

         

stevej444

8:05 pm on May 10, 2007 (gmt 0)

10+ Year Member



Hi,

Sorry to be quite so specific - this is related to the post at [webmasterworld.com...]

I wondered how scalable a solution this was (assuming the uppercase to lowercase was not live)? We've used this quite successfully on a small site, and found it very elegant and powerful (and many thanks and much much respect due to JD Morgan for that). Implements many redirects on one site - approx. 200 with various different rules, with no impact on performance at all.

However - we're now looking at running this on a site on a much larger scale e.g. approx. 700,000 unique users per month.

So the question is - if this is run as .htaccess file (as detailed in that post), and has some meaty rules in it e.g. say approx. 50 rules - and of course provided the regular expressions are reasonably well implemented - could there be a noticeable impact on performance?

I know this question is a bit of a "how long is a piece of string" question - part of the answer though could be that this has been applied to similar or sites with higher/heavier traffic. So I'd be very interested to know where else the approach as detailed in that post has been applied, and traffic levels/scalability comments that anyone has.

Thanks,
Steve

jdMorgan

9:44 pm on May 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might want to consider moving the 'stable' elements of that code to httpd.conf. The advantage is that code in httpd.conf (or conf.d) is compiled at server restart, and executes thereafter as native code. In .htaccess, the code is intepreted for each and every HTTP request. So by pre-compiling the code, you can get much better performance.

The downside is that the server must be restarted before any change to the code in httpd.conf will take effect. Therefore, as stated, I suggest putting on stable elements of the code into the server config files -- You can leave new and temporary rules in .htaccess for awhile.

Also, I want to point out that the code was intended as a demonstration of a work-around for the Apache bug described in that thread, and having overcome that bug, as a method for 'fixing' several URL problems at once using only one external redirect.

As such, some of it may be too complex (or too elegant -- thanks) for everyday, real-world requirements. For example, if you see very few malformed URL requests to your site, then the utility of handling all possible problems in one routine is questionable; You might get better performance by putting everyday administrative redirects in one section, and then only using the malformed-URL fixes as needed in a separate section. It's all a balance between real URL fix-up needs, complex versus simple and easy-to-maintain code, and performance. And that's where we get into the 'piece of string' problem, because only live testing will really give you much useful info.

If however, someone has already tested this on a high-traffic site, I'd love to hear about it, too -- I thought of it as an academic exercise, but if it's being deployed, I'm certainly interested in how it performs.

Jim

[edit] speling [/edit]

[edited by: jdMorgan at 9:46 pm (utc) on May 10, 2007]

stevej444

7:51 am on May 11, 2007 (gmt 0)

10+ Year Member



Hi Jim,
Thanks for the response. Few comments below preceded by SJ>>

JIM>> You might want to consider moving the 'stable' elements of that code to httpd.conf. The advantage is that code in httpd.conf (or conf.d) is compiled at server restart, and executes thereafter as native code. In .htaccess, the code is intepreted for each and every HTTP request. So by pre-compiling the code, you can get much better performance.

SJ>> My take on 'stable' is - those rules that if you like have been through testing, or have been live for a while, and one is almost 100% just aren't going to change. Because as you say, Apache has to be restarted with any change.

JIM>> Also, I want to point out that the code was intended as a demonstration of a work-around for the Apache bug described in that thread, and having overcome that bug, as a method for 'fixing' several URL problems at once using only one external redirect.

SJ>> Understood.

JIM>> As such, some of it may be too complex (or too elegant -- thanks) for everyday, real-world requirements. For example, if you see very few malformed URL requests to your site, then the utility of handling all possible problems in one routine is questionable; You might get better performance by putting everyday administrative redirects in one section, and then only using the malformed-URL fixes as needed in a separate section. It's all a balance between real URL fix-up needs, complex versus simple and easy-to-maintain code, and performance. And that's where we get into the 'piece of string' problem, because only live testing will really give you much useful info.

SJ>> Not quite sure what you mean by administrative redirects here.
SJ>> Also - not entirely sure what you mean by "separate section".

SJ>> Presumably one could spread the load by spreading the redirects throughout various folders? So, supposing the site in question was a large site (as I said approx. 700,000 unique users/month) with a no. of different microsites/verticals e.g. www.mydomain.com/soccernews, www.mydomain.com/baseballnews etc ... then presumably the redirection load could be spread by having some redirects in the apache config, then others in .htaccess files within /soccernews, /baseballnews etc ....

JIM>> If however, someone has already tested this on a high-traffic site, I'd love to hear about it, too -- I thought of it as an academic exercise, but if it's being deployed, I'm certainly interested in how it performs.

SJ>> Yes, likewise!

SJ>> Thanks again.

jdMorgan

2:16 pm on May 11, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Because I'm not intimately familiar with anyone else's sites, I have to generalize.

A stable rule is whatever you define as stable... If you're wrong, you'll have to fix it and restart the server. The restart may not concern you at all, or it may cost you revenue due to lost shopping sessions -- I don't know, so I can't say.

Administrative redirect: Those needed simply because you've changed page names or removed content, and not those related to fixing canonicalization problems, incorrect backlinks, or malicious linking attempts.

Separate section: Rules might be moved to an .htaccess file, or placed in a separate <Directory> container in httpd.conf.

Spreading the load: Again, you can move some rules to .htaccess files located in subdirectories, or use multiple <Directory> containers in httpd.conf -- The difference is purely httpd.conf performance versus .htaccess ease of maintenance, as above.

I hope I got 'em all! :)

Jim