Forum Moderators: phranque
THE QUESTION FIRST:
Does Apache stipulate a limit for the number of RewriteRules we can have? What is the biggest ugliest number after which Apache will croak? Can we have millions of them in a poor .htaccess file? Or can we break them into different .htaccess files?
THE EXPLANATION:
I know what you will say. That I do not need so many RewriteRules, and that I should instead have about five or six RewriteRules that map the old URLs to our new site's JSP page which in turn does the heavy work of doing the mapping according to our business logic.
This is great to me, and doesn't matter one bit, but our IT manager is hung up on the functionality provided by a competing bank where each URL points (by 301) directly to the destination URL.
My manager seems to have discovered the "HTTP Header Check" services online. When he enters one of our old URLs, he obviously gets the 301 to the intermittent JSP which does the URL mapping. He is not happy with this, because the competitor's 301s have no intermittent page.
I don't know how the other bank has accomplished this -- how do you get the HTTP header to show directly the destination site? One way could be to have all the rules in the .htaccess itself, and the other way would be to have millions of files in the filesystem that contain just the URL forwarding code. Neither of these make sense to me, so I'd appreciate any thoughts or ideas!
The wording of your post makes me suspect that you've missed an important point, and that perhaps it will offer you some relief from this problem; You used the generic term 'map' several times, instead of specifying "internal rewrite" or "external redirect." At the same time, you say that your manager sees a "redirect" to the .jsp back-end when using his server headers checker. If this is true, then the original "mapping" function is what I sometimes call "mis-implemented;" The "mapping" of URLs to the .jsp scripts need not include any external (client-visible) redirects whatsoever; Instead, an internal rewrite can be used so that when someone requests a URL, the server changes the internal filepath associated with that URL request -- No client-detectable external redirect need be involved.
Take a look at the implementation; You should be using the internal rewrite syntax of mod_rewrite --or perhaps the reverse-proxy functions of mod_proxy-- to map requested URLs to your .jsp scripts. Within mod_rewrite, you may be using straight RewriteRules, or perhaps the RewriteMap function. At any rate, there is likely no need to create and use "millions" of individual rewrites; the ability of mod_rewrite to use regular expressions would usually preclude the need for such an ad-hoc approach.
I suspect that using mod_proxy [httpd.apache.org] or the internal rewrite capability of mod_rewrite [httpd.apache.org] --with or without RewriteMap-- will offer you a more attractive solution. (Note that RewriteMaps can only be defined at the server config level, and not in .htaccess.)
Jim
[edited by: jdMorgan at 3:44 pm (utc) on July 21, 2007]
This is what I have now:
RewriteEngine On
RewriteRule ^old/some/path/(.*)\.htm$ /new/$1.jsp [L,R=301]
How can I make this an internal rewrite? Should I be looking at RewriteBase or something? From my reading, it sounded useless for my purpose.
I just need one nudge in the right direction and then I'll find and experiment myself. Many thanks!
1. /old/path/x.htm should point to /new/x.jsp
2. The file /new/x.jsp actually does a header 301 redirect
Now I think from reading your own older posts in this forum, I understand "internal rewrite", which is basically something that starts with "/" instead of "http://domain.com/" -- right?
If this is correct, then my .htaccess rules are already internal rewrite. I don't think that's the issue. When I do a HTTP Header Check on Step 1 above, it does show me a 301 to /new/x.jsp. When I do a header check on Step 2, it shows the correct.
The other company I mentioned have done something so that when I do HTTP header check on Step 1 on their server, it already points to the external redirect (to a totally different industry-standard transaction website) in the 301! In our case, there's the intermittent "new/x.jsp" 301 showing up.
Hope I am explaining this well?! Should I be looking at RedirectMap? Or do I need a mod_proxy? Mod_proxy is not currently installed and there will be a not-fun round of policy hoops to get it installed, but if it is worth it -- especially from a speed/performance perspective -- then I can make the case.
Would really appreciate some guru thoughts. Thanks!
RewriteEngine On
RewriteRule ^old/some/path/(.*)\.htm$ /new/$1.jsp [L,R=301]It's pretty simple, right? And it works. But I suppose my code above is "external redirect" given how the http headers show the result.
How can I make this an internal rewrite? Should I be looking at RewriteBase or something? From my reading, it sounded useless for my purpose.
RewriteRule ^old/some/path/(.*)\.htm$ http://www.example.com/new/$1.jsp [R=301,L] RewriteRule ^old/some/path/(.*)\.htm$ new/$1.jsp [L] > when I put these same instructions in virtualhost declaration it doesn't work!)
Again, as documented, the path 'seen' by RewriteRule in an .htaccess per-directory context is stripped of the path-info used to reach the directory in which the .htaccess file resides, while in httpd.conf, RewriteRule sees the full URL-path. So it will be necessary to add the rest of the path to the RewriteRule pattern for use in httpd.conf. If you .htaccess is in your Web root directory, then all you'll need to add is a leading slash. Examples:
Internal rewrite in "/test/.htaccess"
RewriteRule ^old/some/path/(.*)\.htm$ new/$1.jsp [L] RewriteRule ^[b]/test/[/b]old/some/path/(.*)\.htm$ new/$1.jsp [L] Internal rewrite in "/.htaccess"
RewriteRule ^old/some/path/(.*)\.htm$ new/$1.jsp [L] RewriteRule ^[b]/o[/b]ld/some/path/(.*)\.htm$ new/$1.jsp [L] 2. The file /new/x.jsp actually does a header 301 redirect
So you have two redirects back-to-back from the sound of it. That's very inefficient, and slows down the visitor experience, as each access require three client requests and three server responses...
You may not be able to fix the second redirect (done by x.jsp), as it may require re-architecting the whole site... Only you can determine this.
In order to stop x.jsp from redirecting, it will have to be re-written. Instead of sending a 301 redirect response to the client, telling it to go to some other URL to get the content it is asking for, you will need to re-write x.jsp to open the file that was requested, read it into a temp buffer, and then send it to the client. Therefore, the content-delivery will take place within the context of the original HTTP request. This will be possible if the requested content is local, but if it's on an external site (like your competitor's payment gateway that you described, then a 301 is the only way to 'get there').
Take-home lesson: A redirect is a server response to the client, telling it that a requested resource has moved, and giving it the new URL to use to fetch that resource. This redirect response terminates the current HTTP transaction, and leaves it up to the client to make a new request for the resource using the new address provided in the redirect response. So the client must make a new (second) HTTP request to get "the stuff" that it asked for the first time.
An internal redirect, by contrast, simply changes the server file-path associated with the requested URL. The client asks for "foo.html", and you change that to point to a script "x.jsp" that produces a virtual page known as "foo.html" to the outside world.
To drive this home: A URL is a way to specify the location of information on the Web. A filepath is a way to specify the location of information within a server. These two are not at all the same thing, and in fact, need not have anything in common. It is the server's job to 'map' URLs to corresponding filepaths, and those filepaths may lead to (for example) static HTML documents, to scripts that produce HTML documents, or to scripts that produce redirects to be used to locate those documents.
Correspondingly within your script, you can either send a redirect response to tell the client to go somewhere else and ask for the resource it wanted, or the script itself can open the requested resource (file), read it in, and "pipe it" to the client.
I hope this explains the situation and offers you a workable solution...
Jim
The coding difference between a redirect and a rewrite is simply that the redirect has an [R] in it.
The redirect sends a 301 or a 302 code and the URL of where to go. The browser then requests the page at the new URL.
The rewrite translates the requested URL into an internal filepath and simply fetches the data directly from there.
A "proper" external redirect also has a protocol (http:, https:, ftp:, etc.) and a canonical domain name in the RewriteRule substitution URL -- This avoids throwing yourself at the mercy of the host having set up a correct ServerName or VirtualHost when UseCanonicalName is On:
Internal rewrite:
RewriteRule ^foo\.html$ bar.html [L] External redirect:
RewriteRule ^foo\.html$ http://www.example.com/bar.html [R=301,L] Jim
And thanks jd for the httpd.conf suggestions. I was just a front-slash away from making it work! :)
Also, I think the "Options -Indexes" stuff works only when I put it inside a "<Directory />" type tag.
All good, and I cannot tell you how much I appreciate your help!
Elated..