Page is a not externally linkable
- Code, Content, and Presentation
-- Apache Web Server
---- Faceted Browse: Friendly URL Nightmare, Too Many Possible Combinations


g1smd - 10:43 pm on Jan 15, 2012 (gmt 0)


The (.*) pattern is greedy, promiscuous and ambiguous.

The first (.*) captures the entire URL string from there to the very end and then has to look for a hyphen "after the end". It then has to back up and retry to "find" a hyphen. After finding a hyphen it moves forward and finds that the next characters are "f3" not "c1". It now has to back up and retry to find the previous hyphen. After finding it, it moves forward and finds "f2", not "c1". It now has to back up and retry to find the previous hyphen. After finding it, it moves forward and finds "f1", not "c1". It now has to back up and retry to find the previous hyphen. Now and only now it finds "c1".

The second (.*) captures the entire URL string from there to the very end and then has to look for a hyphen "after the end". It then has to back up and retry to "find" a hyphen. After finding a hyphen it moves forward and finds that the next characters are "f3" not "f1". It now has to back up and retry to find the previous hyphen. After finding it, it moves forward and finds "f2", not "f1". It now has to back up and retry to find the previous hyphen. Now and only now it finds "f1".

This multi-step "back up and retry" exercise is repeated again for f2 and then again for f3.

This is a serious coding error, especially given there are several dozen other rules with the same major flaw.

It's not beyond reason to guess that some requests might cause your server to perform tens of thousands of "back off and retry" attempts for each incoming URL request.

The replacement rule I suggested above came with a "maybe". It will fail if there are multiple hyphens within each element beyond the ones immediately before c1, f1, f2 and f3. It will likely need adjusting.

The coding would be a bucket load easier if the c1, f1, f2 and f3 markers were at the start of each URL fragment not at the end.

Find "c1-" then capture until double underscore. Find "f1-" then capture until double underscore. Find "f2-" then capture until double underscore. Find "f3-" then capture until end.

URL design should be one of the first steps of getting a new site designed, coded and online. It seems that it's often the last step, and merely exposes database calls as URLs without any normalisation of format, or sorting of paramters or elements.

One thing your site should do is this: if there's a request for f3 f2 f1 c1 the user should be redirected to c1 f1 f2 f3. Likewise all other non-canonical versions should be redirected.

Only requests for the canonical version of the URL should be rewritten to deliver the content.


Thread source:: http://www.webmasterworld.com/apache/4406852.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com