Forum Moderators: phranque
Script is currently:
Options +FollowSymLinks
RewriteEngine on
RewriteRule ^(.*)/(.*)\.php$ product.php?manufacturer=$1&style=$2 [L]
RewriteRule ^(.*)/$ index.php?manufacturer=$1
RewriteRule ^(.*)$ index.php?manufacturer=$1
I've been reading round, and come to a dead end but understand that:
1) The [L] flag just stops this loop, and that the rewrite engine then goes round again (so catching the rewritten url and modifying it in the second set of rules)
2) I need a rewriteCond to follow the first rule. Which is where the problem lies - I've found lots of things with people saying I need one, but have absolutely no clue what to put in it.
The htaccess file resides in a folder called 'manufacturers', and basically I want anyone going looking for manufacturers/whatever (or manufacturers/whatever/) to go to index.php?manufacturer=whatever with users then progressing to specific product pages at product.php?manufacturer=whatever&product=something.
This is probably a no-brainer for many of you, but it's giving me massive headaches!
As an aside, I strongly suspect that one additional line of code can be used to push requests for pages (products) that don't exists to a custom 404 error page at /error_docs/404/php? Am I right? If so, how do?
Hope this makes sense, and hope someone can help before stressing about it ruins my weekend!
Cheers,
Nik
But the second rule's ".*" pattern matches "anything, everything, or nothing -- with or without a trailing slash," and so must be prevented from being re-invoked (and looping) with previously-internally-rewritten requests for either index.php or products.php:
Options +FollowSymLinks
RewriteEngine on
#
RewriteRule ^(([^/]+/)+)([^.]+)\.php$ product.php?manufacturer=$1&style=$3 [L]
#
RewriteCond $1 !(productŠindex)\.php$
RewriteRule ^(.*)/?$ index.php?manufacturer=$1 [L]
The regex patterns in the first rule above have been modified to improve performance.
Note that you really should not allow that trailing slash to be optional, because that creates two URLs for each page/product. That is duplicate-content, a much-discussed search ranking problem. You should pick either slashed or non-slashed URLs as your preferred (canonical) URL-form, and rewrite only those to your script. The other form should be detected and 301-redirected to the canonical form.
-----
Missing/bogus product page URLs:
As far as the server is concerned, any and all pages (products) in that subdirectory exist, because all are rewritten to one or the other script file, which physically exists. The scripts themselves are the only things that can "know" whether a page (product) exists, based on the presence or absence of a database entry for that page/product. Therefore, the script itself must check for the page's existence, and return a proper 404-Not Found response if no database entry can be found to generate an HTML page for that product.
Note that for pages/products that once existed, but which you no longer wish to support, you can/should return a 410-Gone response instead of a 404, assuming that you mark obsolete product records as obsolete, rather than just deleting them (In other words, whether you can support this function depends on how you set up and administer your database). A 410 response (and its error page) essentially tells visitors "We used to sell that, but no longer do," rather than just saying, "Sorry, we can't find that page, and we don't know why," which is all that a 404 means.
It is best practice to "remember" all old/obsolete URLs that you used to have pages for, and to handle those separately (410-Gone) from mis-typed or bogus URLs (404-Not Found).
Jim
[edited by: jdMorgan at 3:41 pm (utc) on July 19, 2009]
Anyway, if I'm gonna learn, I need to unpick this and understand it rather than just copying & pasting so...
First RewriteRule says "find anything that starts either with or without a / (the [^/] signifying the 'with'), is followed by a / and is then followed by something.php and redirect to ... When done, stop this rewrite run & start again."
The RewriteCond says "if the first bracketed pattern isn't product or index.php, then..."
Not sure about the significance of the ? in the second RewriteRule. Is it literally a question mark? Or is it saying "find anything that may or may not end in a / and redirect to..."
Thanks for the tip abut the 404/410 pages as well - I can check for non-existant pages/products in the php, so no big issues there, but may well alter the database to allow 'deleted' products to retain some value which can be used to pass through to a 410 - am becoming bit of a stickler for standards compliance, clean code etc, so the least I can do is make sure things like this aren't left unresolved - to not do so would just be lazy (and that's something which will set me off on a rant I can probably join in on elsewhere on the forums!)...
Thanks again,
Nik
"?" is a quantifier meaning, "match zero or one of the preceding character, alternate character group, or parenthesized sub-pattern."
"[^/]+/" is "Match one or more characters which are not a slash, followed by a slash."
"([^/]+/)+" is "Match one or more characters which are not a slash, followed by a slash, and do all that one or more times."
So "^(([^/]+/)+)([^.]+)\.php$" is "Match one or more characters which are not a slash, followed by a slash, one or more times, followed by one or more characters which are not a period, followed by a period, and ending with 'php'."
So that pattern parses the requested URL-path into the "directory part" in $1 and the "filename part" in $3. It will work properly as long as the 'filename.filetype' part of your URLs does not contain multiple periods. If they do, then the same technique used for slashes can be used to make the period-matching more robust, e.g. "^(([^/]+/)+)(([^.]+\.)*[^.]+)\.php$"
Jim
And as noted, part of the complexity comes from the flexibility of the pattern: It will accept one or more 'subdirectory levels' and more that one period in the filename itself. If you don't need that flexibility, you can simplify the pattern.
Jim