Forum Moderators: phranque
I'm trying to create an htaccess file using mod_rewrite for the following scenario:
Visitor clicks on:
http://www.example.com/somefolder/product-name.type/
Note: the '-' and '.' are important - I do have them in the url.
Once he/she clicks I want a script that is in the "somefolder" folder to get the product-name.type and generate a page:
http://www.example.com/somefolder/script.cgi?product-name.type
Note: you don't need any "id=" parameter to invoke the script (I could change that of course but I suppose it's irrelevant), just the "product-name.type" part.
So can anyone give me an example of a mod_rewrite syntax to use in this scenario...
I've toyed around for some time but I just can't get it to work.
[edited by: jdMorgan at 3:02 pm (utc) on Feb. 17, 2009]
[edit reason] example.com [/edit]
Please see this thread in our forum library to get started: Changing Dynamic URLs to Static URLs [webmasterworld.com]
Our Apache Forum Charter [webmasterworld.com] also provides links to useful documents and information on how to get the most from this forum.
Jim
The easiest solution is to either make the pattern more specific -- for example "^([^/.]+)/?$", or to explicitly exclude /script.cgi requests from being re-rewritten by using a RewriteCond on the rule.
Note also that your pattern accepts a request_URI with or without a trailing slash. I suggest that you pick one or the other form, externally 301-redirect the non-preferred form to the preferred form, internally rewrite only the preferred form to your script, and so avoid duplicate content issues.
Jim
You didn't reply to the issue that I wrote about above. If you are using a RewriteCond testing %{REQUEST_URI} to prevent looping, then be aware that query strings will NOT be included in %{REQUEST_URI}, and can only be tested using a RewriteCond examining %{QUERY_STRING} or %{THE_REQUEST}.
Jim
Thanks for helping me out.
I used the following: ^([^/.]+)/?$
The RewriteCond is like totally obscure for me right now.
I think the problem is that I also pass a dot '.' in my <value> string. I'm trying to figure out a way to explicitly allow this symbol in the syntax.
Also about the trailing slash I didn't quite get it - it does work either way, so why change it? What's the logic behind the 301 redirect?
Please excuse me if my questions are totally off beat, but this is REALLY new material for me.
Thank you again for all your help!
Major search engines run de-duplication processes in their back-ends, but do you want to depend on them to "pick" the right URL? What if they screw up and pick the wrong one -- the one with the currently-worst ranking? What if they set the threshold incorrectly, and decide that only two URLs with the same content qualify to mark the domain as "spammy?"
I don't know about you, but I never "rely on the kindness of strangers" when my sites' ranking/revenue is at stake. It is an almost-trivial matter to pick either xyz or xyz/ as the preferred "canonical" URL, link only to that one, and redirect the non-canonical variant to the canonical URL.
If you pass a dot in the URL, which then gets rewritten into the "f=" query for use by your script, then you must use a RewriteCond here. You will also have problems because you may have to exclude "well-known files" and "non-page" (e.g media file) URLs from being rewritten as well, because their URLs contain dots: robots.txt, sitemap.xml, labels.rdf, favicon.ico, w3c/p3p.xml, etc.
RewriteCond $1 !^robots\.txt$
RewriteCond $1 !\.(gif¦jpe?g¦ico¦css¦js¦pdf)$
RewriteCond $1 !^script\.cgi$
RewriteRule ^([^/]+)/?$ script.cgi?f=$1 [L]
Jim
[edited by: jdMorgan at 5:43 am (utc) on Feb. 20, 2009]
So for example in the RewriteCond I set rules to match all the possible values? Like:
RewriteCond =[dir.name1]$
RewriteCond =[dir.name2]$
etc.
Another question, partially related to the trailing slash as well.
This whole thing is located within a subfolder on my domain, eg. example.com/somefolder/script.cgi should I place the htaccess in that subfolder or use the top level htaccess to handle things globally? In this scenario I could use just one 301 rule for the whole domain - eg. all URIs that do not end with a slash would be redirected to ones that end up with a slash, eg:
domain.com -> domain.com/
domain.com/subfolder -> domain.com/subfolder/
domain.com/subfolder2 -> domain.com/subfolder2/
domain.com/subfolder3/subfolder4 -> domain.com/subfolder3/subfolder4/
Again sorry of all this seems off beat, I'm not a programmer, I know just some very basic stuff, so it's quite a challenge for me to get this working. I'm starting to get this slowly but considering I just got into the whole matter just a few days ago I'll probably ask a few more dumb questions...
Thanks again for your time and help!
Be careful to consider all possibilities when thinking up URL-handling strategies. For example, you very likely do not want to redirect "robots.txt" to "robots.txt/" just because it does not end with a slash!
(Hint: it does not end with a slash, but it *does* contain a period in the final URL-path-part. Use both factors to decide if you want to redirect or not.)
Always look at these ideas from multiple angles, and ask yourself, "What if...?"
Jim
# Redirect to add a missing slash only if no filetype in final URL path-part
RewriteRule ^(([^/]+/)*[^/.]+)$ http://www.example.com/$1/ [R=301,L]
Note how the second subpattern here matches the one you are already using for the internal rewrite rule already discussed above. The changes to that internal rewrite rule pattern after adding this redirect will be to remove the question mark after the final slash, to make it non-optional, and to remove the RewriteConds, since they are no longer needed:
RewriteRule ^([^/]+)[b]/$[/b] script.cgi?f=$1 [L]
That can be problematic, so you might actually want to consider either NOT using trailing slashes, or perhaps even 'tagging' all URL-path names that will become "f=" values by putting a unique directory path in front of them. This largely eliminates the possibility of "collisions" between real (sub)directory paths, and the virtual ones used to feed "f=" into your script. For example, instead of using a URL of example.com/<f=value_here>, use example.com/products/<f=value_here>. The "products" path can now be used to unambiguously determine that the "subdirectory" that follows it is an "f=" value, and not a real subdirectory.
Jim
When I try it without a dot it works both ways - with and without a trailing slash, and when it's without it does redirect to the trailing slash version.
However when I use a dot, it'll only work if I got to the trailing slash version. Otherwise it gives a 404 error as if it were actually looking for such a file/path on the server...
I suggest you remove the trailing slashes from the URLs on your pages which are to be rewritten to your script, and then replace both rules with
# If requested "products" URL with trailing slash does not resolve to an existing subdirectory
# of this /products directory, externally redirect to remove the trailing slash
RewriteCond %{DOCUMENT_ROOT}/products/$1/ !-d
RewriteRule ^([^/]+)/$ http://www.example.com/products/$1 [R=301,L]
#
# Internally rewrite slashless "products" URLs which do not resolve to existing files to script.cgi
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^/]+)$ /products/script.cgi?f=$1 [L]
I don't normally like to use file- or directory-exists checks, but since this code is only applied in one subdirectory, the performance impact shouldn't be so bad, even on a busy server.
Jim
[edit] Corrected as noted below. [/edit]
[edited by: jdMorgan at 12:10 am (utc) on Feb. 26, 2009]
It all works perfectly nice now!
I even understand the logic of the syntax now for most of its part, just the ^([^/]+)/$ and ^([^/]+)$ are still kind of obscure. I mean I get what symbols mean, but I still need some more reading to actually understand what the whole expression means. Maybe I need to practice with a text file or something.
Anyways, thanks again buddy! You've been a true blessing!
P.S. Sorry for the server error log - I'll know better next time.
Or equivalently, "Starting at the beginning of the requested URL-path, match all characters up to but not including the first slash found, save that matched sub-string in $1, then require a final slash."
Jim
The URL-path would be http://www.example.com/products/someproduct
So why does it match only the "someproduct" part when that rule would more apply if it states start at the END. I mean there are characters different than a slash in the part before the "someproduct"... it should like match only "http:"
Another thing.
In "RewriteCond %{DOCUMENT_ROOT}/products/$1/ !-d" why is there a "$1" ? There hasn't been anything saved in "$1" yet as this is the first line, so why is "$1" there? Or does it mean "if the requested URL that has "products" in it, has a part after "products" eg. the $1, see if that part, $1, matches a directory, if it doesn't proceed further". So in this case "$1" instead of being a previously saved string is more of a variable that would later be checked if it matches a directory?
URL-paths "seen" by RewriteRules in an .htaccess file are localized to that .htaccess file's directory. Therefore, the entire URL-path seen by a RewriteRule in /products/.htaccess when "/products/someproduct/" is requested by the client will be "someproduct/" -- again, this is a documented behavior.
This is the reason that RewriteRule patterns in code for use in httpd.conf or conf.d --at the server config level, that is-- must include the leading slash, whereas in "/.htaccess" the rule pattern cannot include it... That's because the URL-path has been localized for use in /.htaccess, and the path to the current htaccess file's directory *is* "/". So the leading URL-path slash is removed by this localization process.
Jim