homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
mod rewrite help
westman7




msg:3850820
 6:31 pm on Feb 16, 2009 (gmt 0)

Hello,

I'm trying to create an htaccess file using mod_rewrite for the following scenario:

Visitor clicks on:
http://www.example.com/somefolder/product-name.type/

Note: the '-' and '.' are important - I do have them in the url.

Once he/she clicks I want a script that is in the "somefolder" folder to get the product-name.type and generate a page:

http://www.example.com/somefolder/script.cgi?product-name.type

Note: you don't need any "id=" parameter to invoke the script (I could change that of course but I suppose it's irrelevant), just the "product-name.type" part.

So can anyone give me an example of a mod_rewrite syntax to use in this scenario...

I've toyed around for some time but I just can't get it to work.

[edited by: jdMorgan at 3:02 pm (utc) on Feb. 17, 2009]
[edit reason] example.com [/edit]

 

jdMorgan




msg:3851500
 3:05 pm on Feb 17, 2009 (gmt 0)

Hi westman7, and welcome to WebmasterWorld!

Please see this thread in our forum library to get started: Changing Dynamic URLs to Static URLs [webmasterworld.com]

Our Apache Forum Charter [webmasterworld.com] also provides links to useful documents and information on how to get the most from this forum.

Jim

westman7




msg:3851826
 9:24 pm on Feb 17, 2009 (gmt 0)

Here's what I've done so far:

Options +FollowSymLinks
RewriteEngine on

RewriteRule ^([^/]+)/?$ script.cgi?f=$1 [L]

I've set the script to print the parameter it gets, and for some odd reason it prints out its name 'script.cgi'... I really have no idea what am I not doing right in the syntax?

jdMorgan




msg:3851926
 11:46 pm on Feb 17, 2009 (gmt 0)

Your regular expressions pattern isn't specific enough; An initial request get rewritten to /script.cgi, and then that request for /script.cgi gets rewritten to /script.cgi again (Rewrites in .htaccess are recursive).

The easiest solution is to either make the pattern more specific -- for example "^([^/.]+)/?$", or to explicitly exclude /script.cgi requests from being re-rewritten by using a RewriteCond on the rule.

Note also that your pattern accepts a request_URI with or without a trailing slash. I suggest that you pick one or the other form, externally 301-redirect the non-preferred form to the preferred form, internally rewrite only the preferred form to your script, and so avoid duplicate content issues.

Jim

westman7




msg:3853432
 9:47 pm on Feb 19, 2009 (gmt 0)

Should there be any change to the syntax if instead of script.cgi?f=$1 I make it work like script.cgi?$1 ... because it doesn't work that way and I don't know if the reason is in the htaccess file.

jdMorgan




msg:3853482
 10:35 pm on Feb 19, 2009 (gmt 0)

Whatever you script requires is what you should pass to it. If it wants a name/value pair of "f=<value>" then that's what you should use when rewriting. There's no magic here -- and the behavior will be predictable if your syntax is correct.

You didn't reply to the issue that I wrote about above. If you are using a RewriteCond testing %{REQUEST_URI} to prevent looping, then be aware that query strings will NOT be included in %{REQUEST_URI}, and can only be tested using a RewriteCond examining %{QUERY_STRING} or %{THE_REQUEST}.

Jim

westman7




msg:3853553
 12:05 am on Feb 20, 2009 (gmt 0)

Hello Jim,

Thanks for helping me out.

I used the following: ^([^/.]+)/?$

The RewriteCond is like totally obscure for me right now.

I think the problem is that I also pass a dot '.' in my <value> string. I'm trying to figure out a way to explicitly allow this symbol in the syntax.

Also about the trailing slash I didn't quite get it - it does work either way, so why change it? What's the logic behind the 301 redirect?

Please excuse me if my questions are totally off beat, but this is REALLY new material for me.

Thank you again for all your help!

jdMorgan




msg:3853704
 5:40 am on Feb 20, 2009 (gmt 0)

The *problem* is that it would work either way. The result is that the same content would be accessible from the Web at two different URLs -- xyz/ and xyz, both returning the content generated by a script call for "f=xyz". Returning the same content in response to requests for two different URLs is what we call "Duplicate content" and if you or anyone else links to both variants, causes those two URLs to "compete" for search ranking.

Major search engines run de-duplication processes in their back-ends, but do you want to depend on them to "pick" the right URL? What if they screw up and pick the wrong one -- the one with the currently-worst ranking? What if they set the threshold incorrectly, and decide that only two URLs with the same content qualify to mark the domain as "spammy?"

I don't know about you, but I never "rely on the kindness of strangers" when my sites' ranking/revenue is at stake. It is an almost-trivial matter to pick either xyz or xyz/ as the preferred "canonical" URL, link only to that one, and redirect the non-canonical variant to the canonical URL.

If you pass a dot in the URL, which then gets rewritten into the "f=" query for use by your script, then you must use a RewriteCond here. You will also have problems because you may have to exclude "well-known files" and "non-page" (e.g media file) URLs from being rewritten as well, because their URLs contain dots: robots.txt, sitemap.xml, labels.rdf, favicon.ico, w3c/p3p.xml, etc.

RewriteCond $1 !^robots\.txt$
RewriteCond $1 !\.(gif¦jpe?g¦ico¦css¦js¦pdf)$
RewriteCond $1 !^script\.cgi$
RewriteRule ^([^/]+)/?$ script.cgi?f=$1 [L]

Change the broken pipe "¦" characters above to solid pipes before use; Posting on this forum modifies the pipe characters.

Jim

[edited by: jdMorgan at 5:43 am (utc) on Feb. 20, 2009]

westman7




msg:3854008
 3:39 pm on Feb 20, 2009 (gmt 0)

I was thinking, if I originally know exactly what kind of values can be called (they are static) wouldn't it be better to just set those in the htaccess file instead of trying to come up with every possible scenario as this seems more prone to bugs?

So for example in the RewriteCond I set rules to match all the possible values? Like:

RewriteCond =[dir.name1]$
RewriteCond =[dir.name2]$
etc.

Another question, partially related to the trailing slash as well.

This whole thing is located within a subfolder on my domain, eg. example.com/somefolder/script.cgi should I place the htaccess in that subfolder or use the top level htaccess to handle things globally? In this scenario I could use just one 301 rule for the whole domain - eg. all URIs that do not end with a slash would be redirected to ones that end up with a slash, eg:

domain.com -> domain.com/
domain.com/subfolder -> domain.com/subfolder/
domain.com/subfolder2 -> domain.com/subfolder2/
domain.com/subfolder3/subfolder4 -> domain.com/subfolder3/subfolder4/

Again sorry of all this seems off beat, I'm not a programmer, I know just some very basic stuff, so it's quite a challenge for me to get this working. I'm starting to get this slowly but considering I just got into the whole matter just a few days ago I'll probably ask a few more dumb questions...

Thanks again for your time and help!

jdMorgan




msg:3854270
 8:37 pm on Feb 20, 2009 (gmt 0)

The answer to both questions is that it is largely a matter of style -- and to a somewhat lesser extent, of code efficiency versus ease of administration. Consider what happens if your site is a huge success, and the number of subdirectories doubles, or quadruples, or increases by a factor of 10... or more.

Be careful to consider all possibilities when thinking up URL-handling strategies. For example, you very likely do not want to redirect "robots.txt" to "robots.txt/" just because it does not end with a slash!
(Hint: it does not end with a slash, but it *does* contain a period in the final URL-path-part. Use both factors to decide if you want to redirect or not.)

Always look at these ideas from multiple angles, and ask yourself, "What if...?"

Jim

westman7




msg:3855288
 4:13 pm on Feb 22, 2009 (gmt 0)

OK, I've decided to go with a separate htaccess for the subfolder, and a general rule instead of listing each value separately. But I just can't find a way to create a rule for the dot. To be frank, I can't even come up with the logic for it. Can you advise?

jdMorgan




msg:3855394
 9:04 pm on Feb 22, 2009 (gmt 0)


# Redirect to add a missing slash only if no filetype in final URL path-part
RewriteRule ^(([^/]+/)*[^/.]+)$ http://www.example.com/$1/ [R=301,L]

The pattern reads "Match anything but a slash followed by a slash, and as many of those subpatterns as possible (zero or more), followed by anything except slashes or periods." If there is a match, we take the whole matched URL-path, prefix it with the protocol and domain, add a trailing slash, and redirect.

Note how the second subpattern here matches the one you are already using for the internal rewrite rule already discussed above. The changes to that internal rewrite rule pattern after adding this redirect will be to remove the question mark after the final slash, to make it non-optional, and to remove the RewriteConds, since they are no longer needed:

RewriteRule ^([^/]+)[b]/$[/b] script.cgi?f=$1 [L]

However, if you add any subirectories at all, then they will need to be excluded from this rule, and your script/CMS must not allow real subdirectory names to be used as path names that will later become "f=" values.

That can be problematic, so you might actually want to consider either NOT using trailing slashes, or perhaps even 'tagging' all URL-path names that will become "f=" values by putting a unique directory path in front of them. This largely eliminates the possibility of "collisions" between real (sub)directory paths, and the virtual ones used to feed "f=" into your script. For example, instead of using a URL of example.com/<f=value_here>, use example.com/products/<f=value_here>. The "products" path can now be used to unambiguously determine that the "subdirectory" that follows it is an "f=" value, and not a real subdirectory.

Jim

westman7




msg:3855481
 11:33 pm on Feb 22, 2009 (gmt 0)

Thanks Jim!

Yes, that's exactly how I'm doing it - with a "products" subdirectory, so these rules will apply only there (I have a separate htaccess there with these rules).

westman7




msg:3857631
 6:22 pm on Feb 25, 2009 (gmt 0)

Sorry to bother you again, but I can't figure out why it would all work when I call it like example.com/products/someproduct/ eg. with an trailing slash, but not when I don't use it?

jdMorgan




msg:3857637
 6:27 pm on Feb 25, 2009 (gmt 0)

Did you add the additional rule I suggested two posts up?

Jim

westman7




msg:3857754
 8:24 pm on Feb 25, 2009 (gmt 0)

Yes.

When I try it without a dot it works both ways - with and without a trailing slash, and when it's without it does redirect to the trailing slash version.

However when I use a dot, it'll only work if I got to the trailing slash version. Otherwise it gives a 404 error as if it were actually looking for such a file/path on the server...

jdMorgan




msg:3857776
 8:52 pm on Feb 25, 2009 (gmt 0)

Yeah, there's a logical inconsistency with putting a dot before a slash in a URL, trying to make this code work, and canonicalize the products URLs as well.

I suggest you remove the trailing slashes from the URLs on your pages which are to be rewritten to your script, and then replace both rules with

# If requested "products" URL with trailing slash does not resolve to an existing subdirectory
# of this /products directory, externally redirect to remove the trailing slash
RewriteCond %{DOCUMENT_ROOT}/products/$1/ !-d
RewriteRule ^([^/]+)/$ http://www.example.com/products/$1 [R=301,L]
#
# Internally rewrite slashless "products" URLs which do not resolve to existing files to script.cgi
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([^/]+)$ /products/script.cgi?f=$1 [L]

This will redirect any non-existent trailing-slash "products" URLs to a non-trailing-slash URL to fix up existing bookmarks and links out on the Web, and then rewrite any non-trailing-slash "products" URL that does not resolve to a physical file or directory to your script.

I don't normally like to use file- or directory-exists checks, but since this code is only applied in one subdirectory, the performance impact shouldn't be so bad, even on a busy server.

Jim

[edit] Corrected as noted below. [/edit]

[edited by: jdMorgan at 12:10 am (utc) on Feb. 26, 2009]

westman7




msg:3857831
 10:07 pm on Feb 25, 2009 (gmt 0)

Hmm, it gives a 500 Internal Error and I'm sure it's not the script itself...

g1smd




msg:3857909
 11:57 pm on Feb 25, 2009 (gmt 0)


..

jdMorgan




msg:3857922
 12:12 am on Feb 26, 2009 (gmt 0)

Sorry, the first line in the second rule should be a RewriteCond, not a RewriteRule. See correction to code in the post above.

When you get a 500 Server error, please report the relevant contents of your server error log.

Jim

westman7




msg:3858167
 1:32 pm on Feb 26, 2009 (gmt 0)

Thank you Jim!

It all works perfectly nice now!

I even understand the logic of the syntax now for most of its part, just the ^([^/]+)/$ and ^([^/]+)$ are still kind of obscure. I mean I get what symbols mean, but I still need some more reading to actually understand what the whole expression means. Maybe I need to practice with a text file or something.

Anyways, thanks again buddy! You've been a true blessing!

P.S. Sorry for the server error log - I'll know better next time.

jdMorgan




msg:3858176
 1:44 pm on Feb 26, 2009 (gmt 0)

The pattern "^([^/]+)/$" means "Start at the beginning of the requested URL-path and match one or more characters that are not slashes, save that matched sub-string in $1, and require a slash at the end."

Or equivalently, "Starting at the beginning of the requested URL-path, match all characters up to but not including the first slash found, save that matched sub-string in $1, then require a final slash."

Jim

westman7




msg:3858534
 7:28 pm on Feb 26, 2009 (gmt 0)

But there is one thing I don't get about "^([^/]+)/$"...

The URL-path would be http://www.example.com/products/someproduct

So why does it match only the "someproduct" part when that rule would more apply if it states start at the END. I mean there are characters different than a slash in the part before the "someproduct"... it should like match only "http:"

Another thing.

In "RewriteCond %{DOCUMENT_ROOT}/products/$1/ !-d" why is there a "$1" ? There hasn't been anything saved in "$1" yet as this is the first line, so why is "$1" there? Or does it mean "if the requested URL that has "products" in it, has a part after "products" eg. the $1, see if that part, $1, matches a directory, if it doesn't proceed further". So in this case "$1" instead of being a previously saved string is more of a variable that would later be checked if it matches a directory?

jdMorgan




msg:3858547
 7:41 pm on Feb 26, 2009 (gmt 0)

RewriteRule patterns are matched before any RewriteConds are evaluated, and therefore, $1 *has* been populated, and can be used by any of the rule's RewriteConds... See the Apache mod_rewrite documentation "Rule Processing", please.

URL-paths "seen" by RewriteRules in an .htaccess file are localized to that .htaccess file's directory. Therefore, the entire URL-path seen by a RewriteRule in /products/.htaccess when "/products/someproduct/" is requested by the client will be "someproduct/" -- again, this is a documented behavior.

This is the reason that RewriteRule patterns in code for use in httpd.conf or conf.d --at the server config level, that is-- must include the leading slash, whereas in "/.htaccess" the rule pattern cannot include it... That's because the URL-path has been localized for use in /.htaccess, and the path to the current htaccess file's directory *is* "/". So the leading URL-path slash is removed by this localization process.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved