Forum Moderators: phranque

Message Too Old, No Replies

rewrite condition based on first instance in string

         

superbluecouch

5:50 am on May 23, 2007 (gmt 0)

10+ Year Member



I'm trying to create a rewrite that will allow an external URL to be passed in line with my domain but won't effect internal links to thinks like images. Here is what I have:

<IfModule mod_rewrite.c>
Options FollowSymLinks
RewriteEngine on
RewriteCond %{REQUEST_URI}!(^/path/.*(gif¦php¦html¦css¦jpg¦png¦js¦asf¦avi¦wmv¦swf¦xsl¦jar)) [NC]
RewriteRule ^path/(.+) /index.php?q=$1 [L]
</IfModule>

This works correctly if you were to do something like this:
http://example.com/path/webmasterworld.com
or
http://example.com/path/foo.jpg

But I want it to also allow a url like:
http://example.com/path/webmasterworld.com/foo.jpg
As of now that URL would be caught in the condition.
I was thinking that I could look for only the first instance of a period in the URI and then analyze that to see if it's a file extension or a domain extension but I'm not sure how to write that.

Thanks in advance.

jdMorgan

2:27 pm on May 23, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you'll find "end-anchoring" and negative-match regular-expressions patterns to be a salvation here:

This modified RewriteCond excludes only URL-paths containing a single period preceding the filetype. Your URL_paths which include domain names, and those which do not include one of the specified filetypes will not be excluded, and therefore will be rewritten and passed to your script.


RewriteCond %{REQUEST_URI} !^/path/[b][^.]+\.([/b]gif¦php¦html¦css¦jpg¦png¦js¦asf¦avi¦wmv¦swf¦xsl¦jar[b])$ [/b] [NC]
RewriteRule ^path/(.+)$ /index.php?q=$1 [L]

Note that using a negative-match pattern here both helps solve the problem, and is MUCH faster to process than the ambiguous and greedy ".*" pattern -- Avoid using the ".*" pattern wherever possible in patterns which are not fully-anchored, and avoid using multiple ".*" subpatterns in any pattern at almost any cost.

Jim

superbluecouch

5:10 pm on May 24, 2007 (gmt 0)

10+ Year Member



Hey Jim, thanks for the help! That worked for me but the problem I'm having is that I don't want to block any of the files from the added URL in the path, I only want certain file types on my own server to be excluded that way I don't need a long test to see if the first extension is a file extension or a domain extension, I'll know exactly what extensions I'm using (mainly php, html, gif, and jpg) and I can just test for those. Does that make sense?

jdMorgan

7:18 pm on May 24, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, if I'm understanding what you said, then try this:

RewriteCond %{REQUEST_URI} !^/path/([^.]+\.)+(gif¦php¦html¦css¦jpg¦png¦js¦asf¦avi¦wmv¦swf¦xsl¦jar)$ [NC]
RewriteRule ^path/(.+)$ /index.php?q=$1 [L]

The only functional difference between this and what you had initially is the "$" end-anchor on the pattern, which means that the last part of the URL-path is tested for a filetype, and if it is one of the listed filetypes, then the rule is by-passed.

However, the pattern is again optimized to allow matching in only a few passes, as opposed to the original pattern, which would have required many back-off-and-retrys to get a match.

You can shorten the list of filetypes as you like.

Jim

[edited by: jdMorgan at 7:25 pm (utc) on May 24, 2007]

jdMorgan

7:25 pm on May 24, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



On review, I think the first code I posted does what you want. Did you completely flush your browser cache before testing?

Jim

superbluecouch

5:02 am on May 26, 2007 (gmt 0)

10+ Year Member



It could be that I've just confused myself but I did flush my cache and I was getting different results. Let me try one other way to explain it since I don't feel like I've been all that helpful up until this point.

1. mysite.com/path/example.jpg
This should NOT be redirected
This works with my condition but not your's

2. mysite.com/path/webmasterworld.com
This should be redirected
This works with both your condition and mine

3. mysite.com/path/webmasterworld.com/example.jpg
This should be redirected
This works with your condition but not mine

if possible I'd need to deal with subdomains too, so:

4. mysite.com/path/subdomain.webmasterworld.com
This should be redirected

5. mysite.com/path/subdomain.webmasterworld.com/example.jpg
This should be redirected

Is this helping at all or am I making things worse?
Thanks for your patience.

jdMorgan

5:00 pm on May 26, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You didn't say which of my conditions you were referring to, and you're using the term 'redirected' instead of 'rewritten' (they are NOT the same thing), but the first rule I posted:

RewriteCond %{REQUEST_URI} !^/path/[^.]+\.(gif¦php¦html¦css¦jpg¦png¦js¦asf¦avi¦wmv¦swf¦xsl¦jar)$ [NC]
RewriteRule ^path/(.+)$ /index.php?q=$1 [L]

meets all of your conditions:

> 1. mysite.com/path/example.jpg
> This should NOT be [rewritten]
Because the filetype ".jpg" is not preceded by any additional periods, this URL-path matches the pattern, the match is negated by "!", and therefore the rule *is not* applied, and no rewrite takes place.

> 2. mysite.com/path/webmasterworld.com
> This should be [rewritten]
Because there is no matching filetype, this URL-path does not match the pattern, the non-match is negated by "!", and therefore the rule *is* applied, and the URL is rewritten.

> 3. mysite.com/path/webmasterworld.com/example.jpg
> This should be [rewritten]
Because there is an additional period preceding the ".jpg" filetype, this URL path does not match the pattern, the non-match is negated by "!", and therefore, the rule *is* applied and the URL is rewritten.

By way of explanation, the pattern "!^/path/[^.]+\.(gif¦php¦html¦css¦jpg¦png¦js¦asf¦avi¦wmv¦swf¦xsl¦jar)$" means "match a URL-path starting with '/path/' followed by one or more characters not a period, followed by a period, and ending with one of 'gif', ..., or 'jar', and then invert the match/no-match result."

This pattern will not match any URL-path not starting with "/path/", containing more than one period, or not ending with one of the specified filetypes, and then the "!" NOT operator reverses this match/no-match result.

So that's why I asked if you were completely flushing your browser cache before testing the new code: Doing a Reload or closing and restarting the browser is not sufficient -- You must go into the browser options and explicitly empty the cache.

Also, make sure you are replacing the broken pipe "¦" characters in the patterns with solid pipes before use; Posting on this forum modifies the pipe characters, the change is easy to overlook, I forgot to mention this above, and this may in fact be the cause of the problem. However, having already typed all of the above, I'll leave it, in case this is not the problem.

We will discuss subdomains later; Although the rule above appears to meet this requirement as well, one must never add complexity to a simpler but unsolved problem... :)

Jim