Forum Moderators: phranque

Message Too Old, No Replies

negated group at mod rewrite pattern

Can't have mod_rewrite to avoid rewriting when a couple strings are found

         

phoenix_fly

1:14 am on Sep 2, 2009 (gmt 0)

10+ Year Member



Hello Folks, Hello Morgan!

I've been doing many things using mod_rewrite, but this one is proving to be a real chalenge for me.

To get the best possible SEO, I want to offer google links that have the following structure:

Instead of
www.mysite.com/productsearch/hairspray
wich is a pretty good SEO friendly url,

I want
www.mysite.com/hairspray

The problem is that I have the legitimate subdirs at the webroot, and I don't want them to end up at the search script. So I am facing a negated group problem.

I tried this but didn't work:

RewriteCond %{REQUEST_URI} !^/(blog¦cgi\-bin¦mod_perl)
RewriteCond %{REQUEST_URI} ^(.+)$
RewriteRule ^([!\.]+) $ /home/mysite/www/mod_perl/busca.cgi?searchby=productname&keyword=$1 [L]

And putting the negated group pattern in the RewriteRule directive seems to be impossible - by the very nature of the directive, wich is a positive match. So the solution seemed to be near something on the direction above. But this still doesn't work.

Any ideas?

Thanks a lot fellows

Mark

jdMorgan

2:07 am on Sep 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you're over-complicating things, and mixing the exclusion (in the first RewriteCond) with the back-reference in the RewriteRule. The second RewriteCond appears to serve no purpose whatsoever.

Also note that your "negated-group" syntax was incorrect: Use "^" at the beginning of a [group] to indicate that the grouped characters should be rejected.

If you wish to exclude "legitimate subdirectories" and all URL-paths containing periods (e.g. images, CSS and JS files, robots.txt, sitemap.xml, etc.) from being rewritten, then you don't even need a RewriteCond. Just reject URL-paths containing periods or slashes in the RewriteRule pattern itself:


RewriteRule ^([^/.]+)$ /home/mysite/www/mod_perl/busca.cgi?searchby=productname&keyword=$1 [L]

Also, the substitution filepath appears to be excessively long. If this is a 'home server' then consider defining your DocumentRoot at a lower level, for example "DocumentRoot /home/mysite/www". Then you won't have to refer to such long filepaths, and you'll more likely be able to easily move this to a commercially-hosted server.

Jim

phoenix_fly

3:50 am on Sep 2, 2009 (gmt 0)

10+ Year Member



Hey Jim

Thanks for clearing things up, it was really messy.

But, hey, shouldn't I add a backslash before the period, so it is not interpreted as any-character?

RewriteRule ^([^/\.]+)

And, also, shouldn't I negate it as well?

RewriteRule ^([^/^\.]+)

Also, besides that, there's a problem with that on the directories issue. I do have a 'blog' directory wich can be seen by just typing 'www.mysite.com/blog'. Shoudn't that be a problem? I mean, the trailing slash does get added by apache, but I am not sure if mod_rewrite won't capture it before this happens.

Thanks a lot for the special attention Jim

Mark

BTW: Great tip the DocumentRoot, i'll definately do that.

jdMorgan

11:42 am on Sep 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Test.

If mod_dir does not 'fix-up' the missing-slash-subdirectory requests before your new busca.cgi rule executes, then you should add another rule to fix-up those slashless URLs, correcting that problem with an external redirect before attempting to execute the internal busca.cgi rewrite.

Also, review the regular-expressions 'escaping' rules; They are quite different inside and outside [groups].

Inside [groups], only "]", " " (space), "^" used as the first character, and "-" used as anything but the first character need to be escaped. No other 'regex function tokens' such as "." or "+" or "*" are recognized within groups, because those functions wouldn't make sense to use inside a group.

Jim

phoenix_fly

1:16 pm on Sep 2, 2009 (gmt 0)

10+ Year Member



Hey Jim

Thanks for the reply.

Perfect, it worked!
The trailing slash wasn't a problem at all.
(I just had to add a negation to the period, that was missing in your pattern and - contrary to the other ones - wasn't a product of my average-joe knowledge of regexp. Well, at least it worked this way - not sure it would your way too.)

RewriteRule ^([^/^.]+)$

Thanks a lot! All working just fine now!

Mark

jdMorgan

1:29 pm on Sep 2, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My original "^([^/.]+)$" pattern was correct.

Again, you need to review regex escaping requirements. Your pattern ^([^/^.]+)$ means "Match any URL-path that contains one or more characters which are NOT a slash, a carat, or a period. If used as the first character, the carat ("^") means, "Negate this entire group." A carat appearing in any but the first position within the group is taken a literal character to be matched as part of the group.

If my regex group pattern failed the first time you tried it, the likely reason is that you forgot to completely flush (delete) your browser cache before testing after changing your server-side code. If you don't delete your browser cache, then your browser will show you previously-cached pages and server responses, and it will not send a request to your server (unless the requested URL was previously marked as non-cacheable by code on your server).

Jim