Forum Moderators: phranque

Message Too Old, No Replies

Help on regex needed :(

         

sleidia

11:43 pm on May 12, 2010 (gmt 0)

10+ Year Member



Hello :)

I've noticed that Google has indexed my site with
unapropriate urls coming from nowhere.

So, I would like to know how to force spiders to do the three
following things altogether with mod_rewrite I suppose :

1.
index
http://www.example.com/anydirectory/
instead of
http://www.example.com/anydirectory/index.php

2.
index
http://www.example.com/anydirectory/
instead of
http://www.example.com/anydirectory/?anyvar=anydata

3.
index
http://www.example.com/anydirectory/
instead of
http://www.example.com/anydirectory/index.php?anyvar=anydata

Thanks in advance for the help :)

jdMorgan

1:26 am on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We can help, but you haven't posted anything to help with...

Mod_rewrite is one of the best tools for this problem. Query strings can be tested with

RewriteCond %{QUERY_STRING} ^query-string-pattern$

or
 RewriteCond %{THE_REQUEST} ^[A-Z]+\ /[i]url-path-pattern[/i]\?[i]query-string-pattern[/i](#[^\ ]*)?\ HTTP/ 


and this second variant will be needed to stop a potential infinite loop in this case.

Please post your best effort at coding a solution, so we have a basis for discussion.

If the reason for the nature of this reply isn't clear, please review our Forum Charter (link at top of this page). The Charter also contains links to particularly-useful documentation and tutorials.

Thanks,
Jim

sleidia

3:35 am on May 13, 2010 (gmt 0)

10+ Year Member



Hi Jim,

I'm sorry but I'm a total beginner concerning Mod_rewrite.

As for the removal of any query string, I've found and successfully tested this one :
RewriteCond %{QUERY_STRING} .
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

But when I test with :
http://www.example.com/?
... the question mark doesn't get removed.
Is there a way to remove the question mark as well?

Also, when I add this rule in order to remove index.php, I get an internal server error :
RewriteCond %{THE_REQUEST} /index.php HTTP
RewriteRule (.*)index.php$ /$1 [R=301,L]

How do I combine the two so that it doesn't trigger errors?

Thanks to anyone who will be kind enough to help.

jdMorgan

4:22 am on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unescaped spaces cause server errors... :(

# 1 & 3. Redirect requests for /index.php in any directory
# to "/" in that same directory, removing any query string
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.php([?#][^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1? [R=301,L]
#
# 2. Redirect to remove query string from any directory-paths
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*\?[^\ ]*\ HTTP/
RewriteRule ^(([^/]+/)*)$ http://www.example.com/$1? [R=301,L]

Jim

sleidia

5:03 am on May 13, 2010 (gmt 0)

10+ Year Member



Wow Jim! You just saved me long painful hours :)
Thanks to you, I'm starting to understand how these things work.

Thanks 1000 times for that.

Last question to anyone else : is it possible to make it so
that the query string removal doesn't get applied on a specific
directory (cms interface) and all its subdirectories?

g1smd

8:41 am on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, add a negative match RewriteCond looking at REQUEST_URI and testing URL requests beginning with the specific unwanted characters.

sleidia

1:41 am on May 14, 2010 (gmt 0)

10+ Year Member



Hi g1smd,

By any luck, would you have a short code snippet that could illustrate your idea?

Thanks :)

g1smd

7:37 am on May 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The negative match is done with a ! before the pattern. The pattern will be the path you want to test, and as said above it will be a RewriteCond testing REQUEST_URI.

I had assumed those simple instructions would have allowed you to try:

RewriteCond %{REQUEST_URI} !^/somepath