Forum Moderators: phranque

Message Too Old, No Replies

Drop a query string from a url

using mod rewrite

         

madmatt69

2:24 am on Feb 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hey all,

I've got some code that keeps getting appended onto some urls:?start=0&postdays=0&postorder=asc&highlight=

I'd like to use mod rewrite to drop it until I can find out where in the code it's being generated.

I'd like to take the url from being "topic-vt123.html?start=0&postdays=0&postorder=asc&highlight=" to just being "topic-vt123.html"

I tried this code:

RewriteRule ^forums/(.+vt[0-9]+)\.html?start=0&postdays=0&postorder=asc&highlight=$ http://www.example.com/forums/$1.html [R=301,L]

But it didn't work.

Can anyone help me out with the code above?

Thanks!

jdMorgan

3:32 am on Feb 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You must use RewriteCond %{QUERY_STRING} to test and handle query strings.

[added] You will also need to append a "?" to the substitution (new) URL in order to tell mod_rewrite to clear the existing query string. [/added]

Jim

[edited by: jdMorgan at 3:44 am (utc) on Feb. 9, 2007]

madmatt69

3:54 am on Feb 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Something like this?

RewriteCond %{QUERY_STRING} ^forums/(.+vt[0-9]+)\.html?start=0&postdays=0&postorder=asc&highlight=$
RewriteRule http://www.example.com/forums/$1.html [R=301,L]

?

jdMorgan

2:37 pm on Feb 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No, because "forums/" is not part of a query-string, it's a URL-path. And you have no pattern at all in your RewriteRule. Test something like this, and let us know the test conditions and results:

RewriteCond %{QUERY_STRING} ^start=0&postdays=0&postorder=asc&highlight=
RewriteRule ^forums/([^\-]+-vt[0-9]+)\.html$ http://www.example.com/forums/$1.html? [R=301,L]

This will work only if the query string name/value pairs are always in that order. If not, then a more robust solution will be needed.

Jim

madmatt69

7:30 pm on Feb 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks again for the help -

The code didn't end up re-writing anything. When I go to the link in question, the query string remains..No changes.

Plan B? I'm trying to think of what else could work..

jdMorgan

7:57 pm on Feb 9, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, this is a very simple problem, so how about posting some details:
Where is the .htaccess file with this code - what is the URL-path or filepath to it?
What is an exact URL that you used to test - changing only the domain to comply with our TOS?
Do you have any other working RewriteRules (Yes/No)?

Jim

madmatt69

1:38 am on Feb 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's a biggie:


redirect 301 /old/old.shtml http://www.example.com/new/new.shtml

RewriteEngine On
RewriteBase /

RewriteCond %{QUERY_STRING} ^start=0&postdays=0&postorder=asc&highlight=
RewriteRule ^sub/([^\-]+-vt[0-9]+)\.html$ http://www.example.com/sub/$1.html? [R=301,L]

RewriteRule ^sub/(.+vt[0-9]+)_([0-9]+)\.htm$ http://www.example.com/sub/$1-$2.html [R=301,L]
RewriteRule ^sub/([^_]+)_([^.]+)\.html$ http://www.example.com/sub/$1-$2.html [R=301,L]
RewriteRule ^sub/([^_]+)\.htm$ http://www.example.com/sub/$1.html [R=301,L]
RewriteRule ^sub/.*-vp([0-9]+)\.html$ http://www.example.com/sub/post$1.html [R=301,L]
RewriteRule ^sub/.+/([^/]+\.html)$ http://www.example.com/sub/index.php [R=301,L]

RewriteCond %{HTTP_HOST}!^www.example\.com [NC]
RewriteRule ^(.*) http://www.example.com/$1 [QSA,R=301,L]

RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule . /wp-index.php [L]

RewriteRule ^sub/.+-vc([0-9]+)\.html$ /sub/index.php?c=$1 [QSA,L]

RewriteRule ^sub/.+-vf([0-9]+)-([0-9]+)\.html$ /sub/viewforum.php?f=$1&start=$2 [QSA,L]

RewriteRule ^sub/.+-vf([0-9]+)\.html$ /sub/viewforum.php?f=$1 [QSA,L]

RewriteRule ^sub/.+-vt([0-9]+)-([0-9]+)\.html$ /sub/viewtopic.php?t=$1&start=$2 [QSA,L]

RewriteRule ^sub/.+-vt([0-9]+)\.html$ /sub/viewtopic.php?t=$1 [QSA,L]

RewriteRule ^sub/post([0-9]+)\.html$ /sub/viewtopic.php?p=$1 [QSA,L]

RewriteRule ^sub/member([0-9]+)\.html$ /sub/profile.php?mode=viewprofile&u=$1 [QSA,L]

RewriteRule ^sub/sitemaps\.html$ /sub/sitemaps.php [QSA,L]

RewriteRule ^sub/mx-map\.html$ /sub/sitemaps.php?mx [QSA,L]

RewriteRule ^sub/forum-map\.html$ /sub/sitemaps.php?fim [QSA,L]

RewriteRule ^sub/.+-fmp([0-9]+)-([0-9]+)\.html$ /sub/sitemaps.php?fmp=$1&start=$2 [QSA,L]

RewriteRule ^sub/.+-fmp([0-9]+)\.html$ /sub/sitemaps.php?fmp=$1 [QSA,L]

RewriteRule ^sub/.+-sc([0-9]+)\.html$ /sub/sitemaps.php?c=$1 [QSA,L]

RewriteRule ^sub/rss-?(l¦s)?-?(m)?\.([xml¦xml\.gz]+)$ /sub/rss.php?$1&$2 [L]

RewriteRule ^sub/sub-rss-?(l¦s)?-?(m)?\.([xml¦xml\.gz]+)$ /sub/rss.php?forum&c&$1&$2 [L]

RewriteRule ^sub/([a-z]+)-rss([0-9]*)-?(l¦s)?-?(m)?\.([xml¦xml\.gz]+)$ /sub/rss.php?$1=$2&$3&$4 [L]

RewriteRule ^sub/.+-rf([0-9]+)-?(l¦s)?-?(m)?\.([xml¦xml\.gz]+)$ /sub/rss.php?forum=$1&$2&$3 [L]

RewriteRule ^sub/sitemaps\.([xml¦xml\.gz]+)$ /sub/sitemap.php [L]

RewriteRule ^sub/([a-z]+)-sitemap\.([xml¦xml\.gz]+)$ /sub/sitemap.php?$1 [L]

RewriteRule ^sub/.+-gf([0-9]+)\.([xml¦xml\.gz]+)$ /sub/sitemap.php?forum=$1 [L]

RewriteRule ^sub/urllist\.([txt¦txt\.gz]+)$ /sub/urllist.php [L]

# start mod_gzip
mod_gzip_on Yes
mod_gzip_can_negotiate Yes
mod_gzip_static_suffix .gz
AddEncoding gzip .gz
mod_gzip_update_static No
mod_gzip_command_version '/mod_gzip_status'
mod_gzip_minimum_file_size 500
mod_gzip_maximum_file_size 500000
mod_gzip_maximum_inmem_size 60000
mod_gzip_min_http 1000

mod_gzip_dechunk Yes
mod_gzip_add_header_count Yes
mod_gzip_send_vary Yes

# mod_gzip_temp_dir /tmp
# mod_gzip_keep_workfiles No

# not implimented yet, compression_level, maybe next version
# mod_gzip_compression_level9
mod_gzip_handle_methods GET POST

mod_gzip_item_exclude reqheader "User-agent: Mozilla/4.0[678]"
mod_gzip_item_exclude mime ^image/

mod_gzip_item_include file \.html$
mod_gzip_item_include file \.shtml$
mod_gzip_item_include file \.htm$
mod_gzip_item_include file \.shtm$
mod_gzip_item_include file \.php$
mod_gzip_item_include file \.phtml$
mod_gzip_item_include file \.js$
mod_gzip_item_include file \.css$
mod_gzip_item_include file \.pl$
mod_gzip_item_include handler ^cgi-script$
mod_gzip_item_include mime ^text/html$
mod_gzip_item_include mime ^text/plain$
mod_gzip_item_include mime ^httpd/unix-directory$

AddHandler application/x-httpd-php .php .shtm .shtml .htm .html .tpl .xml .txt
Options +FollowSymlinks -Indexes

<Files .htaccess>
deny from all
</Files>

Note 1) I put the code you recommended before to remove the query string as the first rewrite rule. 2) I added the line breaks to make things easier to read.

I tried re-ordering the htaccess file a little and the query string rewrite still doesn't work.

Any obvious fixes in there?

jdMorgan

3:21 am on Feb 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The main problem is that the pattern in the rule I posted won't properly match a request with multiple hyphens in it.

So the rule pattern needs to change:


RewriteCond %{QUERY_STRING} ^start=0&postdays=0&postorder=asc&highlight=
RewriteRule ^forum[b]s/(([^\-]+-)+v[/b]t[0-9]+)\.html$ http://www.example.com/forums/$1.html? [R=301,L]

Here, we use "([^\-]+-)+" meaning, one or more characters not equal to a literal hyphen, followed by a hyphen, and (parenthetically) one or more of those sequences."

This new pattern will now match the semi-anonymized example URL-path and query you provided (via stickymail), "/forums/bull-running-in-spain-vt1306.html?start=0&postdays=0&postorder=asc&highlight="

Despite its apparent complexity, this is a far more efficient pattern than the ".+-vt" that occurs in several of your other rules, because "([^\-]+-)+" allows the requested URL to be matched in a single left-to-right pass, unlike the ambiguous ".+-", which forces multiple retry attempts to find a match.

There is a second (and potentially major) problem that recurs several times in your code. You should replace the improperly-coded subpattern "([xml¦xml\.gz]+)" pattern with "(xml(\.gz)?)" wherever you find it. However, you should then re-evaluate the back-references ($1-$9) and make sure that adding that additional set of parentheses won't require you to change the back-reference number in the URL substitution. Because the "xml.gz" occurs at the end of the requested URL-path, I did not find any cases where that would be necessary, but keep this in mind.

The reason that the "([xml¦xml\.gz]+)" pattern is improperly coded is that characters in square brackets have no position dependency; The square brackets enclose a group of alternate characters, any of which will be accepted as a match. Therefore, the pattern [abc] will match "a", "b", or "c", and so will the pattern [cba] or [cab].

So, "[xml¦xml\.gz]+" will match any string containing (only) one or more of the characters "x", "m", "l", "g", "z", "." or "¦" in any order, which is obviously not what was intended. Try requesting a URL ending in "zg.lmx" (or even just one of any of those characters) and see what happens; If the rest of the URL matches one of your rules' URL-patterns, it will be rewritten!

Jim

madmatt69

3:32 am on Feb 10, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please tell me WebmasterWorld is paying you for your always helpful and educating posts.

If you're at the next pubcon, I owe some drinks.

It worked perfect, and my site definitely sped up thanks to optimizing some of those rules.

Hopefully it'll help take some pages out of supplemental now.

Thanks so much!