homepage Welcome to WebmasterWorld Guest from 54.234.128.25
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
mod rewrite: string added at the end problem
mod_rewrite: string added at the end problem
spok




msg:4150539
 3:26 pm on Jun 10, 2010 (gmt 0)

Hello, please help me with this:
I want to redirect permanently all incoming requests to domain/product [or] application/item
this code makes the work, but in addition it adds whole query string at the end as well


RewriteCond %{QUERY_STRING} (product|application)=(.*)
RewriteRule ^(.*)$ %1/%2 [R=301,L]

it transforms http://www.example.com/?page=2&application=electronics

into http://www.example.com/application/electronics?page=2&application=electronics

I need to remove the string after "?" including ?.
Thanks in advance

spok

 

jdMorgan




msg:4150615
 4:40 pm on Jun 10, 2010 (gmt 0)

The default behavior of RewriteRule is to pass-through query strings unchanged. Add a question mark to the end of the substitution path to clear the query string (See the mod_rewrite RewriteRule documentation at apache.org).

Jim

spok




msg:4151136
 12:04 pm on Jun 11, 2010 (gmt 0)

thanks, i should have seen it in there.
another question: I want to rebuild the page into nice looking urls, i need to permenently redirect old type urls (?page=2&product=name_of_product) into "product/name_of_product" what is working, but internally look for content the old way. there is a loop in the next code I can't see:

RewriteRule index\.php(.*) $1 [R=301]

RewriteCond %{QUERY_STRING} (application|product)=(.*)
RewriteRule ^(.*)$ %1/%2? [R=301]

RewriteRule ^(application)\/(.+)$ ?page=2&$1=$2 [NC,L]
RewriteRule ^(product)\/(.+)$ ?page=3&$1=$2 [NC,L]

I think the redirecting would fit more at the end, but I have read that redirecting comes first. The code doesn't work either way.

When I try the lines 3&4 or 5&6 separately, both do their work, but together they don't.

g1smd




msg:4151138
 12:13 pm on Jun 11, 2010 (gmt 0)

You have an infinite loop, as the rewritten filepath matches the pattern to redirect again.

Fix it by adding a preceding RewriteCond checking THE_REQUEST for the redirects only. In that way, only direct client requests will be redirected.

There's several thousand prior examples of similar code here.

spok




msg:4151185
 1:30 pm on Jun 11, 2010 (gmt 0)

thanks g1smd, also for the last remark, helped a lot, working perfectly

jdMorgan




msg:4151193
 1:53 pm on Jun 11, 2010 (gmt 0)

Possibly two infinite loops, actually, if the original first rule was meant to redirect direct client requests for "/index.php" to "/" and if DirectoryIndex is defined as "index.php"

Original first rule:
Client: "GET /index.php"
First rule: "/index.php"->301->"/"
mod_dir: "/"->rewrite->"/index.php"
First rule: "/index.php"->301->"/" LOOP!

Original second rule:
Client: "GET /?product=widget"
Second rule: "/?product=widget"->301->"/product/widget"
Third rule: "/product/widget"->rewrite->"/?product=widget"
Second rule: "/?product=widget"->301->"/product/widget" LOOP!

Due to lack of comments in the code or any description in the post, I'm not sure what the purpose of the original first rule was, so I provide two different replacements here -- with comments. As I was guessing at original intent, the first of these two new alternate rules may still need tweaking, even if it is "close" to what was intended. Note that it works with additional URL-path info, and not with appended query strings, which are not 'visible' to the RewriteRule pattern.

The new third rule is the "bog-standard" /index-to-slash redirect.

# Externally redirect to remove additional URL-path info following
# "/index.php" e.g. "GET /index.php/<[i]additional-directory-and/or-filepath[/i]>"
RewriteRule ^index\.php/(.*)$ http://www.example.com/$1 [R=301,L]
#
# Externally redirect direct client requests for old dynamic URLs to new 'SE friendly' URLs
# (e.g. redirect requests for "/?product=widget" to "www.example.com/product/widget" or
# redirect requests for "/index.php?product=widget" to "www.example.com/product/widget")
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(index\.php)?\?page=2&(application)=([^&\ ]*)\ HTTP/ [NC,OR]
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(index\.php)?\?page=3&(product)=([^&\ ]*)\ HTTP/ [NC]
RewriteRule ^(index\.php)?$ http://www.example.com/%2/%3? [R=301,L]
#
# Externally redirect direct client requests for "/<any-directory>/index.php" to "/<any_directory>/"
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/]+/)*index\.php([?#][^\ ]*)?\ HTTP/
RewriteRule ^(([^/]+/)*)index\.php$ http://www.example.com/$1 [R=301,L]
#
# Last (least-specific) redirect: Redirect all non-blank non-canonical hostname requests to canonical hostname
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
# Internally rewrite SE-friendly URLs to script filepath+query
RewriteRule ^(application)/(.+)$ /index.php?page=2&$1=$2 [L]
RewriteRule ^(product)/(.+)$ /index.php/?page=3&$1=$2 [L]

By checking THE_REQUEST, we ensure that the dynamic-URL request being redirected came directly from the HTTP client, and is not a result of the friendly-URL-to-script rewrite rule's action (or in the case of the new second rule, not a result of mod_dir's action).

Note that the [NC] flags have been removed from the internal rewrite to prevent duplicate content. If it is possible that requests for "application" or "product" with casing errors will be received, then I suggest rewriting mis-cased requests to a script which can easily do a case-conversion and then generate a 301 redirect to the corrected-case URL. It *is* possible to do case-conversion in mod_rewrite in a .htaccess context, but it is horribly inefficient because it can correct only one character at a time (it can therefore take several seconds to 'fix' a request), and I do not recommend it.

In addition, the target filepath of the new internal rewrite rules are now explicit, in order to avoid the necessity of subsequently invoking mod_dir to rewrite the request from partial-path "/?page=3&product=widget" to filepath "/index.php/?page=3&product=widget". This improves efficiency (well, as long as my assumption that "/" is mapped to "/index.php" by mod_dir's DirectoryIndex is correct). :)

Note also that all redirect rules should specify a protocol and the canonical hostname in the substitution URL in order to prevent problems if the configured ServerName is non-canonical and UseCanonicalName is set to "On" in the server configuration.

Always comment your code correctly and concisely -- You will be glad you did when you re-visit the code later.

Jim

artefaqs




msg:4151460
 10:15 pm on Jun 11, 2010 (gmt 0)

Shocked! Shocked, I am to see someone else having the exact same problem I am. Thanks to jdMorgan for a quick solution!

I do have two further questions, though --

If I have this line in my htaccess:

Redirect /A/B.php http://www.example.com/C/D.php

Is this a 301, or a 302 redirect?

Also, is there some magic regexp that could replace all of the underscores in a URL with hyphens, instead? The idea is to avoid several hundred of the following type of redirects:

Redirect /web_page http://www.example.com/web-page
g1smd




msg:4151462
 10:25 pm on Jun 11, 2010 (gmt 0)


You can test your Redirect by using Live HTTP Headers for Firefox. It will tell you in seconds whether it is a 301 or 302 redirect.


Yes, the underscores can be replaced with hyphens.

What is the maximum number of underscores there will ever be in a URL?


Whatever this "Redirect" actually turns out to be, and now that you are using RewriteRule on this site, you must also change this "Redirect" rule to use the RewriteRule syntax with the [R=301,L] flags.

artefaqs




msg:4151469
 10:49 pm on Jun 11, 2010 (gmt 0)

That's an interesting question. I guess the /maximum/ number of underscores would be something in the area of 10. Most have between two and four.

I dunno about the flags. Every time I try that, it does bad things. For example this works fine:

Redirect 301 /A/1028/B_C_D.php http://www.example.com/A/1028/E-F-G.php?


This throws an error:

Redirect /A/1028/B_C_D.php http://www.example.com/A/1028/E-F-G.php? [R=301,L]


Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Please contact the server administrator, and inform them of the time the error occurred, and anything you might have done that may have caused the error.

More information about this error may be available in the server error log.

g1smd




msg:4151480
 10:59 pm on Jun 11, 2010 (gmt 0)

Yes, you must replace the word "Redirect" with the word "RewriteRule" and then code this rule using the correct "RewriteRule" syntax.


The underscores rule is going to hit the skids if there are more than nine in a URL.

Where are the underscores in relation to any slashes in the URL. Is there any consistency, or a pattern? Does the number of slashes vary?

The idea is to document "exactly" what a matching URL looks like and then build a RegEx pattern that exactly and efficiently matches it.

artefaqs




msg:4151487
 11:05 pm on Jun 11, 2010 (gmt 0)

Thanks for helping with this. I appreciate your patience.

There are always exactly three slashes in the URL (well, five if you count the [)...] and they're always in the same place.

http://www.example.com/A/913/B_C_D.php

The only parts that will vary is the B_C_D area. Sometimes it will be B_C, sometimes B_C___D_E_F, sometimes... who knows.

I think it's OK if it works up to nine underscores. 10 was really just a guesstimate.

g1smd




msg:4151490
 11:10 pm on Jun 11, 2010 (gmt 0)

OK. If there are *never* any underscores before the final slash, the rule can potentially be a LOT more efficient in operation.

For underscores in a row, with no other intervening characters, is there any pattern to that? Is it always the second block, or anywhere? How many in a row, maximum?

artefaqs




msg:4151492
 11:18 pm on Jun 11, 2010 (gmt 0)

There are never any underscores before the final slash.

Underscores in a row are uncommon, but do happen. It's fairly random. Never more than three, though.

artefaqs




msg:4151547
 3:44 am on Jun 12, 2010 (gmt 0)

Thanks for your help on this. I ended up doing it in the document's PHP instead of at the server level. It may not be the best way to do it, but at least it's a way I understand.

For those who Google this in the future, here's what I put at the top of my page:

$HyphenatedURL=str_replace("_", "-", $_SERVER['REQUEST_URI']);
if ($HyphenatedURL != $_SERVER['REQUEST_URI'])
{
Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://{$_SERVER['HTTP_HOST']}{$HyphenatedURL}" );
}

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved