homepage Welcome to WebmasterWorld Guest from 54.163.139.36
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
HTACCESS rewrite URL and remove Querystring
lostdreamer



 
Msg#: 4171357 posted 10:21 am on Jul 16, 2010 (gmt 0)

Hi all,

I'm sure this has been asked in other forms, but I can't for the likes of me get this thing working after googling/trying for hours.

I have a bunch of URLs like this:

example.com/scripts/gateway.php?action=SomeStaticAction&section=SomeDynamicSection


What I want is that all URLs will only be avaiable with clean URLs.

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)\.html$ /scripts/gateway.php?action=SomeStaticAction&section=$1 [NC]


This works fine to create nice URLs like example.com/SomeStaticAction.html

Now I also want the URL
'/scripts/gateway.php?action=SomeStaticAction&section=SomeStaticAction' to 301 redirect to /SomeStaticAction.html

With everything I tried I either get no redirect or an infinite redirect because it will also redirect the querystring.

Can any Htaccess voodoo master help me with this?


Regards,
LostDreamer

 

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4171357 posted 11:50 am on Jul 16, 2010 (gmt 0)

Use a question mark on the redirect target to clear the query string.

The redirect also needs a preceding RewriteCond testing THE_REQUEST so that the redirect only fires for direct client requests, not as the result of a prior internal rewrite.

Theres several hundred prior examples of this code in this forum. It's a question that has been asked every week since the forum began.

lostdreamer



 
Msg#: 4171357 posted 12:59 pm on Jul 16, 2010 (gmt 0)

Thanks for answering so quickly.

I went through the first few pages here and tried the search function a bunch of times, read a lot of topics (again) and got a lot closer.

I have the following to parts now, but they appear to go in an infinite loop:


# All .html URLs should go internally redirect to the scripts/gateway.php
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)\.html$ /scripts/Gateway.php?action=TextAction&section=$1 [NC]

# test to 301 the old URLs to the clean ones
RewriteCond %{THE_REQUEST} ^(GET|HEAD)
RewriteCond %{REQUEST_URI} ^/scripts/(.*)
RewriteCond %{QUERY_STRING} ^.*section=([a-zA-Z]+).*$
RewriteRule ^scripts/Gateway.php$ /%1.html? [R=301,L]


What am I missing here?
If someone could help me get these to to not clash anymore, I'd be helped a lot.


Regards,
LostDreamer

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4171357 posted 1:22 pm on Jul 16, 2010 (gmt 0)

Change "internal redirect" in your comment to be "internal rewrite".

Change "301" in your comment to "external 301 redirect".

Add the domain name to the redirect target.

All the [L] flag to the rewrite. All rules usually need the [L] flag.

List the redirect code before the rewrite code.

Add your canonical non-www to www redirect code after the parameter redirect code, and before the internal rewrite code.

[edited by: g1smd at 1:25 pm (utc) on Jul 16, 2010]

lostdreamer



 
Msg#: 4171357 posted 1:23 pm on Jul 16, 2010 (gmt 0)

Aparently I was staring myself blind with this, his my working version:

# All other should go to the scripts/gateway
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} ^/(.*)\.html$
RewriteRule ^(.*)\.html$ /scripts/Gateway.php?action=TextAction&section=$1 [L]

# test to get only the clean URLs
RewriteCond %{THE_REQUEST} ^.*scripts/.*
RewriteCond %{REQUEST_URI} ^/scripts/(.*)
RewriteCond %{QUERY_STRING} ^.*section=([a-zA-Z]+).*$
RewriteRule ^scripts/Gateway.php$ /%1.html? [R=301]


I only needed to change the

RewriteCond %{THE_REQUEST} ^.*scripts/.*


Thanks a lot for giving me that heads up (and ofcourse all the great info here :) )

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4171357 posted 1:27 pm on Jul 16, 2010 (gmt 0)

There's other things to fix. Some of the patterns are less than optimal, but we'll get to that after you fix the things I highlighted in the post I added just seconds before your last one.

You can also combine your three RewriteConds testing THE_REQUEST, REQUEST_URI and QUERY_STRING into one RewriteCond testing the whole lot against THE_REQUEST.

lostdreamer



 
Msg#: 4171357 posted 1:58 pm on Jul 16, 2010 (gmt 0)

Thanks again g1smd for your help.
What I now have is the following (which is now working very nice):


# external 301 redirect old URLs to the clean ones
RewriteCond %{THE_REQUEST} ^.*scripts/(.*)section=([a-zA-Z]+).*
RewriteRule ^scripts/Gateway.php$ /%2.html? [R=301,L]

# www canonicalization
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]

# All .html URLs should internally rewrite to the scripts/gateway.php
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([a-zA-Z]+)\.html$ /scripts/Gateway.php?action=TextAction&section=$1 [NC,L]

[edited by: engine at 7:47 am (utc) on Jul 22, 2010]
[edit reason] member requested domain obfuscation [/edit]

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4171357 posted 2:05 pm on Jul 16, 2010 (gmt 0)

The most important fixes still to do:

1. Replace the (.*) patterns in the THE_REQUEST line with more efficient patterns. There's thousands of prior examples. The pattern usually begins ^[A-Z]{3,8} and ends HTTP\ with the bit in the middle designed to match the specific requests.

2. Add the canonical domain name to the redirect target in the first rule. If you don't, then consider that for a non-www request for an old URL, the first rule will redirect to new URL but still at non-www, and then the second rule will add the www. That's a "redirection chain" and is something that should be avoided. It is easily fixed.

StupidScript

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 4171357 posted 9:31 pm on Jul 16, 2010 (gmt 0)

As a precursor to posting my own issues with Rewrite, let's break down the the RewriteRule in this thread:

RewriteRule ^(.*)\.html$ /scripts/gateway.php?action=SomeStaticAction&section=$1 [NC]


PATTERN:

^(.*)\.html$

^ = beginning of the string to test
(.*) = any number of any character(s) placed in a match container
\. = escape the period (checks for a period, not 'any character')
html = string 'html'
$ = end of the string to test


This seems to MATCH index.html, thing.html, etc.

This is the PATTERN to seek, yes? Any number of any character(s) at the beginning of the test string followed by a period and the string 'html'.

Then the SUBSTITUTION:

/scripts/gateway.php?action=SomeStaticAction&section=$1 [NC]

/scripts/gateway.php?action=SomStaticAction&section= = INCLUDE this string
$1 = referencing the contents of the first match container in the PATTERN ((.*))
[NC] = case-insensitive


Seems to say: "Take whatever is in the matching container in the PATTERN and include it at the end of the SUBSTITUTION string."

So the result should be:

PATTERN MATCHED:
index.html

SUBSTITUTION TO PERFORM:
/scripts/gateway.php?action=SomeStaticAction&section=index

I am bringing this up because this thread and a few others around town seem to be indicating that the PATTERN to be matched is

/scripts/gateway.php?action=SomeStaticAction&section=$1 [NC]

and the SUBSTITUTION is

^(.*)\.html

"..to create nice URLs like example.com/SomeStaticAction.html"

I see that in subsequent messages, the OP seems to be sending everything into the /scripts/ directory and then rewriting everything back to get a "clean" URI with the second set of instructions, but in the first message it seems like the posted code would NOT result in a "clean" URI. Maybe that was a mistake? Maybe the first post should have ALSO included the second set of instructions? Otherwise, how could the OP have obtained "clean" URIs with what was posted?

I just don't get it.

What is in the "address bar" area before the rewrite?
What is in the "address bar" area after the rewrite?

Can someone please clarify the relationship between PATTERN and SUBSTITUTION? It seems exactly backwards in the first part of this thread, and unaddressed after that.

Thanks.

jdMorgan

WebmasterWorld Senior Member jdmorgan us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4171357 posted 12:37 am on Jul 17, 2010 (gmt 0)

I'd suggest:

# externally redirect direct client requests for old URLs to new clean ones
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /scripts/Gateway\.php\?([^&]*&)*section=([A-Za-z]+)(&[^\ ]*)?\ HTTP/
RewriteRule ^scripts/Gateway\.php$ http://www.example.com/%2.html? [R=301,L]
#
# externally redirect non-blank non-canonical hostname requests to canonical www host
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
# internally rewrite all .html URLs to scripts/gateway.php with filename passed as query parameter
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^([a-z]+)\.html$ /scripts/Gateway.php?action=TextAction&section=$1 [L]

Jim

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4171357 posted 1:36 am on Jul 17, 2010 (gmt 0)

The bit on the left, the pattern, is matched against the GET request sent by the browser.

The server then either
- sends a redirect header back to the browser and the details of a new URL for the browser to use to make a new request for that new URL; or,
- internally rewrites the request to silently fetch the content from a different location inside the server.

So, in the case of a rewrite, the user sees in their URL bar the URL contained in the href of the link they clicked on. The browser asks for that URL, and the server serves some content. The URL seen in the URL bar does not change.

Rewrites do not "make" URLs. URLs are defined in links. A rewrite tranlates a URL request into an internal server filepath to fetch the required content from.

The code above has a redirect and a rewrite. The redirect has [R=301,L] and the rewrite just [L]. The redirect target is a URL with domain name and path. The rewrite target is purely an internal server path with attached parameters.

That is:

1. website page contains link pointing to www.example.com/somepage. User clicks link. Browser requests "GET /somepage HTTP/1.1" from www.example.com server. Server processes rewrite to fetch content from /folder/index.php?page=somepage inside the server.

2. Searchengine results page contains link to old www.example.com/folder/index.php?page=somepage URL. User clicks link. Browser requests "GET /folder/index.php?page=somepage HTTP/1.1" from www.example.com server. Server processes request and generates 301 header and sends it to browser, and also generates "Location: www.example.com/somepage" header and sends it to the browser. Browser makes new request for "GET /somepage HTTP/1.1" and server responds exactly as detailed in step 1 above.

To be clear, Mod_Rewrite does not "make" URLs. It deals with requests coming into the server from the outside world. It then either tells the outside world to go make a different request for another URL (a redirect), or else it goes off and finds the content from inside the server but at a different internal location to that suggested by the path part of the incoming URL request (a rewrite).

It is the internal path pointer that is rewritten. For users to use new URLs, those are the URLs you need to link to in the pages of your site.

The (.*) is problematical. It matches the entire input, and then has to back-off-and-retry multiple times in an attempt to find a match. The initial match was "too greedy". Replacing it with ([^.]+) or similar will parse much faster.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved