Forum Moderators: phranque

Message Too Old, No Replies

$ GET rewrite across all pages

Maintain current functionality of .htaccess

         

Readie

11:48 pm on Mar 17, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm in the process of setting up a completley new website which is as centralised as possible - with only index.php being accessible in the root directory; and all the page content being below-root php includes.

In the interest of not needing to modify the .htaccess file everytime a new page is created (and this site is going to be huge by the time I finish the end of my current "to do" list) I want a standardised approach to how I do the $_GET:

page | action1 | action2 | action3 | etc...

Now, a while ago Jdmorgan provided me with a htaccess script (see below) that forces requests with the extension (.php/.html) to a request without the extension, and I don't want to lose that functionality.

Current .htaccess:
DirectoryIndex /index.php
#
Options +FollowSymLinks -MultiViews
AcceptPathInfo off
RewriteEngine on
#
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^/?\ ]*/)*[^/.?\ ]+\.(php|html)(\?[^\ ]*)?\ HTTP/
RewriteRule ^(([^/]*/)*[^/.]+)\.(php|html)$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule ^(.*)$ http://www.example.com/$1 [R=301,L]
#
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(([^/]*/)*[^/.]+)$ /$1.php [L]
#
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule ^(([^/]*/)*[^/.]+)$ /$1.html [L]
#
ErrorDocument 400 /index.php?page=error&action1=400
ErrorDocument 401 /index.php?page=error&action1=401
ErrorDocument 403 /index.php?page=error&action1=403
ErrorDocument 404 /index.php?page=error&action1=404
ErrorDocument 500 /error500.html

Now, I can see a very easy way to do the 301's to lose trailing slashes:
RewriteRule ^(((([^/]+)/[^/]+)/[^/]+))/[^/]+)/$ http://www.example.com/$1 [R=301,L]

But I first want someone's opinion on the above, and I'm also not sure how to do the actual rewrites. I keep coming up against the following problem in my mind:

A blanket rewrite with no specified starting string will mess up all the images/.css - unless perhaps there is some way to do something like (forgive my usage of PHP to make my point here):
if(!preg_match('/^([^\.]+)\.([a-z0-9]{3,4})$/i', $_SERVER['REQUEST_URI')) {
// Rewrite
} else {
// Allow
}

I also have the suspicion that without careful phrasing a blanket rewrite will clash with the "force-no-extension" rewrite.

Any insight anyone could share would be very much appreciated on this one.

Regards,

Mike Read

g1smd

12:15 am on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Image file URLs usually have an extension. So do CSS and JS files, as does the robots.txt file etc.

Go for extensionless URLs for your HTML pages. A pattern that matches "contains characters but does not contain period after last slash" will find those URLs.

I would number the pages as well as give them a name.

That way, the URL would be www.example.com/42643-some-info-here and that makes the rewrite rules even easier.

It also makes it easier to redirect calls for www.example.com/index.php?id=42463&title=some-info-here over to the correct URL.

The script needs to issue a 404 error when a non-valid ID number is requested. That is vital.

It also needs to issue a 301 redirect to the correct URL if the title part of the URL is not correct for the requested page number ID.

Once that functionality is in place you can post short URLs like example.com/42463 to Twitter too.

jdMorgan

3:22 am on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




# Redirect to remove trailing slashes from all non-directory URLs without a "file extension"
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(([^/]+/)*[^/.]+)/$ http://www.example.com/$1 [R=301,L]
#
# Rewrite all requested URLs without trailing slashes or file extensions to the index.php script
RewriteRule ^(([^/]+/)*[^/.]+)$ index.php?url-path=$1 [L]

Jim

Readie

9:03 am on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thank you for the replies, but I believe I may have been mis-interpreted.

The main thing I wanted a generic re-write to send all/url/requests with no file extension to index.php?page=all&action1=url&action2=requests - however I believe it may be a lot more simple than I thought.

RewriteRule ^([^/]+)/([^/]+)/([^/.]+)$ /index.php?page=$1&action1=$2&action2=$3
RewriteRule ^([^/]+)/([^/.]+)$ /index.php?page=$1&action1=$2
RewriteRule ^([^/.]+)$ /index.php?page=$1

To my logic, the above should work, with me only needing to change the htaccess when I need to use a greater number of $_GET variables.

Cheers,

Mike Read

g1smd

3:23 pm on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I understood what you wanted, but also chose to give additional detail about other parts of how you might implement your site.

Jim's solution does almost the same as yours, works at any folder depth, but delivers the page data as "folder/folder/page" rather than in separate variables. It would be up to your script to slice and dice the data to suit.

jdMorgan

3:30 pm on Mar 18, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looks good, but always add an [L] flag to each rule --and all rules-- for efficiency. The cases where you should not use [L] are quite rare, and when you don't need [L], you will likely know why you don't need it.

For example, you don't need [L] when:
  • Using the [F], [G], or [P], flags. All are documented as being applied immediately, and so have an implied [L] "built-in"
  • Using the [C] (chain), [N] (Next), or [S] (Skip) flags. Since these all imply a continuation of mod_rewrite execution, [L] is obviously not wanted.
  • In very rare cases, when using multiple consecutive rules to rewrite a URL, you wouldn't use [L]. But mod_rewrite is so powerful that I have never found a need to use more than one rule for any given rewrite -- a very few poorly-coded examples in the Apache documentation notwithstanding.

    Jim
  • Readie

    4:31 pm on Mar 18, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    >> g1smd

    Ahh, I apologise for doubting your understanding then :)

    With regards to your intial post: I'm not 301 redirecting or 404ing - I'm reverting to (in the case of invalid page) the home page with an error message stating why they are there and (in the case of invalid action[x]) defaulting to the highest level action that is valid - with an error message telling them why they are there.

    With regards to the later post: I misinterpreted Jim's RewriteRule then - I think I'd find it easier for me just to deal with it as $_GET however :) I have the functionality I want now at any rate, and I'm a big fan of "If it aint broke don't fix it".

    ---

    >> Jdmorgan

    Yea, I was rush typing it :) I put the L flags in before I uploaded it.

    g1smd

    5:08 pm on Mar 18, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    I'm not 301 redirecting or 404ing - I'm reverting to...


    What is the very first HTTP status code that is returned for such a request?

    There's two or three ways to 'do this stuff right' and hundreds of other ways to do it and kill your site in the process.

    Readie

    5:32 pm on Mar 18, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    It's interpreting everything after example.com/ as $_GET - so the header info is:

    [0]=> string(35) "X-Powered-By: PHP/5.2.12-pl0-gentoo"


    If they enter a filetype when the file does not exist - then yes, it will 404.

    Readie

    5:19 pm on Mar 21, 2010 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Just a note: I've since changed this to 404 for all incorrect pages rather than redirecting to the home page - I have the navigation menu and login form on the 404 page and I thought it was a bit wierd to have some bad requests 404 and some not.