Forum Moderators: phranque

Message Too Old, No Replies

Help redirecting to www, trailing slash & pretty urls

         

Komoti

4:56 am on Oct 15, 2009 (gmt 0)

10+ Year Member



Hey everyone, first post hopefully more to come ;D

I've got quite a mess with my htaccess going on. The structure I'm after for pages in a directory should look like [domain.com...] Root level pages should look like [domain.com...] and direct requests look like [domain.com...]

Theres three goals that I'm looking for.

#1: Redirect all non-www request to www like: [domain.com...] -> [(www).domain.com...]

#2: Redirect all pages to a trailing slash like: [domain.com(...] [domain.com...]

#3 Redirect ALL requests except (css¦jpg¦gif¦js) to index.php and rewrite urls to [domain.com...] for categories/root level pages and [domain.com...] for individual pages inside categories.

Basically I need to have WWW added, trailing slash added and redirect everything to index.php for bootstrapping.

It seems that everytime I add another rule I screw up an existing one. So far the following works for redirecting everything to index.php (even though I got no idea why ;D)

RewriteEngine On

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !.*\.(css¦jpg¦gif¦js)

RewriteRule ^([^/]*)/([^/]*)\/$ /?folder=$1&file=$2 [L]


This allows [domain.com...] to be accessed but without a trailing slash it turns up a 404 when I need it to 301 to the slash version.

One last note is this is for a custom CMS and I'd like to keep the structure generic instead of listing a specific domain. [%{HTTP_HOST}...] worked to some degree.

If anyone can help me sort this out I'd appreciate it.

g1smd

7:34 am on Oct 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Redirect ALL requests except (css¦jpg¦gif¦js) to index.php

Do you really mean 'redirect' all URL requests to the new URL /index.php or do you want to 'rewrite' incoming URL requests to the internal filepath '/index.php'?

There is a huge difference between those two things.

You also talk about 'rewriting to new URLs'.

I think your understanding of a rewrite is exactly backwards. A rewrite does not 'make' a new URL. A rewrite reacts to an incoming URL request and looks for a different file on the server compared to that suggested by the path part of the URL request. The URL requested by the browser will be the one that was clicked on in the browser window. Therefore, to make new URLs you need to edit the links on the page so that they show these new URLs.

[edited by: g1smd at 7:37 am (utc) on Oct. 15, 2009]

Komoti

7:36 am on Oct 15, 2009 (gmt 0)

10+ Year Member



Do you really mean 'redirect' all URL requests to the new URL /index.php or do you want to 'rewite' incoming URL requests to the internal filepath 'index.php'.

There is a huge difference between those two things.

You also talk about 'rewriting to new URLs'.

I think your understanding of a rewrite is exactly backwards. A rewrite does not 'make' a new URL. IA rewrite reacts to an incoming URL request and looks for a different file on the server compared to that suggested by the path part of the URL request. The URL requested by the browser will be the one that was clicked on in the browser window. Therefore, to make new URLs you need to edit the links on the page so that they show these new URLs.


Yeah I think I was getting confused there. I want all URLs(as visible in the browser) to actually load one file, index.php. All links on the site are setup to reference "static looking URLs" so I don't really need any part of the visible browser URL to change. Thanks fer the clarification ;D

--------------
Well after 5 hours I finally had a bit of success by finding another thread on here.


# Turn On Rewrite Engine #
RewriteEngine On

# Redirect To Add Trailing Slash If Missing #
RewriteCond %{REQUEST_URI} ^/[^\.]+[^/]$
RewriteRule ^(.*)$ http://%{HTTP_HOST}/$1/ [R=301,L]

# Redirect To Add WWW To Domain If Missing #
RewriteCond %{HTTP_HOST} !^www\.
RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

This works for the first two problems I had.

The third has to do with my lack of understanding how the behind the scenes part works.

I'm using PHP to explode the request_uri and pull out whats needed by the database to display the data. An example is a user types in or goes to http://www.example.com/folder/file-name/. PHP pulls out $folder and $file-name as variables to match against the database, if theres a match it displays data otherwise it sends a 404 header.

The current .htaccess I'm using for this is:


RewriteCond %{REQUEST_URI} !.*\.(css¦jpg¦gif¦js)
RewriteRule ^([^/]*)/([^/]*)\/$ /index.php?folder=$1&file=$2 [L]

The data isn't accessible by http://www.example.com/index.php?folder=category&file=file-name with this method since the URL doesn't match what PHP requires to pull the content from the database and sends the 404 header.

My question is this: Is there any problem with this method? Will a search engine be viewing 404's in any way when all the links look static? Like with this code would a search engine that saw a link (http://www.example.com/category/file-name/) actually be viewing (http://www.example.com/index.php?folder=category&file=file-name) on every URL with a 404 status do to the RewriteRule ^([^/]*)/([^/]*)\/$ /index.php?folder=$1&file=$2? Or am I just backwards? (It has been a long day)

I did a headers check and it outputs 200 only on the correct links so I'm assuming I'm not too far off.

--------
I just found out I can remove ?folder=$1&file=$2 from the rewrite rule and just use /index.php since I didn't need those variables.

jdMorgan

1:01 pm on Oct 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I believe that the root of your question goes to the same issue as that brought up by g1smd -- The difference between a redirect and a rewrite -- or equivalently, the difference between a URL and a filepath.

A external redirect is part of an HTTP transaction. When the client requests URL 'A', the server may send a redirect response that tells the client that the resource has moved to URL 'B'. This ends the current HTTP transaction initiated by the client, and nothing more need happen.

However, most times the client will issue a second HTTP request as a result of the server's redirect response, asking again for the resource it wanted, but this time using the new URL 'B'. Therefore, a redirect may be viewed as a URL-to-URL translation function of the server.

By contrast, an internal rewrite happens entirely within the content of a single HTTP transaction, and is simply a modified URL-to-filepath translation. If properly implemented (as a rewrite, not a redirect), the client is totally unaware of this modified URL-to-filepath translation. The client is no more aware of mod_rewrite's internal rewrites than it is of any of the other behind-the-scenes URL-to-filepath 'rewriting' that a server does as part of its fundamental function of accepting URLs and serving files; The server must always 'rewrite' the URL to a filepath whether mod_rewrite is used or not... Otherwise, your URLs published as links on your pages would have to be full server filepaths, which would make the Web a big mess, and would require you to update your site every time your host added a hard drive or re-arranged the servers' directory structures to allow easier maintenance, change user filespace partitioning, etc.

---

In the absence of any 'cloaking' on your part, search engines see exactly the same thing on your site that a browser sees. Use a server headers checker to test your site to be sure that HTTP responses are correct and appropriate.

---

Because most Webmasters don't ever code a redirect or rewrite, there's a lot of confusion around this subject. What is essential to remember, is that putting a link on a Web page <a href="xyz"> creates and defines a URL. Once a link is published on the Web, that URL 'exists' whether it resolves to an existing resource on an existing domain or not. In fact you could argue that all URLs always exist, since you can type in any URL in your browser and get some kind of response from "somewhere."

By contrast, files are created by putting them into the filespace on a server. But files aren't URLs and URLs aren't files. Think of your currect set-up; Most of your URLs do not correspond directly to files or pages. Instead a URL-request for an "HTML page" is intercepted and passed to a PHP script that creates the HTML page on the fly and sends it back to the client. So in fact, there is never a 'file' on your server that contains the 'page' that was requested by using the URL -- there is no file that corresponds directly to those requested URLs. So URL-to-filepath mapping is a fundamental part of a server's function.

---

For your current purposes, then, you need to understand the function and syntax of two of mod_rewrite's main functions, namely external redirects and internal rewrites.


# Externally redirect client requests for URL "example.com/abc" to URL "example.com/xyz"
RewriteRule ^abc$ http://example.com/xyz [R=301,L]
#
# Internally rewrite client requests for URL "example.com/xyz" to filepath "/pqr"
RewriteRule ^xyz$ /pqr [L]

[input error; Post too long. Terminated by poster]

Jim