Forum Moderators: phranque
I have the following
http://example.com/news/index.php?title=friendly-page-title
and I want to access it via:
http://example.com/news/friendly-page-title
or
http://example.com/news/friendly-page-title/
I have tried putting /news/.htaccess with the following:
Options +FollowSymLinks
RewriteEngine on
RewriteRule /(.*)/$ index.php?title=$1
Can someone put me out of my misery? I've wasted hours on this...
Many thanks for any guidance.
[edited by: jdMorgan at 4:05 pm (utc) on Mar. 3, 2009]
[edit reason] example.com [/edit]
Check recent threads for a whole chunk of information and code.
For starters, remove the leading / from the pattern, and add [L] to the end.
However, there's a whole lot more you will need to do to finish the whole job.
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ index.php?title=$1 [L]
/news/friendly-title
now works, and gets passed as
/news/index.php?title=friendly-title
but two things I can't work out:
1.) if there is a trailing slash, then it gets passed through to the title argument. Is there a way to remove it without php having to look for it?
2.) if I remove the RewriteCond statements then it all fails to work? This I don't understand, as all other examples I have seen don't seem to rely on a RewriteCond preceding the RewriteRule.
.
For the trailing slash problem set up some redirects before the rewrite.
1. Redirect any request with trailing slash for both www and non-www to remove the trailing slash and force the www to be added if it is missing. You now have a canonical URL for your content.
2. Redirect non-www to www for all other non-www requests.
3. Place the rewrite after these redirects.
I now have
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^./]+)$ /news/$1/ [L]
RewriteRule ^(.*)/$ /news/index.php?url=$1 [L]
And it appears to work fine... Although I have read up on regex, I still don't fully understand ^([^./]+)$
So, what actually happens is, if a trailing slash is missing then the first RewriteRule fires, ceases any further processing and passes the request (now with the trailing slash) back to apache?
If someone requests the wrong URL, you want to hard redirect them to the correct URL with a 301 redirect. That makes the user agent 'see' a new URL for the content.
Once they come back requesting the correct URL, and only then, fire the rewrite to get that content.
RewriteRule ^(.*)$ index.php?title=$1 [L] Ask for a URL like this
example.com/12345 and see what happens. The rewrite will rewrite this to
index.php?title=12345 - so far so good. The
index.php still matches (.*) so gets rewritten to index.php?title=index.php&title=12345 The
index.php still matches (.*) so gets rewritten to index.php?title=index.php&title=index.php&title=12345 The
index.php still matches (.*) so gets rewritten to index.php?title=index.php&title=index.php&title=index.php&title=12345 ... and so on, forever.
That code would also feed requests for
robots.txt through to index.php?title=robots.txt and I am quite sure your script would not deliver something that looked like a robots.txt file. You need the
(.*) to be more selective in what can actually trigger the rewrite.
I thought the [L] would have stopped that but obviously not. I need to understand how the headers bounce back and forth from the browser to the server.
Once a rewrite occurs, and the [L] is encountered, then the modified URL goes where? direct to apache? And then it all happens again - i.e. the rewrite rules are all parsed again?
Sorry to sound thick... I see this is going to be one of linux's steeper learning curves...
I have thought about what you say, and now have this:
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^./]+)$ /news/$1/ [R=301,L] <--- make them go away and get the right URL?
RewriteRule ^(.*)/$ /news/index.php?url=$1 [L] <--- only the trailing slash version gets served up!
Does this pass the sanity test? It seems to work fine.
Steve
A RewriteCond or group of conditions apply only the single Rule which follows, so that code is not correct. Don't insert a redirect in the middle like that.
You need to place redirects before the pre-exisiting rewrite code that you already had. As I stated above you also need to add the domain name to the redirect to force www at the same time. As you have it now, both www and non-www will directly serve content. You need the redirect to force only one of the two to be the place to directly get the content from.
For extensionless URLs you should be redirecting to "without slash". A URL with a slash is supposed to be a folder.
You'll also need the other redirect that I mentioned. Pay attention to the order they need to appear in. See the 1-2-3 list above.
I now have the following in the /news/.htaccess
Options +FollowSymlinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^www\.example\.co\.uk [NC]
RewriteRule ^(.*)$ http://www.example.co.uk/news/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^./]+)/$ /news/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /news/index.php?url=$1 [L]
It appears to be doing what I want... I actually have the canonical name redirection in the virtual root above the news directory. I tried using inherit, but the redirection always ended up pointing at the root, and putting in the physical path... I gave up and just put it in the /news/.htaccess as well.
Do i pass the noobie exam yet?
[edited by: jdMorgan at 4:20 pm (utc) on Mar. 3, 2009]
[edit reason] example.co.uk [/edit]
Options All -Indexes
DirectoryIndex index.php index.htm
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.example\.co\.uk [NC]
RewriteRule ^(.*)$ http://www.example.co.uk/$1 [R=301]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^news/([^./]+)/$ /news/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^news/(.*)$ /news/index.php?url=$1 [L] Hopefully this is the (well, one at least) correct way of doing things - it appears to work fine.
Steve
[edited by: jdMorgan at 4:20 pm (utc) on Mar. 3, 2009]
[edit reason] example.co.uk [/edit]
A good rule of thumb is to put your rules in order with external redirects first, ordered from most-specific pattern (fewest URLs affected) to least-specific pattern, followed by internal rewrites, again from most-specific to least-specific.
Options All -Indexes
DirectoryIndex index.php index.htm
RewriteEngine on
#
# Externally redirect to remove trailing slashes from virtual /news/ URLs
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^news/([^./]+)/$ http://www.example.co.uk/news/$1 [R=301,L]
#
# Externally redirect to canonical hostname
RewriteCond %{HTTP_HOST} !^www\.example\.co\.uk$
RewriteRule (.*) http://www.example.co.uk/$1 [R=301,L]
#
# Internally rewrite /news/ URLs to index.php script with query string
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^news/([^./]+)?$ /news/index.php?url=$1 [L]
Note minor tweaks involving anchoring, hostname casing, the [L] flag on the domain redirect, and the more-specific internal rewrite pattern, which reflects the pattern used in the external redirect pattern.
It is good to make RewriteRule patterns as specific as possible, especially when using slow, inefficient file- and directory-exists checking (which involve calling the filesystem, and may even require disk accesses). When possible, it is better to use characteristics of the requested URL itself to avoid the necessity of file- and directory-exists checking.
Jim
The regex ^news/([^./]+)?$ - let me have a go at explaining this to se if I have it right...
^news/ # match anything starting with news/
( )? # everything in the parenthesis either not at all, or only once. Also, whatever is in the parenthesis is the pass back argument $1, used later in the replacement statement
[^./]+ # a character class that says match any character that isn't a dot or a slash. The following + sign means match it one or more times. It took me a while to realise the dot in the character class is a literal, and not a metacharacter!
Do I now get a gold (noob) star?
Steve (aged 41, and rapidly wondering if he's getting too old for this crap!)
once upon a time g1kad on 144Mhz
^news/ # match anything starting with news/( )? # everything in the parenthesis either not at all, or only once. Also, whatever is in the parenthesis is the pass back argument $1, used later in the replacement statement
Correct. Note that "pass back" is more-formally termed "back-reference."
[^./]+ # a character class that says match any character that isn't a dot or a slash. The following + sign means match it one or more times.
Correct: "Match one or more characters not equal to a period or a slash."
It took me a while to realise the dot in the character class is a literal, and not a metacharacter!
Escaping rules vary within and outside [alternate-character groups], with fewer characters needing to be escaped within groups than without.
Jim