homepage Welcome to WebmasterWorld Guest from 54.226.93.128
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 39 message thread spans 2 pages: < < 39 ( 1 [2]     
SEO friendly URLs
JamieEff




msg:4464352
 1:05 pm on Jun 12, 2012 (gmt 0)

Hi there

Newbie to the forum here (and to mod_rewrite as well!)

Basically I am trying to get blog urls re-written into something a little more acceptable to the search engines.

The url that is generated by the system is something like this:

http://www.example.com/blog-article.php?zzBlog=4 (where the 4 is the uniue url for a specific article)

I am looking to re-write it to look something like this:

http://www.example.com/blog/4.html

I have written the below to re-write it to :

RewriteEngine On
RewriteRule ^blog/([^-]*)\.html$ /blog-article.php?zzBlog=$1 [L]


but it doesnt seem to be working.

The .htaccess file is uploaded and in the root of the server

Any suggestions graetfully received.

Many thanks

Jamie

 

jasimon9




msg:4539714
 3:06 am on Jan 28, 2013 (gmt 0)

Thanks again for the pointers. Yes, I see that the %1 on the second RewriteRule is not needed.

I know you have several times said to include the protocol and hostname on the redirect. However at [httpd.apache.org...] they say the following:

Use of the [R] flag causes a HTTP redirect to be issued to the browser. If a fully-qualified URL is specified (that is, including [servername...] then a redirect will be issued to that location. Otherwise, the current protocol, servername, and port number will be used to generate the URL sent with the redirect.


So I felt that following that would be cleaner. Perhaps you can explain why it is a better practice to include them.

The code I developed is only a prototype, and that I either need to be very careful about the ordering of the parameters, or better yet, use more robust regex as you have suggested.

lucy24




msg:4539726
 4:19 am on Jan 28, 2013 (gmt 0)

Put a blank line after each Rule.

Oops, yes.

After.
After.
After.

Otherwise, the current protocol, servername, and port number will be used to generate the URL sent with the redirect.

.. and that's exactly why you need to spell it out. Otherwise you get:

user asks for
www.example.com/oldname
is redirected to
www.example.com/newname

user asks for
example.com/oldname
is redirected to
example.com/newname

user asks for
www.example.com:1234/oldname
is redirected to
www.example.com:1234/newname

... and then you have to redirect them all over again to get your hostname canonicalized in your final RewriteRule. The human user won't notice-- unless they are on a very very slow connection-- but search engines get huffy if there is more than one redirect. And, of course, your server has to do extra work because it ends up processing three requests instead of two.

jasimon9




msg:4540066
 12:42 am on Jan 29, 2013 (gmt 0)

I previously replied to the most recent posts from lucy24 and g1smd, but I don't see that my post got properly submitted to this page. In short, I commented that I agreed that the %1 in the second RewriteRule was wrong.

I also posted a quote from the apache documentation saying that the protocol and host defaulted to the current, and was wondering if in spite of this it is a better practice to include them, or something else. From page [httpd.apache.org...]

Use of the [R] flag causes a HTTP redirect to be issued to the browser. If a fully-qualified URL is specified (that is, including [servername...] then a redirect will be issued to that location. Otherwise, the current protocol, servername, and port number will be used to generate the URL sent with the redirect.

g1smd: Thanks for the additional insight into the regex fine points. Fortunately, all of the query string parameters occur in a fixed order, and in addition they work as flags. That is, the name=value as a whole is the flag; value never varies. So my simpler regex could work. Nevertheless, I believe it is wise to make the regex more robust, so as not to restrict the assumptions to the current rules of the implementation.

jasimon9




msg:4540067
 12:44 am on Jan 29, 2013 (gmt 0)

Now my previous post is showing, and I missed some of the intermediate posts! You have already answered the question that I reposted. Sorry.

jasimon9




msg:4540084
 3:32 am on Jan 29, 2013 (gmt 0)

One additional question: I had found previously that part of the problem had to do with SSL. I since modified httpd-ssl.conf to contain all the same rewrite rules as in httpd-vhosts.conf. Is this the right way to do that?

I first thought, why not? Then I though, "no, only the SSL related redirects should be in httpd-ssl.conf". Then I thought, but once the browser goes into SSL, it stays in SSL and the rules in httpd-ssl.conf would be needed. So I put them all back, and essentially now they duplicate the rules in httpd-vhosts.conf for the related host.

Is this right?

jasimon9




msg:4540448
 4:59 am on Jan 30, 2013 (gmt 0)

Here are the prototype rules I have come up with. They seem to be working as I want, with one exception, as described below. But I am thinking maybe I can live with that. The rules assume a page called redirect.php. This page simply displays the query string parameters that it is passed, for testing.

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /redirect\.php\?one=(\w+)(?:&two=(\w+)(?:&three=(\w+)(?:&four=(\w+)(?:&five=(\w+))?)?)?)?\ HTTP/
RewriteRule ^redirect\.php$ /redirected/%1/%2/%3/%4/%5? [R=301,L]

RewriteRule ^redirected/(\w+)(?:/(\w+)(?:/(\w+)(?:/(\w+)(?:/(\w+))?)?)?)?/?$ /redirect.php?one=$1&two=$2&three=$3&four=$4&five=$5 [L]

Here is what the rules do:

1. The pretty URL looks like this: redirected/a/b/c/d/e. The first parameter "a" is mandatory, but any of the trailing parameters are optional. Parameters cannot be skipped however. Another way of saying this is that any part of the "path" beyond the first level can be dropped.

2. If the "old" ugly url is used, likewise, any of the trailing parameters can be omitted in the same way.

3. Although the page works properly when parameters are omitted from the ugly url, there are "directories' named with the null-string. For example, when the url is redirect.php?mode=a&usertype=b , then the rewritten url that appears in the address bar is /redirected/a/b///. This is because the first rewrite rule has 5 hard-coded slashes.

While this is unusual appearing, I am thinking this is not a problem for the following reasons:

1. I have done some testing at well-known websites, and it appears that adding null-string named directories, ie, a couple of slashes, at the end of a url seems to not bother anything.

2. This would only occur with ugly urls coming from outside my website, which redirect to the actual page with a valid query string.

3. The problem does not occur at all with the pretty url that omits a portion of the "path".

I am wondering if this issue is merely a cosmetic alteration to the address bar, or there is something else that a server, browser, or search engine might object to.

g1smd




msg:4540494
 7:58 am on Jan 30, 2013 (gmt 0)

I am wondering if this issue is merely a cosmetic alteration to the address bar

No. The browser address bar shows the URL being requested, and it's malformed.

There's several ways to tackle this.
The most obvious is to have 5 rulesets: one for a, another for a and b, another for a and b and c, ...
The other way is to not redirect in htaccess at all, but instead rewrite (that's rewrite, not redirect) the requests to a PHP script that works out what the new URL will be and then uses the PHP HEADER directive to redirect the browser to the new URL. When you do this, the rule needs to be high up the list of rules in the htaccess file and you need to add the PHP filename as an exclusion to the non-www/www redirect otherwise non-www requests with parameters get re-exposed with parameters at www instead of redirecting to the friendly URL.

It doesn't matter which method you use: htaccess redirect or PHP HEADER redirect after internal rewrite. As long as asking for URL with parameters returns either 404 because it's invalid or 301 to the new URL, the process is sound.

lucy24




msg:4540574
 1:51 pm on Jan 30, 2013 (gmt 0)

(?:/(\w+)(?:/(\w+)(?:/(\w+)(?:/(\w+))?)?)?)?/?

Uhm, you don't actually have to do that. Just
(?:/(\w+))?
five times in a row. And the same thing, mutatis mutandis, for the "pretty" version. mod_rewrite will pick them up one at a time, left to right, and stop when it runs out. Same number of parentheses, same number of question marks. But leaving out the multi-nesting means less potential confusion for you and less strain on the server. (For a given definition of "strain", of course.)

In any case I gotta say that's a masterful display of how to use capturing and non-capturing groups :)

jasimon9




msg:4540806
 3:11 am on Jan 31, 2013 (gmt 0)

Thanks again to g1smd and lucy!

My first attempts at this approach did not use the nested groups, but I switched to the the more complex approach with the hope that somehow I could eliminate the slashes signifying the "empty directories". But I could not find a way to do that, and neglected to revert to the non-nested approach. Which reversion I have now done.

Bottom line is that each slash has to come from somewhere not in the back-reference; thus the nested approach gives no benefit over the simpler approach.

The present internal rewrite handles the variable number of parameters just fine. Thus all links from within the website can be handled. It is only the redirect that at present has to insert the slashes where the problem comes in. After some thought, I am inclined to set up multiple rules to handle the different variations that actually occur as suggested by g1smd.

Another approach might be to put in a rule to fill out the missing parameters when they don't occur. However, this would probably end up being excessively complicated as having separate rules for the cases.

This 39 message thread spans 2 pages: < < 39 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved