Forum Moderators: phranque

Message Too Old, No Replies

My first rewriterules! Help please

rewriterule

         

ozbodd

3:48 pm on Mar 5, 2009 (gmt 0)

10+ Year Member



This is my code

RewriteEngine On

RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)?$ http://www.example.co.uk/index.php?lmenu=$2&brand=$1 [L]

This works fine for links like
<a href="catimini/15"></a>
to
http://www.example.co.uk/index.php?lmenu=15&brand=catimini [L]

It fails though if the brand has a space in its path for example

<a href="rip curl/16"></a>
doesn't convert to
http://www.example.co.uk/index.php?lmenu=15&brand=rip curl[L]

I have read quite a bit but I must admit this is taking longer than usual for the penny to drop! Anyone point me in the right direction (as plain english as posssible please) to handle the spaces in the querystrings?

Also just a question on link paths, is it better to use absolute paths or relative paths in links to internal pages?

[edited by: jdMorgan at 5:00 pm (utc) on Mar. 5, 2009]
[edit reason] example.co.uk [/edit]

jdMorgan

4:59 pm on Mar 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your RewriteRule syntax invokes an external redirect, telling the client to ask again for the content that it wanted, using a new URL. Further, it defaults to a 302-Found redirect.

Neither of these is likely what you wanted.

You can save space in your pattern, and speed up the rule execution by 25% or so, simply by using the [NC] flag on the rule, to make the pattern-matching case-insensitive.

In order to match a space, you must include the space in your [alternate character group]. And in order to avoid the mod_rewrite parser throwing an error, you must escape that space by preceding it with a backslash.

Finally, in order to prevent future problems due to server upgrades and differences in the regex libraries provided by various servers' operating systems, it's a good idea to also escape literal hyphens in regex patterns to distinguish them from the character range operator (e.g. the hyphen in a-z):


RewriteEngine on
#
RewriteRule ^([a-z0-9\-\ ]+)/([a-z0-9\-\ ]+)?$ /index.php?lmenu=$2&brand=$1 [NC,L]

Having fixed these problems, you may also want to consider redirecting direct client requests (only) for the dynamic URL back to the new friendly static URL. This prevents duplicate-content problems (the same content available at two or more URLs), and speeds up search engines' replacement of the old URLs with the new.

There are three basic forms of linking: Page-relative, server-relative, and canonical (loosely-termed "absolute"). Which of these you use depends on several factors: Test environment, server-configuration canonicalization support, and personal preference.

If you are testing on a PC and don't have a test server running on that PC, then using relative links preserves your ability to test your site without a server.

If you do full subdomain, domain, FQDN, port number, URL-path, and URL-fragment (named anchor) canonicalization in the server configuration (e.g. in httpd.conf or .htaccess), then there is no search-related need to include longer server-relative or canonical links on your pages. If not, then you should consider using these longer forms to prevent duplicate-content from arising, for example, if a search engine indexes your whole site starting with "example.co.uk" instead of the canonical "www.example.co.uk". Lacking forced canonicalization in your server config, this would result in two "copies" of your site one at and one at, both with the exact same content. The pages would essentially compete with each other for ranking, you'd end up with links to the non-canonical domain, and over time, your canonical pages would lose ranking power to the non-canonical ones.

Major search engines have begun to do back-end processing and have recently added an HTML element to address this problem, but the fact remains that using these band-aid approaches introduces an external dependency of your site on outside parties to "get it right." When such problems can be corrected --and in fact prevented entirely-- by proper server configuration, there's little reason to rely on the kindness of strangers for your site's success...

So, you might want to look into the various thread here in the forum and in our Apache Forum Library to see the code snippets needed to do thorough job of forcing canonical URLs, so that any given page on your site can be reached with one and only one unique URL, and all other valid-but-non-canonical requests result in a 301-Moved Permanently redirect to the canonical URL.

One other factor is involved: That of content-scrapers who copy your whole site (usually to slap ads all over it and to steal your traffic). Using canonical links can help here, as they will have to edit all your pages or scripts to change those links to work on their domain. However, sometimes a compromise is appropriate: You can use relative links to everything except for the home page, and use a canonical link for that. This allows you test test most links on a PC without running a server, while still saving lots of bytes in your other link. The degree to which you use canonical links for this application also depends on how well you 'defend' your site against content-scrapers; If you have good battlements around your site and armor on your pages, canonical links as a defense against scrapers may not be needed at all.

All that said, it still comes down to a matter or personal preference.

Jim

jdMorgan

5:03 pm on Mar 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Forgot the examples...

<a href="images/logo.gif"> Page-relative link
<a href="/images/logo.gif"> Server-relative link
<a href="http://www.example.com/images/logo.gif"> Canonical link

Jim

ozbodd

8:36 pm on Mar 8, 2009 (gmt 0)

10+ Year Member



Thank you for the reply - I'll try and digest that, it looks very comprehensive thanks again.

ozbodd

8:55 pm on Mar 8, 2009 (gmt 0)

10+ Year Member



First oservations your code breaks the stylesheet path and any second level hyperlinks, though I suspect this is something to do with the path from root? (/index.php/catimini/) If I put back the http://www.example.com/index.php... path it works with the spaces and doesn't affect the next level.

To be honest I am not 100% sure why I have to do this and to what level, I believe this makes the site ranking better in Google by making better paths?

I will continue to explore this black magic ;o) but if there is a simple fix for those paths I would appreciate the heads up.

Cheers again
Steve

jdMorgan

9:24 pm on Mar 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Erm... That's *your* code, not mine. The modification I provided did exactly what you asked for by recognizing spaces in the requested URL-path. It's up to you to get it working.

If you are now having problems with CSS and other URL-paths containing spaces, then you either need to explicitly exclude those paths from being rewritten by the rule, link to those resources using server-relative or canonical URLs (as opposed to page-relative URLs), or further refine your requirements; The code does exactly what you write it to do, not necessarily what you want...

Jim

ozbodd

9:41 pm on Mar 10, 2009 (gmt 0)

10+ Year Member



Yes I understand.

What I didn't/don't get was why when I had the full path it only affect first level links, when I used your modification without the full path it then affected second and third level links, breaking the stylesheet. I do not understand how such a small change caused that.

I will continue to explore, I do not find this very intuitive at all.

Cheers
Steve

g1smd

9:53 pm on Mar 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That's the difference between page-relative and server-relative references.

Use the version beginning with a slash to avoid these problems.

ozbodd

11:06 pm on Mar 10, 2009 (gmt 0)

10+ Year Member



OK then I just need to figure out why the stylesheet broke on some links with the /index.php path. At the moment I have reverted back to the full path which works correctly on the first links but leaves the rest of the links functioning as querystrings.

I'll read on :o/

Thanks guys, I can see there are a lot of similar questions on this forum and you are working hard to explain things, which reinforces how cryptic this really is for newbies. Reading them is like reading a foriegn language dictionary, and I can code freely in XHTML, CSS, JavaScript, PHP and MySQL!

Is there a simple and example driven site that covers this topic effectively?

Steve

jdMorgan

12:00 am on Mar 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Any such "simple site" would violate Einstein's dictum: "Make everything as simple as possible, but no simpler."

The basic fact is that it's not simple. It starts off with regular expressions --which alone have had hundreds of books devoted to them -- then carries on through the various syntactical constructs for RewriteRule and RewriteCond, the server variables they can reference, the flags that can be used to modify condition/rule behaviour, and ends up with internal rewrites, external redirects, and proxy through-puts.

And this doesn't even address that fact that there are potential server performance and search-ranking side-effects to every rule; You simply can't hide from the fact that you are adjusting the server configuration, and that is never a "simple" matter. Neither is it particularly safe -- for those who are not detail-oriented.

There are some tutorial and example threads in our Apache Forum Library. There are links to reference material in our Forum Charter. See the links at the top of this page.

Jim