Forum Moderators: phranque

Message Too Old, No Replies

(Yet another) url rewriting question

Problems with relative images

         

St_Michiel

12:26 pm on Nov 8, 2008 (gmt 0)

10+ Year Member



Searched a lot but I'm still having troubles with rewriting. I need to redirect a user which uses a trailing slash to a url without one.

I have this .htaccess


Options +FollowSymLinks
RewriteEngine on
RewriteBase /
RewriteRule ^/(.*).html?/(.*)$ $1.html [R,L]
RewriteRule ^/(.*).html?/$ $1.html [R,L]
RewriteRule ^(.*).html?(.*)$ index.php?page=$1&$2

It will transform "home.html" into "index.php?page=home"
But it still allows for index.php to be loaded in case of http://example.com/home.html/ (with a trailing slash). The php file will be loaded but images that are relative fail. I've searched but nobody seems to have this problem. Basicly.. how do I notify the user(agent) to strip that trailing slash?

[edited by: jdMorgan at 6:40 pm (utc) on Nov. 9, 2008]
[edit reason] example.com [/edit]

jdMorgan

5:26 pm on Nov 8, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The only rule shown that will work in .htaccess is the last: URL-paths "seen" by RewriteRule in .htaccess do not start with a trailing slash.

*Everbody* has problems with page-relative URLs after modifying the URL used by the client. This is because it is the client which resolves relative URLs, and if the URL used by the client differs in its directory-path from the actual URL-path at the server, then these page-relative links are incorrectly resolved -- Look at you server error log to confirm.

The solution is to use server-relative links ( <a href[b]="/images/logo.gif"> ) or canonical links ( <a href="http://example.com/images/logo.gif"> ) on your pages.

Furthermore, RewriteRule cannot "see" query strings appended to URLs. Query strings are not part of a URL, but rather, data appended to a URL to be passed to the resource at that URL. Remember, a URL is a "locator" and the location in this case is a script, not the values passed to that script.

The solution is to use a RewriteCond to examine %{QUERY_STRING} and create one or more back-references to it (%1 -%9). However, if your only intent is to copy the existing query string, that is not necessary. Just use the [QSA] flag of RewriteRule.

With this information and the effects it may have on your plans in mind, how do you wish to procede?

Jim

St_Michiel

7:45 pm on Nov 8, 2008 (gmt 0)

10+ Year Member



Thank you JD Morgan. I know the last rule works. the first two rules were failed attempts to redirect. I'm really bad with the expressions and I was glad that the first redirect worked. Until I noticed failing URLs getting passed a webpage.

What I want to do is that the client's user agent that visits "http://my.domain.com/home.html/" gets either a 404 or a redirect to "http://my.domain.com/home.html".

There is a need for relative links to work because of an obscure CMS structure. So it's essential for me to notify the user agent that visits the page it's either not there 404 or a redirect 301.

g1smd

2:40 am on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, but relative links will be relative to the *new* URL if the browser has been redirected to a new URL.

St_Michiel

3:56 am on Nov 9, 2008 (gmt 0)

10+ Year Member



Please read my post. I want to strip that trailing slash. I know it's relative to the new url.

I have a index.php that serves all the content. without the URL rewrite:
"index.php?page=main" would serve a main page
"index.php/" would give a 404 and no content would be served
With the rewrite rule
"main.html" serves the main page (which is index.php?page=main)
"main.html/" shows index.php but with all links broken.
I want to fix the last trailing slash.. strip it.. no directories.

Please.

jdMorgan

2:36 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Options +FollowSymLinks
RewriteEngine on
#
# Externally redirect to remove trailing slash from .html URLs
RewriteRule ^(.+)\.html/$ http://www.example.com/$1.html [R=301,L]
#
# Internally rewrite .html URLs to index.php script with page name as query parameter
RewriteRule ^(.+)\.html$ index.php?page=$1 [L]

This code works as described by the comments. I don't know what your "$2" parameter was expected to accomplish, as it has not been described or demonstrated by example URLs. I removed "RewriteBase /" as it is the default, and therefore redundant.

Jim

[edit] Added [L] flag to second rule [/edit]

[edited by: jdMorgan at 8:04 pm (utc) on Nov. 9, 2008]

St_Michiel

3:12 pm on Nov 9, 2008 (gmt 0)

10+ Year Member



Ah excellent! Thank you so much! That's what I was looking for.
The html? was for having .htm and .html extentions to work. The $2 was for giving some extra $_GET thrash like "home.html?theme=lorem" I'll get it to work.

g1smd

3:14 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You should not have things set up such that more than one URL can be rewritten to one internal resource.

If you do, then you will find that:

example.com/filename.htm
example.com/filename.html
www.example.com/filename.htm
www.example.com/filename.html

all get indexed as Duplicate Content.

If you need the user to be able to type one of several URLs in and still be able to reach the content, then three of the four URL options should issue a 301 redirect to the one canonical form that you do want to be indexed.

In this case there would be two redirects involved.

1. Redirect any non-www or www URL ending with .htm to the www form with the .html ending.

2. Redirect any remaining non-www URLs to the www form (this picks up non-www requests for .html endings).

For any URL request, only one of the redirects applies, thereby avoiding a chain.

Once the user has the correct URL in the browser address bar, and only then, rewrite that URL request to get the actual content.

g1smd

3:24 pm on Nov 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Jim's redirect forces www and .html for all URL requests (www and non-www) with .html and / included.

You can easily modify that redirect so that it also caters for .htm as well.

However, you would also need at least one more rule to cater for the other conditions.

All of the redirects must be placed before any of the rewrites.

Also, be aware that [R] makes a 302 redirect. You will need [R=301,L] in your rules.

St_Michiel

6:23 pm on Nov 9, 2008 (gmt 0)

10+ Year Member



g1smd
Thank you for your imput. I was struggling with the redirect and you helped me alot.
I used [R] for testing, but messed up the regex. I am aware of Google's penalty, but the htm redirs should be fixed pretty soon.
Thanks!