Forum Moderators: phranque

Message Too Old, No Replies

Need help to finish setting up clean URLs

         

jorg

6:47 am on May 20, 2009 (gmt 0)

10+ Year Member



My site currently will redirect 'www.example.com/page.php' to 'example.com/page' which is exactly what I want. I have one problem with that though, 'example.com/page' can also be loaded from 'example.com/page/', this is not good as it will cause duplicate content. There is no folder called '/page/' only a page called 'page.php'.

The last thing I would like to do is have 'example.com/page?search=answer&submit=submit' to appear as 'example.com/page/answer'. I tried:

RewriteRule ^page/(.+)?$ page?search=$1&submit=submit [L]

I also tried a few other things which didn't work.

I had my own configuration which wasn't working properly. I ended up finding an example that was posted from jdMorgan, that fixed the problems I was having.

So here is what's in my .htaccess:


RewriteEngine on
#
# externally redirect client /index page requests to "/"
RewriteCond %{the_REQUEST} ^[A-Z]+\ /([^/]+/)*index?
RewriteRule ^(([^/]+/)*)index\.php?$ http://example.com/$1 [R=301,L]
#
# externally redirect client requests contains php extension to extensionless URL
RewriteCond %{the_REQUEST} ^[A-Z]{3,9}\ /([^/]+/)*[^.]+\.php?
RewriteRule ^(([^/]+/)*[^.]+)\.php?$ http://example.com/$1 [R=301,L]
#
# externally redirect non-blank canonical hostname requests to non-canonical hostname
RewriteCond %{HTTP_HOST} !^(example\.com)?$
RewriteRule (.*) http://example.com/$1 [R=301,L]
#
# if requested extensionless URL-path does not resolve to an existing directory
RewriteCond %{REQUEST_fileNAME} !-d
# and if requested extensionless URL-path plus ".php" does resolve to an existing file
RewriteCond %{REQUEST_fileNAME}.php -f
# then append ".php" to resolve the actual filename
RewriteRule ^(([^/]+/)*[^./]+)$ /$1.php [L]

Thank you to anyone that can lend me a hand. :)

[edited by: jdMorgan at 12:38 pm (utc) on May 20, 2009]
[edit reason] example.com, formatting [/edit]

jdMorgan

12:57 pm on May 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Requests for the /page/ URLs should not be rewritten to your script with the code posted above; The rewrite pattern in the "extensionless URL rewrite rule" specifically rejects URL-paths with slashes or periods in the final path-part. So I'm wondering if amybe you've got mod_negotiation enabled here. Try adding
 Options -MultiViews 

at the top of the code (just before RewriteEngine would be a good place.)

mod_speling or AcceptPathInfo (on Apache 2.x) could also cause the same kind of problem.

Jim

jorg

5:47 pm on May 20, 2009 (gmt 0)

10+ Year Member



Thank you, that was a quick fix. I added 'Options -MultiViews' and now when you go to /page/ it receives a 404, I suppose it would be better if /page/ would redirect to /page.

jdMorgan

6:54 pm on May 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, but only if that "/page/" path doesn't exist as a real subdirectory...

RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://example.com/$1 [R=301,L]

The problem is that testing for 'directory exists' is inefficient, because a call must be made to the OS file manager --possibly invoking a read of the physical disk-- for each and every HTTP request, unless steps are taken to prevent that.

So, I'd suggest 'listing' any directories that you know exist and using a 'skip rule' construct, so as to avoid doing that filesystem check whenever possible:


# Skip next rule if known or existing subdirectory, or if no trailing slash
RewriteCond %{REQUEST_URI} ^/(w3c/¦forum/¦stats/¦cgi-bin/¦.*[^/])$ [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^ - [S=1]
# Externally redirect to remove trailing slash if not an existing subdirectory
RewriteRule ^(.+)/$ http://example.com/$1 [R=301,L]

Replace broken pipe "¦" characters with solid pipes...

Jim

jorg

9:15 pm on May 20, 2009 (gmt 0)

10+ Year Member



Ah, that worked perfect. Could you explain what I'm doing wrong with:

RewriteRule ^page/(.+)$ page?search=$1&submit=submit [L]

As I said in my original post, I want 'example.com/page?search=answer&submit=submit' to appear as 'example.com/page/answer'.

As it is, it doesn't seem to have any affect.

Also what exactly does 'RewriteRule ^ - [S=1]' do? Could you direct me to a proper guide explaining the parameters for RewriteCond and RewriteRule?

Thank you jdMorgan.

jdMorgan

1:07 am on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are several 'proper guides' cited in our Apache Forum Charter [webmasterworld.com].

There's nothing apparently wrong with your code. So...
How did you test it? (What did you type in?)
What results did you expect?
What results did you get?
How did the actual results differ from those that you expected?

Jim

jorg

3:28 am on May 21, 2009 (gmt 0)

10+ Year Member



The result I expected was to have
example.com/page?search=answer&submit=submit
appear as
example.com/page/answer

I had a typo in my test, it does work, I can now access the page from both URLs. So the problem left is that when the user submits the form on example.com/page, it goes to example.com/page?search=answer&submit=submit and not example.com/page/answer.

So what I would like is to have example.com/page?search=answer&submit=submit redirect to example.com/page/answer. But if I do that then the user is going to see example.com/page?search=answer&submit=submit in the address bar first, then it'll redirect? I'd like example.com/page?search=answer&submit=submit never to be shown to the user at all.

jorg

6:24 am on May 21, 2009 (gmt 0)

10+ Year Member



I figured out how to do it through php. It was really simple. Check the header, if match is found, rewrite header.

preg_match("/page\?answer=(.+)&submit=submit/", $_SERVER['REQUEST_URI'], $match);

if($match[1])
{
header('Location: http://example.com/page/'.$match[1]);
}

Thanks again for all your help jdMorgan. :)

g1smd

10:15 am on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That gives a 302 redirect (it's not a rewrite), so you're creating a problem there.

To be clear on "rewrites" - a rewrite does not 'make' a URL. URLs are defined by the links on webpages. After a link is clicked, the browser sends the URL information to the server, and the server uses it to fetch the file. A rewrite changes the server path to get the data from a different place.

jorg

10:32 am on May 21, 2009 (gmt 0)

10+ Year Member



lol, just looked at how I coded that, pretty tired, I'll ignore it (no edit button).

@g1smd: The type of redirect didn't even come to mind. I'll try to figure out how to do it with a rewrite tomorrow, bedtime for me.

jorg

7:12 pm on May 21, 2009 (gmt 0)

10+ Year Member



I also noticed another problem, I have an http headers check on my server. If the user inputs: domain.tld/page

It goes to example.com/headers/domain.tld/page

Then it gets a 404. :x

g1smd

7:22 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



*** "It goes to" ***

Use Live HTTP Headers for Firefox to see "how" it "goes" there.

It is likely a redirect, probably of the 302 type.

Note that "goes to" is not at all a clear description of what happens, when we have explicit words like "redirect" and "rewrite" available to concisely describe the situation.

jorg

9:27 pm on May 21, 2009 (gmt 0)

10+ Year Member



I do use Live HTTP Headers. The type of redirection didn't cross my mind last night as I was so happy that it actually 'worked', and I was too tired to think of much else. Yes, when you submit the form on my site it is a 302 redirect. Of course if you go directly to the page example.com/page/answer it's not a redirect. I'm not sure if this will actually have any negative effect on SEO.

But then again, if this can be done properly through mod_rewrite, I'd much rather it that way. Unfortunately I still don't know how.

g1smd

10:17 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, Mod_Rewrite can be used to to generate a 301 redirect or a 302 redirect or an internal rewrite depending on the exact code used. That's why it is crucially important to be absolutely sure which one of those things you want each rule to do.

In general there will be several redirects to force canonicalisation, and to redirect direct client requests for the dynamic URL, and there will be a rewrite to accept a URL request in the "static" format and rewrite it to fetch the content from an internal dynamic filepath.

ogletree

10:26 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Changing URL's for SEO is always a bad idea. Most of the time it can actually hurt you if you already are getting some traffic from Google. If it is part of the Algo and I don't think it is it plays such a tiny role that it is statistically irrelevant.

SEO URL's is something that is an old concept. Most of the time people are hurting themselves more than helping. If your making a new website and it is not much work to make "SEO URL's" then do it. Do not spend lots of time changing URL's.

If you ever talk to an SEO company and they want to change your URL's run away.

jdMorgan

10:28 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you want your URL "A" to appear as URL "B" in the browser's address bar, then the only solution is to re-code your script to produce links to URL "B" on your Web pages. Mod_rewrite cannot be used to "fix" or "change" a URL in any way that would be "good" for SEO or for your users.

So instead of modifying your script to force a redirect, modify it to output the "pretty" URLs by re-formatting the path information taken from your database into a pretty form when it is 'building' a URL to put into a link on the page. You can use preg_replace if needed to do this.

If the URLs for links are taken direct from your database, that makes it even easier: Just change your URLs in your database.

Jim

jorg

12:25 am on May 22, 2009 (gmt 0)

10+ Year Member



Thanks for the tips guys. Well unfortunately I don't have the time to figure this stuff out right away (hopefully this weekend I'll find time). I just set the forms back to using post method (for now).