Forum Moderators: phranque

Message Too Old, No Replies

Question about "SE friendly" URLs

         

FalseDawn

6:05 am on May 1, 2005 (gmt 0)

10+ Year Member



I'm currently using a .htaccess file to rewrite my site's .php files to a more friendly format,

eg: www.mydomain.com/login.php is accessed by

www.mydomain.com/secure/login

etc

This works fine - however, I'm a bit concerned about the index.php file - I currently have this rule:

RewriteRule ^$ index.php

So when the site is accessed via www.mydomain.com, it automatically uses this file and doesn't display www.mydomain.com/index.php in the address bar.

My concern is what will happen if www.mydomain.com/index.php gets indexed by search engines, or is this unlikely?
Also, is this a valid thing to do? Do Search Engines look for index.php by default, and with the above rule will they not index it?

Is there any way to prevent access to my site using /blahblah.php? Ie throw a 404 if this is attempted?

Thanks for any advice.

jd01

7:18 am on May 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi FalseDawn,

Normally a search engine will index the default page, of / and index.what-ever-the-default-is if there are any links to the full version of the default. There are theories of PageRank splitting, and other oddities because of this. (Not my forte, but this is what I've read.)

What you can do to stop this:

A. Rather than rewriting silently to index.php (EG the browser does not change) you can do an external or visible rewrite by adding [R=301,L] to your rule like this:

RewriteRule ^$ index.php [R=301,L]

This way you site will always default to yoursite.com/index.php

B. You can 'catch' any direct, original request for index.php and rewrite it back to /. To do this and still serve the page correctly, you will have to use a {THE_REQUEST} condition before the rule like this:

RewriteRule ^$ index.php [L]

RewriteCond %{THE_REQUEST} ^/index\.php\ HTTP/ [NC]
RewriteRule $index\.php$ [yoursite.com...] [R=301,L]

If you use this method, remember the order must be intact, or you may create an ugly loop.

Either of these should get you close to eliminating duplicates. Go with whatever way you think is better for your situation.

Hope this helps and gives you some ideas.

Justin

If you decide to use either and have trouble getting them to work, keep posting and I'll look closer at them.

FalseDawn

5:17 pm on May 1, 2005 (gmt 0)

10+ Year Member



Many thanks for the reply - I see I have a lot to learn about mod rewrite.
I find regular expressions just about the hardest thing to get my head around sometimes. I've lost count of the number of times I've introduced loops and server errors in my htaccess files, but I am getting better!

I'm off to try your suggestions and will report back.

FalseDawn

6:53 pm on May 1, 2005 (gmt 0)

10+ Year Member



Well, something's not working.. :(

I don't want to use your first suggestion:
RewriteRule ^$ index.php [R=301,L]

Since my reason in using rewrites is to hide the php extensions.

Tried the second suggestion, and www/mysite.com/index.php is not being re-written.

In this line:
RewriteCond %{THE_REQUEST} ^/index\.php\ HTTP/ [NC]

Can you explain further what this does exactly - is it supposed to check for a request coming in from somewhere external to my site?

This line:
RewriteRule $index\.php$ [yoursite.com...] [R=301,L]

Is the first "$" supposed to be a "^"?
In either case, it's not working.

I tried removing the RewriteCond line, and the rewrite works, but the URL in the address bar remains the same - is this normal? (plus I'm now getting looping in certain cases)
I was hoping that www.mysite.com/index.php would actually be changed to www.mysite.com in the address bar, or is this not the case?

I also have these lines in the file:
RewriteCond %{HTTP_HOST}!^$
RewriteCond %{HTTP_HOST}!^www\.mysite\.com [NC]
RewriteRule ^(.*) [mysite.com...] [L,R=301]

to change www.mysite.com to [mysite.com,...] and this is reflected in the address bar - I have no idea why the other rewrites are not doing this.

Thanks

jd01

7:15 pm on May 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry my typing / reading skills aren't always the best.

RewriteCond %{THE_REQUEST} ^/index\.php\ HTTP/
RewriteRule . [yoursite.com...] [R=301,L]

Yes, you were correct in thinking the $ was wrong.

The single .(dot) will check all non-blank requests for a match.

%{THE_REQUEST} matches only original header requests, not re-written requests, so by keeping this 'set' at the end, you should only match direct requests. EG mouse click on a link and typed requests.

This should work better for you.

Justin

You will also have to make sure you use the www version of yoursite.com in the rule.

FalseDawn

7:29 pm on May 1, 2005 (gmt 0)

10+ Year Member



Still no joy. I know the rewrite is not working because if I change it to

RewriteCond %{THE_REQUEST} ^/index\.php\ HTTP/
RewriteRule . [yoursite.com...] [R=301,L]

I don't get a 404

Any more ideas as to what could be wrong?

FalseDawn

7:59 pm on May 1, 2005 (gmt 0)

10+ Year Member



I changed it to:
RewriteCond %{THE_REQUEST} /index\.php
RewriteRule . [mysite.com...] [R=301,L]

And it "appears" to be working now, or did I just break something?

FalseDawn

9:12 pm on May 1, 2005 (gmt 0)

10+ Year Member



The following also works, that I spotted in another thread:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /index\.php\ HTTP/ [NC]

jdMorgan

9:23 pm on May 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The string evaluated by RewriteCond %{THE_REQUEST} will be the exact request header sent by the browser, for example:

GET /mypage.php HTTP/1.1

Therefore, the original RewriteCond posted above wouldn't work because the pattern was start-anchored on the page name, and had no provision for the HTTP method of GET, POST, HEAD, etc. The pattern you found in the other thread covers all HTTP methods ranging in length from 3 to 9 characters (they are all-uppercase) which should cover them all. You won't need the [NC] flag unless your index.php is linked using both upper- and lower-case links, in which case you may have another problem, because Apache on *nix is case-sensitive, so it won't find that page if the case doesn't match exactly.

Jim

jd01

10:29 pm on May 1, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



?

I'm using the example in post #5 and it's working... what is making the difference in situation(s)?

I understand what is being said, but not sure why the example is working if not correct.

Justin

Edited for correct example.

FalseDawn

11:03 pm on May 1, 2005 (gmt 0)

10+ Year Member



Thanks for the clarification jdMorgan
It's all still a bit like black magic to me - I'm learning to not ask too many questions and take the easy approach of "if it works, leave it alone"!
:)

jd01

12:02 am on May 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah, never mind...

The ones I'm using aren't start anchored... I'm rewriting all strings containing the file... duh!

Thanks for cleaning up my mistake again Mr. Morgan.

Justin