Forum Moderators: phranque

Message Too Old, No Replies

Bi-directional mod-rewrite

yet another mod-rewrite newb question

         

CNibbana

12:25 am on Apr 25, 2004 (gmt 0)

10+ Year Member



I just successfully converted my site to PHP with dynamic includes. For whatever reason (SEO?), I would like my pages to appear static. And since we're at it, would it be harder to make them look like .html instead of .php?

Example URLs of my site would be:
[domain.com...]
[domain.com...]

I would like them to appear:
[domain.com...] (or .html)
[domain.com...]

But here's where my bi-directional question comes in: Is it possible to use mod-rewrite to have my links within my document appear as:
<a href="home.php">home</a>
instead of current:
<a href="index.php?page=home">home</a>

Or, because of the $_GET'[page] factor in PHP is that not possible? Here's my PHP dynamic page code if that helps:
<?php
if(isset($_GET['page'])) {
include("{$page}/index.php");
print($page);
}
else {include('home/index.php');
}
?>
Would this need to be changed to accomplish what I'm trying to do?

BTW, I tried tackling this myself but didn't get very far. Because each page variable changes, I got confused:
RewriteEngine on
RewriteRule ^index\.php$ newlink.php [R=301,L]

jdMorgan

12:59 am on Apr 25, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



CNibbana,

mod_rewrite takes the URI specified in an incoming HTTP request and either generates an external redirect or modifies the local path for use within the server, depending on what you specify in the RewriteRule. This takes place on the front-end of HTTP request processing -- before any script is called or any content is served. mod_rewrite cannot be used to change the links within pages already generated by your scripts before the server sends those pages to the requesting client; It only works on requested URIs.

To achieve search-engine friendly URIs, do the following:

  • Change the links that your pages generate to 'friendly' format.
  • Use mod_rewrite to intercept requests for those friendly URIs, and convert them to the query-string form needed to call your script.
    -OR-
  • Modify your script to accept friendly URIs, and extract the needed parameters from those URIs within the script itself. The requested page will be available to your script as environment variable {REQUEST_FILENAME}

    Jim

  • CNibbana

    1:27 am on Apr 25, 2004 (gmt 0)

    10+ Year Member



    I think I'll take option three and modify the PHP script. In the meantime, is it still possible to take these URLs with a mod-rewrite:

    [domain.com...]
    [domain.com...]

    And make them appear:
    [domain.com...] (or .html)
    [domain.com...]

    How would I do that? TIA

    jdMorgan

    1:40 am on Apr 25, 2004 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Yes, see this recent thread [webmasterworld.com], starting at msg#8

    Jim

    CNibbana

    9:19 pm on Apr 26, 2004 (gmt 0)

    10+ Year Member



    I got mine to work, turned out to be simpler than I thought:

    RewriteEngine on
    Options +FollowSymlinks
    RewriteBase /
    RewriteRule ^(.*)\.htm /?page=$1 [L]

    However, I'm curious if there is a way to specify the RewriteRule to work whether the user adresses

    mydomain/link.htm -OR-
    mydomain/link/ -OR-
    mydomain/link

    without having to write a seperate rule for each?

    And what's the difference between [NC] and [QSA] after the rule?

    CNibbana

    11:23 pm on Apr 26, 2004 (gmt 0)

    10+ Year Member



    Forget my last question in msg#5. I realized I got myself into trouble with every page under the domain being rewritten. I don't want pages further down to be rewritten.

    I realize I can make a string for the rewrite to identify and only rewrite on it:

    RewriteRule ^index\(.*)\.htm /?page=$1 [L]

    however, I would prefer not to have every page appear this way in the address bar:
    domain.com/index/home.htm
    domain.com/index/about.htm

    Is there a way to have mod-rewrite only take the information in the first subdirectory:

    domain.com/home.htm

    and complete the rewrite, yet not attempt a rewrite if the address is deeper than one subdirectory:

    domain.com/about/location/map.htm

    jdMorgan

    12:50 am on Apr 27, 2004 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Well...
    ^index\.html$ in your home directory, and
    ^subdir/index\.html$
    have two differences, the word "subdir", but also the slash after "subdir", which will be there for every subdirectory.

    So, "search" for the slash, and don't redirect if there is one following any subdirectory name and before any filename. Just add a RewriteCond ahead of your rule:
    RewriteCond %{REQUEST_URI} !^/[^/]+/

    Ref:
    Apache mod_rewrite Documentation [httpd.apache.org]
    Apache URL Rewriting Guide [httpd.apache.org]
    Regular Expressions Tutorial [etext.lib.virginia.edu]

    Jim

    CNibbana

    1:59 am on Apr 27, 2004 (gmt 0)

    10+ Year Member



    Thank-you jd! It seems like you're the one man band fielding all the questions in this area.

    I appreciate your help!

    CNibbana

    3:30 am on Apr 30, 2004 (gmt 0)

    10+ Year Member



    I need to make one more change to my rewrite and I've been reading all of the tutorials and I still can't figure it out.

    I need to rewrite the url as follows, regardless of it's length (number of subdirectories):

    [domain.com...] REWRITE TO:
    [domain.com...]

    but also work if there are more subdomains: (regardless of the number of subdomains)
    [domain.com...]
    [domain.com...]
    OR
    [domain.com...]
    [domain.com...]

    Current rule:
    RewriteEngine on
    Options +FollowSymlinks
    RewriteBase /
    RewriteRule ^(.*)\.htm /?page=$1 [L]

    I tried to incorprate parts of the RewriteCond that jd gave me, but I couldn't get any variation to work.

    RewriteRule ^[(.*)+/](.*)\.htm $1/?page=$2 [L]

    I tried this and 30+ other combinations. Please help?

    jdMorgan

    4:34 am on Apr 30, 2004 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    Just to get the semantics straight, your code appears to be an attempt to rewrite
    FROM http://domain.com/<anything>/home.htm
    TO http://domain.com/<anything>/index.php?page=home
    and not the other way around.
    You are rewriting from the requested static URL to the dynamic URL needed by your script. This is the right thing to do, but I want to make sure we get the 'from' and 'to' straight, because otherwise, it is confusing. mod_rewrite rewrites the URL requested by the browser to a different URL or to a different server-internal filepath before any scripts are run or any content is served.

    Based on that, try something like this:


    RewriteRule (.*/)?([^/]+)\.htm$ /$1index.php?page=$2 [L]

    This is sort of a tricky regex pattern, and means, "match one or more characters followed by a slash (all of which is optional) followed by all the text that does not contain any slashes, but is directly followed by a literal period and 'htm' to end the string. This will "isolate" the filename at the end and allow you to move it into the query string using back-reference $2, while $1 will contain all of the subdirectory path up to the actual filename.

    mod_rewrite must evaluate this pattern from right to left to match it, so it will be a bit slow. Also, this code is intended for use in .htaccess. For use in httpd.conf, leave off the slash in front of $1.

    Jim

    CNibbana

    5:48 pm on Apr 30, 2004 (gmt 0)

    10+ Year Member



    Jim, you are absolutely incredible. You obviusly know this stuff upside down.

    Because this is slower and puts more strain on the server, I'm going to see if I can think of a better way to accomplish this over the weekend without using such a complex mod-rewrite. Thanks again for your help.

    jdMorgan

    6:02 pm on Apr 30, 2004 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member



    I doubt it'll make much difference, unless you get 100,000 hits a day or more. I just try to cover all the bases, and patterns like that, ones where the regex parser has to "back into" the string, are less efficient to process.

    Take a look at the "design" of your friendly URLs, and consider adding a "key" for regex to look for. This is a bad example, but something like:
    mydomain.com/fee/fie/foh/fum/start/somepage.html
    is easy to parse if you tell regex to put everything up to "start" in $1, and everything after "start" in $2.

    The key shown above as "start" needs to be a unique word, one that won't conflict with a real directory name, and also one that won't look silly or suspicious to visitors.

    Another alternative would be to move the whole local filepath into the query string, and let the script take care of it.

    Just some thoughts.

    Jim