Forum Moderators: phranque

Message Too Old, No Replies

mod rewrite + httpd.config + mysql + php script = clean URLs

problem implementing a rewrite using mysql dbase, mod rewrite, and php

         

kidcobra

6:45 pm on Mar 29, 2011 (gmt 0)

10+ Year Member



After studying the post by StupidScript [webmasterworld.com...] with the working code to do a rewrite in httpd.config accessing a php script in cgi-bin, I tried to duplicate as nearly as possible the implementation on our website as a first step to shake out any problems before making modifications to do something a little more complicated.

The goal is to improve the user experience with cleaner easier to deal with URL's, and to improve search engine performance, which seems to be taking more account of exact matches in URL's.

The implementation is for a new section of the site, not an existing section. The new section is in development and not live (except when uploaded for testing), so there is no issue of changing existing URLs anywhere on the site or out there on the web.

I have not been able to get the duplicate implementation working and have run out of ideas as to why. Here is our test goal and set-up

Take an incoming URL:

http://example.com/dog/collie

And grab from the server this filepath:

http://example.com/domestic-animals/pet-test-target.php?ID=74

where example.com is the website, domestic-animals is an actual folder name, and pet-test-target.php is the dynamic page address.

Our site example.com is on a Network Solutions VPS with three other websites we set up. There is a seemingly default (with all kinds of instructions etc) httpd.config file that has never been altered by us. We know the file is being read because we first tried this without the cleanrul.php script in cgi-bin, and it shut down the VPS on the error that the cleanurls.php file could not be found. The httpd.config has 3 sections: Global, Main Server Config, and Virtual Hosts. There are no hosts entered in virtual hosts, but there are four total sites all working, all having log files, etc. though no options were set in httpd.config. Possibly our problem is there in the unlikely event we have the code right. Anyway,

In the database "stuff" in the table "pets", the relevant columns are:

ID - primary ID
type
breed

Sample table data:

ID - 74
type - dog
breed - collie

This was set up to try and match relevant fields in the referenced code as follows:

ID - rid
type - type
breed - alias

So that our "dog" is analogous to the sample script "article" and our "collie" is analogous to the sample script "this-is-an-article" . We didn't' have an extra numbering field already in the database mirroring the primary ID field (which appears to be the "rid" in the script), and from looking at the script didn't think this was an issue so we just used the primary ID field.

Ignoring the hours and crazy attempts and simplifying where we are now: To start, I copied the the rewrite code for httpd.config and change only one thing, the rewrite condition word from "article" to "dog"

RewriteEngine on
RewriteMap newurl prg://var/www/cgi-bin/cleanurls.php
RewriteLock /var/lock/map.newurl.lock
RewriteCond %{REQUEST_URI} ^/dog.*
RewriteRule ^/(.*) ${newurl:$1} [L]

I put this at the top of the 2nd section on httpd.config under the the Section 2 heading of main server configuration. I also had tried this at the end of Section 2, just above the Section 3 Virtual Hosts heading. Did the restart using virtuoso restart of apache web server so as not to shut down the entire VPS (which I did anyway by error as previously mentioned :), figuring this constitutes the soft start I have read about.

Next, our cleanurls.php file. This was uploaded to cgi-bin, changed to a 755 and user was changed to Apache with group left as root.

We encountered one problem that we did isolate with this cleanurls.php file. When we used the path to our db connection file in a simple test looking for reasons our set up was not working, it found the connection file, but would not give access. The simple test we ran was an instruction to update a row in the mysql table via php script in cgi-bin started via cron. This test did not work, and the cron email came back no permission to access (when using the path to database connection file). After screwing around with permissions and getting nowhere (out of ignorance), I finally just went with the code you see (localhost, user, pword) and later specifying the database and table names as shown. This worked in the simple "update a row" test, so I assumed it will work here. And I don't know if this method is a security concern or not, leaving a php script in cgi-bin with the database user and password in the clear.

#!/usr/bin/php
<?php
mysql_connect("localhost","user","pword");
set_time_limit(0);
$keyboard = fopen("php://stdin","r");
while (1) {
$line = fgets($keyboard);
if (preg_match('/(.*)\/(.*)/', $line, $igot)) {

$getalias = mysql_query("select ID from `stuff `.`pets ` where type = '$igot[1]' && breed = '$igot[2]'");
while($row=mysql_fetch_array($getalias)) {
$arid = $row['ID'];
}
print "/example.com/domestic-animals/pet-test-target.php?ID=$arid\n";
}
else {
print "$line\n";
}
}
?>

Other than the database connect issue, the only other thing we did to the script intentionally was to take out one of the two pieces of info that the select statement was looking for and the assignment of a function to that variable. In the script, he was using two variables in the filepath, but we only need one… the ID number. We made certain for purposes of the test that the type and breed were a unique combination in our database.

So basically it says (or is supposed to say):

If an incoming URL is http://example.com/dog (and anything else after that who cares), hand off to the script, which looks at /dog/collie in our example as the two variables and uses them to get the ID which is needed for the filepath to grab and return that page without messing with the clean ULR in the address bar on the browser. If nothing matches up, just go with what was typed in by the user as the filepath you are looking for.

When we test this, we get a 404.

Your comments and insight would be greatly appreciated.

jdMorgan

12:25 am on Apr 7, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The rewritecond is redundant, wrong, or both.

If it is simply redundant, then a single line of code:
 
RewriteRule ^/(dog/.*) ${newurl:$1} [L]

should work identically.

That won't fix your problem, though.

As for debugging this stuff, I suggest that you divide and conquer. Temporarily replace cleanurl.php with a script that reads STDIN (as before), but always returns a fixed string value such as "tiger" or some other valid but non-pet non-dog path. This eliminates the database part of the problem entirely, and allows you to recognize that the rewrite has indeed been invoked. If that works, then you know that the problem is related to the database access in the script.

A useful 'trick' you may find useful is to replace the rewrite (temporarily) with an external redirect, so that you can 'see' the substitution path output by the rule in your browser address bar. This is useful if you cannot use a RewriteLog due to server admin permission settings or the (possibly huge) load it creates on the server.

Do be very sure that the cleanurl.php script cannot "die" or "exit" for any reason. It must listen on STDIN forever, no matter what happens, as it is only started when the server (re)starts, and if it ever exits or quits, your whole site goes down...

Also, be sure that you set everything up, open the database, get the value you need, and then close the database immediately -- no "logic" of any kind should occur between the open and close of the database. You don't want have any possibility whatsoever of leaving a database connection open (forever).

You may already be well aware of the warning issues above, but I include them for *all* readers of this thread.

Jim