After studying the post by StupidScript [
webmasterworld.com...] with the working code to do a rewrite in httpd.config accessing a php script in cgi-bin, I tried to duplicate as nearly as possible the implementation on our website as a first step to shake out any problems before making modifications to do something a little more complicated.
The goal is to improve the user experience with cleaner easier to deal with URL's, and to improve search engine performance, which seems to be taking more account of exact matches in URL's.
The implementation is for a new section of the site, not an existing section. The new section is in development and not live (except when uploaded for testing), so there is no issue of changing existing URLs anywhere on the site or out there on the web.
I have not been able to get the duplicate implementation working and have run out of ideas as to why. Here is our test goal and set-up
Take an incoming URL:
http://example.com/dog/collie
And grab from the server this filepath:
http://example.com/domestic-animals/pet-test-target.php?ID=74
where example.com is the website, domestic-animals is an actual folder name, and pet-test-target.php is the dynamic page address.
Our site example.com is on a Network Solutions VPS with three other websites we set up. There is a seemingly default (with all kinds of instructions etc) httpd.config file that has never been altered by us. We know the file is being read because we first tried this without the cleanrul.php script in cgi-bin, and it shut down the VPS on the error that the cleanurls.php file could not be found. The httpd.config has 3 sections: Global, Main Server Config, and Virtual Hosts. There are no hosts entered in virtual hosts, but there are four total sites all working, all having log files, etc. though no options were set in httpd.config. Possibly our problem is there in the unlikely event we have the code right. Anyway,
In the database "stuff" in the table "pets", the relevant columns are:
ID - primary ID
type
breed
Sample table data:
ID - 74
type - dog
breed - collie
This was set up to try and match relevant fields in the referenced code as follows:
ID - rid
type - type
breed - alias
So that our "dog" is analogous to the sample script "article" and our "collie" is analogous to the sample script "this-is-an-article" . We didn't' have an extra numbering field already in the database mirroring the primary ID field (which appears to be the "rid" in the script), and from looking at the script didn't think this was an issue so we just used the primary ID field.
Ignoring the hours and crazy attempts and simplifying where we are now: To start, I copied the the rewrite code for httpd.config and change only one thing, the rewrite condition word from "article" to "dog"
RewriteEngine on
RewriteMap newurl prg://var/www/cgi-bin/cleanurls.php
RewriteLock /var/lock/map.newurl.lock
RewriteCond %{REQUEST_URI} ^/dog.*
RewriteRule ^/(.*) ${newurl:$1} [L]
I put this at the top of the 2nd section on httpd.config under the the Section 2 heading of main server configuration. I also had tried this at the end of Section 2, just above the Section 3 Virtual Hosts heading. Did the restart using virtuoso restart of apache web server so as not to shut down the entire VPS (which I did anyway by error as previously mentioned :), figuring this constitutes the soft start I have read about.
Next, our cleanurls.php file. This was uploaded to cgi-bin, changed to a 755 and user was changed to Apache with group left as root.
We encountered one problem that we did isolate with this cleanurls.php file. When we used the path to our db connection file in a simple test looking for reasons our set up was not working, it found the connection file, but would not give access. The simple test we ran was an instruction to update a row in the mysql table via php script in cgi-bin started via cron. This test did not work, and the cron email came back no permission to access (when using the path to database connection file). After screwing around with permissions and getting nowhere (out of ignorance), I finally just went with the code you see (localhost, user, pword) and later specifying the database and table names as shown. This worked in the simple "update a row" test, so I assumed it will work here. And I don't know if this method is a security concern or not, leaving a php script in cgi-bin with the database user and password in the clear.
#!/usr/bin/php
<?php
mysql_connect("localhost","user","pword");
set_time_limit(0);
$keyboard = fopen("php://stdin","r");
while (1) {
$line = fgets($keyboard);
if (preg_match('/(.*)\/(.*)/', $line, $igot)) {
$getalias = mysql_query("select ID from `stuff `.`pets ` where type = '$igot[1]' && breed = '$igot[2]'");
while($row=mysql_fetch_array($getalias)) {
$arid = $row['ID'];
}
print "/example.com/domestic-animals/pet-test-target.php?ID=$arid\n";
}
else {
print "$line\n";
}
}
?>
Other than the database connect issue, the only other thing we did to the script intentionally was to take out one of the two pieces of info that the select statement was looking for and the assignment of a function to that variable. In the script, he was using two variables in the filepath, but we only need one… the ID number. We made certain for purposes of the test that the type and breed were a unique combination in our database.
So basically it says (or is supposed to say):
If an incoming URL is http://example.com/dog (and anything else after that who cares), hand off to the script, which looks at /dog/collie in our example as the two variables and uses them to get the ID which is needed for the filepath to grab and return that page without messing with the clean ULR in the address bar on the browser. If nothing matches up, just go with what was typed in by the user as the filepath you are looking for.
When we test this, we get a 404.
Your comments and insight would be greatly appreciated.