Forum Moderators: open
1.What is being used to manipulate the URL's like this? What is typically used to parse this on the server?
2. Can this be implimented with any server side scriping language? ASP, PHP, CF?
3.If the page is extensionless do any of the search engines have problems spydering the stie?
4. Are there other alternatives to pass variables to dynamic pages without using ?'s ?
If anyone can share a little wisdom with me, and at least point me in the right direction, it would be greatly appreciated.
Thanks in advance,
Dan LaRiviere
This works on Apach 1.3.9 running on Linux; I cannot speak for anything else.
Assume you had a database and were generating pages based on the QUERY_STRING, as in this example:
www.nowhere.com/cgi-bin/runprog.cgi?name1=value1&name2=value2
You can get rid of the question mark by using PATH_INFO instead of QUERY_STRING:
www.nowhere.com/cgi-bin/runprog.cgi/name1=value1&name2=value2
Everything after runprog.cgi will end up in PATH_INFO instead of QUERY_STRING. I'd strongly recommend a bit of homebrew parsing using characters that won't get you into trouble:
www.nowhere.com/cgi-bin/runprog.cgi/name1-value1_name2-value2
The underscore and hyphen are safe, and A-Z, a-z, and 0-9 are safe. Nothing else is safe.
Next, drop the ".cgi" extension; Apache doesn't care:
www.nowhere.com/cgi-bin/runprog/name1-value1_name2-value2
Next, get rid of the cgi-bin directory by asking your ISP to make another directory ["newdir"] and giving it ExecCGI status in the Apache configuration file:
www.nowhere.com/newdir/runprog/name1-value1_name2-value2
Now change the name of the program so it doesn't look like a program (let's call it "fakedir"):
www.nowhere.com/newdir/fakedir/name1-value1_name2-value2
Now slap a ".html" on the end of the data, which you'll parse out but it will make it look good:
www.nowhere.com/newdir/fakedir/name1-value1_name2-value2.html
Now get rid of that clumsy name/value pair stuff; you don't need all of this baggage since you're doing your own pasing anyway. Use cool names that might help your ranking (this goes for the newdir and fakedir above as well, which I won't illustrate here):
www.nowhere.com/newdir/fakedir/widget1-feature2.html
Okay, now looking at this new URL, can any bot on earth figure out that this is a dynamically-generated page, assuming that the page is properly an html page? I don't think so.
Your "fakedir" program gets "/widget1-feature2.html" into the PATH_INFO. You throw out the ".html" on the end, throw out the leading slash, and parse the rest. The parsed data gives you the information you need to define what gets extracted dynamically from the database. Now dump this to standard output with the proper HTML coding in it, starting with Content-type: text/html\n\n
Everyone is happy.
Would appreciate feedback on either thread.
Isn't mod_rewrite the simplest way if the host won't make changes and the module is available?
We have a script that handles our caching (let's us keep the load on the database server pretty light, and make changes without the users seeing them first) and parsing the system.
Basically, a few addresses are let straight in... images, stylesheets, javascript files, etc. They are in directories and passed straight through.
All other URLs go to our parsing script. The parser does the cacheing and converts the variables in the slashes to variables.
We found that google indexes these well.
Now, our next gen. parser is going to be easier to configure, so we don't get URLs that look like they are 8 levels deep and passing variables through logical names.
RewriteEngine on
RewriteRule ^fakedir/([^/]+)/([^/]+)/([^/]+)\.html$ /cgi-bin/script.cgi?var1=$1&var2=$2&var3=$3 [L]
Quite a mouthful... Here's what it does:
As a result, you can access an URL like this:
www.example.com/fakedir/blah/blubb/glugg.html
and the server will process that internally as:
www.example.com/cgi-bin/script.cgi?arg1=blah&arg2=blubb&arg3=glugg
All of this is untested off the top of my head, so you'll have to verify the details for yourself... ;)
The underscore and hyphen are safe, and A-Z, a-z, and 0-9 are safe. Nothing else is safe.
Next, drop the ".cgi" extension; Apache doesn't care:
www.nowhere.com/cgi-bin/runprog/name1-value1_name2-value2
Next, get rid of the cgi-bin directory by asking your ISP to make another directory ["newdir"] and giving it ExecCGI status in the Apache configuration file:
www.nowhere.com/newdir/runprog/name1-value1_name2-value2
Hmmm then again, maybe that only works with PHP ... anyone confirm?
www.nowhere.com/cgi-bin/runprog.cgi/name1-value1_name2-value2
The underscore and hyphen are safe, and A-Z, a-z, and 0-9 are safe. Nothing else is safe.
In the URL pattern of the RewriteRule, most of the slashes are matched literally and can be replaced with other characters to your liking. Of course, you can also encode the variable names in the faked directory and file names.
RewriteRule ^somedir/([^/]+)-([^/]+)_([^/]+)-([^/]+)\.html$ /cgi-bin/runprog.cgi?$1=$2&$3=$4 [L]
If you want to support different numbers of variables, then you'll have to establish seperate rules for each number. In any case, how you assign the "slot" created this way is up to you.
The only thing you need to remember is that the first (group) in the pattern is matched by $1, the second by $2, etc.
Btw: In my last example, you may want to change the group patterns, so that they match everything except the character that terminates each group (I had left the slash from my first example). A better pattern would therefore look like this:
RewriteRule ^somedir/([^-]+)-([^_]+)_([^-]+)-([^\.]+)\.html$ /cgi-bin/runprog.cgi?$1=$2&$3=$4 [L]
You could als just exclude all seperators from each group pattern to stay on the safe side: ([^/-_\.]+)
Oh, and in case it hasn't become obvious so far, the "." needs a leading "\" to escape its normal meaning of "match any character". Thus the pattern "\." matches a literal dot character.
I have seen far too many sites that use ASP/PHP to serve essentially static content. In that case, you are taking a long detour to emulate static content in your dynamic web application. Of course, you can charge many consulting hours on this topic alone...
If you really are serving dynamic content, ultimately determined by user behavior, why would you want to have such "private" content indexed by search engines?