Forum Moderators: open

Message Too Old, No Replies

How to get rid of ? in URL

Eliminating allergic symbols

         

dantheriver

3:10 am on Dec 30, 2001 (gmt 0)



Hi all-
I understand that I need to get rid of the ? in my URLs to enable better spydering of my site. I've seen many sites that seem to pass wariables with multiple /'s. For example... www.mystore.com/item/category/radio.
This raises a few questions for me:

1.What is being used to manipulate the URL's like this? What is typically used to parse this on the server?
2. Can this be implimented with any server side scriping language? ASP, PHP, CF?
3.If the page is extensionless do any of the search engines have problems spydering the stie?
4. Are there other alternatives to pass variables to dynamic pages without using ?'s ?

If anyone can share a little wisdom with me, and at least point me in the right direction, it would be greatly appreciated.
Thanks in advance,
Dan LaRiviere

ralnikov

2:35 pm on Dec 30, 2001 (gmt 0)

10+ Year Member



1) I did not understand question.
2) Yes, it could be implemented with ASP3/IIS5 or with Apache mod_rewrite.
3) I think not. But you could add a extension to virtual url.
4) There is not standard ways to pass variable values except standard ?param=value

Everyman

9:24 pm on Dec 30, 2001 (gmt 0)



You can use PATH_INFO.

This works on Apach 1.3.9 running on Linux; I cannot speak for anything else.

Assume you had a database and were generating pages based on the QUERY_STRING, as in this example:

www.nowhere.com/cgi-bin/runprog.cgi?name1=value1&name2=value2

You can get rid of the question mark by using PATH_INFO instead of QUERY_STRING:

www.nowhere.com/cgi-bin/runprog.cgi/name1=value1&name2=value2

Everything after runprog.cgi will end up in PATH_INFO instead of QUERY_STRING. I'd strongly recommend a bit of homebrew parsing using characters that won't get you into trouble:

www.nowhere.com/cgi-bin/runprog.cgi/name1-value1_name2-value2

The underscore and hyphen are safe, and A-Z, a-z, and 0-9 are safe. Nothing else is safe.

Next, drop the ".cgi" extension; Apache doesn't care:

www.nowhere.com/cgi-bin/runprog/name1-value1_name2-value2

Next, get rid of the cgi-bin directory by asking your ISP to make another directory ["newdir"] and giving it ExecCGI status in the Apache configuration file:

www.nowhere.com/newdir/runprog/name1-value1_name2-value2

Now change the name of the program so it doesn't look like a program (let's call it "fakedir"):

www.nowhere.com/newdir/fakedir/name1-value1_name2-value2

Now slap a ".html" on the end of the data, which you'll parse out but it will make it look good:

www.nowhere.com/newdir/fakedir/name1-value1_name2-value2.html

Now get rid of that clumsy name/value pair stuff; you don't need all of this baggage since you're doing your own pasing anyway. Use cool names that might help your ranking (this goes for the newdir and fakedir above as well, which I won't illustrate here):

www.nowhere.com/newdir/fakedir/widget1-feature2.html

Okay, now looking at this new URL, can any bot on earth figure out that this is a dynamically-generated page, assuming that the page is properly an html page? I don't think so.

Your "fakedir" program gets "/widget1-feature2.html" into the PATH_INFO. You throw out the ".html" on the end, throw out the leading slash, and parse the rest. The parsed data gives you the information you need to define what gets extracted dynamically from the database. Now dump this to standard output with the proper HTML coding in it, starting with Content-type: text/html\n\n

Everyone is happy.

Robert Charlton

5:45 am on Feb 7, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I'd started another thread [webmasterworld.com] in the Service Side Scripting forum, pursuing this question from a slightly different angle, which is whether it's actually necessary to create the fake pagename with the html extension... or whether the spiders will follow a url that looks like a fake directory.

Would appreciate feedback on either thread.

Marcia

5:59 am on Feb 7, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just saw a PHP script that gets rid of the dynamic ? and changes to a static URL. Also, I believe there's a way to configure Cold Fusion to do it. I believe there was a thread on how to do it with ASP - a site search for dynamic pages should turn it up.

Isn't mod_rewrite the simplest way if the host won't make changes and the module is available?

alex_h

6:10 pm on Feb 13, 2002 (gmt 0)

10+ Year Member



Use a parsing script...

We have a script that handles our caching (let's us keep the load on the database server pretty light, and make changes without the users seeing them first) and parsing the system.

Basically, a few addresses are let straight in... images, stylesheets, javascript files, etc. They are in directories and passed straight through.

All other URLs go to our parsing script. The parser does the cacheing and converts the variables in the slashes to variables.

We found that google indexes these well.

Now, our next gen. parser is going to be easier to configure, so we don't get URLs that look like they are 8 levels deep and passing variables through logical names.

physics

8:35 pm on Feb 16, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




Next, get rid of the cgi-bin directory by asking your ISP to make another directory ["newdir"] and giving it execcgi status in the Apache configuration file:

Is there a way to do this myself, with .htaccess or something? My webhost chose to ignore this request :(

physics

4:11 am on Feb 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've seen posts here which state you can have .html processed as a script by using .htaccess, but I can't see how that same method could be used here. Seems like there must be a way!

bird

4:31 am on Feb 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The principle works like this (in .htaccess):


RewriteEngine on
RewriteRule ^fakedir/([^/]+)/([^/]+)/([^/]+)\.html$ /cgi-bin/script.cgi?var1=$1&var2=$2&var3=$3 [L]

Quite a mouthful... Here's what it does:


  • RewriteEngine on # activate rewriting
  • RewriteRule # rewrite according to this rule
  • ^fakedir/ # match the fake directory name at the beginning of the URL path (depending on other settings, you may have to use ^/fakedir/ instead)
  • ([^/]+) # the () create a "group", which can later be referenced in sequence by $1, $2, etc.
  • [^/] # any character except a slash
  • [^/]+ # one or more characters that are not slashes
  • \.html # present a fake file name extension to the user
  • $ # matches the end of the URL string
  • /cgi-bin/... # the path that the rule redirects to internally
  • $1, $2, $3 # the references to the groups defined in the pattern
  • [L] # stop rewriting (ignore any further rules)

As a result, you can access an URL like this:

www.example.com/fakedir/blah/blubb/glugg.html

and the server will process that internally as:

www.example.com/cgi-bin/script.cgi?arg1=blah&arg2=blubb&arg3=glugg

All of this is untested off the top of my head, so you'll have to verify the details for yourself... ;)

physics

5:01 am on Feb 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks I'll give it a shot. I'm still wondering if there is a way to do what is mentioned in the above post though:

www.nowhere.com/cgi-bin/runprog.cgi/name1-value1_name2-value2

The underscore and hyphen are safe, and A-Z, a-z, and 0-9 are safe. Nothing else is safe.

Next, drop the ".cgi" extension; Apache doesn't care:

www.nowhere.com/cgi-bin/runprog/name1-value1_name2-value2

Next, get rid of the cgi-bin directory by asking your ISP to make another directory ["newdir"] and giving it ExecCGI status in the Apache configuration file:

www.nowhere.com/newdir/runprog/name1-value1_name2-value2



Can this only be done in the Apache config file?

Black Knight

1:07 pm on Feb 17, 2002 (gmt 0)

10+ Year Member



You can actually simplify that URL a little more by dropping the 'newdir' directory. Just place the script itself at the root level called something innocuous like 'content'
and then:
www.nowhere.com/newdir/runprog/name1-value1_name2-value2
can become:
www.nowhere.com/content/name1-value1_name2-value2

Hmmm then again, maybe that only works with PHP ... anyone confirm?

bird

3:38 pm on Feb 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm still wondering if there is a way to do what is mentioned in the above post though:

www.nowhere.com/cgi-bin/runprog.cgi/name1-value1_name2-value2

The underscore and hyphen are safe, and A-Z, a-z, and 0-9 are safe. Nothing else is safe.

In the URL pattern of the RewriteRule, most of the slashes are matched literally and can be replaced with other characters to your liking. Of course, you can also encode the variable names in the faked directory and file names.

RewriteRule ^somedir/([^/]+)-([^/]+)_([^/]+)-([^/]+)\.html$ /cgi-bin/runprog.cgi?$1=$2&$3=$4 [L]

If you want to support different numbers of variables, then you'll have to establish seperate rules for each number. In any case, how you assign the "slot" created this way is up to you.

The only thing you need to remember is that the first (group) in the pattern is matched by $1, the second by $2, etc.

physics

4:46 pm on Feb 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks bird. On one of the servers I'm on it works great! On the other one it causes a site wide server error :( However, on the server that went down there was no .htaccess file in my /home/me/public_html/ directory. There was one called htaccess.out though... I don't know what the deal is with dat.

bird

5:39 pm on Feb 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For all of this to work, the server needs to have mod_redirect present and configured to be active. The htaccess.out file is non-standard, so it could mean anything, possibly a template.

Btw: In my last example, you may want to change the group patterns, so that they match everything except the character that terminates each group (I had left the slash from my first example). A better pattern would therefore look like this:

RewriteRule ^somedir/([^-]+)-([^_]+)_([^-]+)-([^\.]+)\.html$ /cgi-bin/runprog.cgi?$1=$2&$3=$4 [L]

You could als just exclude all seperators from each group pattern to stay on the safe side: ([^/-_\.]+)
Oh, and in case it hasn't become obvious so far, the "." needs a leading "\" to escape its normal meaning of "match any character". Thus the pattern "\." matches a literal dot character.

lazerzubb

11:29 am on Aug 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I wonder if anyone have found a solution, i need to present it to a client and need to know if you can rewrite the url in Cold Fusion?

Go2

9:25 pm on Aug 7, 2002 (gmt 0)

10+ Year Member



Maybe the ?s aren't necessary in the first place?

I have seen far too many sites that use ASP/PHP to serve essentially static content. In that case, you are taking a long detour to emulate static content in your dynamic web application. Of course, you can charge many consulting hours on this topic alone...

If you really are serving dynamic content, ultimately determined by user behavior, why would you want to have such "private" content indexed by search engines?

Internet Marketing M

10:51 pm on Aug 22, 2002 (gmt 0)

10+ Year Member



After you replace
"http://domain.com/index.asp?page=content"
with
"http://domain.com/_content"
(notice the "index.asp?page=" has been replaced with a "_"

Would it me better to leave it:
/_content
or
/_content.html