CGI urls without "?"s, but also with no extensions - Perl Server Side CGI Scripting forum at WebmasterWorld - WebmasterWorld

Forum Moderators: coopster & phranque

Message Too Old, No Replies

CGI urls without "?"s, but also with no extensions

How do search engines see these?

Robert Charlton

3:07 am on Feb 7, 2002 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

First, let me say that I know almost nothing about CGI. I did a lot of reading and searching before posting this, and I still don't know much. I've also been following dynamic page issues in the other forums for quite a while now; and I do understand that some, but not all, engines are beginning to look at dynamic content... and that static pages sometimes do better.

I'm optimizing a large, non-profit site with essentially all CGI pages. It's a well-established site, showing a lot of incoming links on all engines, and it ranks well on some very competitive stuff even though it's essentially not optimized. I attribute the ranking to a lot of relevant links coupled with chance page and title content.

When I started to look at the urls, I was completely puzzled, because there basically aren't any page names or extensions as I understand them. For the most part, the question marks in the query strings have been replaced by "/"s followed by the query (though not, I was told, for SEO purposes); and the "urls" look like

www.domain.org/modperl/go/ca

or:

www.domain.org/cgi-bin/ca/district_profile/286

I'm wondering whether, when I try to create links and page content for optimizing the interior pages, I can keep the pages and their urls in this form, or whether we need to create dupes of our most important pages with conventional extensions... or, whether there's any other trick that might work, if in fact a trick is necessary.

Google clearly can follow the existing links and can see these urls. I'm not sure about the other engines.

I just need to get enough of a handle on how the spiders will see things before making suggestions to the technical people on the site. Obviously, they'd like to leave things alone if they can... it's a big site.

If it would help, I can explain the "url" naming conventions as they were explained to me, but I suspect most of you are already ahead of me on this. It's definitely not what I'm used to.

physics

3:42 am on Feb 7, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

My understanding is this:


www.domain.org/cgi-bin/ca/district_profile/286  
            ^--this is the program ca, without a .cgi extension

Then, the /district_profile/286 is read by the ca program as URL input information... similar to the way the &key=value pairs are normally read. [It sounds like they've done the sort of thing recommended in this post [webmasterworld.com]. Then the script uses this info as input parameters to create the page. No extensions are necessary as you aren't calling a real page but rather sending info to a program. You should have no problems linking to this... just give it a try with a test link!

Robert Charlton

5:38 am on Feb 7, 2002 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

physics - Thanks for the explanation and the link to the "Eliminating allergic symbols" thread. Yes, I think that's essentially what they're doing on the site I'm working on. It is an Apache server. For reference, I'm including at the end of this post the explanation that I was sent.

>>No extensions are necessary as you aren't calling a real page but rather sending info to a program.

This is the heart of my question. Is it actually necessary to create the fake pagename with the html extension, as in the "Eliminating allergic symbols" thread? The urls without a filename seem to work just fine to take me to pages from my browser. The question is whether "sending info to a program" includes spiders as programs... ie, will spiders follow these links too?

I've seen posts on the forums suggesting that the engines might favor html pages over others, and Everyman's post also suggests an html extension is desirable. How true is that these days?

Creating the "fake" pages with extensions would be a lot of work... could conceivably mess up hundreds of external links... and if the pages were created as dupe pages for spidering only, there might be a dupe content problem as well.

Here's the explanation I was sent by the engineer on the site:

The normal way to pass parameters to a cgi program looks something like
this:
www.domain.org/cgi-bin/cs_compare?state=ca
where the name of the program is 'cs_compare' and we are passing a variable that has a name of 'state' and a value of 'ca'. The web server takes everything after the question mark as name/value pairs and passes that info into the cgi program. Instead we do something like this:
www.domain1.domain.org/cgi-bin/cs_compare/ca
This looks kind of like a directory, but actually it is a cgi program named 'cs_compare'. The web server knows that it is supposed to execute anything in the cgi-bin directory, so it executes the 'cs_compare' cgi program and ignores the following slash and everything after it. Then the program parses the url itself and extracts the 'ca' and knows that that is supposed to be the state.

I should probably add that the actual form of the url described above for 'ca' ends up as:
www.domain.org/modperl/go/ca

physics

6:03 am on Feb 7, 2002 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

In my experience, sometimes having the .html extension is better. But I think I noticed that with Google it doesn't matter that much. In any case, if the cgi-bin directory is used in the path the 'trickiness' factor is destroyed anyway... any cgi savvy dude (ahem... or SE algo) knows it's a script and not a static page. If I were you, I wouldn't worry about adding the .html (imho). Anyone else?