Forum Moderators: coopster & phranque

Message Too Old, No Replies

parsing URLs with hyphens/dashes

         

smallcompany

8:38 pm on Oct 30, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi,

We have a CGI script that parses HTML pages while you browse through a site in order to carry variables attached to URLs.
At the end, when you click onto outgoing link, the script passes variables to PHP script which finally gets you to an outside site.

In some cases variable gets inserted into middle of the URL, rather than at the end. By comparing it to sites where it works fine, we found it gets “puzzled” when a subfolder has “hyphen” (dash) within its name. In our case, most of file names (html) are two words and more with hyphens and it works fine. It look like a subfolder is a problem.

Now, without posting parts of that script (yet), is there anything that is kind of rule when about hyphens/dashes?

Here is the head of that script:

#!/usr/bin/perl

use CGI::Carp qw(fatalsToBrowser);
use CGI qw/:cgi/;
use Data::Dumper;
use HTML::LinkExtor;
$HTML::Tagset::linkElements{'option'} = ['value'];
use URI;

Please note that $HTML line has been written as initially we had problems with links that were part of forms where there would be no “href” but “option”.

Thank you

jatar_k

2:21 pm on Oct 31, 2007 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



maybe post the code that is doing the parsing of the params and adding them to the url

smallcompany

8:10 am on Nov 2, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Show me the money - :) - just kidding.

I've picked some coder from scriptlance and he already sent 4 questions. If my answers do not help him, I'll post all here so other folks that find it useful don't suffer like I do. :(

...once we fix it, certainly.

perl_diver

6:44 am on Nov 3, 2007 (gmt 0)

10+ Year Member



Now, without posting parts of that script (yet), is there anything that is kind of rule when about hyphens/dashes?

Hyphens/dashes are nothing special. The problem might be if you are using a regexp the dash might be getting interpreted as a range operator, similar to: /a-z0-9/. But without seeing your code there is no way to tell what the problem is.

smallcompany

2:55 am on Nov 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It's getting worse. Some sites work with this, some don't.

If I take a site that gets troubled, I delete images and content, and I leave some navigation, internal links and few outgoing links…

no problem!

I’ll give some time to the programmer I found on scriptlance.

Thanks for showing interest. No matter how we resolve this, I'll ensure I update this thread with what was the problem.

smallcompany

3:58 am on Nov 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sooner than I thought… I fixed it – MYSELF!

Here is what was causing it:

Site navigation is on the on left and at the bottom. One template for all pages. Within both left and bottom navigation there was ONE link pointing to a subfolder in a form of “word-word/”.
That subfolder contains “index.html” and several more files.

If you are on the page that points to some of the files inside “word-word” folder, all links woul break and would look like:

../word-word/?VARIABLErest of url?VARIABLE

…instead like:

/word-word/rest of url?VARIABLE

If I renamed folder “word-word” to “word” everything worked fine, but that was crappy solution as I wanted to know why it was breaking.

Finally, this evening, I’ve changed “word-word/” to “word-word/index.html” and everything started working fine.

In our web world, is there any technical difference between “folder/” and “folder/index.html”?

Going back to Perl/CGI script that does the work of inserting VARIABLE to the end of each link within the site, I have no technical clue why it is doing that wrongly in the case I’ve explained.

Yet, I am convinced that with evading to end our links with “folder/” but using “folder/index.html”, we will not experience this problem again.

perl_diver

4:55 am on Nov 4, 2007 (gmt 0)

10+ Year Member



In our web world, is there any technical difference between “folder/” and “folder/index.html”?

Yes there is a difference but most servers are setup to search a list of files if no file is defined in a URL. 'index.html' is generally the first one the server will look for. Which is why 'www.example.com' does not have to be written as: 'www.example.com/index.html' to work properly.

smallcompany

7:06 am on Nov 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is there any difference when you are working with folders with Perl/CGI?

perl_diver

7:53 am on Nov 4, 2007 (gmt 0)

10+ Year Member



Your quetion is not clear, but a folder is just a folder: it's a list of files and other subfolders. Perl reads folders just like any other prgramming language does.

smallcompany

4:59 pm on Nov 4, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It must be something specific to this situation, rather than something well known as a rule.

Thanks

perl_diver

8:13 pm on Nov 4, 2007 (gmt 0)

10+ Year Member



Right. And unless we see some code there is no telling what the problem might be.