We have a CGI script that parses HTML pages while you browse through a site in order to carry variables attached to URLs.
At the end, when you click onto outgoing link, the script passes variables to PHP script which finally gets you to an outside site.
In some cases variable gets inserted into middle of the URL, rather than at the end. By comparing it to sites where it works fine, we found it gets “puzzled” when a subfolder has “hyphen” (dash) within its name. In our case, most of file names (html) are two words and more with hyphens and it works fine. It look like a subfolder is a problem.
Now, without posting parts of that script (yet), is there anything that is kind of rule when about hyphens/dashes?
Here is the head of that script:
#!/usr/bin/perl
use CGI::Carp qw(fatalsToBrowser);
use CGI qw/:cgi/;
use Data::Dumper;
use HTML::LinkExtor;
$HTML::Tagset::linkElements{'option'} = ['value'];
use URI;
Please note that $HTML line has been written as initially we had problems with links that were part of forms where there would be no “href” but “option”.
Thank you
Now, without posting parts of that script (yet), is there anything that is kind of rule when about hyphens/dashes?
Hyphens/dashes are nothing special. The problem might be if you are using a regexp the dash might be getting interpreted as a range operator, similar to: /a-z0-9/. But without seeing your code there is no way to tell what the problem is.
If I take a site that gets troubled, I delete images and content, and I leave some navigation, internal links and few outgoing links…
no problem!
I’ll give some time to the programmer I found on scriptlance.
Thanks for showing interest. No matter how we resolve this, I'll ensure I update this thread with what was the problem.
Here is what was causing it:
Site navigation is on the on left and at the bottom. One template for all pages. Within both left and bottom navigation there was ONE link pointing to a subfolder in a form of “word-word/”.
That subfolder contains “index.html” and several more files.
If you are on the page that points to some of the files inside “word-word” folder, all links woul break and would look like:
../word-word/?VARIABLErest of url?VARIABLE
…instead like:
/word-word/rest of url?VARIABLE
If I renamed folder “word-word” to “word” everything worked fine, but that was crappy solution as I wanted to know why it was breaking.
Finally, this evening, I’ve changed “word-word/” to “word-word/index.html” and everything started working fine.
In our web world, is there any technical difference between “folder/” and “folder/index.html”?
Going back to Perl/CGI script that does the work of inserting VARIABLE to the end of each link within the site, I have no technical clue why it is doing that wrongly in the case I’ve explained.
Yet, I am convinced that with evading to end our links with “folder/” but using “folder/index.html”, we will not experience this problem again.
In our web world, is there any technical difference between “folder/” and “folder/index.html”?
Yes there is a difference but most servers are setup to search a list of files if no file is defined in a URL. 'index.html' is generally the first one the server will look for. Which is why 'www.example.com' does not have to be written as: 'www.example.com/index.html' to work properly.