Forum Moderators: phranque

Message Too Old, No Replies

Technology independent URLs

and a problem with all locations ending in "/"

         

oddsod

7:50 pm on Jul 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It has been discussed often here (and this is where I got the idea) to dispense with page extensions by having all links to locations on my site go to directories. So it's mysite.com/contactus/ rather than mysite.com/contactus.htm (or asp/php/html/shtml).

I've just noticed one thing though: If someone links to me without the trailing slash the URL changes from mysite.com/this-is-my-products-page/ to mysite.com/this%2Dis%2Dmy%2Dproducts%2Dpage.

OK, so the "-" is being replaced by "%2D". Does anyone know how this impacts on SEs spidering/indexing/listing the page? Particularly on duplicate content type problems? Or do they just automatically read the "%2D" as a hyphen?

jomaxx

8:17 pm on Jul 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



IMO this would indeed be a problem, except that I've never seen that happen with a regular hyphen. My server does not do this.

I'll bet some site links to you with the %2D encoded in the HREF tag, but your browser converts it to a dash for display purposes. Is that possible?

jdMorgan

10:21 pm on Jul 6, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd recommend using mysite.com/contactus instead of mysite.com/contactus/ as a page URL, if your server allows you to implement it that way.

You are probably seeing a redirect, as the server is trying to find a file (no slash), and not finding one, redirects adding a slash assuming that it is a directory. During the redirect, restricted characters are encoded to prevent problems with HTTP compliance. You can use the server headers checker in the WebmasterWorld control panel to test this.

Jim

oddsod

8:09 am on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



jdMorgan, thanks for your suggestion but on the shared hosting I don't have that option.

Can anyone please comment on the SE related questions?

ryan26

12:53 pm on Jul 7, 2005 (gmt 0)

10+ Year Member



oddsod,
I currently have many URLs indexed with "%2D" and "%20" and it poses no problem currently in any of the search engines. The pages rank high for their intended terms.
If you were a search engine you'd surely be able to accept the fact that "%20" means a space.

oddsod

1:05 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks ryan26

If you were a search engine you'd surely be able to accept the fact that "%20" means a space.

That's what I'd have thought earlier with the www and non-www versions - that they would see it as dup. And the 302 redirect being "safe". You can't tell with the SEs and I wanted to be double check with those, like you, who've got experience of it.

encyclo

2:39 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unless you're using a database and mod_rewrite to create static-looking URLs without extensions or you are using content negotiation (which is rarely available on a shared server) I find it is best to stick to URLs with file extensions, usually technology-independent (ie. .htm or .html rather than .php or .asp).

There are several reasons for this: firstly the problem with inbound links which may or may not use the trailing slash. Apache normally issues a 301 redirect automatically to the trailing slash version, but why take the risk? The second problem is local file management - you end up with every file named index.html and a huge, unwieldy directory structure. Finally, Googlebot (and other bots) appear to associate file extensions with file types (logical really) and a .html file therefore has less chance of confusing the spider (there's a Googleguy post somewhere but I can't find it).

gershon

4:00 pm on Jul 7, 2005 (gmt 0)

10+ Year Member



As an aside, Tim Burners Lee (and W3C) officially recommend leaving the extensions off of URLs (though everyone, including the W3C themselves, ends up doing it):

[w3.org...]


What to leave out
...
File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.

See also [w3.org...] for a simple way of doing this (anyone try it?)

oddsod

4:29 pm on Jul 14, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



bump.

That was something I didn't know, gershon. I look forward to any other comments anyone has.