Forum Moderators: phranque
I've just noticed one thing though: If someone links to me without the trailing slash the URL changes from mysite.com/this-is-my-products-page/ to mysite.com/this%2Dis%2Dmy%2Dproducts%2Dpage.
OK, so the "-" is being replaced by "%2D". Does anyone know how this impacts on SEs spidering/indexing/listing the page? Particularly on duplicate content type problems? Or do they just automatically read the "%2D" as a hyphen?
You are probably seeing a redirect, as the server is trying to find a file (no slash), and not finding one, redirects adding a slash assuming that it is a directory. During the redirect, restricted characters are encoded to prevent problems with HTTP compliance. You can use the server headers checker in the WebmasterWorld control panel to test this.
Jim
If you were a search engine you'd surely be able to accept the fact that "%20" means a space.
There are several reasons for this: firstly the problem with inbound links which may or may not use the trailing slash. Apache normally issues a 301 redirect automatically to the trailing slash version, but why take the risk? The second problem is local file management - you end up with every file named index.html and a huge, unwieldy directory structure. Finally, Googlebot (and other bots) appear to associate file extensions with file types (logical really) and a .html file therefore has less chance of confusing the spider (there's a Googleguy post somewhere but I can't find it).
[w3.org...]
What to leave out
...
File name extension. This is a very common one. "cgi", even ".html" is something which will change. You may not be using HTML for that page in 20 years time, but you might want today's links to it to still be valid. The canonical way of making links to the W3C site doesn't use the extension.