Do search engines care?
I've been reading many contradictory tidbits on the trailing slash... My question is: do search engines care about the trailing slash yes or no? i.e. will a link on one site to [domain.com...] and on another site to [domain.com...] dilute linkjuice between the / and non-/ version? Should I 301 the non-/ page to the /-page?
|brotherhood of LAN|
Yes, they will treat the URLs as unique pages.
But some engines like Google are clever enough to check the URLs to see if they're unique enough to warrant being called a unique page.
It doesn't matter for the root of the domain, i.e. domain.com and domain.com/ though it's good practice to use the latter.
The issue is quite similar to www.domain.com and domain.com. It can be easy to assume that they also point to the same page though it's not always necessarily true.
Depending on how important the non-/ link is to you, perhaps you can just 404 it instead? If you do that then in future people won't mistakenly link to the wrong URL.
[edited by: brotherhood_of_LAN at 1:16 pm (utc) on Oct. 5, 2008]
I have many outstanding non-/ URLs so I guess the 301 is the most viable option...
If article is a "page" rather than a directory or directory index, then don't add a slash if you want to be HTTP-compliant. Directories and directory indexes have slashes, and extensionless pages do not, so I'd suggest that you 301-redirect the slashed page URLs to the equivalent non-slashed URLs.
When you have this link
Isn't it the browser that adds the trailing slash prior to sending the request,
Or is that just the domain name?
example.com -> example.com/
The browser should "correct" only the domain index path example.com to example.com/
jdMorgan, how would go about rewriting all 'slashed' URLs to non-slashed URLs, while taking into account directories? As far as I know, mod_rewrite can not do that, so the only solution I can think of is to just add a slash to everything thats non-slashed. This may be non-http compliant, but to me it seems like the only way to keep a large dynamic site manageable.
Since the site is dynamic and uses clean URL rewrites, I would expect there not be any excess burden on the server, since the request is rewritten to a compliant messy URL before actual processing.
As it is non-HTTP compliant, some browsers will take exception to what your site sends to them and will fail to process it properly.
There was an example of that just last week, here in the Apache forum at WebmasterWorld.
The usual approach to implementing extensionless URLs is to use mod_rewrite to detect URL-paths having no extension (e.g. having no period in the final URL-path-part), append ".html" or ".php" to those URLs, and then do a check for "file exists" with that appended extension. If the extensionless URL plus appended filetype exists as a file, then internally rewrite to that file. Otherwise, leave the request alone and let mod_dir try to resolve it as a directory.
It is also possible, though not very efficient, to work through a "priority list" of possible extensions: For example, to try .php, .html, and .htm extensions in order, and rewrite to the first one that exists.
To solve the appended-slash-on-page-URL problem, you can use an external 301 redirect based on opposite and somewhat extended logic: If the slashed URL does not exist as a directory, but does exists as a file when the slash is removed and an extension added, then externally redirect to remove the slash, and let the code described above do the rest. Alternatively, you can let let "pages" take precedence over directories if/when a naming collision occurs by simply omitting the directory-exists check.
For more information, see the RewriteCond directive and its "-f" and "-d" tokens in the Apache mod_rewrite documentation.
Google doesn't care if you use the / or not. However, Google highly recommends that you be consistent. Never use both. Choose one or the other. By the way, I got this tip from Google concerning duped content.:))