|Force trailing slash in URL?|
My site has a folder styled URL structure, e.g
All my internal links specify a trailing slash, and I perform a 301 redirect to the same URL + trailing slash, if it wasn't there already.
I've also noticed that Google seems to be storing two results for each version (no trailing slash & trailing slash).
What should I be doing? I don't see any other sites having this kind of issue :/
[edited by: tedster at 4:02 pm (utc) on Jul 5, 2010]
[edit reason] make the full example URLs readable [/edit]
This certainly shouldn't happen if there is indeed a permanent redirect from one to the other. I would verify that Google is indeed receiving a 301 status - you could use the tool in GWT to "fetch as googlebot", for instance.
Another possibility is that Google is merely manipulating the display URL in SERPs, rather than actually storing a distinct copy. This seems to happen when Google anticipates that a redirected URL will get more clickthroughs for a particular search.
One way to check if this is happening is to use the search syntax below:
In cases where Google has correctly mapped redirects, you'll usually see the canonical URL returned for that query. In those instances, it's a display issue rather than a problem that would affect performance.
The canonical URL for the index page in a folder is
example.com/folder/ with a trailing slash, but for an extensionless page the URL is
example.com/page without the trailing slash.
The difference is that for a folder, there will be more pages (other than the index page) within that folder.
|I've also noticed that Google seems to be storing two results for each version |
Are you seeing both URL versions in the actual SERPs - or in some other area of Google reporting?
Rather strange this one, but the URLs I found earlier (using site:http://mysite.tld/categoryname/widgetname/ ) have since disappeared and I can find no trace of them checking multiple other pages.
I'll double check again tomorrow and update if I find anything.
Think I'll change my brand of coffee.
|Think I'll change my brand of coffee. |
Just make sure you've got the kind with Extra Caffeine...
Google should make more sense that way. :)
On a somewhat related issue we just started using this line of new code which Matt Cutts suggested in one of his recent videos:
<link rel="canonical" href="http://example.com/index.html"/>
and I am wondering if it should be used with a trailing slash as now being done, or without, or does it not make a difference either way? Thanks.
A canonical link should work with the trailing slash issue very well. But it does depend on Google not to have problems, so its better to configure your server properly too.
I do like the canonical link in addition, because it catches so many edge cases - and yuou can barely think of all the crazy things that might happen.
If you're talking about this trailing slash (in red):
<link rel="canonical" href="http://example.com/index.html"/>
Then you should not be using it unless you're using XHTML syntax in your markup. (Note also that this would not then be the same issue as the original poster above.)
Also, in your example, it is unlikely that your canonical URL would include the "index.html" part, and you may need a "www" in the URL too. So tread carefully so as not to create problems for yourself - first decide what the canonical link for the page should really be.
Yes, that is the dash. I think we are using it mostly without using xhtml and don't recall Matt Cutts mentioning that aspect. Also, we basically never use the www with any pages, links and sites since we decided to go nearly 100% with non-www several years ago. I also think Matt said the index.html (or whatever index file name such as .htm and .php) should be used in the url (but not positive). What issues would all of that cause?
No trailing slash on your home page is fine, browsers append it anyway and both give a 200 OK code.
A trailing slash is appropriate after any category or sub-level in the url.
No trailing slash, and even better a .htm or .html (or php etc..etc), is fine after the full url for your articles, there is no valid reason to make you final articles appear to be categories.
Just an added note for wordpress users, wordpress categories are broken (still) in the latest release. All url's have a /category/ in them by default and even if you remove that or replace it with a real category name it remains broken. visiting example.com/category/ results in a 404 error which is a fairly significant oversight (which is likely known and accounted for by search engines).
It is possible to fix wordpress categories but requires several modifications.
The / you are talking about doesn't really matter... Using dir/index.html vs dir/ doesn't really matter as long as only one is accessible... Using www. or non-www. doesn't really matter... I would say:
Google doesn't really care if your html / xhtml tags are closed correctly, so the canonical reference should have the same effect with /> or with >.
/index.html is some extra unnecessary characters, and I don't think it looks as good or professional personally, so I don't use it. (Of course, I don't ever use a trailing / at all in my URLs either, because although some may disagree, IMO it's just an extra character that the visitor doesn't really need, so my URLs end with no trailing /.)
Using non-www. is something I decided to do on one site, and I'm going to keep it that way because I have more domains I can use to serve graphic from, but I probably won't do it again, because one of the biggest, best reasons I've heard for adding those extra characters to the front of the domain name is the ability to use non-www. as cookieless and serve images from there... It doesn't work the other way. If you set a cookie on non-www. the cookie is sent by the browser to any subdomain by default with every request made, even if it's not needed, but if you add the 4 extra characters to the front of the domain the cookie is not sent to the non-www. version (unless you specify .example.com when setting the cookie), so you can actually speed things up a bit by serving 'cookie unnecessary' files from the non-www. version while serving the 'cookie needed' pages / files from the www. version.
|No trailing slash on your home page is fine, browsers append it anyway and both give a 200 OK code. |
To clarify this, a trailing slash is a requirement for the root of the site - without it there is no request at all and nothing to retrieve. This is why browser are designed to automatically append it for you.
It's related to the way web requests work. Visiting www.example.com/test in a typical browser results in the request below:
GET /test HTTP/1.1
So, without the trailing slash, there is nothing to put in the GET section - no request!
So, your browser will automatically add a slash if you request the root, otherwise nothing would happen ;)
For this reason, it's good practice to include the trailing slash when linking to the root - it avoids the extra step and ensures compatibility.