|Are URL structure and file names important for usability?|
In a recent discussion about using dashes versus underscores for file names, one of the reasons given for favoring hyphens is that it makes the URL more usable, because the underscore could be mistaken for a space in underlined links. But does the URL structure really have any influence on the usability of a site?
For the above example, the usability "problem" comes about then a live link is styled with an underline. But real users will just click - so it doesn't matter what it looks like. Let's face it, how many sites do you navigate via hacking the URL? How many inexperienced users (who are the ones who are supposedly confused) navigate via URL hacking? It just doesn't happen.
Fancy URLs are only useful for SEO. End users never type in keywords at the end of domain names - some may type in domain names, but that's all. They don't care if your URL is
example.com/script.aspx?variable=keyword&variable2=keyword2, they just click links.
So are URLs a non-issue for usability? What do you think?
|So are URLs a non-issue for usability? What do you think? |
I think they can be a major usability issue, particularly when it comes to email and other devices where long URIs can become a problem.
Yes, the underscore is an issue in links. Print a page that has raw links with underscores and what do you see? A lot of links with what appears to be spaces and not underscores.
Using underscores also creates one more keystroke for the user, the shift key. When I say user in this instance, I'm referring mostly to the developer/maintainer of the site.
|Fancy URLs are only useful for SEO. |
Let's define Fancy URI. If it's more than a certain character length, then it is too fancy. If it contains too many delimiters, it is too fancy. If it doesn't properly describe the resource in a short and succinct manner, it is too fancy. If it contains a file extension, you may not be thinking ahead.
I've just found that working with clean URIs make it so much easier to manage from a variety of standpoints. From usability at the consumer level to usability at the programming level. Everybody wins.
Let me continue playing devil's advocate for a while. ;)
|email and other devices where long URIs can become a problem |
The only real problem on the email side is the 80-character limit. Users are still clicking, not typing. It doesn't need to be readable, it just needs to be clickable.
What real, concrete problems occur with other devices, or is it all just false assumptions? How many user agents can't handle URIs with variables, for example on a site such as a search engine?
|Using underscores also creates one more keystroke for the user, the shift key ... I'm referring mostly to the developer/maintainer of the site. |
For the developer, maybe, but that's not a usability problem. :)
I'm not saying that clean URIs aren't better for developers, or better for search engines, just that it is not important for users.
|Let me continue playing devil's advocate for a while. ;) |
|Just that it is not important for users. |
How about yourself? Do you type URIs into the address bar? Are we not users? ;)
What about URIs in print? Specific promotions? Etc...
Here are two great articles on URIs...
Towards Next Generation URIs
Jakob Nielsen's Alertbox, March 21, 1999: - URL as UI
[edited by: pageoneresults at 1:12 am (utc) on June 28, 2006]
|Do you type URIs into the address bar? |
It's extremely rare that I type in anything other than a domain name into the address bar. The exceptions are for...
|What about URIs in print? Specific promotions? |
My local radio station has an associated website, and each program has its own section. They use one-keyword redirects to the "real" URL, so:
You can see the same approach from big names:
leads to the "real" URL:
So you can set up redirects for specific campaigns, but the real underlying structure doesn't need to be "clean" to be usable.
Of course, you can't always depend on this approach:
So you have to know the shortcuts exist to use them. Outside of the aspect given above (for type-ins from radio or print ads), there is no real use for URL shortcuts. If you can't find the page or section by clicking via the navigation or searching the site, then URL-guessing is a desperate last resort, and an indication of a usability failure. Most users would give up before trying to guess the link, inexperienced users don't use the address bar for navigation at all - at best they use it as a search box.
Neilsen's article is mostly about domain names, not file names. Domain name choice is also important, but branding is much more popular than keyword domains, which are supposedly more usable as they are more obvious.
|So you can set up redirects for specific campaigns, but the real underlying structure doesn't need to be "clean" to be usable. |
I guess I have a much different perspective on this. Too me, the underlying structure should be no different than the visible one, short and sweet. The redirects are an added layer that I don't see as being needed. Not for the average site anyway. MS and the other big names are a different story. And, those redirect setups have caused issues for some, particularly 302s.
In working with clients during development, I'm regularly asking them to browse to a particular area of the site. Most of my long term clients are so used to the structures of their sites, they know they can type a specific URI path and find their destination in most instances.
URI structure defines the site. It's the map to and through your site. Usability comes into play in many instances. Not only from the consumer side, but from the development side. Usabilility isn't just for the consumer. ;)
I work in a fairly large Japanese corporation and have noticed a disturbing trend that is lessening, but refuses to disappear. That is the printing of e-mail messages. This is not just older staff that "don't get the Internet". I see staff in their 20's doing the same thing. Every morning they dutifully print out all of their mail (spam included). Links in these messages are then typed back in if access is required. I still get calls from people who can't access a URL and inevitably it's an underscore/space issue.
This company and a lot of similar companies in Asia also still rely heavily on faxes. From time to time we get long URLs faxed to us on order sheets and inquiries. Needless to say it can be difficult to determine what a URL is at all if the transmission gets garbled or the fax paper is fed in an angle.
These are just two real-world examples where I still see URL structure being an issue.
Draft - Make readable URIs
|Machine processing and human comprehension and reproduction of URIs are, however, not anthetical. Similarly, the "opacity by design" of URIs does not mean they must be overly complicated. On the contrary: well designed, user-friendly URIs can result in easier management and an increased readership. |
Common HTTP Implementation Problems
|Do not put too much meaning in a URI. as Berners-Lee writes, Designing mostly means leaving information out. If you put too much meaning, too much semantics in your URI, chances are your resource will evolve outside of the semantic frame, resulting in an unnecessary division of the resource or change of URI. Use simple URIs, easy to type, write down, spell, or at least easy to cut and paste. They are likely to be easy to be remember if you follow this rule. |
Okay, I'm not going to let this topic slide that easily. Back to discussion... ;)
A little off topic, but it was brought up by someone else:
In the 'real world' users just don't hack URL's.
I've even got employees that annoy the heck out of me by using search engines instead of the address bar when they know the URL they want to visit. I fume inside when I see that, but it serves well to remind me that I should try harder to put myself in the place of a newbie - it's so easy to assume people know things. The average joe know diddly squat about how to use a browser, what a dodgy site is, why they should be suspicious of certain things and why clicking 10 times on something will not make it arrive sooner. (sorry for the rant)
I fail to see why we're having this discussion :) For SEO, it's well known that friendly URIs help. For development, a decent URI structure replicates the internal IA. For users, it helps when sending emails, and for the few of us (like me) who do hack URIs. I just can't see any downsides.
What really matters is the end user
most of my MySQl fields use an " _ "
So out of reflex I use it in naming sub_dir and some pages.
As mentioned earlier it can be lost in an email address underlined.
Instead of changing my habits (I know...)
I always add a PS) "empty spaces are HYPHEN"
Well, it just shows that hyphen might not be that good of a naming practice :)
|I fail to see why we're having this discussion :) |
I started it for several reasons, including the importance of challenging received wisdom, clarifying the relative importance of URL structure compared to other aspects of usability, and simply to provoke a reaction by playing devil's advocate. ;)
|the underlying structure should be no different than the visible one |
One aspect about the development side is that as the tendancy is to have database-driven sites running through a single script, the URI structure has no direct relation to the underlying file system. Site structure is therefore defined by navigation rather than by URI.
So, what's a "usable" URI? Is this usable?:
Is this better?:
The length of that example URL would be a problem to me, I often find myself apologising for 'not having the time to give a shorter answer'. Meaning that it's far more difficult to say something in a few words than it is to express something in several paragraphs. Nice URLs are just the same, they should be concise but include information that users and search engines can easily understand.
The shorter URL is also problematic, there is no information for a user to 'guess' what the page is about.
|Is this usable?: example.com/forum116/83.htm |
Yes, that is a good usable URI for the type of environment it is used in.
|Is this better?: example.com/Accessibility_Usability/URL-structure-and-file names-important-for-usability |
Personally, the second example doesn't do much for me from a usability and visibility perspective. For one, it is using mixed case. Two, there are 6 hyphens in that file name, yikes! Three, the length of the URI concerns me.
And since your second example utilizes the user entered data, it opens up a can of worms when it comes to typos, grammar, special characters, etc. Think long term management in this case. I know where you're going with this and I can tell you that I've been known to go back through old topics and correct titles, descriptions, etc. That particular URI structure is determined by the data I may be changing.?
My idea of the perfect URI using the examples above...
And then I might look at...
A sub-domain approach may be an option for a site large enough to be broken down "naturally".
And, if the primary forum name needs to be used...
I'm slightly allergic to trailing slashes on URIs unless they map to a genuine physical directory. Otherwise you can get problems if your mod_rewrite rules don't take into account that you can get inbound links (eg. from Yahoo Search) without the trailing slash. So I would consider the following:
or closer to root (FWIW):
to be better solutions. I admit that I am old-fashioned and I still favor a generic file extension (.htm or .html) which leaves no doubt at all - much like the URI of this thread.
I should add that the above is not really a critique of the WebmasterWorld URI structure, but rather the "blog" style of URIs which end up being a long string of words separated by hyphens - which add nothing to the site usability and are just there for SEO.
Do you think that user-generated content is usually better without keyword-rich URIs? What matters most? Brevity? Complexity? Clarity? Is there a conflict between usable URIs and SEO-friendly URIs?
|I'm slightly allergic to trailing slashes on URIs unless they map to a genuine physical directory. |
Hmmm, I've had no allergic reactions so far. ;)
|Otherwise you can get problems if your mod_rewrite rules don't take into account that you can get inbound links (eg. from Yahoo Search) without the trailing slash. |
I think in this instance that Apache's Content Negotiation comes into play.
|So I would consider the following: example.com/forums/116/83 or closer to root (FWIW): example.com/forum116/83 to be better solutions. |
I would too if I knew the future didn't hold a possible horizontal growth to other primary categories like example.com/blogs/ and example.com/faqs/, example.com/articles/.
I've found over the years that it doesn't matter where the content resides. It's how the links are mapped to that content. It can be six levels deep and if there is a link to it from the home page or another primary category entrance page, it's going to be indexed as if it were sitting at the root. This is also all relative to the current PR of the site, it's age, the crawl routines of the spiders, etc. Think...
Harnessing and Funneling Link Juice
|I admit that I am old-fashioned and I still favor a generic file extension (.htm or .html) which leaves no doubt at all - much like the URI of this thread. |
I used to think the same way until I had to change the underlying technology of a site and deal with the intricate issues of regular expressions and pattern matching which your typical webmaster doesn't get involved with. I surely didn't want to lose all of those indexed .htm pages.
|Blog style of URIs which end up being a long string of words separated by hyphens - which add nothing to the site usability and are just there for SEO. |
Matt Cutt's Blog is a prime example of this. Personally I wouldn't go near a URI structure like that. And Matt could do whatever he wants to with his Blog as it won't really have any major impact. He could use underscores, hyphens, periods, whatever and with the sheer number of inbound links powering that blog, URI structure wouldn't be a major issue.
How many others do you think have followed in his footsteps? ;)
|Do you think that user-generated content is usually better without keyword-rich URIs? |
|What matters most? Brevity? Complexity? Clarity? |
Brevity and clarity. Removing the complexity is the end goal.
|Is there a conflict between usable URIs and SEO-friendly URIs? |
What is an SEO-friendly URI? Is it a keyword laden, hyphenated structure? Or, is it a natural path that mimics the exact architecture of the site using brevity and clarity?
The above is a good example of brevity and clarity. The problem comes in when you have names that use two or three words. I typically prefer single word URI structures. I've moved away from using hyphens where possible and have done away with actual file extensions in most instances (the old fashioned way). I'm currently studying the implementation of Content Negotiation on my Windows Servers and will eventually migrate to that platform once I feel comfortable with it.
The trailing slash is redundant for a rewritten URI, and the Apache negotiation only works if there is a physical directory with the name of the slash-less filename. If you are using mod_rewrite for a URI
example.com/word[b]/[/b] and don't take into acount that Yahoo (or type-ins) leave out the trailing slash [webmasterworld.com] then calls for
example.com/word will lead to a 404 Not Found. I have personally been caught out by this when using a rule similar to this:
RewriteRule ^word([0-9]+)-([0-9]+)/$ myscript\.php?cat=$1&page=$2 [QSA,L]
So either no extension (and no unnecessary trailing slash) or an explicit file extension is probably best. The trailing directory slash is really just as much "legacy thinking" as using .htm. :)
So we are left with:
We get a short, robust, usable URI, but one which may be suboptimal for indexing purposes. I find that the SEO and technical issues as mentioned above tend to take precedence over the usability aspect in most cases, as the importance of URI usability is eclipsed by the importance of indexation.
|The trailing slash is redundant for a rewritten URI, and the Apache negotiation only works if there is a physical directory with the name of the slash-less filename. |
No, the trailing slash is mandatory for a rewritten URI if you are using a sub-directory URI naming scheme.
|If you are using mod_rewrite for a URI example.com/word/ and don't take into acount that Yahoo (or type-ins) leave out the trailing slash then calls for example.com/word will lead to a 404 Not Found. |
They shouldn't, or at least that has been my experience. Since I've not delved into the Content Negotiation aspects of this, I cannot comment on the absence of the trailing forward slash in that type of scenario. But, my research tells me that with the Content Negotiation in place, there should be no issues. You can have both addresses indexed with their respective content.
Also, I found that when working with rewrites, you need to hack the URI and back your way through it to be sure that all possible paths in that URI are returning the proper responses.
|So either no extension (and no unnecessary trailing slash) or an explicit file extension is probably best. |
Explicit file extension is fine. The issues come in when you change the underlying technology of the site. For example, from .htm to .asp. And yes, I understand the whole concept of parsing htm as asp, etc. But, that doesn't properly address inbound links to the old technology.
|The trailing directory slash is really just as much "legacy thinking" as using .htm. :) |
I may be missing something but I don't see any legacy thinking in the use of sub-directories. The trailing forward slash indicates that there is a root level page at that address and reduces one call to the server to append the trailing forward slash.
If Yahoo! does not properly handle URIs with and without trailing forward slashes, then they have a technical problem on their end. I've not run into any problems with Yahoo! And then again, I'm not using Content Negotiation to strip file extensions. It would be nice to hear from someone who has experience in this exact scenario.
|We get a short, robust, usable URI, but one which may be suboptimal for indexing purposes. |
Actually I find both of the above examples extremely optimal for both usability and SEO purposes.
|I find that the SEO and technical issues as mentioned above tend to take precedence over the usability aspect in most cases, as the importance of URI usability is eclipsed by the importance of indexation. |
This all comes down to the level of understanding from the person structuring the URIs. When you say indexing, the first thing I think of is how clean the URI is, the depth of the content and how the site is structured.
Keyword laden URIs may cause problems. There is a point when you cross a line and possibly raise a flag or two. If you keep them short, succinct and without all the other fluff, then you have a URI structure that is perfect for usability and indexing.
P.S. When it comes to rewrites, we get into a lot of technical issues that many overlook. For example, I was reviewing a rewrite of another site not long ago. I started to hack the URI and each time I removed one character from the string, I'd get a custom 404 returning a 200 status and the person was wondering why their pages were not getting indexed properly. They had an absolute mess on their hands with 200 status codes being returned for almost everything.
Would be my ultimate choice. But, if an existing site follows a structure that doesn't use a keyword naming structure, it's not a big deal as long as all other factors have been addressed. It's just one part in the overall equation. You can make up for it in other areas.