Forum Moderators: open
You might look at this thread [webmasterworld.com] or do a site search on " file - space" without the quotes.
Start here [webmasterworld.com], and follow the first link in that thread to find what nancyb was talking about, where GoogleGuy recommends hyphens.
- cow horse goat.html or
- cow%20horse%20goat.html?
These are one in the same; naming a file with spaces produces the %20 in a browser--don't use either one.
IMO, dashes are better than underscores for your readers because they aren't disguised in an underlined hyperlink if your link happens to get copy & pasted around in an e-mail for example.
I also think conservative use of the dash is more eye-catching in the SERPs.
If any of your examples were to give a slight boost for a particular search engine, from what I've read, it would typically be the dashed example.
added: beg your pardon, my points are covered in those threads, damned phone.
[edited by: skipfactor at 10:42 pm (utc) on June 25, 2003]
Having turned one of my sites into a file structure where the names are based on keywords (though not obsessively), I've found that the site-map in Dreamweaver is easy on the eye, and the pages that google offers up look to have welcoming filenames...
Better to click on
order-widget.html
than
sales.html
and the google listing definitely looks more inviting as a result...
DerekH
[december.com...]
www.yourdomain.com/cow_horse_goat.html
is better.
Never leave spaces in between words.
Use lower case to comply with XHTML in the future.
"You can name a file in Unix using up to fourteen characters in any combination..." If you follow this recommendation, your life will be easier when you make back ups on CDs.
If you need to extend the file name beyond 14 characters, then create a sub-folder: cow_horse_goat
www.yourdomain.com/cow_horse_goat/cat_dog.html
There's a simple reason why the most basic file naming system is recommended with Unix. Without going through technicalities, most activities done in Unix are through what you call 'shell scripting' it is therefore 'crucial' that you have to be very careful with your file names or it will crash your system.
The hyphen/dash(-) when put before a file name ie;
-widget.sh can be misinterpreted by unix to execute a command line instruction
Thus it becomes a practice among Unix power users not to include hyphen/dash(-) in their file naming structure, as to separate command instructions from file names.
Thus this becomes a sort of mantra among Unix programmers and even spilled to the Perl community which so happen that most of this Perl programmers are Unix users as well.
This becomes the de facto standard for naming files such that blue widget becomes blue_widget.
The truth is you can use hyphen/dash(-) in naming files even in Unix as long as the dash is not in front of the file name, thus blue widget can be named to
blue-widget.
As far as Google is concern, it certainly read blue-widget as 'blue' and 'widget'. Just for fun, search for "_" {without the quotes}, you will get millions of results but search for "-"{without the quotes}, nothing, nada, zip result. I think that's proof enough how Google treat hyphen and underscore.
As far as ranking is concern, whatever the advantage of a URL having '-' instead of '_' is very negligible. I would give it a very small percentage when optimizing a site or a page.
Cheers
[webmasterworld.com...]
a [cough] year before googleguy recommended them :)
I don't think it is given much weight as it was before, but I still go with the 90% sure it doesn't make more than 1% of a difference, but why take the chance.
"For complete file name compatibility keep file names to 8.3 and only A..Z, 0..9, _ and - and one period '.'. For file name compatibility and easy access for MacOS, Unix and Windows 95/98 NT/2000 keep the file names down to 31 characters and limit the characters to only A..Z, 0..9, underscore (_), dollar sign ($), tilde (~), exclamation point (!), number sign (#), hyphen (-), parenthesis (), and apostrophe (') with NO spaces." [imagemontage.com...]
So, you'll be fine by using:
cow-horse-goat.html
cow_horse_goat.html
CowHorseGoat.html
W3C uses on its Web site URLs as the third one.
Something else comes to mind when asked about the structure of the file name. Does anyone else think that the file extension should be dropped as well? eg:
/cow-horse-goat
I'm wondering if this has any pluses or negatives with concern to Google?
I never thought much about dropping file extensions until I read an article by Tim Berners-Lee, called Cool URIs don't change [w3.org]. My whole outlook on URI design (yes design) changed after this.
You can use mod_rewrite to hide the extensions, so you can still have all your files with .html on the file system. Plus it lets you change the underlying technology as needed and keep your URIs the same.
I hope your question gets answered soon. Be well.
You can verify it.
We know DMOZ has URLs like this:
[dmoz.org...]
Yet doing an AllInURL search using dmoz.org and the world "survival" does not yield that page:
link [google.com]
Perhaps GoogleGuy could look into this -- personally I think its either a bug or an oversight since underscores are almost always counted as spaces. Also I believe it is a standard convention that underscores are the "official" space replacement at times when you cannot use a space for whatever reason.
[edited by: Brett_Tabke at 2:12 pm (utc) on June 27, 2003]
[edit reason] fix long url [/edit]
For instance "Catherine Zeta-Jones" (just an example -- not sure if she uses a hyphen). If you need to write her name without any spaces and used "Catherine-Zeta-Jones" it'd be unclear what purpose the hyphens are there for. Are they how she writes her name? Or they there as space substitutes? Who knows. In contrast if you wrote "Catherine_Zeta-Jones" it's quite obvious why the hyphen is there. This concept would be true for any and all hyphenated words.
But thanks for clarifying Google's position. I'd rather disagree with it than not know what it is.
Oh... on the topic of product searches, I think it is far far more common for a hyphen to exist in a model name/number than a underscore. The same would be true for things like phon enumbers. In fact I can only think of one other use for an underscore other than as a space substitute -- that is to indicate text that should be underlined when dealing with a vanilla ascii text file.
In anycase when I need to search for an exact product name/model I'll quote it if it uses a hyphen so Google doesn't parse the model name apart.
in RL, - is used to join words into one, e.g. dark-blue - where the word means neither blue nor dark, only dark-blue.
whereas, _ is used to replace spaces by people making yahoo and hotmail accounts the world over, aint that so mr joe_bloggs@hotmail.com?
1. Follow Unix file name rules:
[december.com...]
2. The following options are acceptable:
cow_horse_goat.html
cow-horse-goat.html
CowHorseGoat.html
3. Never leave spaces in between words.
4. Use lower case preferably.
5. "You can name a file in Unix using up to fourteen characters in any combination..." If you follow this recommendation, your life will be easier when you make back ups of your site on CDs. Think of your Web hosting company at the time of back ups.
6. If you need to extend the file name beyond 14 characters, then create a folder such us: www.yourdomain.com/cow_horse_goat/cat_dog.html
I have always favoured: cow.horse.goat.html with it all in lower case with a dot between each word pair.
Sites at #1 with it, so it doesn't appear to work against you. I have no idea of it works for you.
Any comments?
I never have spaces or underscores in filenames, and I use all lowercase.
I wouldn't use periods though. I know for a fact that a couple years ago Google saw URLs like
[example.com...]
as malformed (and consequently didn't index them)... probably because of the location of the period.
I carried on a brief email correspondence with a Google tech, I think his name was David DesJardin's if GG knows him, and they fixed it.
However, since there was once a problem with such URLs, I'd personally use something different. You never know if another search engine might have the same problem or if a similar problem might pop up.