homepage Welcome to WebmasterWorld Guest from 54.161.200.144
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 36 message thread spans 2 pages: 36 ( [1] 2 > >     
Which filenames are best for Google
kippie




msg:143789
 4:17 pm on Jun 25, 2003 (gmt 0)

If I want to use more than one word in a file name (for example: cow, horse and goat), what would be the best way for Google to do it:
- cow-horse-goat.html or
- cow_horse_goat.html or
- cow horse goat.html or
- cow%20horse%20goat.html?

 

nancyb




msg:143790
 9:24 pm on Jun 25, 2003 (gmt 0)

Can't find the thread right now, but GoogleGuy said some time back that "-" was treated as a space.

You might look at this thread [webmasterworld.com] or do a site search on " file - space" without the quotes.

JayC




msg:143791
 9:35 pm on Jun 25, 2003 (gmt 0)

A search for hyphen underscore /forum3/ will at least reveal that it's a common question.

Start here [webmasterworld.com], and follow the first link in that thread to find what nancyb was talking about, where GoogleGuy recommends hyphens.

skipfactor




msg:143792
 10:25 pm on Jun 25, 2003 (gmt 0)

- cow horse goat.html or
- cow%20horse%20goat.html?

These are one in the same; naming a file with spaces produces the %20 in a browser--don't use either one.

IMO, dashes are better than underscores for your readers because they aren't disguised in an underlined hyperlink if your link happens to get copy & pasted around in an e-mail for example.

I also think conservative use of the dash is more eye-catching in the SERPs.

If any of your examples were to give a slight boost for a particular search engine, from what I've read, it would typically be the dashed example.

added: beg your pardon, my points are covered in those threads, damned phone.

[edited by: skipfactor at 10:42 pm (utc) on June 25, 2003]

DerekH




msg:143793
 10:39 pm on Jun 25, 2003 (gmt 0)

I've had great success with the hyphen (minus) - it definitely pops pages up in the SERPS when all else fails - a page filed under fred-bill.html will index under fred or bill when there are no stronger terms in the content of the page that can be indexed.

Having turned one of my sites into a file structure where the names are based on keywords (though not obsessively), I've found that the site-map in Dreamweaver is easy on the eye, and the pages that google offers up look to have welcoming filenames...

Better to click on
order-widget.html
than
sales.html
and the google listing definitely looks more inviting as a result...
DerekH

subway




msg:143794
 10:55 pm on Jun 25, 2003 (gmt 0)

GoogleGuy recommends hyphens

I could almost swear I see more underscores than hyphens in the SERPS!

gutabo




msg:143795
 11:40 pm on Jun 25, 2003 (gmt 0)

IIRC, "blue_widgets" is read like one big word, and "blue-widgets" is read like two separate words so... I would say "'-' pwnz j00".

BTW, I've seen some peeps that use "blue-widgets-blue-widgets"... anyone knows if that's better?

zafile




msg:143796
 11:44 pm on Jun 25, 2003 (gmt 0)

Follow Unix file name rules always:

[december.com...]

www.yourdomain.com/cow_horse_goat.html

is better.

Never leave spaces in between words.

Use lower case to comply with XHTML in the future.

"You can name a file in Unix using up to fourteen characters in any combination..." If you follow this recommendation, your life will be easier when you make back ups on CDs.

If you need to extend the file name beyond 14 characters, then create a sub-folder: cow_horse_goat

www.yourdomain.com/cow_horse_goat/cat_dog.html

Net_Wizard




msg:143797
 4:17 am on Jun 26, 2003 (gmt 0)

Unix

There's a simple reason why the most basic file naming system is recommended with Unix. Without going through technicalities, most activities done in Unix are through what you call 'shell scripting' it is therefore 'crucial' that you have to be very careful with your file names or it will crash your system.

The hyphen/dash(-) when put before a file name ie;
-widget.sh can be misinterpreted by unix to execute a command line instruction

Thus it becomes a practice among Unix power users not to include hyphen/dash(-) in their file naming structure, as to separate command instructions from file names.

Thus this becomes a sort of mantra among Unix programmers and even spilled to the Perl community which so happen that most of this Perl programmers are Unix users as well.

This becomes the de facto standard for naming files such that blue widget becomes blue_widget.

The truth is you can use hyphen/dash(-) in naming files even in Unix as long as the dash is not in front of the file name, thus blue widget can be named to
blue-widget.

Google

As far as Google is concern, it certainly read blue-widget as 'blue' and 'widget'. Just for fun, search for "_" {without the quotes}, you will get millions of results but search for "-"{without the quotes}, nothing, nada, zip result. I think that's proof enough how Google treat hyphen and underscore.

As far as ranking is concern, whatever the advantage of a URL having '-' instead of '_' is very negligible. I would give it a very small percentage when optimizing a site or a page.

Cheers

Chris_R




msg:143798
 4:39 am on Jun 26, 2003 (gmt 0)

I would go with hyphens for the same reason I mentioned in this thread:

[webmasterworld.com...]

a [cough] year before googleguy recommended them :)

I don't think it is given much weight as it was before, but I still go with the 90% sure it doesn't make more than 1% of a difference, but why take the chance.

zafile




msg:143799
 4:52 am on Jun 26, 2003 (gmt 0)

More about file name rules:

"For complete file name compatibility keep file names to 8.3 and only A..Z, 0..9, _ and - and one period '.'. For file name compatibility and easy access for MacOS, Unix and Windows 95/98 NT/2000 keep the file names down to 31 characters and limit the characters to only A..Z, 0..9, underscore (_), dollar sign ($), tilde (~), exclamation point (!), number sign (#), hyphen (-), parenthesis (), and apostrophe (') with NO spaces." [imagemontage.com...]

djgreg




msg:143800
 6:18 am on Jun 26, 2003 (gmt 0)

From my own experience I can say , that "-" definitely stands for a space.
I have several domains and other files and at first I wrote it like widget1widget2.html, but then I read here that it is better to change the name into widget1-widget2.html. Since the change the sites are in higher positions in the SERPS.
greg

mayor




msg:143801
 7:43 am on Jun 26, 2003 (gmt 0)

You forgot about CowHorseGoat.html

That's been working fine for me, but haven't done any comparative analysis on it's effect on ranking. Maybe someone else has.

zafile




msg:143802
 8:16 am on Jun 26, 2003 (gmt 0)

The URL that explains the final HTML specification [w3.org...] uses dashes.

So, you'll be fine by using:

cow-horse-goat.html
cow_horse_goat.html
CowHorseGoat.html

W3C uses on its Web site URLs as the third one.

menton




msg:143803
 8:36 am on Jun 26, 2003 (gmt 0)

hi,

I think that "-" are not as effective anymore. I have a couple of sites that use "-" and when they came up in the serps the keywords in the domain were bold. This nolonger is the case so I don't think it matters as much with file names.

menton

GoogleGuy




msg:143804
 6:00 am on Jun 27, 2003 (gmt 0)

Shouldn't matter much. %20 is just ugly though. :)

dkubb




msg:143805
 7:06 am on Jun 27, 2003 (gmt 0)

I use all lower-cased letters, with dashes separating individual keywords in the file name. No other punctuation or spaces. I may use numbers in extremely rare cases, only when necessary.

Something else comes to mind when asked about the structure of the file name. Does anyone else think that the file extension should be dropped as well? eg:

/cow-horse-goat

I'm wondering if this has any pluses or negatives with concern to Google?

I never thought much about dropping file extensions until I read an article by Tim Berners-Lee, called Cool URIs don't change [w3.org]. My whole outlook on URI design (yes design) changed after this.

You can use mod_rewrite to hide the extensions, so you can still have all your files with .html on the file system. Plus it lets you change the underlying technology as needed and keep your URIs the same.

zafile




msg:143806
 8:27 am on Jun 27, 2003 (gmt 0)

dkubb, thanks for providing us with Tim Berners-Lee's suggestions. I already put the page in my favorites.

I hope your question gets answered soon. Be well.

vincevincevince




msg:143807
 8:37 am on Jun 27, 2003 (gmt 0)

how about /cow+horse+goat.html?

i used a certain well-known php function and it changed my spaces into +

does google like it?

bnc929




msg:143808
 1:53 pm on Jun 27, 2003 (gmt 0)

A month or so ago I checked and Google was not counting an underscore as a space. I thought this odd and sent them an email about it.

You can verify it.

We know DMOZ has URLs like this:

[dmoz.org...]

Yet doing an AllInURL search using dmoz.org and the world "survival" does not yield that page:

link [google.com]

Perhaps GoogleGuy could look into this -- personally I think its either a bug or an oversight since underscores are almost always counted as spaces. Also I believe it is a standard convention that underscores are the "official" space replacement at times when you cannot use a space for whatever reason.

[edited by: Brett_Tabke at 2:12 pm (utc) on June 27, 2003]
[edit reason] fix long url [/edit]

GoogleGuy




msg:143809
 3:27 pm on Jun 27, 2003 (gmt 0)

I don't consider that a bug--I think underscores are often useful for phrase matches, and I used one in a product search the other day. Dang'ed if I can't recall it now. So I would stay away from both %20 (ugly) and underscore (treated as part of the word) if you want the url to have separate words parsed out from it.

bnc929




msg:143810
 4:10 pm on Jun 27, 2003 (gmt 0)

I disagree, plenty of sites like DMOZ use them to replace spaces (and have used them for years and years), I think it stems from the fact that a hyphen might actually grammatically belong in a phrase even if you can use spaces.

For instance "Catherine Zeta-Jones" (just an example -- not sure if she uses a hyphen). If you need to write her name without any spaces and used "Catherine-Zeta-Jones" it'd be unclear what purpose the hyphens are there for. Are they how she writes her name? Or they there as space substitutes? Who knows. In contrast if you wrote "Catherine_Zeta-Jones" it's quite obvious why the hyphen is there. This concept would be true for any and all hyphenated words.

But thanks for clarifying Google's position. I'd rather disagree with it than not know what it is.

Oh... on the topic of product searches, I think it is far far more common for a hyphen to exist in a model name/number than a underscore. The same would be true for things like phon enumbers. In fact I can only think of one other use for an underscore other than as a space substitute -- that is to indicate text that should be underlined when dealing with a vanilla ascii text file.

In anycase when I need to search for an exact product name/model I'll quote it if it uses a hyphen so Google doesn't parse the model name apart.

vincevincevince




msg:143811
 8:54 pm on Jun 27, 2003 (gmt 0)

it is odd indeed how this is totally the opposite of the real world (remember that place? hehe)

in RL, - is used to join words into one, e.g. dark-blue - where the word means neither blue nor dark, only dark-blue.

whereas, _ is used to replace spaces by people making yahoo and hotmail accounts the world over, aint that so mr joe_bloggs@hotmail.com?

zafile




msg:143812
 9:22 pm on Jun 27, 2003 (gmt 0)

I wouldn't sacrifice stablished standards such as Unix file name rules to target search queries in Google. I rather provide enough content so I don't have to bypass standards.

1. Follow Unix file name rules:
[december.com...]

2. The following options are acceptable:

cow_horse_goat.html
cow-horse-goat.html
CowHorseGoat.html

3. Never leave spaces in between words.

4. Use lower case preferably.

5. "You can name a file in Unix using up to fourteen characters in any combination..." If you follow this recommendation, your life will be easier when you make back ups of your site on CDs. Think of your Web hosting company at the time of back ups.

6. If you need to extend the file name beyond 14 characters, then create a folder such us: www.yourdomain.com/cow_horse_goat/cat_dog.html

rfgdxm1




msg:143813
 9:34 pm on Jun 27, 2003 (gmt 0)

I'd say GoogleGuy's answer should be considered definitive. And, there is one other advantage to using - in URLs. Newbies might not realize that when there browser underlines a URL that there is a _ there rather than a space.

warrenk




msg:143814
 9:55 pm on Jun 27, 2003 (gmt 0)

Some of my file names are '-widget.html'. Are there any problems with having a dash as the first character in a name?

GoogleGuy




msg:143815
 10:15 pm on Jun 27, 2003 (gmt 0)

I think domain names can't start with a dash, but don't see any reason why paths on the url can't. Kinda weird, but that's your call. :)

g1smd




msg:143816
 11:46 pm on Jun 27, 2003 (gmt 0)

>> You forgot about CowHorseGoat.html <<

I have always favoured: cow.horse.goat.html with it all in lower case with a dot between each word pair.
Sites at #1 with it, so it doesn't appear to work against you. I have no idea of it works for you.

Any comments?

I never have spaces or underscores in filenames, and I use all lowercase.

bnc929




msg:143817
 12:36 am on Jun 28, 2003 (gmt 0)

With the relatively lower value of keywords in the URL anyways I'd think this is more of a usability issue than a search engine issue. Sure keywords in the URL can help, especially if someone is linking to you with your URL (so you get them in the anchor text) but it isn't going to make or break your site.

I wouldn't use periods though. I know for a fact that a couple years ago Google saw URLs like

[example.com...]

as malformed (and consequently didn't index them)... probably because of the location of the period.

I carried on a brief email correspondence with a Google tech, I think his name was David DesJardin's if GG knows him, and they fixed it.

However, since there was once a problem with such URLs, I'd personally use something different. You never know if another search engine might have the same problem or if a similar problem might pop up.

edit_g




msg:143818
 12:40 am on Jun 28, 2003 (gmt 0)

how about /cow+horse+goat.html?

This works just fine. <added> for asp and aspx pages.</added> I've recently had some experience with it and it works really well. Previously I was using underscores but now I am reconsidering.

This 36 message thread spans 2 pages: 36 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved