Forum Moderators: open

Message Too Old, No Replies

Odd characters in file names

underscore, dot and pipe - what's the effect?

         

tedster

6:54 pm on Oct 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've been asked to work with a site that renames dynamic content to appear as flat pages. But I'm struggling with some of the conventions that are currently in place. My gut tells me that these are bad news, at least for getting any weight to the keywords involved. But I would value other input here.

1) The underscore -- an old conversation really, and settled to my satisfaction months ago by inspecting DMOZ pages in the Google index. A filename such as kw1_kw2.html doesn't (or didn't) get seen as containing either keyword. A hyphen separates them, but an underscore doesn't. I wonder if anyone has seen any counter evidence recently.

2) The pipe -- this one really feels bad to me. The pages that use a pipe in their filename seem to be indexed (in Google, Inktomi and others) with the pipe character turned into "%7C". But I really wonder about what kind of back-end trouble it's causing and whether those keywords are seen as separate

3) The period or dot -- I'm talking about a page name like atlanta.georgia.html here. I don't thinks there's any problem with the keywords being seen as separate, but don't some operating systems have trouble with periods when they're not file extensions or domain name separators? These pages don't seem to be getting the rank I'd expect from Google and I'm wrestling with the period as one possible culprit.

EBear

1:21 pm on Oct 10, 2003 (gmt 0)

10+ Year Member



tedster, I'd tend to agree with you in avoiding all three. You've largely made the arguments yourself.

1) GoogleGuy has confirmed here (about 4 or 5 months back) that underscore does not separate keywords, whereas hyphen does.

2) I've never even considered using a pipe in a filename. It used too have other functions in DOS, so I'd be scared of accidental side-effects. I think though, that any character that gets converted to %? can cause havoc with a browser's history - the stored url never matches the (unconverted) link url and a link will never be marked as visited. This is a problem with spaces too.

3) Again this is really a matter of cross-platform compatability. I imagine those extra periods would cause havoc on a Windows system, making life very difficult for your competitors trying to cache your pages to study your SEO techniques. ;)

tedster

6:27 pm on Oct 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the input EBear, especially about the browser history.

Funny enough, I've done some more research and the pipe character does get seen as a "word seperator" by Google. The browser history argument makes a strong point however, even if Google has learned how to cope with the pipe character.

If anyone who reads this still needs convincing that Google doesn't see underscores as word separators (they are only seen as a character in the middle of a long word), play around with DMOZ's pages on Google. DMOZ uses underscores as a rule, and there's no shortage of fodder for research. For example, do searches like this:

allinurl: open site:dmoz.org

There is a page named http://www.dmoz.org/Computers/Open_Source/ but you don't see it in the results. Not even searching DMOZ on allinurl: "open source" works. The only way that page gets returned is on this search:

allinurl: open_source site:dmoz.org

The dash and the period don't show me any keyword separation problems on Google, however, and I've invested a decent bit of research time on the issue.

sun818

7:15 pm on Oct 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just to be safe, I only include alphanumeric characters [a-z0-9], with dashes replacing non-alphanumeric characters (i.e. spaces, colons, extra periods). Only one period exists and thatis for the file extension.

pageoneresults

7:24 pm on Oct 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



tedster, I'll mirror what Ebear and sun818 said. In addition I'll add that memorability is also an issue. I've used the pipe only as a separator in horizontal link navigation, never in a file name. Making a URI that is easy to type in is a big benefit from a usability standpoint, or at least it has been with various clients of mine.

I've not once since 1996 used an underscore in any file names, neither have I used mixed case, I always use lower case.

I keep it simple too. I want to provide the shortest possible route to the resource. I only use hyphens to separate words in file names when applicable.

Brett has done an excellent job of keeping the URIs here at the forums very neat and trim. ;)

EBear

7:36 pm on Oct 10, 2003 (gmt 0)

10+ Year Member



Ditto with the lowercase only rule. I used to do a lot of basic HTML training and no matter how much I stressed that rule I couldn't tell you the amount of times I saw students upload an entire site created and tested on Windows to an Apache server, then spend an afternoon correcting all those case sensitive links. A good lesson hard learnt.

My experience as a trainer (almost twenty years) has taught me that the most valuable PC skill is good file management. SEO considerations aside, a well-structured site displays clear planning, easy navigation and easy maintenance and expandibility. Also, I get a buzz any time I'm looking for a file I made a year ago and I can guess exactly what I called it. When that happens, you know you're doing something right. (Doesn't always happen though :( .)

tedster

8:04 pm on Oct 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I resonate completely with this advice. Unfortunately, I just took on a client where their legacy work (over 8 years of development) did not consider any of these factors.

Fortunately (I hope!) we're talking about rules for re-writing dynamic URLs for asp pages. It should be an easy fix, now that I've convinced the client that it needs to be done. But as with any "should be easy" fixes, I'm ready for all the hidden dependencies that aren't immediately obvious. Who knows what dragons lie hidden in that database tangle?