Welcome to WebmasterWorld Guest from 54.146.139.201

Forum Moderators: open

Message Too Old, No Replies

'Illegal' chars in address

what character set to use in addresses

     
9:40 pm on Apr 15, 2004 (gmt 0)

New User

10+ Year Member

joined:Apr 13, 2004
posts:30
votes: 0


Hi,

This post follows another one, at the end of which I concluded that I better rewrite my php addresses with mod_rewrite to remove the vars in adresses.

I wonder though if there are rules in the choice of characters to use in the rewritten addresses. I have to pass all the vars I had in GET to build a fake document name that will be rewritten. But I have to find the names of the vars and their values in order to 'rescramble' the querystring with mod_rewrite.

I thought of separating my variables names with a character never found in the names of the vars or their values, probably underscore ( _ ). That means that the 'fake' name might look like i13_ttmain_le_d2.htm for example.

Would it be bad practice? Could a long filename prevent my website from being properly indexed? I don't really care if the adress is not 'human readable', as long as it is indexed.

Thanks for the advice.

Mart

1:01 pm on Apr 18, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 23, 2003
posts:801
votes: 0


I've seen some massively long filenames indexed by Google - a URL a line and a half long at 1024x768 screen resolution...

General feeling here is that a hyphen might offer a small advantage over an underscore because hyphens are word separators, and break a URL into keywords.
DerekH

3:50 pm on Apr 19, 2004 (gmt 0)

New User

10+ Year Member

joined:Apr 13, 2004
posts:30
votes: 0


Thanks for the advice.

Since I have 2 integers vars and 2 char vars, I decided to altern them (char, int, char, int) in the filename. It is easy to find the pattern with a regexp then, I don't need no separator.

9:53 pm on Apr 21, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Common usage, without any problem, includes hyphens, dots, and commas.

Avoid spaces and underscores. They both have various inherent problems.

2:54 am on Apr 22, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


Yah, I'd stick to hyphens, periods, or commas. Most people seem to prefer hyphens. If you use an underscore '_' character, then Google will combine the two words on either side into one word. So bla.com/kw1_kw2.html wouldn't show up by itself for kw1 or kw2. You'd have to search for kw1_kw2 as a query term to bring up that page.

The characters you can use in domain names are pretty restricted: a-z, 0-9, and the '-' character. For subdomains and url paths (stuff after the slash), you've got a lot more flexibility, but I'd recommend keeping it pretty simple. That makes it easier for search engines and users to understand.

There's actually a proposal so that you can encode all sorts of characters in a domain (e.g. CJK--Chinese/Japanese/Korean) but that's a little outside the scope of your question, and I'm not as familiar with the encoding. My rule of thumb is to keep it simple where you can.

10:24 pm on Apr 22, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


For the non-Latin-character-set domain name issue, do a search on your favourite SE for Punycode.