Forum Moderators: open

Message Too Old, No Replies

What does G. consider as a space in URLs?

         

Paco

2:35 pm on May 31, 2002 (gmt 0)

10+ Year Member



I though that Google parsed some characters in URLs as spaces, but after investigating the issue a bit I got really confused. The general feeling seems to be that Google considers - as a space but doesn't consider _ as one. And what happends with the + symbol?

Some programs parse spaces as +. Don't html rules say that spaces in urls should be replaced by +s? I think I've read that somewhere (w3c?), but I could be wrong. If Google doesn't consider a + to be a space I'm worried because my site has lots of urls with them.

In other words:

if I have a url like .../a-b/.. G. counts a and b as keywords, right?

.../a_b/... a_b is a keyword for G., but not a alone or b alone, right?

.../a+b/... ?????

paynt

1:43 pm on Jun 1, 2002 (gmt 0)



Hi Paco, I wanted to offer you a welcome to Webmaster World. I'm not sure I have an answer to your question but I hope by bumping your post back up that someone will come along and help you out.

brotherhood of LAN

2:04 pm on Jun 1, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi paco, welcome!

Either papabear or pageoneresults was talking about this the other day, but most importantly I remember what they said

From what it "seems", a - separates words but _ does not, "according to google". I'll try find the thread where I read this, but for sure, definetely a question worth getting a good complete answer for.

Don't hold me to this, but I believe a-b are treated as 2 words while a_b is one.....

Paco

2:28 pm on Jun 3, 2002 (gmt 0)

10+ Year Member



Thanks for the welcome :D

brotherhood_of_LAN: I agree it's worth it getting a complete answer to this.

Brett_Tabke

2:51 pm on Jun 3, 2002 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The plus is not valid in a url to a file resource - only to a form style resource. Plus can't be as it used as a form separator.

Paco

2:11 pm on Jun 5, 2002 (gmt 0)

10+ Year Member



:(

Does it mean that I should change all the +s with -s?

At least I can make a script to do most of the work. I would hate to change the names of my 1400 pages. I'll still need to do it for 1 or 2 hundred.

Did anybody else read that spaces should be replaced with +s? If not, I guess I dreamed it.

sun818

3:24 am on Sep 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi paco, you are not dreaming. When I was into my Netscape compatibility phase, I worked at making my site compatible with Netscape 4.x as well. This browser version (and earlier? heh, its been a while) would drop the rest of the URL past the first space.
For example:

[yourwebsite.com...] world

interpreted as

[yourwebsite.com...]

unless you joined it with a +

[yourwebsite.com...]

About 1.5% of my users are Netscape 4.x browsers. Not sure if it is worth supporting them anymore.

danny

3:39 am on Sep 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google *should* treat _s as token separators. I suspect the reason it doesn't is that programmers often use _s in variable names, but the fraction of web pages that are source code is now pretty small. And let's fasce it, _ is a more natural replacement for a space than -, because -s occur in English (as hyphens and dashes), while _s don't naturally occur.

I'm a bit peeved because my web site has filenames based on book titles e.g. The_Secret_History.html... I guess I could abandon that convention now, but I have over 600 old files and I'd really like to keep it consistent.

bcc1234

4:05 am on Sep 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Did anybody else read that spaces should be replaced with +s? If not, I guess I dreamed it.

Spaces are also encoded as %20
so /a%20b.html should (i did say should) display the file 'a b.html'

There are rules on url encoding, search some docs to find the exact sequence (I just don't remember), but I would not bet that google would consider blue%20widget as 'blue widget'.

Try using advanced search with something like 'inurl:' (check the commands, i'm not sure here either).