homepage Welcome to WebmasterWorld Guest from 54.197.215.146
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
Google recognizing word parts?
filesharing -> file sharing?
globay




msg:155191
 3:13 pm on Feb 9, 2003 (gmt 0)

When visiting the German Google page (www.google.de) and searching for "filesharing" I was quite surprised getting results for "filesharing" and "file sharing".

Has anybody seen something like this before? I didn't find an example on google.com. Is Google may be testing a new algorythm?

Or have I missed something?

--
globay

 

Yidaki




msg:155192
 4:06 pm on Feb 9, 2003 (gmt 0)

If google returns also results that use the two words "file sharing" in their title allthough you searched for "filesharing" that isn't a indicator for a new algo.

If you have some (or even just one) inbound link(s) with the anchor text "filesharing" allthough your title is "file sharing" that's enough to be found if someone searches for "filesharing" ...

globay




msg:155193
 6:41 pm on Feb 9, 2003 (gmt 0)

Yes, but on the Google result page, "filesharing" appears in bold as well as "file sharing"!

So I guess it's not only the inbound links.

--
globay

NFFC




msg:155194
 7:20 pm on Feb 9, 2003 (gmt 0)

Looks new to me, it may still be the links but I haven't seen it before.

Take a look at this cached page
[google.de...]

I searched for applemac, it shows that term and apple mac.

Digimon




msg:155195
 11:29 pm on Feb 9, 2003 (gmt 0)

I've seen similar examples in my market niche... Google is splitting the words. Big change?

rfgdxm1




msg:155196
 1:19 am on Feb 10, 2003 (gmt 0)

It does look like in some cases Google is splitting words now. My guess is that they are only doing this in special cases they have hard coded into the software. Thus, since basically always "filesharing" = "file sharing", they are treated as the same. However, rugrats won't be split into rug and rats.

HarryM




msg:155197
 1:24 am on Feb 10, 2003 (gmt 0)

Could this be specific to Google.de to allow for the German habit of creating words by stringing together other words. An example from the page would be Momentaufnahmen (snapshot).

rfgdxm1




msg:155198
 1:48 am on Feb 10, 2003 (gmt 0)

I had the same thought HarryM. From what others have told me this sort of combining words is very common in the German language. Google may have made a special tweak for google.de to take this into account.

JayC




msg:155199
 1:53 am on Feb 10, 2003 (gmt 0)

Right... it doesn't seem to happen at google.com with "filesharing," (just get the "Did you mean: file sharing?" prompt) but clearly does at google.de.

vitaplease




msg:155200
 7:19 am on Feb 10, 2003 (gmt 0)

Nice find. Google also highlights the split "file sharing" in its cache.

Strange though, computersystem also works as split-up, but computerfile does not.

Markus




msg:155201
 9:47 am on Feb 10, 2003 (gmt 0)

I think they are testing it for quite some time now. I've heard about it a few weeks ago, but I couldn't verify it back then. I'll have to take a look at our logs to check when it started.

It appears that there is no general scheme behind it. A search for 'websitedesign' finds 'website design' and 'web site design'. A search for 'searchengineoptimization' finds nothing but that term. But a search for 'searchengine' also finds 'search engine'.

Generally, it appears that only terms which are actually searched by users are splitted. If you merge words that don't make sense this way like 'domake' or 'haveget', you receive the "did you mean..." message but not the correct spelling in the SERP.

Since queries are splitted but not merged, it may have a major impact on SEO in german speaking countries. Now, splitting terms certainly has some advantages and having different pages for both versions like I always did it, is rather unfavorable because PR and anchor text of inbound links is split up between two pages. Looks like a little work to do...

cwebb




msg:155202
 11:24 am on Feb 10, 2003 (gmt 0)

Yuck, I noticed a lot of german speaking guys in here, and this certainly looks like a new challenge, but I think it's great as some of the domains I have to optimize use keywordandkeyword.at and it would be great if Google managed to separate those!

Yidaki




msg:155203
 5:26 pm on Feb 10, 2003 (gmt 0)

I think google just splits words based on its spell check data. To follow the given examples:

- Filesharing is wrong - should be File Sharing
- Webdesign is wrong - should be Web design

If google thinks (or knows) that a searched word is misspelled (allthough there are results), it prob does another search for the correct spelling and then adds these results to the "misspelled set".

HitProf




msg:155204
 5:57 pm on Feb 10, 2003 (gmt 0)

It seems to happen in Google.de only.

In Google.nl applemac stays applemac and filesharing filesharing, where in Google.de the seperated terms are highlighted as well.

(The Dutch combine existing words to form new ones.)

jpavery




msg:155205
 6:00 pm on Feb 10, 2003 (gmt 0)

google will ask
"did you mean file sharing"

so maybe the suggestion has something to with it...

globay




msg:155206
 6:43 pm on Feb 10, 2003 (gmt 0)

I agree with what rfgdxm1 said: it looks like Google hard coded these special cases where two words mean the same. Especially in the German language where "File-Sharing" is as correct as "Filesharing" this would improve search results.

But is this just the beginning of an advanced change in Google? Are they going to merge the results of similar words and their different ways of spelling, like "Optimization" and "Optimisation", since both ways are correct somewhere and certainly some are confused and spell it wrong.

Is Google going to recognize the parts of a domain name that are not separated with a hyphen? You can't use a space and there is no spelling rule that tells you to set a hyphen instead (and many don't do!). Is there a difference in quality of my-domain.com and mydomain.com? I don't think so. And isn't Google trying to list the results according to their relevance? Well I think there are going to be more changes sooner or later. At least there is a lot of potencial for improving!

What do you think?
--
globay

globay




msg:155207
 6:45 pm on Feb 10, 2003 (gmt 0)

By the way: it is interesting that Google suggests "Optimization" when you spell it with s, but it does not suggest anything when you spell it with z ;-)

stlouislouis




msg:155208
 7:40 pm on Feb 10, 2003 (gmt 0)

This is something I've wondered about for awhile.

Why can't Google derive at least some "word splits"
within a character string (such as a multiworddomainname.com)
from the link text part of a link that's visible to a viewer
or from text surrounding a link?

Maybe figure the ratio of occurance of "filesharing" to
"file sharing" in the link text and figuring the probability
that filesharing actually is two words, not one. Seems an
algo along this line would be fairly accurate. Especially
when filesharing isn't a dictionary word nor encountered
that often compared to "file" and "sharing" both being in
a dictionary and often appearing in order -- and when
related to the theme of the page.

If, within the link text, the same string of (non space)
letters appear -- but with spaces between some letters --
that might indicate a high probability that a string of
characters like "filesharing" in a domain name should really be
"file sharing" if that's what is in some (at least) link text,
surrounding text, or some other on page factor/occurance of
"file sharing".

Just some random thoughts I had awhile back when reading
the discussion here about and looking at the links to
emptywebsite.com as the #1 result for a search on Google
for "empty website". However, I'm too much of a newbie to
feel too confident in any such "insights/suspicions" I have
as I begin learning about SEO, thus didn't post about it on
that thread. Hope the above has some value to this discussion.

Take care,

Louis

JayC




msg:155209
 9:40 pm on Feb 10, 2003 (gmt 0)

By the way: it is interesting that Google suggests "Optimization" when you spell it with s, but it does not suggest anything when you spell it with z

Not really surprising: "optimization" returns 3.4 millions hits; "optimisation" returns only 838,000. As far as the logic in the software is concerned, that makes it more likely that the latter is a spelling error, or at least that you'd want to look at results for the much more common term.

jimbeetle




msg:155210
 9:50 pm on Feb 10, 2003 (gmt 0)

When I looked at this thread yesterday I couldn't see what globay was talking about. That's because I tried it with "file sharing" (no quotes).

It only seems to go one way.

For "file sharing" Google will pick up "file sharing" and "file-sharing." But for "filesharing" it picks up all three variations.

With the German language propensity to bung words together you would think Google would be consistent.

cwebb




msg:155211
 9:08 am on Feb 11, 2003 (gmt 0)

I don't even want to check on all the possibilities of DonauDampfSchiffFahrtsGesellschaftsKapitänsAnwärterAusbilder

:)

HitProf




msg:155212
 2:18 pm on Feb 11, 2003 (gmt 0)

jpavery:
google will ask
"did you mean file sharing"

so maybe the suggestion has something to with it...

no, it still suggests the same when splitting up

JayC:
Not really surprising: "optimization" returns 3.4 millions hits; "optimisation" returns only 838,000.

sounds logical, but I've seen Google suggest words returning *less* results. Perhaps it checks for number of queries?

Onza




msg:155213
 7:29 pm on Feb 11, 2003 (gmt 0)

I have paid special attention to the keyword "Citybike" for a while. It is english, yes, however, it is broadely used for urban bicycles in Germany and Switzerland also.

On my site I never ( never never ) spell it "City Bike". Now suddenly I rank Top for "City Bike" also and "Citybike" is highlighted in the search results. Definitely new feature.

Ex Ce Lent for Everyone I think. Helps the user to find the right product, no matter if he decides to spell it in one or two words.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved