Forum Moderators: open

Message Too Old, No Replies

How does Google treat Accented characters?

Some spanish examples

         

jon80

9:24 pm on May 7, 2003 (gmt 0)

10+ Year Member



split from this discussion [webmasterworld.com]

"Wrt your second point, I have three accented characters in my main keyphrase and get the same results with or without them."

Typing any Spanish word with or without the accent returns completely different Serps for me.

The problem is made worse by the fact that many who write in Spanish incorrectly omit the accented characters at times.

Writing the noun in uppercase, where using the accented character would not be correct, returns results different from the noun written in lowercase.

[edited by: heini at 10:46 am (utc) on May 8, 2003]

mipapage

9:39 pm on May 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting,

I'm talkin' Spanish too, and I get equal results with or without, and in the pages themselves, one of them spells a word consistently with the accent, and another page without!

I'll have to try the cases thing.

The problem is made worse by the fact that many who write in Spanish incorrectly omit the accented characters at times.

Yes, that stinx. I had seen this problem when I first made the site, but then within about two months ago it stopped making a difference.

<UPDATE>TRIED IT IN UPPERCASE, SAME RESULTS FOR ME.

jon80

9:49 pm on May 7, 2003 (gmt 0)

10+ Year Member



información 13,3000,000
INFORMACION 1,380,000

Are you not getting the same results?

PatrickDeese

10:10 pm on May 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



try:

mexico: 5,170,000
méxico: 5,130,000

DISENO: 403,000
diseño: 397,000

jon80

10:29 pm on May 7, 2003 (gmt 0)

10+ Year Member



Just checked and I get:
Mexico 25,600,000
México 4,920,000

Are you checking results on www as opposed to sj or one of the other servers?

wanderer

10:37 pm on May 7, 2003 (gmt 0)

10+ Year Member



Not sure how this might relate to your question and the answers posted here, but I recently did a search on Google for a particular java applet by its "name" (e.g. search for "java applet funthing"). The top 10 results were all German pages. The only place the name of the applet "funthing" appeared in any of the results was in the html code itself where the applet was named:

<APPLET CODE="funthing.class"

or in the case of the javascript code I found, in the src tag:

<script language="JavaScript" src="funthing.js">

Was very surprising to me. I didn't think Google would include this kind of thing in search results ...

anyone else seen this kind of thing in SERP's before?

PatrickDeese

10:54 pm on May 7, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I was using toolbar & WWW.

I just did it again and got similar quantities. However, I have a Mexican IP address range, so I don't know if that makes a difference.

mipapage

9:58 am on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I see, but, for my terms, the results are the same, same numbers, same serps.

Maybe these are a test area?

As another example, if I were to try:
diseño de páginas web

with or without, the total *about* results are different, but the serps are very similar. I only looked at the first page, where they are identical!

*wanderer*

Super interesting find!

jon80

10:36 am on May 8, 2003 (gmt 0)

10+ Year Member



I really dont know how Google handles the problems inherent in Spanish with accented characters.

número - noun
numero -verb 1st person singular.

doméstico - adjective
domestico - verb 1st person singular

In other cases the meaning of the word changes completely when written with or without accented characters.

campana - bell
campaña - campaign

Canadá - Canada
cañada - ravine

These are not rare examples. It happens frequently in Spanish.

Strictly speaking ñ is not considered to be an accented character in Spanish, rather a separate letter.

mipapage

11:27 am on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I really dont know how Google handles the problems inherent in Spanish with accented characters.

Speculating here... since they examine keyword proximity, if you were to enter a phrase, like the one I mentioned above, maybe they look at the words around to 'guess' at the context?

french tourist

12:19 pm on May 8, 2003 (gmt 0)

10+ Year Member



Just tried to understand it, and I think I have now and idea of how it works :

* if you ask some query including "television", you receive only pages containing the very word "television"

* if you ask some query including "télévision", you receive a mix of French pages containing "télévision" and of international pages containing "television" - if you click on the "Cached" version of some US page, say www.abc.com, Google highlights in yellow "television" in the frame, and at the same time asserts that he has highlighted "télévision".

But on the other hand the results count included in the blue bar (the "Results 1-10 of about xxxx") SEEMS (I am not sure) to give the number of pages containing exactly your query : the number of pages about "television" if you typed without accents (18,700,000), about "télévision" if you typed the accents (1,690,000). But the truth is more complicated! I just tried "télevision" and "telévision" which should only return a small number of pages suffering from typos, and each of them returns in its blue bar a number smaller than with "télévision" (about 1,000,000 instead of 1,700,000) but which is certainly not the number of pages containing this specific typo. So I must admit I don't understand the blue-bar-return when using accents.

jon80

12:36 pm on May 8, 2003 (gmt 0)

10+ Year Member



There has been some speculation here that Google has recently started to process words with accented characters in a more logical way, but as the examples you have given show, Google is still having difficulties. I can't fathom the logic.

mipapage

12:53 pm on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I can't fathom the logic.

Yo tampoco, but something is up/in the works! I am glad that at least I don't have to do what Jacob Nielson had on his site: those very small versions of common misspellings of his name.

jon80

1:13 pm on May 8, 2003 (gmt 0)

10+ Year Member



Some of the Mexican search engines I have submitted to ask for site titles and descriptions to be submitted without accented characters, ie, they ask you to submit the material incorrectly spelt. Make of that what you will, but this is obviously an issue which is complicated in search engine terms.

sullen

1:56 pm on May 8, 2003 (gmt 0)

10+ Year Member



My theory on this is that Google treats accented characters as the same as the unaccented letter, but treats the HTML codes (eg. &eacute;) differently. Or perhaps vice versa.

Certainly I have a Spanish page which has only started showing up in a search involving the word "niño" since I changed the HTML codes to characters. Could have been a coincidence though.

mipapage

3:28 pm on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't know if this helps some theories or not, but I use the &eacute; varieties only.

PatrickDeese

7:00 pm on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's funny because I don't use the HTML equivalencies, however since I use doc type and encoding metas I think that helps.

mipapage

7:05 pm on May 8, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



PatrickDeese,

I'm tired and my head is real thick. I too use a full doctype and encoding. Can you explain a bit the relevance of your comment?

sorry if the answers obvious, I've been trying to replace sleep with caffeine and work this week

fatpeter

8:40 pm on May 8, 2003 (gmt 0)

10+ Year Member



I've been following accented keywords and from what i can see there is no pattern. one minute different results then 1 hour later same results...It's been like that the last month or so!