Forum Moderators: open

Message Too Old, No Replies

Text with accent marks and text without accent marks

How does a search engine see it?

         

grnidone

3:33 pm on Aug 28, 2001 (gmt 0)



I am sure this has been asked before, but I don't know how to search for it.

How does a search engine see accent marks?

For example does 'compaņίa' = 'companίa' = 'compaņia' = 'compania'

or does an engine see them as different words? I would think a spider would see four different, but similiar words. (?)

I talked to a few bi-lingual (english/ spanish) folks I work with, and they said they usually do not put the accent marks in when they search because it is too much trouble with a westernized keyboard. Since there are a good number of hispanics in the US, I thought this might be an issue over here.

-G

rencke

4:15 pm on Aug 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It has been always been my understanding that the engines strip accent marks before the input is used to search the index. So Sučde (French for Sweden) and suede (a type of leather in English) would result in the same search. Similarly, Västerås would be equivalent to Vasteras. The rationale being that people with English keyboards are likely to type the nearest available letter, which indeed they do.

Having said that, I should say that not all engines seem to be handling this the same way. Yesterday, and to my great surprise, I found a major engine (AV if memory serves me) treating a search for Västergötland differently from a search for Vastergotland. Perhaps it is time to do a systematic study of differences between the engines in this respect. That could be important indeed when optimizing pages in non-English languages.

If you would like to volunteer and become our heroine, then all you have to do is to copy and paste from the examples above and report back here in a week or so. :)

(edited by: rencke at 4:16 pm (gmt) on Aug. 28, 2001

Eric_Jarvis

4:16 pm on Aug 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



the SE will see what you tell it to see...so if the character encoding is done correctly it will see the accent and treat it as a specific character

Macguru

4:17 pm on Aug 28, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I recently did a market search for maison ā vendre (house for sale in French). I found out 2.5 times more people type maison a vendre instead. Since we cannot write official text without accents, I keep those for seamless doorways.

Google do show a different result for accents searches. Most SE are up to date with accents exept for the title tag.

[webmasterworld.com...]

marino

4:38 pm on Aug 28, 2001 (gmt 0)

10+ Year Member



HI grnidone,
In META Keywords I wrote pubblicitā = pubblicit+&+agrave. I am obliged to put "+" in this message to make it readable, but you have to delete it in the code.
If can help.

grnidone

7:26 pm on Aug 28, 2001 (gmt 0)



Yes, that does help a *lot*.

>so if the character encoding is done correctly

So... does that mean I put in the code with a keystroke (CTRL + Whatever)

Or

put the HTML equivalent like the copyright symbol?

Thanks everyone.

And welcome to WebMasterWorld, marino. Or should I say:

<AltaVista Babelfish>
Benvenuto a WebMasterWorld!

Parlo come un turista? <--Do I speak like a tourist? ;)

</AltaVista Babelfish>

-G

marino

10:04 am on Aug 29, 2001 (gmt 0)

10+ Year Member



No, non parli come un turista, piuttosto come un altavista!
:)
Should not be usefull to have the plain text option on posting messages?
Thanks for welcome, I feel great here!