Forum Moderators: Robert Charlton & goodroi
This is a heads up to webmasters and SEOs running websites in multiple languages.
Google has incorporated a language detection parameter to the algo. I can't say as to whether it is a *new* feature, or if it only in the past 3-4 weeks has been given greater value.
Case example:
English page for BlueWidget
Large Spanish-speaking user base - has left comments/reviews
These are displayed on the BlueWidget English page
Google now kicks the page because the content-language does not match the actual content.
The Spanish version of the same page is doing wonderfully though.
I am seeing this right across the board for all latin-character based languages.
Have you tried changing this to see if the page comes back? Google engineers, in the past at least, denied that the server's http header or the page's meta information for language had much effect on rankings.
When you say "kicks the page" I assume you mean filters it out or doesn't rank it, correct?
Yes ;) The pages did rank (and were indexed) and now they don't rank - and are not indexed.
- on a side note, the corresponding pages for the BlueWidgets where comment language = content language are enjoying a much higher/better indexation than before.
Have you tried changing this to see if the page comes back?
Yes... currently working on it, but there are a lot of pages, and 'generating' unique content won't work. Experimenting with a language-detection software to only show comments/reviews for the relevant language.
My English-language site has got a fair amount of foreign-language phrases on it, by necessity. I haven't seen any negative effect as of today, but if Google start to penalize that then I'm pretty much hosed.
What I have seen is that it primarily affects pages where there is a substantial amount of 'foreign' language elements in one or more large blocks in the same language.
If you have a few short comments, then you shouldn't be affected.
However, to be safe I would do some checks to make sure that you have at a bear minimum 60-65% of 'correct' language content on the page.
Again, I only noticed this because all of the pages from the website are not yet indexed. Now google has started skewing and not indexing pages google probably finds "not relevant to index". I would insofar then not say that it is a penalty... but it might as well have been for what it is doing to my traffic since a lot of important long-tail pages are no longer indexed.
Experimenting with a language-detection software
I've tried using a free script/method that uses trigrams, 3-letter sequences to best guess a language. It seems to be fairly effective.
I've read somewhere that Google's AJAX language API [google.com] uses a similar method.
I've tried using a free script/method that uses trigrams, 3-letter sequences to best guess a language. It seems to be fairly effective.
The language detection seems to be working with a 90% reliability. Getting it to work over all pages/comments and languages is what's proving to be the challenge ;)
Then you have other issues than what you are suggesting. Google doesn't not index pages based on language.
Yes. The website has one other major issue. I mentioned it earlier in the thread. We have too many pages (in relation to inbound external links) since our content is multilingual.
We used to have a good spread on the indexing though, and now it is heavily skewed, because of the language. Our total indexation has taken a slight beating (which happens from time to time), but the language factor has hurt specific portions of the website severely. It wasn't obvious at first, but after analyzing which pages have been dropped, and also what pages have been added, there can be only one conclusion.
Also, what is your ratio of the English written content and the text from Spanish comments on the page? 40%-60%? 30%-70% Other?
are you publishing the same comments on the both versions of the page?
Yes, but since they are reviews they tend to be rather long-winded.
what is your ratio of the English written content
It varies. Worst case pages with lots of reviews have as little as 15% English.
-- -- -- -- -- -- --
I would like to stress that the following information is mostly guesswork and has not been tested yet. These are merely observations and not to be taken at face value.
For the affected pages it seems that the red-line is drawn at ~60%. That is to say English vs Spanish 60-40.
However that is only where the problem is bilingual. If the content is multilingual you can get closer to ~40% English... so long as none of the non-English content reaches higher than ~40% in and of itself.
However, that does not explain the issue.
1. If there are enough comments/reviews (which is the case 70% of the time) we have a filter to sort the comments differently. ie. displayed comments are different on spanish vs. english page, unless the user clicks to see more comments (these are on a separate page w/ "robots='noindex'" tag)
2. Comments/reviews was just an example I took. There are other pages where I am seeing the same issue with different types of (unique)content.
3. The problem has affected all languages and all categories of pages. The only common factor for dropped pages is the language factor.
We have a few pages where there is a mixture of english and local language(s), with english prevailing. These pages are not ranking very well on the local language google(s), but they never did rank particularly well.
What I am going to do is record exact ranking position of these pages, then remove english content and leave the local language content only. This will make these pages smaller, but it will still have around 300 words of unique content on the page. I will see if the ranking improves the next time google caches these pages.
Google has incorporated a language detection parameter to the algo. I can't say as to whether it is a *new* feature, or if it only in the past 3-4 weeks has been given greater value.
I will do my test anyway. I have 4 good pages that each mix english and another (one) language, with the ratio currently being around 60% - 40% in favour of english language, even though the page was intended for local language. I can test french, italian, german and hungarian.
The test will be on how much the ranking changes on local google domains for the language-specific key phrase(s) after the 60% of english content has been removed from the page. I will not touch other elements of the page, I will only remove english content.
The domain in question is .com domain, no geo targeting set.
Whether they pick a single language for the whole page only or do it on a more granular level (per div, p, sentence...), and whether that changed, I have no idea, though.
What I am going to do is record exact ranking position of these pages, then remove english content and leave the local language content only. This will make these pages smaller, but it will still have around 300 words of unique content on the page. I will see if the ranking improves the next time google caches these pages.
Well, my test has finished and I can report that both pages (one in italian and one in german), have moved up significantly in SERPs.
Basically, the german page had about 60% of content in english. By removing the english content from the page, the page still had 250 words in german remaining. After Google had cached it, the page jumped 2/3 up in ranking for a selected phrase. The similarly significant jump was observed for italian page.