Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

robot.txt for pages with google-translated content?

         

Tomas12345

10:08 am on Jul 5, 2017 (gmt 0)

5+ Year Member



Hello,

Our new website needs to have (formal reason: required by potential institutional partners) additional non-English versions. Each language version will have e.g. 1,000 subpages, out of which 10 are general pages which are professionally translated & 990 user-generated subpages originally written in English with 99% content translated via Google plugin.

The real worth are the 990 autotranslated pages... What do you think is best?
A/
Block the entire non-English version via robot.txt (or Metatag Robot?) - because the 10 pages are worth little, and we're afraid that poor content on the other language versions may harm the SERPs of the quality content in English..
B/
Use robot.txt (or Metatag Robots?) for the 990 auto-translated pages
C/
Don't use any robot.txt, as there is nothing illicit on those subpages, there's just crappy content which Google may not show anyway until weimprove the content...


This is an interesting problem and i hope that answer may me helpful also for other people.

Thank you very much.

not2easy

2:02 pm on Jul 5, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Both to avoid duplicate content issues and to serve the correct page version in visitors' search results, Google suggests [support.google.com] either rel="alternate" hreflang="x" or a language version sitemap. It depends on the type of pages/platform which solution would be better for your site.

lucy24

4:45 pm on Jul 5, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'd be inclined to use a meta noindex instead of robots.txt. Unless your 990 pages contain extremely specialized, highly-useful-to-non-English-speakers content, I wouldn't want them indexed, would you?

Tomas12345

7:40 pm on Jul 5, 2017 (gmt 0)

5+ Year Member



Dear @not2easy & @lucy24

Thank you bvery much for the very helpful replies!

Those 990 user-generated pages will include very specialized and useful content also for non-English speakers
The only problem is the moderate quality of Google Translate. It's understandable what's the content about, but everyone will see that this was not translated by a human. We would love to curate translations, but do not have such resources yet...

- Do you think we could risk a penalty if we don't block google robots from crawling/indexing it?
- Do I understand that listing the 10 well-translated pages in non-English sections is no problem?

not2easy

10:10 pm on Jul 5, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



If you have quality content in more than one language and are using the hreflang= meta tags there is no need to noindex those pages. If there are low quality pages - pages that have been auto-translated and are copies of pages in another language that are being indexed, those pages might be better off to noindex. At least until they are improved. If and when they are improved, then the hreflang= meta tags will insure there is no confusion and they can be indexed.

I agree with lucy24 that it is better to noindex the auto-translated pages that are the same as other, indexed pages until such time as they are improved to be of more value to users. When that improvement is made, they can be indexed if you add the language attributes as suggested.