According to John Mueller of Google, you shouldn’t be creating too many pages in multiple languages, unless it’s actually necessary.
This advice was posted on Reddit in response to a thread in r/TechSEO where somebody wanted help with hreflang.
”I have an international site that I’m managing for a client that has pages being indexed for different languages. The problem is that some pages are still in english (ex: example.com/ru still has english content).
Since the X-default is english, there’s no need to have these other URLs, yet they are unfortunately being created and indexed as new content gets added, with the only difference being the URL.
Reading Google’s guidelines (if I understood correctly, which I may not have), they suggest blocking these URLs in the robots.txt file, but should these pages that are already indexed be removed, redirected, canonicalised, or noindexed? I’m stumped, so any help is appreciated.”
Mueller responded by saying that pages that have been created already shouldn’t be blocked or disallowed in robots.txt. By disallowing them from crawling, they won’t be able to be canonicalized at all by Google.
If you throw hreflang into the mix, things can get overly complicated, Mueller explains:
“It’s easy to dig into endless pits of complexity with hreflang. “Let’s create all languages! Let’s make pages for all countries! What if someone in Japan wants to read it in Swahili? Let’s make even more pages!” My guess is most of these “pages created because you can” get very little traffic, add very little value, and they add a significant overhead (crawling, indexing, canonicalization, ranking, maintenance, hreflang, structured data, etc.).”
Ideally, site owners should limit the amount of multi-language pages to those that are necessary for the site to achieve its goals.
“My recommendation would be first to limit the number of pages you create to those that are absolutely critical & valuable — maybe that already cuts the pages you’re thinking about. Think big here; if you’re talking about individual pages within a medium-sized site, it’s probably a non-issue. On the other hand, if you’re considering copying your whole site into 20 languages x 10 countries, that’s something else.”
When it comes to hreflang in particular, Mueller recommends focusing on pages that are receiving wrong-language traffic.
“Past that, for hreflang, I’d focus first on pages where you’re seeing wrong-language traffic — often these are pages that get a lot of global, branded queries, where it’s hard to determine which language content they want. A search for “google” can match a lot of language pages, hreflang can help to differentiate. On the other hand, a search for “search engine” is pretty clear & matches pages where you write about “search engine” already, so pages like that don’t need as much help being language-targeted.”
Mueller notes that it can be hard determining how to balance the “just do it everywhere” approach with the “save effort by thinking” approach.
You can find this thread here on Reddit.