Three caveats in enterprise continuous localization

I have written this blog post with software developers in mind, but it is also a good source of information for translation companies that don’t know what to request from their customers who wish to change to a continuous localization model.

We introduce three concepts: disambiguation, explicitation and dispatching.

It may be logical for a software developer to think about continuous localization as a constant flow of externalized translatable information that leaves the system and comes back later with the final translated versions. This model would work to perfection if there were no people, ambiguity, languages or transactional cost involved. In reality, translators need context and instructions to deal with text ambiguity, and it can only be done by communicating with each other or reviewing the source text. Working languages implies more complexity than what most developers assume, and one of the biggest problems with using English as a source language is that English, is more an exception to many frequent grammatical rules than the standard. And we also have transactional costs, which is nothing else than the cost of working with many people – just imagine, 15 target languages with translation and review means coordinating at least 30 people. Below we describe three important aspects that you should keep in mind when developing for continuous localization, no matter whether you are translating custom developed software or database content.

Disambiguation: prepare for source enrichment

As a developer, when creating the user interface text of an application, you have a clear understanding of everything that’s going to show up on the screen. You are so familiar with the environment, that you don’t even consider ambiguity. But what ambiguity? The ID of the string clarifies the context in which the string appears so other people can understand how to translate it into any given language. Yet, experience shows that there is a good deal of ambiguity in English language controls and texts.

When you code, best practice usually encourages adding comments. Likewise, when you write text, best internationalization practice suggests that you need to also explain where this text appears. Nobody wants to burden you with unnecessary commenting. Just like the best benchmark for understanding your code is to have a programmer peer review it, the best benchmark for understanding the ambiguity of the text is to have a linguist review it.

From the code perspective, prepare a method for adding additional information for the translator: this may be another XML element or a column in a CSV file. One single comment field per translatable element is enough. Either you comment, or you work with the localization team on disambiguation. Careful about phrasing these comments; the readers are translators, not developers! A style guide on how to write comments may go a long way.

To save time on this, make sure that the modules of your code follow a friendly naming standard. Better to work this out quite early in the development process, otherwise the dependencies in the code make it a hassle. Also remember that when it comes to agile development, much functionality is not given a name in marketing before it’s developed — and then it needs retrospective fixing in the code.

A software solution that allows gathering, handling, discussing source strings can generally be referred to as the disambiguation engine.

Explicitation: send full sentences or bear the consequences

Ambiguity is not the only factor that comes into play when translating text — there is also a translatability element. Full sentences are mostly always translatable into any language, but what happens with short expressions? Developers are taught that the more compact the code is, the better — this is the “do not repeat yourself (DRY) principle”. The problem is that the economy of logic is different from the economy of language. Using variables (also called placeholders) and conditions in a linguistic asset is not a good practice. During internationalization, code scanning should highlight such assets, and they should be examined together with localization. If you have no placeholders in your content, skip to the next section. If you do, and you’re interested why this is or can be a problem, read on.

Imagine the following text:

“Copy the %fieldname% into the field.” %fieldname% can be either “username” or “password”.

In a language like Spanish (and many other European languages), words have gender: username is masculine, password is feminine. And as a result, “the” becomes “el” or “la”, depending on the value of %fieldname%. Translators are creative and they would rephrase it like this:

“Copy the following information into the field: %fieldname%.”

I believe you agree that it sounds reasonably artificial, even if it is grammatically correct. A better solution would be the equivalent of

“Copy the value of %fieldname% into the field.”

Another example is the use of numbers: in English there is singular and plural. Not only in English, in frequent localization formats such as PO files and iOS .stringsdict files too. When there is one instance you use the singular form, if there are more, you use plural. The plural form is often simple in English, it’s usually made by adding “s” (with some exceptions of course, for words such as child or ox not particularly common in software). If you have an adjective, it does not change at all: “1 instant message” vs “2 instant messages”.

But other languages work differently. Most of the German, Latin and Slavic languages – the most demanded European languages when it comes to localization – have complicated plurals (the plural “predict” form is not easily predictable because it is not just simply an “s” at the end of each word and it actually requires working with a plural dictionary), conjugate the adjective (so even “instant” becomes different), and the condition can also be different: in Slovenian, for example, there’s a different dual form, which stands for exactly two instances.

The only realistic solution to this is using full sentences as much as possible. If you can, make sure that you list every possible combination for translation.

Another thing I’ve seen is the use of conditional cases: producing a different string based on the value of a certain variable (In a way, the use of numbers is also a specific example of this.) This is a must in template-based texts, but it’s not a good idea to try and save a few words of translation. Always have full sentences for every alternative!

At the moment, I am not aware of any such explicitation engine that would expand the text with placeholders into fully expanded options, but I believe that the Mozilla Fluent documentation or their localization content best practices deserves a good reading.

Dispatching: Be aware of minimum fees

Continuous localization removes the hassle of project management and speeds up the translation and thus release process. Still, it would be foolish to think that it’s cheap. First of all, your vendors need an infrastructure to make it happen. While smaller companies can also be able to do this, most believe that it’s better to work with the large global players – who in turn are quite expensive.

Most companies have a minimum fee for translation, because they have to ensure availability of translators and take care of project administration. Also, their vendors are usually other translation companies who don’t have the technology infrastructure. This creates a lot of project management overhead – it’s generally not a good idea to send out a three-word project when a string was added.

The only one way of saving on minimum fees is gathering projects and dispatching them at regular intervals. This requires having an infrastructure that gathers strings and sends them out at regular intervals, and then, when the volume exceeds a certain threshold, sends translation projects to the right vendors.

Bear in mind, if you have this, you will be able to outsource to multiple translation vendors with the same overhead as to a single one. When preparing for continuous localization, consider a dispatching engine.

How can BeLazy help you?

When companies – no matter if they are established players or start-ups – embark on continuous localization, they often make a few mistakes that later turn out to be costly. We provide consultancy and also technology solutions to help them do so cost-effectively.

We are platform independent and don’t want to turn the tool into a translation management system, but we’d like to conceptualize, develop and customize tools that help enterprise translation departments to work better with their software development and database/content management counterparts.

We’d like to understand your challenges and requirements.

Contact us for a free first consultation!

Continuous localization