I am back from the Mozilla Summit and somewhat managed to process all the new information I got there. But instead of posting yet another summit summary or more summit photos (what, you didn’t know how great this summit was?) I have a far more boring topic for today: localization of XULRunner-based applications.
I mean, what is there to say about localization? It is really very simple. Some magic in the chrome:// protocol makes sure that whenever a file in the
locale “subdirectory” is accessed one of the available locales is selected and the file is loaded from there. This automatic selection mechanism works very well and will select the locale that is closest to the value of the
A typical locale contains files of two types. The DTD file format is part of the XML specification and can be used with any XML file (which includes XUL and XHTML files). The idea is to associate XML entities in DTD files with localized strings, the XUL document only references the entities then. This is a rather unorthodox use of DTD files but the approach clearly has the advantage of not requiring any special handling, the browser simply processes an XML file as it would usually do it. The downside however is that the DTD format requires a significant amount of boilerplate and leaves much room for mistakes. And any mistake in a DTD file (missing entity definition, syntax error, invalid character, Byte Order Mark) results in a fatal error — the entire XUL file is rejected with a parsing error. The other issue is that including multiple DTD files into a XUL file is complicated and rather counterintuitive.
Ensuring working localizations
So at the moment the historically grown localization landscape in XULRunner is somewhat inconsistent. But this inconsistency is merely a minor annoyance and something that L20N efforts will hopefully make go away soon. Fatal errors due to localization errors however are significantly more problematic and were haunting TomTom HOME for example quite regularly during the early phases of the project. Turns out that you cannot really trust localizers to deliver DTD files that use the correct encoding, have no BOM and are free of syntax errors. Given that localized application versions typically get less testing these mistakes would sometimes go unnoticed. And it simply cannot always be guaranteed to have translations of all strings in all locales, particularly not in the middle of a development cycle. But it would be nice to always have usable localized builds.
So, what you need for working localizations:
- Validation: Ensure that the localization files use UTF-8 encoding without BOM and check syntax (makes sense even for properties files — any “trash” that will be ignored by the browser indicates an issue). Ideally the tools used by localizers to create the translated files already ensure valid format, otherwise scripts will need to be used for this job.
- Completeness: Locales have to be compared against the base locale to find missing or unnecessary strings. Ideally, the scripts used here will also add missing strings from the base locale to prevent errors in the build (arguably, this fallback behavior should be implemented in XULRunner, yet it isn’t).
Mozilla apparently has a set of scripts called l10n-checks to do this job. Unfortunately, I am not familiar with it and cannot say whether it is a complete solution for the problems above. Documentation doesn’t really make it clear either. For TomTom HOME I had to write custom scripts and Songbird also uses its custom solution from what I can tell (I didn’t look too closely though).
Getting good localizations
But wait, a working localization doesn’t necessarily mean a good localization — it might contain pretty crappy translations. And finding good translators is only one step towards good localization. Some of the other steps are:
- Find a good translation environment for translators to use. Mozilla uses narro and Verbatim, I don’t know much about the merits of either unfortunately.
- Make sure to provide translators with some context about the strings they are translating. This means first of all having developers choose meaningful string IDs that describe the function of a string rather than its value. And it also means adding comments to explain how a string is used if it isn’t obvious.
- If the space for a particular string is limited this should be communicated to translators. Remember that English is a very compact language, translations will often be significantly longer. Oh, and no — telling translators about the size constrains doesn’t mean that testers no longer need to check whether any localized strings are cut off or make the layout look bad.
- Avoid inserting numbers or words dynamically into a sentence, use different static variants of the same sentence if possible. Building together a sentence dynamically might work well in English but will usually get very complicated in other languages (at least if you want to get a result that sounds somewhat correct). L20n is meant to address this issue though I have my doubts here.
Once you’ve done your homework and got great localizations for your application you might notice an issue: some strings are not localized, e.g. labels of default alert dialog buttons, the entire add-on manager or error console UI, some error messages. Yes, these strings are not part of your application, they are part of XULRunner. The good news: XULRunner locales are all there, you can get them. The bad news: XULRunner locales aren’t exactly small, around 150 kB (compressed) or more. If you played with the idea of putting all the available locales of your application into one installation package this is quite a setback — already including 20 XULRunner locales will increase the download size by 3 MB.
So, what are the options:
- Do not offer installation packages with multiple locales, that’s what Firefox does. The disadvantage: the user has to decide on a language before download and cannot change his mind afterwards.
- Download additional locales automatically when the user selects a different locale. I am not aware of any application that chose this approach, probably because even building all the required XULRunner locales is rather complicated.
- Discover that there are only few places where XULRunner strings “shine through” and replace these by your own UI. That’s the approach that TomTom HOME followed pretty consistently (which was a pain for developers) and Songbird less consistently (which is probably a pain for users).
Do you know a perfect solution? I don’t.