Locale kit: quick developer guide

Blog post by PulkoMandy on Thu, 2009-07-16 16:36
This week I was at the RMLL in Nantes, and I was busy showing Haiku to other people and explaining them why it was so much better than linux. I had little time for GSoC coding. Still, I made some cleanup and fixed some small bugs. The catalog part of the locale kit is now working fine and can be used to internationalize applications. Here is a small guide for those who want to get an application speaking in their own language.

Sourcecode changes

You have to alter your source code to get it working. We've tried to make this need as little changes as possible. First, you have to #include two files : Catalog.h and Locale.h. They are system headers from the locale kit. Now you have to tell the locale kit to initialize a catalog for you. A catalog is a class that you will use to map strings to their translated equivalents. The locale kit will automatically find the right data files for you, depending on the system-wide language preferences, you application mime signature, and some other magic (see the part of this post about the build system changes). So, you only have to add two lines of code:
BCatalog cat;
be_locale->GetAppCatalog(&cat);
These lines must be executed before any localization occurs. You can put it at the beginning of your main() function to be safe, or at some other place like your main window constructor if you know what you are doing. You don't need to keep the cat variable around, the locale kit will handle everything by itself. Now, you need to define contexts in your application. Contexts are a way to separate different parts of the application. This is needed because, depending on the context, a string may have a different translation. For example, you usually find "Edit" in english applications. Depending on the context, this will in french become "Éditer" or "Édition". If your application is small or you don't know what to separate, it seems a good idea to use a single context for the whole application. You can #define TR_CONTEXT to anything you want, but try to be helpful to the translators so they know where the string comes from. For example, you can have a different context for each window in your app. This way, if a french translator sees "Edit" in your file, looking at the context he can know if he need to look for it in the "Main window" or the "Setting window" to see which translation he must choose. The last step, and the biffest one, is to enclose all the strings you want to localize in the TR() macro. For example, printf("hello"); becomes printf(TR("hello"));. It is possible to be more precise, for example if you want to help the translator, you can replace "This program is free software" by TR_CMT("This program is free software","free as in free speech, not as in free beer"). This way our french guy knows he needt to use "libre" and not "gratuit" as a translation. Finally, it is possible to override the context just for one string, but then you are forced to add a comment. So, if you have an Edit button (Éditer in french) and an Edit menu (Édition in french), you can do something like that:
#define TR_CONTEXT "Main window"
TR_ALL("Edit","Main window menu","This is édition in french");
TR_CMT("Edit","These french guys are strange, this one is éditer");
Finally, it is possible to translate strings by id to make your program language-neutral (and unreadable) this way:
TR_ID(1844535) // This is "Hello, world!"
I advise you not to do that. It was planned to support multiple datatypes instead of just strings, so you could use this ID system to get a picture or a sound. However, as everything in a catalog will be loaded into ram at startup, this would be pretty heavy for big programs. So we found another solution for you:
fopen(TR_CMT("path/to/english/file.wav","this is me shouting Hello World"));
The translator will then record a translated version of the sound and "translate" the path to ath/to/french/file.wav. You can do the same for everything using a string, so it should be possible to access pretty much anything this way.

Build system

That's all for the code, but now you have to generate the files to send to your translators. Note that as long as you don't use TR_ID, your program will work fine in the original language without any file. If the locale kit can't find a translation for something it will return the original string. There are two different cases for the build tools : are you localizing something integrated in the haiku build tree (a preflet for example), or are you working on an external app?

Manual tool invocation

If it's your own app, you have to do the work yourself. There are two commands to use, they are both included in Haiku images. The first one is collectcatkeys. It will travell trough your sourcecode and find the strings you want to be localized and dump them to a simple text file you can send to your translators. Here is how to use it:
cpp *.cpp > preprocessed.cpp.tmp
collectcatkeys preprocessed.cpp.tmp -l english -o english.catkeys
rm preprocessed.cpp.tmp
The other tool is linkcatkeys. Linkcatkeys will compile the catkeys files into a binary format suitable for use at application runtime (just a flatenned BMessage, actually).
linkcatkeys english.catkeys -l english -s mimesignature -o english.catalog
The mimesignature should be the one of your application. You now just have to place the generated catalog file with your executable, in the locale/catalogs/mimesignature/ folder. It is also possible to tell linkcatkeys to put the catalog directly as an attribute or a resource in your executable file if you want to avoid having all the catalogs around, but this is still untested. There are some other folders where you can put your catalog files too, the locale kit will look at /boot/home/config/etc and /boot/system/etc/ . These are meant for users who want to override catalogs for a specific app, and for system-wide catalogs from haiku we don't want to spread all over the system.

Haiku jam rules

If you are inside the haiku jam system, the rules for localization are already written for you (they lie in build/jam/BeOSRules if you ever want to look at them). Here is the example from the locale preflet, which is the first localized application in Haiku:
DoCatalogs Locale #App. executable name
 : x-vnd.Haiku-Locale # mime signature
 : Locale.cpp #source files
   LocaleWindow.cpp
 : english.catalog #default catalog generated from sourcecode
 :
 [ FDirName $(HAIKU_TOP) src preferences locale french.catkeys ] # list of availabe translations
 ;
Thats all. Jam will take care of everything else.

Comments

Re: Locale kit: quick developer guide

Adrien, nice write-up, this is just what I was looking for, it'll come in handy for when I get to the point of adding translations to things.
-scottmc

Re: Locale kit: quick developer guide

Yes, pretty cool article!

Adrien, I was wondering about the mentioned context. Is it still possible to have some sort of hierarchical translations, where certain words are found in a translation "higher up in the hierarchy", or will it be required that each application gets it's own translation, possibly with a fair number of repeated translations of the same words across various applications?

You are making really nice progress! It's fun to watch how everything falls into place!

Re: Locale kit: quick developer guide

In the current state, there is no global catalog for usual strings (i can think of "Revert", "Cancel", and some other things we tend to use everywhere). However, there is support for multiple catalog files. This can be used in multiple ways : one user can choose to override just one of two strings from the defaut catalog for one app and create a tiny catalog file with these small changes in his home directory, while still getting the default translation for everything else. There is the case you mentionned, and finally, there is the case where an app is only partially translated to one language and you want to fallback for another one (not necessarily english) for the missing parts. There is also support for language chains in the current code : for example, you can have a catalog with language "english-british", which will provide just a small set of string, everything else being managed by an "english" catalog. The support for this chainload is currently disabled, because there is no way to build the language tree safely from freeform names.

When ICU is put into the mix, the catalog will probably need to use ISO codes for their languages. This will allow for a clear detection of language variants and will allow to use the chainloading of languages properly.

So: it is possible to load multiple catalogs for various reasons, but this part of the code is disabled and untested. If two apps have the same mime signature, they will be able to share catalogs (then contexts only will differenciate them), but I think that will not happen too often. I don't think we want too many levels, but maybe two levels of hierarchy could be interesting : a set of system-wide strings, and a catalog for each application.

However, this could be done in a different way. There is a GetString() method in the BLanguage class that could be used to handle this common translations. This method already handles things like the language name and some other similar things. Right now they provide days and month names, "yes", "no", and things like "Future", "Yesterday", "Tomorrow". I think these strings were meant to be useful in the OpenTracker when the localekit was started. In this case, instead of the TR macro you'd use things like B_LOCALE_UNDO_BUTTON_TEXT (defined as a little macro to call the right function). I'm not sure which of the approach is better. Using the TR macros seems more transparent, but you would have to specify a context (let's say "System wide translation" or something alike) to separate it from the rest of the app, so maybe it's easier to use other macros for these special cases.

... And if translator find strings like "Undo" or "Revert" in the catalog texts you send them for translation, I'm sure they will point you to the right way to do it :)

I think the first step is to define the set of strings that would use this system wide translation. The second step is to find translation for them in some languages (this will raise some cases like Edit = Éditer or Édition in french). Then, depending on the size of the set of words/sentances we end up with, choose if it's better to store them in an hash map or in a static table in the data of each BLanguage.

Re: Locale kit: quick developer guide

FYI:
There has been a small change with the LocaleKit since this article was written. The macro now is "B_TRANSLATE" instead of "TR".

Regards,
Humdinger