TripleO Deep Dive: Internationalisation in the UI

Yesterday, as part of the TripleO Deep Dives series I gave a short introduction to internationalisation in TripleO UI: the technical aspects of it, as well as a quick overview of how we work with the I18n team.

You can catch the recording on BlueJeans or YouTube, and below's a transcript.

~

Life and Journey of a String

Internationalisation was added to the UI during Ocata - just a release ago. Florian implemented most of it and did the lion's share of the work, as can be seen on the blueprint if you're curious about the nitty-gritty details.

Addition to the codebase

Here's an example patch from during the transition. On the left you can see how things were hard-coded, and on the right you can see the new defineMessages() interface we now use. Obviously new patches should directly look like on the right hand-side nowadays.

The defineMessages() dictionary requires a unique id and default English string for every message. Optionally, you can also provide a description if you think there could be confusion or to clarify the meaning. The description will be shown in Zanata to the translators - remember they see no other context, only the string itself.

For example, a string might sound active like if it were related to an action/button but actually be a descriptive help string. Or some expressions are known to be confusing in English - "provide a node" has been the source of multiple discussions on list and live so might as well pre-empt questions and offer additional context to help the translators decide on an appropriate translation.

Extraction & conversion

Now we know how to add an internationalised string to the codebase - how do these get extracted into a file that will be uploaded to Zanata?

All of the following steps are described in the translation documentation in the tripleo-ui repository. Assuming you've already run the installation steps (basically, npm install):

$ npm run build

This does a lot more than just extracting strings - it prepares the code for being deployed in production. Once this ends you'll be able to find your newly extracted messages under the i18n directory:

$ ls i18n/extracted-messages/src/js/components

You can see the directory structure is kept the same as the source code. And if you peek into one of the files, you'll note the content is basically the same as what we had in our defineMessages() dictionary:

$ cat i18n/extracted-messages/src/js/components/Login.json
[
  {
    "id": "UserAuthenticator.authenticating",
    "defaultMessage": "Authenticating..."
  },
  {
    "id": "Login.username",
    "defaultMessage": "Username"
  },
  {
    "id": "Login.usernameRequired",
    "defaultMessage": "Username is required."
  },
[...]

However, JSON is not a format that Zanata understands by default. I think the latest version we upgraded to, or the next one might have some support for it, but since there's no i18n JSON standard it's somewhat limited. In open-source software projects, po/pot files are generally the standard to go with.

$ npm run json2pot

> tripleo-ui@7.1.0 json2pot /home/jpichon/devel/tripleo-ui
> rip json2pot ./i18n/extracted-messages/**/*.json -o ./i18n/messages.pot

> [react-intl-po] write file -> ./i18n/messages.pot ✔️

$ cat i18n/messages.pot
msgid ""
msgstr ""
"POT-Creation-Date: 2017-07-07T09:14:10.098Z\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"MIME-Version: 1.0\n"
"X-Generator: react-intl-po\n"


#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.noNodesToRegister] - undefined
msgid ""No Nodes To Register""
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/NodesToolbar/NodesToolbar.json
#. [Toolbar.activeFilters] - undefined
#: ./i18n/extracted-messages/src/js/components/validations/ValidationsToolbar.json
#. [Toolbar.activeFilters] - undefined
msgid "Active Filters:"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.addNew] - Small button, to add a new Node
msgid "Add New"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/plan/PlanFormTabs.json
#. [PlanFormTabs.addPlanName] - Tooltip for "Plan Name" form field
msgid "Add a Plan Name"
msgstr ""
[...]

This messages.pot file is what will be automatically uploaded to Zanata.

Infra: from the git repo, to Zanata

The following steps are done by the infrastructure scripts. There's infra documentation on how to enable translations for your project, in our case as the first internationalised JavaScript project we had to update the scripts a little as well. This is useful to know if an issue happens with the infra jobs; debugging will probably bring you here.

The scripts live in the project-config infra repo and there are three files of interest for us:

In this case, upstream_translation_update.sh is the file of interest to us: it simply sets up the project on line 76, then sends the pot file up to Zanata on line 115.

What does "setting up the project" entails? It's a function in common_translations_update.sh, that pretty much runs the steps we talked about in the previous section, and also creates a config file to talk to Zanata.

Monitoring the post jobs

Post jobs run after a patch has already merged - usually to upload tarballs where they should be, update the documentation pages, etc, and also upload messages catalogues onto Zanata. Being a 'post' job however means that if something goes wrong, there is no notification on the original review so it's easy to miss.

Here's the OpenStack Health page to monitor 'post' jobs related to tripleo-ui. Scroll to the bottom - hopefully tripleo-ui-upstream-translation-update is still green! It's good to keep an eye on it although it's easy to forget. Thankfully, AJaeger from #openstack-infra has been great at filing bugs and letting us know when something does go wrong.

Debugging when things go wrong: an example

We had a couple of issues whereby a linebreak gets introduced into one of the strings, which works fine in JSON but breaks our pot file. If you look at the content from the bug (the full logs are no longer accessible):

2017-03-16 12:55:13.468428 | + zanata-cli -B -e push --copy-trans False
[...]
2017-03-16 12:55:15.391220 | [INFO] Found source documents:
2017-03-16 12:55:15.391405 | [INFO]            i18n/messages
2017-03-16 12:55:15.531164 | [ERROR] Operation failed: missing end-quote

You'll notice the first line is the last function we call in the upstream_translation_update.sh script; for debugging that gives you an idea of the steps to follow to reproduce. The upstream Zanata instance also lets you create toy projects, if you want to test uploads yourself (this can't be done directly on the OpenStack Zanata instance.)

This particular newline issue has popped up a couple of times already. We're treating it with band-aids at the moment, ideally we'd get a proper test on the gate to prevent it from happening again: this is why this bug is still open. I'm not very familiar with JavaScript testing and haven't had a chance to look into it yet; if you'd like to give it a shot that'd be a useful contribution :)

Zanata, and contributing translations

The OpenStack Zanata instance lives at https://translate.openstack.org. This is where the translators do their work. Here's the page for tripleo-ui, you can see there is one project per branch (stable/ocata and master, for now). Sort by "Percent Translated" to see the languages currently translated. Here's an example of the translator's view, for Spanish: you can see the English string on the left, and the translator fills in the right side. No context! Just strings.

At this stage of the release cycle, the focus would be on 'master,' although it is still early to do translations; there is a lot of churn still.

If you'd like to contribute translations, the I18n team has good documentation about how to go about how to do it. The short version: sign up on Zanata, request to join your language team, once you're approved - you're good to go!

Return of the string

Now that we have our strings available in multiple languages, it's time for another infra job to kick in and bring them into our repository. This is where propose_translation_update.sh comes in. We pull the po files from Zanata, convert them to JSON, then do a git commit that will be proposed to Gerrit.

The cleanup step does more than it might seem. It checks if files are translated over a certain ratio (~75% for code), which avoids adding new languages when there might only be one or two words translated (e.g. someone just testing Zanata to see how it works). Switching to your language and yet having the vast majority of the UI still appear in English is not a great user experience.

In theory, files that were added but are now below 40% should get automatically removed, however this doesn't quite work for JavaScript at the moment - another opportunity to help! Manual cleanups can be done in the meantime, but it's a rare event so not a major issue.

Monitoring the periodic jobs

Zanata is checked once a day every morning, there is an OpenStack Health page for this as well. You can see there are two jobs at the moment (hopefully green!), one per branch: tripleo-ui-propose-translation-update and tripleo-ui-propose-translation-update-ocata. The job should run every day even if there are no updates - it simply means there might not be a git review proposed at the end.

We haven't had issues with the periodic job so far, though the debugging process would be the same: figure out based on the failure if it is happening at the infra script stage or in one of our commands (e.g. npm run po2json), try to reproduce and fix. I'm sure super-helpful AJaeger would also let us know if he were to notice an issue here.

Automated patches

You may have seen the automated translations updates pop up on Gerrit. The commit message has some tips on how to review these: basically don't agonise over the translation contents as problems there should be handled in Zanata anyway, just make sure the format looks good and is unlikely to break the code. A JSON validation tool runs during the infra prep step in order to "prettify" the JSON blob and limit the size of the diffs, therefore once the patch  makes it out to Gerrit we know the JSON is well-formed at least.

Try to review these patches quickly to respect the translators' work. Not very nice to spend a lot of time on translating a project and yet not have your work included because no one was bothered to merge it :)

A note about new languages...

If the automated patch adds a new language, there'll be an additional step required after merging the translations in order to enable it: adding a string with the language name to a constants file. Until recently, this took 3 or 4 steps - thanks to Honza for making it much simpler!

This concludes the technical journey of a string. If you'd like to help with i18n tasks, we have a few related bugs open. They go from very simple low-hanging-fruits you could use to make your first contribution to the UI, to weird buttons that have translations available yet show in English but only in certain modals, to the kind of CI resiliency tasks I linked to earlier. Something for everyone! ;)

Working with the I18n team

It's really all about communication. Starting with...

Release schedule and string freezes

String freezes are noted on the main schedule but tend to fit the regular cycle-with-milestones work. This is a problem for a cycle-trailing project like tripleo-ui as we could be implementing features up to 2 weeks after the other projects, so we can't freeze strings that early.

There were discussions at the Atlanta PTG around whether the I18n should care at all about projects that don't respect the freeze deadlines. That would have made it impossible for projects like ours to ever make it onto the I18n official radar. The compromise was that cycle-trailing project should have a I18n cross-project liaison that communicates with the I18n PTL and team to inform them of deadlines, and also to ignore Soft Freeze and only do a Hard Freeze.

This will all be documented under an i18n governance tag; while waiting for it the notes from the sessions are available for the curious!

What's a String Freeze again?

The two are defined on the schedule: soft freeze means not allowing changes to strings, as it invalidates the translator's work and forces them to retranslate; hard freeze means no additions, changes or anything else in order to give translators a chance to catch up.

When we looked at Zanata earlier, there were translation percentages beside each language: the goal is always the satisfaction of reaching 100%. If we keep adding new strings then the goalpost keeps moving, which is discouraging and unfair.

Of course there's also an "exception process" when needed, to ask for permission to merge a string change with an explanation or at least a heads-up, by sending an email to the openstack-i18n mailing list. Not to be abused :)

Role of the I18n liaison

...Liaise?! Haha. The role is defined briefly on the Cross-Projects Liaison wiki page. It's much more important toward the end of the cycle, when the codebase starts to stabilise, there are fewer changes and translators look at starting their work to be included in the release.

In general it's good to hang out on the #openstack-i18n IRC channel (very low traffic), attend the weekly meeting (it alternates times), be available to answer questions, and keep the PTL informed of the I18n status of the project. In the case of cycle-trailing projects (quite a new release model still), it's also important to be around to explain the deadlines.

A couple of examples having an active liaison helps with:

  • Toward the end or after the release, once translations into the stable branch have settled, the stable translations get copied into the master branch on Zanata. The strings should still be fairly similar at that point and it avoids translators having to re-do the work. It's a manual process, so you need to let the I18n PTL know when there are no longer changes to stable/*.
  • Last cycle, because the cycle-trailing status of tripleo-ui was not correctly documented, a Zanata upgrade was planned right after the main release - which for us ended up being right when the codebase had stabilised enough and several translators had planned to be most active. Would have been solved with better, earlier communication :)

Post-release

After the Ocata release, I sent a few screenshots of tripleo-ui to the i18n list so translators could see the result of their work. I don't know if anybody cared :-) But unlike Horizon, which has an informal test system available for translators to check their strings during the RC period, most of the people who volunteered translations had no idea what the UI looked like. It'd be cool if we could offer a test system with regular string updates next release - maybe just an undercloud on the new RDO cloud? Deployment success/failures strings wouldn't be verifiable but the rest would, while the system would be easier to maintain than a full dev TripleO environment - better than nothing. Perhaps an idea for the Queens cycle!

The I18n team has a priority board on the Zanata main page (only visible when logged in I think). I'm grateful to see TripleO UI in there! :) Realistically we'll never move past Low or perhaps Medium priority which is fair, as TripleO doesn't have the same kind of reach or visibility that Horizon or the installation guides do. I'm happy that we're included! The OpenStack I18n team is probably the most volunteer-driven team in OpenStack. Let's be kind, respect string freezes and translators' time! \o/

</braindump>

links

social