TripleO Deep Dive: Internationalisation in the UI

Yesterday, as part of the TripleO Deep Dives series I gave a short introduction to internationalisation in TripleO UI: the technical aspects of it, as well as a quick overview of how we work with the I18n team.

You can catch the recording on BlueJeans or YouTube, and below's a transcript.


Life and Journey of a String

Internationalisation was added to the UI during Ocata - just a release ago. Florian implemented most of it and did the lion's share of the work, as can be seen on the blueprint if you're curious about the nitty-gritty details.

Addition to the codebase

Here's an example patch from during the transition. On the left you can see how things were hard-coded, and on the right you can see the new defineMessages() interface we now use. Obviously new patches should directly look like on the right hand-side nowadays.

The defineMessages() dictionary requires a unique id and default English string for every message. Optionally, you can also provide a description if you think there could be confusion or to clarify the meaning. The description will be shown in Zanata to the translators - remember they see no other context, only the string itself.

For example, a string might sound active like if it were related to an action/button but actually be a descriptive help string. Or some expressions are known to be confusing in English - "provide a node" has been the source of multiple discussions on list and live so might as well pre-empt questions and offer additional context to help the translators decide on an appropriate translation.

Extraction & conversion

Now we know how to add an internationalised string to the codebase - how do these get extracted into a file that will be uploaded to Zanata?

All of the following steps are described in the translation documentation in the tripleo-ui repository. Assuming you've already run the installation steps (basically, npm install):

$ npm run build

This does a lot more than just extracting strings - it prepares the code for being deployed in production. Once this ends you'll be able to find your newly extracted messages under the i18n directory:

$ ls i18n/extracted-messages/src/js/components

You can see the directory structure is kept the same as the source code. And if you peek into one of the files, you'll note the content is basically the same as what we had in our defineMessages() dictionary:

$ cat i18n/extracted-messages/src/js/components/Login.json 
    "id": "UserAuthenticator.authenticating",
    "defaultMessage": "Authenticating..."
    "id": "Login.username",
    "defaultMessage": "Username"
    "id": "Login.usernameRequired",
    "defaultMessage": "Username is required."

However, JSON is not a format that Zanata understands by default. I think the latest version we upgraded to, or the next one might have some support for it, but since there's no i18n JSON standard it's somewhat limited. In open-source software projects, po/pot files are generally the standard to go with.

$ npm run json2pot

> tripleo-ui@7.1.0 json2pot /home/jpichon/devel/tripleo-ui
> rip json2pot ./i18n/extracted-messages/**/*.json -o ./i18n/messages.pot

> [react-intl-po] write file -> ./i18n/messages.pot ✔️

$ cat i18n/messages.pot 
msgid ""
msgstr ""
"POT-Creation-Date: 2017-07-07T09:14:10.098Z\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"MIME-Version: 1.0\n"
"X-Generator: react-intl-po\n"

#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.noNodesToRegister] - undefined
msgid ""No Nodes To Register""
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/NodesToolbar/NodesToolbar.json
#. [Toolbar.activeFilters] - undefined
#: ./i18n/extracted-messages/src/js/components/validations/ValidationsToolbar.json
#. [Toolbar.activeFilters] - undefined
msgid "Active Filters:"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/nodes/RegisterNodesDialog.json
#. [RegisterNodesDialog.addNew] - Small button, to add a new Node
msgid "Add New"
msgstr ""

#: ./i18n/extracted-messages/src/js/components/plan/PlanFormTabs.json
#. [PlanFormTabs.addPlanName] - Tooltip for "Plan Name" form field
msgid "Add a Plan Name"
msgstr ""

This messages.pot file is what will be automatically uploaded to Zanata.

Infra: from the git repo, to Zanata

The following steps are done by the infrastructure scripts. There's infra documentation on how to enable translations for your project, in our case as the first internationalised JavaScript project we had to update the scripts a little as well. This is useful to know if an issue happens with the infra jobs; debugging will probably bring you here.

The scripts live in the project-config infra repo and there are three files of interest for us:

In this case, is the file of interest to us: it simply sets up the project on line 76, then sends the pot file up to Zanata on line 115.

What does "setting up the project" entails? It's a function in, that pretty much runs the steps we talked about in the previous section, and also creates a config file to talk to Zanata.

Monitoring the post jobs

Post jobs run after a patch has already merged - usually to upload tarballs where they should be, update the documentation pages, etc, and also upload messages catalogues onto Zanata. Being a 'post' job however means that if something goes wrong, there is no notification on the original review so it's easy to miss.

Here's the OpenStack Health page to monitor 'post' jobs related to tripleo-ui. Scroll to the bottom - hopefully tripleo-ui-upstream-translation-update is still green! It's good to keep an eye on it although it's easy to forget. Thankfully, AJaeger from #openstack-infra has been great at filing bugs and letting us know when something does go wrong.

Debugging when things go wrong: an example

We had a couple of issues whereby a linebreak gets introduced into one of the strings, which works fine in JSON but breaks our pot file. If you look at the content from the bug (the full logs are no longer accessible):

2017-03-16 12:55:13.468428 | + zanata-cli -B -e push --copy-trans False
2017-03-16 12:55:15.391220 | [INFO] Found source documents:
2017-03-16 12:55:15.391405 | [INFO]            i18n/messages
2017-03-16 12:55:15.531164 | [ERROR] Operation failed: missing end-quote

You'll notice the first line is the last function we call in the script; for debugging that gives you an idea of the steps to follow to reproduce. The upstream Zanata instance also lets you create toy projects, if you want to test uploads yourself (this can't be done directly on the OpenStack Zanata instance.)

This particular newline issue has popped up a couple of times already. We're treating it with band-aids at the moment, ideally we'd get a proper test on the gate to prevent it from happening again: this is why this bug is still open. I'm not very familiar with JavaScript testing and haven't had a chance to look into it yet; if you'd like to give it a shot that'd be a useful contribution :)

Zanata, and contributing translations

The OpenStack Zanata instance lives at This is where the translators do their work. Here's the page for tripleo-ui, you can see there is one project per branch (stable/ocata and master, for now). Sort by "Percent Translated" to see the languages currently translated. Here's an example of the translator's view, for Spanish: you can see the English string on the left, and the translator fills in the right side. No context! Just strings.

At this stage of the release cycle, the focus would be on 'master,' although it is still early to do translations; there is a lot of churn still.

If you'd like to contribute translations, the I18n team has good documentation about how to go about how to do it. The short version: sign up on Zanata, request to join your language team, once you're approved - you're good to go!

Return of the string

Now that we have our strings available in multiple languages, it's time for another infra job to kick in and bring them into our repository. This is where comes in. We pull the po files from Zanata, convert them to JSON, then do a git commit that will be proposed to Gerrit.

The cleanup step does more than it might seem. It checks if files are translated over a certain ratio (~75% for code), which avoids adding new languages when there might only be one or two words translated (e.g. someone just testing Zanata to see how it works). Switching to your language and yet having the vast majority of the UI still appear in English is not a great user experience.

In theory, files that were added but are now below 40% should get automatically removed, however this doesn't quite work for JavaScript at the moment - another opportunity to help! Manual cleanups can be done in the meantime, but it's a rare event so not a major issue.

Monitoring the periodic jobs

Zanata is checked once a day every morning, there is an OpenStack Health page for this as well. You can see there are two jobs at the moment (hopefully green!), one per branch: tripleo-ui-propose-translation-update and tripleo-ui-propose-translation-update-ocata. The job should run every day even if there are no updates - it simply means there might not be a git review proposed at the end.

We haven't had issues with the periodic job so far, though the debugging process would be the same: figure out based on the failure if it is happening at the infra script stage or in one of our commands (e.g. npm run po2json), try to reproduce and fix. I'm sure super-helpful AJaeger would also let us know if he were to notice an issue here.

Automated patches

You may have seen the automated translations updates pop up on Gerrit. The commit message has some tips on how to review these: basically don't agonise over the translation contents as problems there should be handled in Zanata anyway, just make sure the format looks good and is unlikely to break the code. A JSON validation tool runs during the infra prep step in order to "prettify" the JSON blob and limit the size of the diffs, therefore once the patch  makes it out to Gerrit we know the JSON is well-formed at least.

Try to review these patches quickly to respect the translators' work. Not very nice to spend a lot of time on translating a project and yet not have your work included because no one was bothered to merge it :)

A note about new languages...

If the automated patch adds a new language, there'll be an additional step required after merging the translations in order to enable it: adding a string with the language name to a constants file. Until recently, this took 3 or 4 steps - thanks to Honza for making it much simpler!

This concludes the technical journey of a string. If you'd like to help with i18n tasks, we have a few related bugs open. They go from very simple low-hanging-fruits you could use to make your first contribution to the UI, to weird buttons that have translations available yet show in English but only in certain modals, to the kind of CI resiliency tasks I linked to earlier. Something for everyone! ;)

Working with the I18n team

It's really all about communication. Starting with...

Release schedule and string freezes

String freezes are noted on the main schedule but tend to fit the regular cycle-with-milestones work. This is a problem for a cycle-trailing project like tripleo-ui as we could be implementing features up to 2 weeks after the other projects, so we can't freeze strings that early.

There were discussions at the Atlanta PTG around whether the I18n should care at all about projects that don't respect the freeze deadlines. That would have made it impossible for projects like ours to ever make it onto the I18n official radar. The compromise was that cycle-trailing project should have a I18n cross-project liaison that communicates with the I18n PTL and team to inform them of deadlines, and also to ignore Soft Freeze and only do a Hard Freeze.

This will all be documented under an i18n governance tag; while waiting for it the notes from the sessions are available for the curious!

What's a String Freeze again?

The two are defined on the schedule: soft freeze means not allowing changes to strings, as it invalidates the translator's work and forces them to retranslate; hard freeze means no additions, changes or anything else in order to give translators a chance to catch up.

When we looked at Zanata earlier, there were translation percentages beside each language: the goal is always the satisfaction of reaching 100%. If we keep adding new strings then the goalpost keeps moving, which is discouraging and unfair.

Of course there's also an "exception process" when needed, to ask for permission to merge a string change with an explanation or at least a heads-up, by sending an email to the openstack-i18n mailing list. Not to be abused :)

Role of the I18n liaison

...Liaise?! Haha. The role is defined briefly on the Cross-Projects Liaison wiki page. It's much more important toward the end of the cycle, when the codebase starts to stabilise, there are fewer changes and translators look at starting their work to be included in the release.

In general it's good to hang out on the #openstack-i18n IRC channel (very low traffic), attend the weekly meeting (it alternates times), be available to answer questions, and keep the PTL informed of the I18n status of the project. In the case of cycle-trailing projects (quite a new release model still), it's also important to be around to explain the deadlines.

A couple of examples having an active liaison helps with:

  • Toward the end or after the release, once translations into the stable branch have settled, the stable translations get copied into the master branch on Zanata. The strings should still be fairly similar at that point and it avoids translators having to re-do the work. It's a manual process, so you need to let the I18n PTL know when there are no longer changes to stable/*.
  • Last cycle, because the cycle-trailing status of tripleo-ui was not correctly documented, a Zanata upgrade was planned right after the main release - which for us ended up being right when the codebase had stabilised enough and several translators had planned to be most active. Would have been solved with better, earlier communication :)


After the Ocata release, I sent a few screenshots of tripleo-ui to the i18n list so translators could see the result of their work. I don't know if anybody cared :-) But unlike Horizon, which has an informal test system available for translators to check their strings during the RC period, most of the people who volunteered translations had no idea what the UI looked like. It'd be cool if we could offer a test system with regular string updates next release - maybe just an undercloud on the new RDO cloud? Deployment success/failures strings wouldn't be verifiable but the rest would, while the system would be easier to maintain than a full dev TripleO environment - better than nothing. Perhaps an idea for the Queens cycle!

The I18n team has a priority board on the Zanata main page (only visible when logged in I think). I'm grateful to see TripleO UI in there! :) Realistically we'll never move past Low or perhaps Medium priority which is fair, as TripleO doesn't have the same kind of reach or visibility that Horizon or the installation guides do. I'm happy that we're included! The OpenStack I18n team is probably the most volunteer-driven team in OpenStack. Let's be kind, respect string freezes and translators' time! \o/


Leave a comment

OpenStack Pike PTG: TripleO, TripleO UI Some highlights

For the second part of the PTG (vertical projects), I mainly stayed in the TripleO room, moving around a couple of times to attend cross-project sessions related to i18n.

Although I always wish I understood more/everything, in the end my areas of interest (and current understanding!) in TripleO are around the UI, installing and configuring it, the TripleO CLI, and the tripleo-common Mistral workflows. Therefore the couple of thoughts in this post are mainly relevant to these - if you're looking for a more exhaustive summary of the TripleO discussions and decisions made during the PTG, I recommend reading the PTL's excellent thread about this on the dev list, and the associated etherpads.

Random points of interest

  • Containers is the big topic and had multiple sessions dedicated to it, both single and cross-projects. Many other sessions ended up revisiting the subject as well, sometimes with "oh that'll be solved with containers" and sometimes with "hm good but that won't work with containers."
  • A couple of API-breaking changes may need to happen in Tripleo Heat Templates (e.g. for NFV, passing a role mapping vs a role name around). The recommendation is to get this in as early as possible (by the first milestone) and communicate it well for out of tree services.
  • When needing to test something new on the CI, look at the existing scenarios and prioritise adding/changing something there to test for what you need, as opposed to trying to create a brand new job.
  • Running Mistral workflows as part of or after the deployment came up several times and was even a topic during a cross-project Heat / Mistral / TripleO sessions. Things can get messy, switching between Heat, Mistral and Puppet. Where should these workflows live (THT, tripleo-common)? Service-specific workflows (pre/post-deploy) are definitely something people want and there's a need to standardise how to do that. Ceph's likely to be the first to try their hand at this.
  • One lively cross-project session with OpenStack Ansible and Kolla was about parameters in configuration files. Currently whenever a new feature is added to Nova or whatever service, Puppet and so on need to be updated manually. The proposal is to make a small change to oslo.config to enable it to give an output in machine-readable YAML which can then be consumed (currently the config generated is only human readable). This will help with validations, and it may help to only have to maintain a structure as opposed to a template.
  • Heat folks had a feedback session with us about the TripleO needs. They've been super helpful with e.g. helping to improve our memory usage over the last couple of cycles. My takeaway from this session was "beware/avoid using YAQL, especially in nested stacks." YAQL is badly documented and everyone ends up reading the source code and tests to figure out how to things. Bringing Jinja2 into Heat or some kind of way to have repeated patterns from resources (e.g. based on a file) also came up and was cautiously acknowledged.
  • Predictable IP assignment on the control plane is a big enough issue that some people are suggesting dropping Neutron in the undercloud over it. We'd lose so many other benefits though, that it seems unlikely to happen.
  • Cool work incoming allowing built-in network examples to Just Work, based on a sample configuration. Getting the networking stuff right is a huge pain point and I'm excited to hear this should be achievable within Pike.

Python 3

Python 3 is an OpenStack community goal for Pike.

Tripleo-common and python-tripleoclient both have voting unit tests jobs for Python 3.5, though I trust them only moderately for a number of reasons. For example many of the tests tend to focus on the happy path and I've seen and fixed Python 3 incompatible code in exceptions several times (no 'message' attribute seems easy to get caught into), despite the unit testing jobs being all green. Apparently there are coverage jobs we could enable for the client, to ensure the coverage ratio doesn't drop.

Python 3 for functional tests was also brought up. We don't have functional tests in any of our projects and it's not clear the value we would get out of it (mocking servers) compared to the unit testing and all the integration testing we already do. Increasing unit test coverage was deemed a more valuable goal to pursue for now.

There are other issues around functional/integration testing with Python 3 which will need to be resolved (though likely not during Pike). For example our integration jobs run on CentOS and use packages, which won't be Python 3 compatible yet (cue SCL and the need to respin dependencies). If we do add functional tests, perhaps it would be easier to have them run on a Fedora gate (although if I recall correctly gating on Fedora was investigated once upon a time at the OpenStack level, but caused too many issues due to churn and the lack of long-term releases).

Another issue with Python 3 support and functional testing is that the TripleO client depends on Mistral server (due to the Series Of Unfortunate Dependencies I also mentioned in the last post). That means Mistral itself would need to fully support Python 3 as well.

Python 2 isn't going anywhere just yet so we still have time to figure things out. The conclusions, as described in Emilien's email seem to be:

  • Improve the unit test coverage
  • Enable the coverage job in CI
  • Investigate functional testing for python-tripleoclient to start with, see if it makes sense and is feasible

Sample environment generator

Currently environment files in THT are written by hand and quite inconsistent. This is also important for the UI, which needs to display this information. For example currently the environment general description is in a comment at the top of the file (if it exists at all), which can't be accessed programmatically. Dependencies between environment files are not described either.

To make up for this, currently all that information lives in the capabilities map but it's external to the template themselves, needs to be updated manually and gets out of sync easily.

The sample environment generator to fix this has been out there for a year, and currently has two blockers. First, it needs a way to determine which parameters are private (that is, parameters that are expected to be passed in by another template and shouldn't be set by the user).

One way could be to use a naming convention, perhaps an underscore prefix similar to Python. Parameter groups cannot be used because of a historical limitation, there can only be one group (so you couldn't be both Private and Deprecated). Changing Heat with a new feature like Nested Groups or generic Parameter Tags could be an option. The advantage of the naming convention is that it doesn't require any change to Heat.

From the UI perspective, validating if an environment or template is redefining parameters already defined elsewhere also matters. Because if it has the same name, then it needs to be set up with the same value everywhere or it's uncertain what the final value will end up as.

I think the second issue was that the general environment description can only be a comment at the moment, there is no Heat parameter to do this. The Heat experts in the room seemed confident this is non-controversial as a feature and should be easy to get in.

Once the existing templates are updated to match the new format, the validation should be added to CI to make sure that any new patch with environments does include these parameters. Having "description" show up as an empty string when generating a new environment will make it more obvious that something can/should be there, while it's easy to forget about it with the current situation.

The agreement was:

  • Use underscores as a naming convention to start with
  • Start with a comment for the general description

Once we get the new Heat description attribute we can move things around. If parameter tags get accepted, likewise we can automate moving things around. Tags would also be useful to the UI, to determine what subset of relevant parameters to display to the user in smaller forms (easier to understand that one form with several dozens of fields showing up all at once). Tags, rather than parameter groups are required because of the aforementioned issue: it's already used for deprecation and a parameter can only have one group.

Trusts and federation

This was a cross-project session together with Heat, Keystone and Mistral. A "Trust" lets you delegate or impersonate a user with a subset of their rights. From my experience in TripleO, this is particularly useful with long running Heat stacks as a authentication token expires after a few hours which means you lose the ability to do anything in the middle of an operation.

Trusts have been working very well for Heat since 2013. Before that they had to encrypt the user password in order to ask for a new token when needed, which all agreed was pretty horrible and not anything people want to go back to. Unfortunately with the federation model and using external Identity Providers, this is no longer possible. Trusts break, but some kind of delegation is still needed for Heat.

There were a lot of tangents about security issues (obviously!), revocation, expiration, role syncing. From what I understand Keystone currently validates Trusts to make sure the user still has the requested permissions (that the relevant role hasn't been removed in the meantime). There's a desire to have access to the entire role list, because the APIs currently don't let us check which role is necessary to perform a specific action. Additionally, when Mistral workflows launched from Heat get in, Mistral will create its own Trusts and Heat can't know what that will do. In the end you always kinda end up needing to do everything. Karbor is running into this as well.

No solution was discovered during the session, but I think all sides were happy enough that the problem and use cases have been clearly laid out and are now understood.

TripleO UI

Some of the issues relevant to the UI were brought up in other sessions, like standardising the environment files. Other issues brought up were around plan management, for example why do we use the Mistral environment in addition to Swift? Historically it looks like it's because it was a nice drop-in replacement for the defunct TripleO API and offered a similar API. Although it won't have an API by default, the suggestion is to move to using a file to store the environment during Pike and have a consistent set of templates: this way all the information related to a deployment plan will live in the same place. This will help with exporting/importing plans, which itself will help with CLI/UI interoperability (for instance there are still some differences in how and when the Mistral environment is generated, depending on whether you deployed with the CLI or the UI).

A number of other issues were brought up around networking, custom roles, tracking deployment progress, and a great many other topics but I think the larger problems around plan management was the only expected to turn into a spec, now proposed for review.

I18n and release models

After the UI session I left the TripleO room to attend a cross-project session about i18n, horizon and release models. The release model point is particularly relevant because the TripleO UI is a newly internationalised project as of Ocata and the first to be cycle-trailing (TripleO releases a couple of weeks after the main OpenStack release).

I'm glad I was able to attend this session. For one it was really nice to collaborate directly with the i18n and release management team, and catch up with a couple of Horizon people. For second it turns out tripleo-ui was not properly documented as cycle-trailing (fixed now!) and that led to other issues.

Having different release models is a source of headaches for the i18n community, already stretched thin. It means string freezes happen at different times, stable branches are cut at different points, which creates a lot of tracking work for the i18n PTL to figure which project is ready and do the required manual work to update Zanata upon branching. One part of the solution is likely to figure out if we can script the manual parts of this workflow so that when the release patch (which creates the stable branch) is merged, the change is automatically reflected in Zanata.

For the non-technical aspects of the work (mainly keeping track of deadlines and string freezes) the decision was that if you want to be translated, then you need to respect the same deadlines than the cycle-with-milestones projects do on the main schedule, and if a project doesn't want to - if it doesn't respect the freezes or cut branches when expected, then they will be dropped from the i18n priority dashboard in Zanata. This was particularly relevant for Horizon plugins, as there's about a dozen of them now with various degrees of diligence when it comes to doing releases.

These expectations will be documented in a new governance tag, something like i18n:translated.

Obviously this would mean that cycle-trailing projects would likely never be able to get the tag. The work we depend on lands late and so we continue making changes up to two weeks after each of the documented deadlines. ianychoi, the i18n PTL seemed happy to take these projects under the i18n wing and do the manual work required, as long as there is an active i18n liaison for the project communicating with the i18n team to keep them informed about freezes and new branches. This seemed to work ok for us during Ocata so I'm hopeful we can keep that model. It's not entirely clear to me if this will also be documented/included in the governance tag so I look forward to reading the spec once it is proposed! :)

In the case of tripleo-ui we're not a priority project for translations nor looking to be, but we still rely on the i18n PTL to create Zanata branches and merge translations for us, and would like to continue with being able to translate stable branches as needed.


The CI Q&A session on Friday morning was amazingly useful and unanimously agreed it should be moved to the documentation (not done yet). If you've ever scratched your head about something related to TripleO CI, have a look at the etherpad!

Leave a comment

A Quick Introduction to Mistral Usage in TripleO (Newton) For developers

Since Newton, Mistral has become a central component to the TripleO project, handling many of the operations in the back-end. I recently gave a short crash course on Mistral, what it is and how we use it to a few people and thought it might be useful to share some of my bag of tricks here as well.

What is Mistral?

It's a workflow service. You describe what you want as a series of steps (tasks) in YAML, and it will coordinate things for you, usually asynchronously.

Link: Mistral overview.

We are using it for a few reasons:

  • it lets us manage long-running processes (e.g. introspection) and track their state
  • it acts a common interface/API, that is currently used by both the TripleO CLI and UI thus avoiding duplication, and can also be consumed directly by external non-OpenStack consumers (e.g. ManageIQ).


A workbook contains multiple workflows. (The TripleO workbooks live at

A workflow contains a series of 'tasks' which can be thought of as steps. We use the default 'direct' type of workflow on TripleO, which means tasks are executed in the order written, moving around based on the on-success and on-error values.

Every task calls to an action (or to another workflow), which is where the work actually gets done.

OpenStack services are automatically mapped into actions thanks to the mappings defined in Mistral, so we get a ton of actions for free already.

Useful tip: with the following commands you can see locally which actions are available, for a given project.

$ mistral action-list | grep $projectname

You can of course create your own actions. Which we do. Quite a lot.

$ mistral action-list | grep tripleo

An execution is what an instance of a running workflow is called, once you started one.

Link: Mistral terminology (very detailed, with diagrams and examples).

Where the TripleO Mistral workflows live

Let's look at a couple of examples.

A short one to start with, scaling down

It takes some input, starts with the 'delete_node' task and continues on to on-success or on-error depending on the action result.

Note: You can see we always end the workflow with send_message, which is a convention we use in the project. Even if an action failed and moves to on-error, the workflow itself should be successful (a failed workflow would indicate a problem at the Mistral level). We end with send_message because we want to let the caller know what was the result.

How will the consumer get to that result? We associate every workflow with a Zaqar queue. This is a TripleO convention, not a Mistral requirement. Each of our workflow takes a queue_name as input, and the clients are expected to listen to the Zaqar socket for that queue in order to receive the messages.

Another point, about the action itself on line 20: tripleo.scale.delete_node is a TripleO-specific action, as indicated in the name. If you were interested in finding the code for it, you should look at the entry_points in setup.cfg for tripleo-common (where all the workflows live):

which would lead you to the code at:

A bit more complex: node configuration

It's "slightly more complex" in that it has a couple more tasks, and it also calls to another workflow (line 426). You can see it starts with a call to ironic.node_list in its first task at line 417, which comes for free with Mistral. No need to reimplement it.

Debugging notes on workflows and Zaqar

Each workflow creates a Zaqar queue, to send progress information back to the client (CLI, web UI, other...).

Sometimes these messages get lost and the process hangs. It doesn't mean the action didn't complete successfully.

  • Check the Zaqar processes are up and running: $ sudo systemctl | grep zaqar (this has happened to me after reboots)
  • Check Mistral for any errored workflow: $ mistral execution-list
  • Check the Mistral logs (executor.log and engine.log are usually where the interesting errors are)
  • Ocata has timeouts for some of the commands now, so this is getting better

Following a workflow through its execution via CLI

This particular example will run somewhat fast so it's more of a "tracing back what happened afterwards."

$ openstack overcloud plan create my-new-overcloud
Started Mistral Workflow. Execution ID: 05d550f2-5d13-4782-be7f-a775a1d86a84
Default plan created

The CLI nicely tells you which execution ID to look for, so let's use it:

$ mistral task-list 05d550f2-5d13-4782-be7f-a775a1d86a84

| ID                                   | Name                            | Workflow name                              | Execution ID                         | State   | State info                   |
| c6e0fef0-4e65-4ee6-9ae4-a6d9e8451fd0 | verify_container_doesnt_exist   | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | ERROR   | Failed to run action [act... |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 72c1310d-8379-4869-918e-62eb04530e46 | verify_environment_doesnt_exist | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | ERROR   | Failed to run action [act... |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 74438300-8b18-40fd-bf73-62a1d90f71b3 | create_container                | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 667c0e4b-6f6c-447d-9325-ab6c20c8ad98 | upload_to_container             | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| ef447ea6-86ec-4a62-bca2-a083c66f96d3 | create_plan                     | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| f37ebe9f-b39c-4f7a-9a60-eceb80782714 | ensure_passwords_exist          | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 193f65fb-502a-4e4c-9a2d-053966500990 | plan_process_templates          | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 400d7e11-aea8-45c7-96e8-c61523d66fe4 | plan_set_status_success         | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 9df60103-15e2-442e-8dc5-ff0d61dba449 | notify_zaqar                    | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |

This gives you an idea of what Mistral did to accomplish the goal. You can also map it back to the workflow defined in tripleo-common to follow through the steps and find out what exactly was run. It if the workflow stopped too early, this can give you an idea of where the problem occurred.

Side-node about plans and the ERRORed tasks above

As of Newton, information about deployment is stored in a "Plan" which is implemented as a Swift container together with a Mistral environment. This could change in the future but for now that is what a plan is. (Edited to add: this changed in Pike. The plan environment is now stored in Swift as well, in a file named plan-environment.yaml.)

To create a new plan, we need to make sure there isn't already a container or an environment with that name. We could implement this in an action in Python, or since Mistral already has commands to get a container / get an environment we can be clever about this and reverse the on-error and on-success actions compared to usual:

If we do get a 'container' then it means it already exists and the plan already exists, so we cannot reuse this name. So 'on-success' becomes the error condition.

I sometimes regret a little us going this way because it leaves exception tracebacks in the logs, which is misleading when folks go to the Mistral logs for the first time in order to debug some other issue.

Finally I'd like to end all this by mentioning the Mistral Quick Start tutorial, which is excellent. It takes you from creating a very simple workflow to following its journey through the execution.

How to create your own action/workflow in TripleO

Mistral documentation:

In short:

  • Start writing your python code, probably under tripleo_common/actions
  • Add an entry point referencing it to setup.cfg
  • /!\ Restart Mistral /!\ Action code is only taken in once Mistral starts

This is summarised in the TripleO common README (personally I put this in a script to easily rerun it all).

Back to deployments: what's in a plan

As mentioned earlier, a plan is the combination of a as a Swift container + Mistral environment. In theory this is an implementation detail which shouldn't matter to deployers. In practice knowing this gives you access to a few more debugging tricks.

For example, the templates you initially provided will be accessible through Swift.

$ swift list $plan-name

Everything else will live in the Mistral environment. This contains:

  • The default passwords (which is a potential source of confusion)
  • The parameters_default aka overriden parameters (this takes priority and would override the passwords above)
  • The list of enabled environments (this looks nicer for plans created from the UI, as they are all munged into one user-environment.yaml file when deploying from CLI - see bug 1640861)
$ mistral environment-get $plan-name

For example, with an SSL-deployment done from the UI:

$ mistral environment-get ssl-overcloud
| Field       | Value                                                                             |
| Name        | ssl-overcloud                                                                     |
| Description | <none>                                                                            |
| Variables   | {                                                                                 |
|             |     "passwords": {                                                                |
|             |         "KeystoneFernetKey1": "V3Dqp9MLP0mFvK0C7q3HlIsGBAI5VM1aW9JJ6c5lLjo=",     |
|             |         "KeystoneFernetKey0": "ll6gbwcbhyAi9jNvBnpWDImMmEAaW5dog5nRQvzvEz4=",     |
|             |         "HAProxyStatsPassword": "NXgvwfJ23VHJmwFf2HmKMrgcw",                      |
|             |         "HeatPassword": "Fs7K3CxR636BFhyDJWjsbAQZr",                              |
|             |         "ManilaPassword": "Kya6gr2zp2x8ApD6wtwUUMcBs",                            |
|             |         "NeutronPassword": "x2YK6xMaYUtgn8KxyFCQXfzR6",                           |
|             |         "SnmpdReadonlyUserPassword": "5a81d2d83ee4b69b33587249abf49cd672d08541",  |
|             |         "GlancePassword": "pBdfTUqv3yxpH3BcPjrJwb9d9",                            |
|             |         "AdminPassword": "KGGz6ApEDGdngj3KMpy7M2QGu",                             |
|             |         "IronicPassword": "347ezHCEqpqhmANK4fpWK2MvN",                            |
|             |         "HeatStackDomainAdminPassword": "kUk6VNxe4FG8ECBvMC6C4rAqc",              |
|             |         "ZaqarPassword": "6WVc8XWFjuKFMy2qP2qqqVk82",                             |
|             |         "MysqlClustercheckPassword": "M8V26MfpJc8FmpG88zu7p3bpw",                 |
|             |         "GnocchiPassword": "3H6pmazAQnnHj24QXADxPrguM",                           |
|             |         "CephAdminKey": "AQDloEFYAAAAABAAcCT546pzZnkfCJBSRz4C9w==",               |
|             |         "CeilometerPassword": "6DfAKDFdEFhxWtm63TcwsEW2D",                        |
|             |         "CinderPassword": "R8DvNyVKaqA44wRKUXEWfc4YH",                            |
|             |         "RabbitPassword": "9NeRMdCyQhekJAh9zdXtMhZW7",                            |
|             |         "CephRgwKey": "AQDloEFYAAAAABAACIfOTgp3dxt3Sqn5OPhU4Q==",                 |
|             |         "TrovePassword": "GbpxyPdnJkUCjXu4AsjmgqZVv",                             |
|             |         "KeystoneCredential0": "1BNiiNQjthjaIBnJd3EtoihXu25ZCzAYsKBpPQaV12M=",    |
|             |         "KeystoneCredential1": "pGZ4OlCzOzgaK2bEHaD1xKllRdbpDNowQJGzJHo6ETU=",    |
|             |         "CephClientKey": "AQDloEFYAAAAABAAoTR3S00DaBpfz4cyREe22w==",              |
|             |         "NovaPassword": "wD4PUT4Y4VcuZsMJTxYsBTpBX",                              |
|             |         "AdminToken": "hdF3kfs6ZaCYPUwrTzRWtwD3W",                                |
|             |         "RedisPassword": "2bxUvNZ3tsRfMyFmTj7PTUqQE",                             |
|             |         "MistralPassword": "mae3HcEQdQm6Myq3tZKxderTN",                           |
|             |         "SwiftHashSuffix": "JpWh8YsQcJvmuawmxph9PkUxr",                           |
|             |         "AodhPassword": "NFkBckXgdxfCMPxzeGDRFf7vW",                              |
|             |         "CephClusterFSID": "3120b7cc-b8ac-11e6-b775-fa163e0ee4f4",                |
|             |         "CephMonKey": "AQDloEFYAAAAABAABztgp5YwAxLQHkpKXnNDmw==",                 |
|             |         "SwiftPassword": "3bPB4yfZZRGCZqdwkTU9wHFym",                             |
|             |         "CeilometerMeteringSecret": "tjyywuf7xj7TM7W44mQprmaC9",                  |
|             |         "NeutronMetadataProxySharedSecret": "z7mb6UBEHNk8tJDEN96y6Acr3",          |
|             |         "BarbicanPassword": "6eQm4fwqVybCecPbxavE7bTDF",                          |
|             |         "SaharaPassword": "qx3saVNTmAJXwJwBH8n3w8M4p"                             |
|             |     },                                                                            |
|             |     "parameter_defaults": {                                                       |
|             |         "OvercloudControlFlavor": "control",                                      |
|             |         "ComputeCount": "2",                                                      |
|             |         "ControllerCount": "3",                                                   |
|             |         "OvercloudComputeFlavor": "compute",                                      |
|             |         "NtpServer": ""                                  |
|             |     },                                                                            |
|             |     "environments": [                                                             |
|             |         {                                                                         |
|             |             "path": "overcloud-resource-registry-puppet.yaml"                     |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/inject-trust-anchor.yaml"                       |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/tls-endpoints-public-ip.yaml"                   |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/enable-tls.yaml"                                |
|             |         }                                                                         |
|             |     ],                                                                            |
|             |     "template": "overcloud.yaml"                                                  |
|             | }                                                                                 |
| Scope       | private                                                                           |
| Created at  | 2016-12-02 16:27:11                                                               |
| Updated at  | 2016-12-06 21:25:35                                                               |

Note: 'environment' is an overloaded word in the TripleO world, be careful. Heat environment, Mistral environment, specific templates (e.g. TLS/SSL, Storage...), your whole setup, ...

Bonus track

There is documentation on going from zero (no plan, no nodes registered) till running a deployment, directly using Mistral:

Also, with the way we work with Mistral and Zaqar, you can switch between the UI and CLI, or even using Mistral directly, at any point in the process.


Thanks to Dougal for his feedback on the initial outline!

Leave a comment