A Quick Introduction to Mistral Usage in TripleO (Newton) | For developers

Since Newton, Mistral has become a central component to the TripleO project, handling many of the operations in the back-end. I recently gave a short crash course on Mistral, what it is and how we use it to a few people and thought it might be useful to share some of my bag of tricks here as well.

What is Mistral?

It's a workflow service. You describe what you want as a series of steps (tasks) in YAML, and it will coordinate things for you, usually asynchronously.

Link: Mistral overview.

We are using it for a few reasons:

  • it lets us manage long-running processes (e.g. introspection) and track their state
  • it acts a common interface/API, that is currently used by both the TripleO CLI and UI thus avoiding duplication, and can also be consumed directly by external non-OpenStack consumers (e.g. ManageIQ).

Terminology

A workbook contains multiple workflows. (The TripleO workbooks live at https://github.com/openstack/tripleo-common/tree/master/workbooks).

A workflow contains a series of 'tasks' which can be thought of as steps. We use the default 'direct' type of workflow on TripleO, which means tasks are executed in the order written, moving around based on the on-success and on-error values.

Every task calls to an action (or to another workflow), which is where the work actually gets done.

OpenStack services are automatically mapped into actions thanks to the mappings defined in Mistral, so we get a ton of actions for free already.

Useful tip: with the following commands you can see locally which actions are available, for a given project.

$ mistral action-list | grep $projectname

You can of course create your own actions. Which we do. Quite a lot.

$ mistral action-list | grep tripleo

An execution is what an instance of a running workflow is called, once you started one.

Link: Mistral terminology (very detailed, with diagrams and examples).

Where the TripleO Mistral workflows live

Let's look at a couple of examples.

A short one to start with, scaling down

https://github.com/openstack/tripleo-common/blob/156d2c/workbooks/scale.yaml#L8

It takes some input, starts with the 'delete_node' task and continues on to on-success or on-error depending on the action result.

Note: You can see we always end the workflow with send_message, which is a convention we use in the project. Even if an action failed and moves to on-error, the workflow itself should be successful (a failed workflow would indicate a problem at the Mistral level). We end with send_message because we want to let the caller know what was the result.

How will the consumer get to that result? We associate every workflow with a Zaqar queue. This is a TripleO convention, not a Mistral requirement. Each of our workflow takes a queue_name as input, and the clients are expected to listen to the Zaqar socket for that queue in order to receive the messages.

Another point, about the action itself on line 20: tripleo.scale.delete_node is a TripleO-specific action, as indicated in the name. If you were interested in finding the code for it, you should look at the entry_points in setup.cfg for tripleo-common (where all the workflows live):

https://github.com/openstack/tripleo-common/blob/156d2c/setup.cfg#L81

which would lead you to the code at:

https://github.com/openstack/tripleo-common/blob/156d2c/tripleo_common/actions/scale.py#L52

A bit more complex: node configuration

https://github.com/openstack/tripleo-common/blob/156d2c/workbooks/baremetal.yaml#L402

It's "slightly more complex" in that it has a couple more tasks, and it also calls to another workflow (line 426). You can see it starts with a call to ironic.node_list in its first task at line 417, which comes for free with Mistral. No need to reimplement it.

Debugging notes on workflows and Zaqar

Each workflow creates a Zaqar queue, to send progress information back to the client (CLI, web UI, other...).

Sometimes these messages get lost and the process hangs. It doesn't mean the action didn't complete successfully.

  • Check the Zaqar processes are up and running: $ sudo systemctl | grep zaqar (this has happened to me after reboots)
  • Check Mistral for any errored workflow: $ mistral execution-list
  • Check the Mistral logs (executor.log and engine.log are usually where the interesting errors are)
  • Ocata has timeouts for some of the commands now, so this is getting better

Following a workflow through its execution via CLI

This particular example will run somewhat fast so it's more of a "tracing back what happened afterwards."

$ openstack overcloud plan create my-new-overcloud
Started Mistral Workflow. Execution ID: 05d550f2-5d13-4782-be7f-a775a1d86a84
Default plan created

The CLI nicely tells you which execution ID to look for, so let's use it:

$ mistral task-list 05d550f2-5d13-4782-be7f-a775a1d86a84

+--------------------------------------+---------------------------------+--------------------------------------------+--------------------------------------+---------+------------------------------+
| ID                                   | Name                            | Workflow name                              | Execution ID                         | State   | State info                   |
+--------------------------------------+---------------------------------+--------------------------------------------+--------------------------------------+---------+------------------------------+
| c6e0fef0-4e65-4ee6-9ae4-a6d9e8451fd0 | verify_container_doesnt_exist   | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | ERROR   | Failed to run action [act... |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 72c1310d-8379-4869-918e-62eb04530e46 | verify_environment_doesnt_exist | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | ERROR   | Failed to run action [act... |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 74438300-8b18-40fd-bf73-62a1d90f71b3 | create_container                | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 667c0e4b-6f6c-447d-9325-ab6c20c8ad98 | upload_to_container             | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| ef447ea6-86ec-4a62-bca2-a083c66f96d3 | create_plan                     | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| f37ebe9f-b39c-4f7a-9a60-eceb80782714 | ensure_passwords_exist          | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 193f65fb-502a-4e4c-9a2d-053966500990 | plan_process_templates          | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 400d7e11-aea8-45c7-96e8-c61523d66fe4 | plan_set_status_success         | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
| 9df60103-15e2-442e-8dc5-ff0d61dba449 | notify_zaqar                    | tripleo.plan_management.v1.create_default_ | 05d550f2-5d13-4782-be7f-a775a1d86a84 | SUCCESS | None                         |
|                                      |                                 | deployment_plan                            |                                      |         |                              |
+--------------------------------------+---------------------------------+--------------------------------------------+--------------------------------------+---------+------------------------------+

This gives you an idea of what Mistral did to accomplish the goal. You can also map it back to the workflow defined in tripleo-common to follow through the steps and find out what exactly was run. It if the workflow stopped too early, this can give you an idea of where the problem occurred.

Side-node about plans and the ERRORed tasks above

As of Newton, information about deployment is stored in a "Plan" which is implemented as a Swift container together with a Mistral environment. This could change in the future but for now that is what a plan is. (Edited to add: this changed in Pike. The plan environment is now stored in Swift as well, in a file named plan-environment.yaml.)

To create a new plan, we need to make sure there isn't already a container or an environment with that name. We could implement this in an action in Python, or since Mistral already has commands to get a container / get an environment we can be clever about this and reverse the on-error and on-success actions compared to usual:

https://github.com/openstack/tripleo-common/blob/156d2c/workbooks/plan_management.yaml#L129

If we do get a 'container' then it means it already exists and the plan already exists, so we cannot reuse this name. So 'on-success' becomes the error condition.

I sometimes regret a little us going this way because it leaves exception tracebacks in the logs, which is misleading when folks go to the Mistral logs for the first time in order to debug some other issue.

Finally I'd like to end all this by mentioning the Mistral Quick Start tutorial, which is excellent. It takes you from creating a very simple workflow to following its journey through the execution.

How to create your own action/workflow in TripleO

Mistral documentation:

In short:

  • Start writing your python code, probably under tripleo_common/actions
  • Add an entry point referencing it to setup.cfg
  • /!\ Restart Mistral /!\ Action code is only taken in once Mistral starts

This is summarised in the TripleO common README (personally I put this in a script to easily rerun it all).

Back to deployments: what's in a plan

As mentioned earlier, a plan is the combination of a as a Swift container + Mistral environment. In theory this is an implementation detail which shouldn't matter to deployers. In practice knowing this gives you access to a few more debugging tricks.

For example, the templates you initially provided will be accessible through Swift.

$ swift list $plan-name

Everything else will live in the Mistral environment. This contains:

  • The default passwords (which is a potential source of confusion)
  • The parameters_default aka overriden parameters (this takes priority and would override the passwords above)
  • The list of enabled environments (this looks nicer for plans created from the UI, as they are all munged into one user-environment.yaml file when deploying from CLI - see bug 1640861)
$ mistral environment-get $plan-name

For example, with an SSL-deployment done from the UI:

$ mistral environment-get ssl-overcloud
+-------------+-----------------------------------------------------------------------------------+
| Field       | Value                                                                             |
+-------------+-----------------------------------------------------------------------------------+
| Name        | ssl-overcloud                                                                     |
| Description | <none>                                                                            |
| Variables   | {                                                                                 |
|             |     "passwords": {                                                                |
|             |         "KeystoneFernetKey1": "V3Dqp9MLP0mFvK0C7q3HlIsGBAI5VM1aW9JJ6c5lLjo=",     |
|             |         "KeystoneFernetKey0": "ll6gbwcbhyAi9jNvBnpWDImMmEAaW5dog5nRQvzvEz4=",     |
|             |         "HAProxyStatsPassword": "NXgvwfJ23VHJmwFf2HmKMrgcw",                      |
|             |         "HeatPassword": "Fs7K3CxR636BFhyDJWjsbAQZr",                              |
|             |         "ManilaPassword": "Kya6gr2zp2x8ApD6wtwUUMcBs",                            |
|             |         "NeutronPassword": "x2YK6xMaYUtgn8KxyFCQXfzR6",                           |
|             |         "SnmpdReadonlyUserPassword": "5a81d2d83ee4b69b33587249abf49cd672d08541",  |
|             |         "GlancePassword": "pBdfTUqv3yxpH3BcPjrJwb9d9",                            |
|             |         "AdminPassword": "KGGz6ApEDGdngj3KMpy7M2QGu",                             |
|             |         "IronicPassword": "347ezHCEqpqhmANK4fpWK2MvN",                            |
|             |         "HeatStackDomainAdminPassword": "kUk6VNxe4FG8ECBvMC6C4rAqc",              |
|             |         "ZaqarPassword": "6WVc8XWFjuKFMy2qP2qqqVk82",                             |
|             |         "MysqlClustercheckPassword": "M8V26MfpJc8FmpG88zu7p3bpw",                 |
|             |         "GnocchiPassword": "3H6pmazAQnnHj24QXADxPrguM",                           |
|             |         "CephAdminKey": "AQDloEFYAAAAABAAcCT546pzZnkfCJBSRz4C9w==",               |
|             |         "CeilometerPassword": "6DfAKDFdEFhxWtm63TcwsEW2D",                        |
|             |         "CinderPassword": "R8DvNyVKaqA44wRKUXEWfc4YH",                            |
|             |         "RabbitPassword": "9NeRMdCyQhekJAh9zdXtMhZW7",                            |
|             |         "CephRgwKey": "AQDloEFYAAAAABAACIfOTgp3dxt3Sqn5OPhU4Q==",                 |
|             |         "TrovePassword": "GbpxyPdnJkUCjXu4AsjmgqZVv",                             |
|             |         "KeystoneCredential0": "1BNiiNQjthjaIBnJd3EtoihXu25ZCzAYsKBpPQaV12M=",    |
|             |         "KeystoneCredential1": "pGZ4OlCzOzgaK2bEHaD1xKllRdbpDNowQJGzJHo6ETU=",    |
|             |         "CephClientKey": "AQDloEFYAAAAABAAoTR3S00DaBpfz4cyREe22w==",              |
|             |         "NovaPassword": "wD4PUT4Y4VcuZsMJTxYsBTpBX",                              |
|             |         "AdminToken": "hdF3kfs6ZaCYPUwrTzRWtwD3W",                                |
|             |         "RedisPassword": "2bxUvNZ3tsRfMyFmTj7PTUqQE",                             |
|             |         "MistralPassword": "mae3HcEQdQm6Myq3tZKxderTN",                           |
|             |         "SwiftHashSuffix": "JpWh8YsQcJvmuawmxph9PkUxr",                           |
|             |         "AodhPassword": "NFkBckXgdxfCMPxzeGDRFf7vW",                              |
|             |         "CephClusterFSID": "3120b7cc-b8ac-11e6-b775-fa163e0ee4f4",                |
|             |         "CephMonKey": "AQDloEFYAAAAABAABztgp5YwAxLQHkpKXnNDmw==",                 |
|             |         "SwiftPassword": "3bPB4yfZZRGCZqdwkTU9wHFym",                             |
|             |         "CeilometerMeteringSecret": "tjyywuf7xj7TM7W44mQprmaC9",                  |
|             |         "NeutronMetadataProxySharedSecret": "z7mb6UBEHNk8tJDEN96y6Acr3",          |
|             |         "BarbicanPassword": "6eQm4fwqVybCecPbxavE7bTDF",                          |
|             |         "SaharaPassword": "qx3saVNTmAJXwJwBH8n3w8M4p"                             |
|             |     },                                                                            |
|             |     "parameter_defaults": {                                                       |
|             |         "OvercloudControlFlavor": "control",                                      |
|             |         "ComputeCount": "2",                                                      |
|             |         "ControllerCount": "3",                                                   |
|             |         "OvercloudComputeFlavor": "compute",                                      |
|             |         "NtpServer": "my.ntp-server.example.com"                                  |
|             |     },                                                                            |
|             |     "environments": [                                                             |
|             |         {                                                                         |
|             |             "path": "overcloud-resource-registry-puppet.yaml"                     |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/inject-trust-anchor.yaml"                       |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/tls-endpoints-public-ip.yaml"                   |
|             |         },                                                                        |
|             |         {                                                                         |
|             |             "path": "environments/enable-tls.yaml"                                |
|             |         }                                                                         |
|             |     ],                                                                            |
|             |     "template": "overcloud.yaml"                                                  |
|             | }                                                                                 |
| Scope       | private                                                                           |
| Created at  | 2016-12-02 16:27:11                                                               |
| Updated at  | 2016-12-06 21:25:35                                                               |
+-------------+-----------------------------------------------------------------------------------+

Note: 'environment' is an overloaded word in the TripleO world, be careful. Heat environment, Mistral environment, specific templates (e.g. TLS/SSL, Storage...), your whole setup, ...

Bonus track

There is documentation on going from zero (no plan, no nodes registered) till running a deployment, directly using Mistral: http://tripleo.org/mistral-api/mistral-api.html.

Also, with the way we work with Mistral and Zaqar, you can switch between the UI and CLI, or even using Mistral directly, at any point in the process.

~

Thanks to Dougal for his feedback on the initial outline!

links

social