For the second part of the PTG (vertical projects), I mainly stayed in the TripleO room, moving around a couple of times to attend cross-project sessions related to i18n.
Although I always wish I understood more/everything, in the end my areas of interest (and current understanding!) in TripleO are around the UI, installing and configuring it, the TripleO CLI, and the tripleo-common Mistral workflows. Therefore the couple of thoughts in this post are mainly relevant to these - if you're looking for a more exhaustive summary of the TripleO discussions and decisions made during the PTG, I recommend reading the PTL's excellent thread about this on the dev list, and the associated etherpads.
Random points of interest
- Containers is the big topic and had multiple sessions dedicated to it, both single and cross-projects. Many other sessions ended up revisiting the subject as well, sometimes with "oh that'll be solved with containers" and sometimes with "hm good but that won't work with containers."
- A couple of API-breaking changes may need to happen in Tripleo Heat Templates (e.g. for NFV, passing a role mapping vs a role name around). The recommendation is to get this in as early as possible (by the first milestone) and communicate it well for out of tree services.
- When needing to test something new on the CI, look at the existing scenarios and prioritise adding/changing something there to test for what you need, as opposed to trying to create a brand new job.
- Running Mistral workflows as part of or after the deployment came up several times and was even a topic during a cross-project Heat / Mistral / TripleO sessions. Things can get messy, switching between Heat, Mistral and Puppet. Where should these workflows live (THT, tripleo-common)? Service-specific workflows (pre/post-deploy) are definitely something people want and there's a need to standardise how to do that. Ceph's likely to be the first to try their hand at this.
- One lively cross-project session with OpenStack Ansible and Kolla was about parameters in configuration files. Currently whenever a new feature is added to Nova or whatever service, Puppet and so on need to be updated manually. The proposal is to make a small change to oslo.config to enable it to give an output in machine-readable YAML which can then be consumed (currently the config generated is only human readable). This will help with validations, and it may help to only have to maintain a structure as opposed to a template.
- Heat folks had a feedback session with us about the TripleO needs. They've been super helpful with e.g. helping to improve our memory usage over the last couple of cycles. My takeaway from this session was "beware/avoid using YAQL, especially in nested stacks." YAQL is badly documented and everyone ends up reading the source code and tests to figure out how to things. Bringing Jinja2 into Heat or some kind of way to have repeated patterns from resources (e.g. based on a file) also came up and was cautiously acknowledged.
- Predictable IP assignment on the control plane is a big enough issue that some people are suggesting dropping Neutron in the undercloud over it. We'd lose so many other benefits though, that it seems unlikely to happen.
- Cool work incoming allowing built-in network examples to Just Work, based on a sample configuration. Getting the networking stuff right is a huge pain point and I'm excited to hear this should be achievable within Pike.
Python 3 is an OpenStack community goal for Pike.
Tripleo-common and python-tripleoclient both have voting unit tests jobs for Python 3.5, though I trust them only moderately for a number of reasons. For example many of the tests tend to focus on the happy path and I've seen and fixed Python 3 incompatible code in exceptions several times (no 'message' attribute seems easy to get caught into), despite the unit testing jobs being all green. Apparently there are coverage jobs we could enable for the client, to ensure the coverage ratio doesn't drop.
Python 3 for functional tests was also brought up. We don't have functional tests in any of our projects and it's not clear the value we would get out of it (mocking servers) compared to the unit testing and all the integration testing we already do. Increasing unit test coverage was deemed a more valuable goal to pursue for now.
There are other issues around functional/integration testing with Python 3 which will need to be resolved (though likely not during Pike). For example our integration jobs run on CentOS and use packages, which won't be Python 3 compatible yet (cue SCL and the need to respin dependencies). If we do add functional tests, perhaps it would be easier to have them run on a Fedora gate (although if I recall correctly gating on Fedora was investigated once upon a time at the OpenStack level, but caused too many issues due to churn and the lack of long-term releases).
Another issue with Python 3 support and functional testing is that the TripleO client depends on Mistral server (due to the Series Of Unfortunate Dependencies I also mentioned in the last post). That means Mistral itself would need to fully support Python 3 as well.
Python 2 isn't going anywhere just yet so we still have time to figure things out. The conclusions, as described in Emilien's email seem to be:
- Improve the unit test coverage
- Enable the coverage job in CI
- Investigate functional testing for python-tripleoclient to start with, see if it makes sense and is feasible
Sample environment generator
Currently environment files in THT are written by hand and quite inconsistent. This is also important for the UI, which needs to display this information. For example currently the environment general description is in a comment at the top of the file (if it exists at all), which can't be accessed programmatically. Dependencies between environment files are not described either.
To make up for this, currently all that information lives in the capabilities map but it's external to the template themselves, needs to be updated manually and gets out of sync easily.
The sample environment generator to fix this has been out there for a year, and currently has two blockers. First, it needs a way to determine which parameters are private (that is, parameters that are expected to be passed in by another template and shouldn't be set by the user).
One way could be to use a naming convention, perhaps an underscore prefix similar to Python. Parameter groups cannot be used because of a historical limitation, there can only be one group (so you couldn't be both Private and Deprecated). Changing Heat with a new feature like Nested Groups or generic Parameter Tags could be an option. The advantage of the naming convention is that it doesn't require any change to Heat.
From the UI perspective, validating if an environment or template is redefining parameters already defined elsewhere also matters. Because if it has the same name, then it needs to be set up with the same value everywhere or it's uncertain what the final value will end up as.
I think the second issue was that the general environment description can only be a comment at the moment, there is no Heat parameter to do this. The Heat experts in the room seemed confident this is non-controversial as a feature and should be easy to get in.
Once the existing templates are updated to match the new format, the validation should be added to CI to make sure that any new patch with environments does include these parameters. Having "description" show up as an empty string when generating a new environment will make it more obvious that something can/should be there, while it's easy to forget about it with the current situation.
The agreement was:
- Use underscores as a naming convention to start with
- Start with a comment for the general description
Once we get the new Heat description attribute we can move things around. If parameter tags get accepted, likewise we can automate moving things around. Tags would also be useful to the UI, to determine what subset of relevant parameters to display to the user in smaller forms (easier to understand that one form with several dozens of fields showing up all at once). Tags, rather than parameter groups are required because of the aforementioned issue: it's already used for deprecation and a parameter can only have one group.
Trusts and federation
This was a cross-project session together with Heat, Keystone and Mistral. A "Trust" lets you delegate or impersonate a user with a subset of their rights. From my experience in TripleO, this is particularly useful with long running Heat stacks as a authentication token expires after a few hours which means you lose the ability to do anything in the middle of an operation.
Trusts have been working very well for Heat since 2013. Before that they had to encrypt the user password in order to ask for a new token when needed, which all agreed was pretty horrible and not anything people want to go back to. Unfortunately with the federation model and using external Identity Providers, this is no longer possible. Trusts break, but some kind of delegation is still needed for Heat.
There were a lot of tangents about security issues (obviously!), revocation, expiration, role syncing. From what I understand Keystone currently validates Trusts to make sure the user still has the requested permissions (that the relevant role hasn't been removed in the meantime). There's a desire to have access to the entire role list, because the APIs currently don't let us check which role is necessary to perform a specific action. Additionally, when Mistral workflows launched from Heat get in, Mistral will create its own Trusts and Heat can't know what that will do. In the end you always kinda end up needing to do everything. Karbor is running into this as well.
No solution was discovered during the session, but I think all sides were happy enough that the problem and use cases have been clearly laid out and are now understood.
Some of the issues relevant to the UI were brought up in other sessions, like standardising the environment files. Other issues brought up were around plan management, for example why do we use the Mistral environment in addition to Swift? Historically it looks like it's because it was a nice drop-in replacement for the defunct TripleO API and offered a similar API. Although it won't have an API by default, the suggestion is to move to using a file to store the environment during Pike and have a consistent set of templates: this way all the information related to a deployment plan will live in the same place. This will help with exporting/importing plans, which itself will help with CLI/UI interoperability (for instance there are still some differences in how and when the Mistral environment is generated, depending on whether you deployed with the CLI or the UI).
A number of other issues were brought up around networking, custom roles, tracking deployment progress, and a great many other topics but I think the larger problems around plan management was the only expected to turn into a spec, now proposed for review.
After the UI session I left the TripleO room to attend a cross-project session about i18n, horizon and release models. The release model point is particularly relevant because the TripleO UI is a newly internationalised project as of Ocata and the first to be cycle-trailing (TripleO releases a couple of weeks after the main OpenStack release).
I'm glad I was able to attend this session. For one it was really nice to collaborate directly with the i18n and release management team, and catch up with a couple of Horizon people. For second it turns out tripleo-ui was not properly documented as cycle-trailing (fixed now!) and that led to other issues.
Having different release models is a source of headaches for the i18n community, already stretched thin. It means string freezes happen at different times, stable branches are cut at different points, which creates a lot of tracking work for the i18n PTL to figure which project is ready and do the required manual work to update Zanata upon branching. One part of the solution is likely to figure out if we can script the manual parts of this workflow so that when the release patch (which creates the stable branch) is merged, the change is automatically reflected in Zanata.
For the non-technical aspects of the work (mainly keeping track of deadlines and string freezes) the decision was that if you want to be translated, then you need to respect the same deadlines than the cycle-with-milestones projects do on the main schedule, and if a project doesn't want to - if it doesn't respect the freezes or cut branches when expected, then they will be dropped from the i18n priority dashboard in Zanata. This was particularly relevant for Horizon plugins, as there's about a dozen of them now with various degrees of diligence when it comes to doing releases.
These expectations will be documented in a new governance tag, something like i18n:translated.
Obviously this would mean that cycle-trailing projects would likely never be able to get the tag. The work we depend on lands late and so we continue making changes up to two weeks after each of the documented deadlines. ianychoi, the i18n PTL seemed happy to take these projects under the i18n wing and do the manual work required, as long as there is an active i18n liaison for the project communicating with the i18n team to keep them informed about freezes and new branches. This seemed to work ok for us during Ocata so I'm hopeful we can keep that model. It's not entirely clear to me if this will also be documented/included in the governance tag so I look forward to reading the spec once it is proposed! :)
In the case of tripleo-ui we're not a priority project for translations nor looking to be, but we still rely on the i18n PTL to create Zanata branches and merge translations for us, and would like to continue with being able to translate stable branches as needed.
The CI Q&A session on Friday morning was amazingly useful and unanimously agreed it should be moved to the documentation (not done yet). If you've ever scratched your head about something related to TripleO CI, have a look at the etherpad!