Planet RDO

October 18, 2018

Adam Young

Creating a Self Trust In Keystone

Lets say you are an administrator of an OpenStack cloud. This means you are pretty much all powerful in the deployment. Now, you need to perform some operation, but you don’t want to give it full admin privileges? Why? well, do you work as root on your Linux box? I hope note. Here’s how to set up a self trust for a reduced set of roles on your token.

First, get a regular token, but use the –debug to see what the project ID, role ID, and your User ID actually are:

In my case, they are … long uuids.

I’ll trim them down both for obscurity as well as the make it more legible. Here is the command to create the trust.

openstack trust create --project 9417f7 --role 9fe2ff 154741 154741

Mine returned:

+--------------------+----------------------------------+
| Field              | Value                            |
+--------------------+----------------------------------+
| deleted_at         | None                             |
| expires_at         | None                             |
| id                 | 26f8d2                           |
| impersonation      | False                            |
| project_id         | 9417f7                           |
| redelegation_count | 0                                |
| remaining_uses     | None                             |
| roles              | _member_                         |
| trustee_user_id    | 154741                           |
| trustor_user_id    | 154741                           |
+--------------------+----------------------------------+

On my system, role_id 9fe2ff is the _member_role.

Note that, if you are Admin, you need to explicitly grant yourself the _member_ role, or use an implied role rule that says admin implies member.

Now, you can get a reduced scope token. Unset the variables that are used to scope the token, since you want to scope to the trust now.

$ unset OS_PROJECT_DOMAIN_NAME 
$ unset OS_PROJECT_NAME 
$ openstack token issue --os-trust-id  26f8d2eaf1404489ab8e8e5822a0195d
+------------+----------------------------------+
| Field      | Value                            |
+------------+----------------------------------+
| expires    | 2018-10-18T10:31:57+0000         |
| id         | f16189                           |
| project_id | 9417f7                           |
| user_id    | 154741                           |
+------------+----------------------------------+

This still requires you to authenticate with your userid and password. An even better mechanism is the new Application Credentials API. It works much the same way, but you use an explicitly new password. More about that next time.

by Adam Young at October 18, 2018 02:44 AM

October 16, 2018

Website and blog of Pablo Iranzo Gómez

Contributing to OSP upstream a.k.a. Peer Review

Introduction

In the article "Contributing to OpenStack" we did cover on how to prepare accounts and prepare your changes for submission upstream (and even how to find low hanging fruits to start contributing).

Here, we'll cover what happens behind the scene to get change published.

Upstream workflow

Peer review

Upstream contributions to OSP and other projects are based on Peer Review, that means that once a new set of code has been submitted, several steps for validation are required/happen before having it implemented.

The last command executed (git-review) on the submit sequence (in the prior article) will effectively submit the patch to the defined git review service (git-review -s does the required setup process) and will print an URL that can be used to access the review.

Each project might have a different review platform, but usually for OSP it's https://review.openstack.org while for other projects it can be https://gerrit.ovirt.org, https://gerrithub.io, etc (this is defined in .gitreview file in the repository).

A sample .gitreview file looks like:

[gerrit]
host=review.gerrithub.io
port=29418
project=citellusorg/citellus.git

For a review example, we'll use one from gerrithub from Citellus project:

https://review.gerrithub.io/#/c/380646/

Here, we can see that we're on review 380646 and that's the link that allows us to check the changes submitted (the one printed when executing git-review).

CI tests (Verified +1)

Once a review has been submitted, usually the bots are the first ones to pick them and run the defined unit testing on the new changes, to ensure that it doesn't break anything (based on what is defined to be tested).

This is a critical point as:

  • Tests need to be defined if new code is added or modified to ensure that later updates doesn't break this new code without someone being aware.
  • Infrastructure should be able to test it (for example you might need some specific hardware to test a card or network configuration)
  • Environment should be sane so that prior runs doesn't affect the validation.

OSP CI can be checked at 'Zuul' http://zuul.openstack.org/ where you can 'input' the number for your review and see how the different bots are running CI tests on it or if it's still queued.

If everything is OK, the bot will 'vote' your change as Verified +1 allowing others to see that it should not break anything based on the tests performed

In the case of OSP, there's also third-party CI's that can validate other changes by third party systems. For some of them, the votes are counting towards or against the proposed change, for others it's just a comment to take into account.

Even if sometimes you know that your code is right, there's a failure because of the infrastructure, in those cases, writing a new comment saying recheck, will schedule a new CI test run.

This is common usually during busy periods when it's harder for the scheduler to get available resources for the review validation. Also, sometimes there are errors in the configuration of CI that must be fixed in order to validate those changes.

Note: you can run some of the tests on your system to validate faster if you've issues by running tox this will setup virtual environment for tests to be run so it's easier to catch issues before upstream CI does (so it's always a good idea to run tox even before submitting the review with git-review to detect early errors).

This is however not always possible as some changes include requirements like testing upgrades, full environment deployments, etc that cannot be done without the required preparation steps or even the infrastructure.

Code Review+2

This is probably the 'longest' process, it requires peers to be added as 'reviewer' (you can get an idea on the names based on other reviews submitted for the same component) or they will pick up new reviews as the pop un on notification channels or pending queues.

On this, you must prepare mentally for everything... developers could suggest to use a different approach, or highlight other problems or just do some small nit comments to fixes like formating, spacing, var naming, etc.

After each comment/change suggested, repeat the workflow for submitting a new patchset, but make sure you're using the same review id (that's by keeping the commit id that is appended): this allows the Code Review platform to identify this change as an update to a prior one, and allow you for example to compare changes across versions, etc. (and also notify the prior reviewers of new changes).

Once reviewers are OK with your code, and with some 'Core' developers also agreeing, you'll see some voting happening (-2..+2) meaning they like the change in its actual form or not.

Once you get Code Review +2 and with the prior Verified +1 you're almost ready to get the change merged.

Workflow+1

Ok, last step is to have someone with Workflow permissions to give a +1, this will 'seal' the change saying that everything is ok (as it had CR+2 and Verified+1) and change is valid...

This vote will trigger another build by CI, and when finished, the change will be merged into the code upstream, congratulations!

Cannot merge, please rebase

Sometimes, your change is doing changes on the same files that other programmers did on the code, so there's no way to automatically 'rebase' the change, in this case the bad news is that you need to:

git checkout master # to change to master branch
git pull # to push latest upstream changes
git checkout yourbranch # to get back to your change branch
git rebase master # to apply your changes on top of current master

After this step, it might be required to manually fix the code to solve the conflicts and follow instructions given by git to mark them as reviewed.

Once it's done, remember to do like with any patchset you submited afterwards:

git commit --amend # to commit the new changes on the same commit Id you used
git-review # to upload a new version of the patchset

This will start over the progress, but will, once completed to get the change merged.

How do we do it with Citellus?

In Citellus we've replicated more or less what we've upstream... even the use of tox.

Citellus does use https://gerrithub.io (free service that hooks on github and allows to do PR)

We've setup a machine that runs Jenkins to do 'CI' on the tests we've defined (mostly for python wrapper and some tests) and what effectively does is to run tox, and also, we do use https://travis-ci.org free Tier to repeat the same on other platform.

Tox is a tool that allows to define several commands that are executed inside python virtual environments, so without touching your system libraries, it can get installed new ones or removed just for the boundaries of that test, helping into running:

  • pep8 (python formating compliance)
  • py27 (python 2.7 environment test)
  • py35 (python 3.5 environment test)

The py tests are just to validate the code can run on both base python versions, and what they do is to run the defined unit testing scripts under each interpreter to validate.

For local test, you can run tox and it will go trough the different tests defined and report status... if everything is ok, it should be possible that your new code review passes also CI.

Jenkins will do the +1 on verified and 'core' reviewers will give +2 and 'merge' the change once validated.

Hope you enjoy!

Pablo

by Pablo Iranzo Gómez at October 16, 2018 05:32 AM

October 09, 2018

RDO Blog

Stein PTG Summary for Documentation and i18n

Ian Y. Choi and I already shared a summary of docs and i18n updates from the Stein Project Teams Gathering with the openstack-dev mailing list, but I also wanted to post the updates here for wider distribution. So, here comes what I found the most interesting out of our docs- and i18n-related meetings and discussions we had in Denver from 10 through 14 September.

The overall schedule for all our sessions with additional comments and meeting minutes can be found in OpenStack Etherpad.

First things first, so the following is our obligatory team picture (with quite a few members missing); picture courtesy of OpenStack Foundation folks:

Operators documentation

We met with the Ops community to discuss the future of Ops docs. The plan is for the Ops group to take ownership of the operations-guide (done), ha-guide (in progress), and the arch-design guide (to do).

These three documents are being moved from the openstack-manuals repository to their own repos, owned by the newly formed Operations Documentation SIG.

See also ops-meetup-ptg-denver-2018-operations-guide for more notes.

Documentation site and design

We discussed improving the docs.openstack.org site navigation, guide summaries (in particular, install-guide), adding a new index page for project team contrib guides, and more. We met with the OpenStack Foundation staff to discuss the possibility of getting assistance with site design work.

We are also looking into accepting contributions from the Strategic Focus Areas folks to make parts of the docs toolchain like openstackdocstheme more easily reusable outside of the official OpenStack infrastructure. Support for some of the external project docs has already landed in git.

We got feedback on our front page template for project team docs, with Ironic being the pilot for us.

We got input on restructuring and reworking specs site to make it easier for users to understand that specs are not feature descriptions nor project docs, and to make it more consistent in how the project teams publish their specs. This will need to be further discussed with the folks owning the specs site infra.

Support status badges showing at the top of docs.openstack.org pages may not work well for projects following the cycle-with-intermediary release model, such as swift. We need to rethink how we configure and present the badges.

There are also some UX bugs present in badges (for instance, bug 1788389).

Translations

We met with the infra team to discuss progress on translating project team docs and, related to that, generating PDFs.

With the Foundation staff, we discussed translating Edge and Container whitepapers and similar material.

More details in Ian’s notes.

Reference, REST API docs and Release Notes

With the QA team, we discussed the scope and purpose of the /doc/source/reference documentation area in project docs. Because the scope of /reference might be unclear and can be used inconsistently by project teams, the suggestion is to continue with the original migration plan and migrate REST API and possibly Release Notes under /doc/source, as documented in doc-contrib-guide.

Contributor Guide

The OpenStack Contributor Guide was discussed in a separate session, see FC_SIG_ptg_stein for notes.

Thanks!

Finally, I’d like to thank everybody who attended the sessions, and a special thanks goes to all the PTG organizers and the OpenStack community in general for all their work!

by Petr Kovar at October 09, 2018 02:14 PM

October 08, 2018

dmsimard

AnsibleFest 2018: Community project highlights

With two days of AnsibleFest instead of one this time around, we had 100% more time to talk about Ansible things ! I got to attend great sessions, learn a bunch of things, chat and exchange war stories about Ansible, ARA, Zuul, Tower and many other things. It was awesome and I wanted to take the time to share a bit about some of the great Ansible community projects that were featured during the event.

October 08, 2018 12:00 AM

October 07, 2018

Emilien Macchi

OpenStack Containerization with Podman – Part 3 (Upgrades)

For this third episode, here are some thoughts on how upgrades from Docker to Podman could work for us in OpenStack TripleO. Don’t miss the first and second episodes where we learnt how to deploy and operate Podman containers.

I spent some time this week to investigate how we could upgrade the OpenStack Undercloud that is running Docker containers to run Podman containers, without manual intervention nor service disruption. The way I see it as this time (the discussion is still ongoing), is we could remove the Docker containers in Paunch, just before starting the Podman containers and service in Systemd. It would be done per container, in serial.

for container in containers:
    docker rm container
    podman run container
    create systemd unit file && enable service

In the follow demo, you can see the output of openstack undercloud upgrade with a work in progress prototype. You can observe the HAproxy running in Docker, and during the Step 1 of containers deployment, the container is stopped (top right) and immediately started in Podman (bottom right).

You might think “that’s it?”. Of course not. There are still some problems that we want to figure out:

  • Migrate containers not managed by Paunch (Neutron containers, Pacemaker-managed containers, etc).
  • Whether or not we want to remove the Docker container or just stop (in the demo the containers are removed from Docker).
  • Stopping Docker daemon at the end of the upgrade (will probably be done by upgrade_tasks in Docker service from TripleO Heat Templates).

The demo is a bit long as it shows the whole upgrade output. However if you want to see when HAproxy is stopped from Docker and started in Podman, go to 7 minutes. Also don’t miss the last minute of the video where we see the results (podman containers, no more docker containers managed by Paunch, and SystemD services).

Thanks for following this series of OpenStack / Podman related posts. Stay in touch for the next one! By the way, did you know you could follow our backlog here? Any feedback on these efforts are warmly welcome!

by Emilien at October 07, 2018 03:08 PM

October 05, 2018

John Likes OpenStack

Updating ceph-ansible in a containerized undercloud

Update

What's below won't be the case for much longer because ceph-ansible will be come a dependency of TripleO and the mistral-executor container will bind mount the ceph-ansible source directory on the container host. What's in this post could still be used as an example of updating a package in a TripleO container but don't be mislead about it being the way to update ceph-ansible any longer.

Original Content

In Rocky the TripleO undercloud will run containers. If you're using TripleO to deploy Ceph in Rocky, this means that ceph-ansible shouldn't be installed on your undercloud server directly because your undercloud server is a container host. Instead ceph-ansible should be installed on the mistral-executor container because, as per config-download, That is the container which runs ansible to configure the overcloud.

If you install ceph-ansible on your undercloud host it will lead to confusion about what version of ceph-ansible is being used when you try to debug it. Instead install it on the mistral-executor container.

So this is the new normal in Rocky on an undercloud that can deploy Ceph:


[root@undercloud-0 ~]# rpm -q ceph-ansible
package ceph-ansible is not installed
[root@undercloud-0 ~]#

[root@undercloud-0 ~]# docker ps | grep mistral
0a77642d8d10 192.168.24.1:8787/tripleomaster/openstack-mistral-api:2018-08-20.1 "kolla_start" 4 hours ago Up 4 hours (healthy) mistral_api
c32898628b4b 192.168.24.1:8787/tripleomaster/openstack-mistral-engine:2018-08-20.1 "kolla_start" 4 hours ago Up 4 hours (healthy) mistral_engine
c972b3e74cab 192.168.24.1:8787/tripleomaster/openstack-mistral-event-engine:2018-08-20.1 "kolla_start" 4 hours ago Up 4 hours (healthy) mistral_event_engine
d52708e0bab0 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1 "kolla_start" 4 hours ago Up 4 hours (healthy) mistral_executor
[root@undercloud-0 ~]#

[root@undercloud-0 ~]# docker exec -ti d52708e0bab0 rpm -q ceph-ansible
ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch
[root@undercloud-0 ~]#

So what happens if you're in a situation where you want to try a different ceph-ansible version on your unercloud?

In the next example I'll update my mistral-executor container from ceph-ansible rc18 to rc21. These commands are just variations of the upstream documentationbut with a focus on updating the undercloud, not overcloud, container. Here's the image I want to update:


[root@undercloud-0 ~]# docker images | grep mistral-executor
192.168.24.1:8787/tripleomaster/openstack-mistral-executor 2018-08-20.1 740bb6f24755 2 days ago 1.05 GB
[root@undercloud-0 ~]#
I have a copy of ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm in my current working directory

[root@undercloud-0 ~]# mkdir -p rc21
[root@undercloud-0 ~]# cat > rc21/Dockerfile <
> FROM 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
> USER root
> COPY ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm .
> RUN yum install -y ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm
> USER mistral
> EOF
[root@undercloud-0 ~]#
So again that file is (for copy/paste later):

[root@undercloud-0 ~]# cat rc21/Dockerfile
FROM 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
USER root
COPY ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm .
RUN yum install -y ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm
USER mistral
[root@undercloud-0 ~]#
Build the new container

[root@undercloud-0 ~]# docker build --rm -t 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1 ~/rc21
Sending build context to Docker daemon 221.2 kB
Step 1/5 : FROM 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
---> 740bb6f24755
Step 2/5 : USER root
---> Using cache
---> 8d7f2e7f9993
Step 3/5 : COPY ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm .
---> 54fbf7185eec
Removing intermediate container 9afe4b16ba95
Step 4/5 : RUN yum install -y ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm
---> Running in e80fce669471

Examining ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm: ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch
Marking ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch.rpm as an update to ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch
Resolving Dependencies
--> Running transaction check
---> Package ceph-ansible.noarch 0:3.1.0-0.1.rc18.el7cp will be updated
---> Package ceph-ansible.noarch 0:3.1.0-0.1.rc21.el7cp will be an update
--> Finished Dependency Resolution

Dependencies Resolved

================================================================================
Package
Arch Version Repository Size
================================================================================
Updating:
ceph-ansible
noarch 3.1.0-0.1.rc21.el7cp /ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch 1.0 M

Transaction Summary
================================================================================
Upgrade 1 Package

Total size: 1.0 M
Downloading packages:
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Updating : ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch 1/2
Cleanup : ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch 2/2
Verifying : ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch 1/2
Verifying : ceph-ansible-3.1.0-0.1.rc18.el7cp.noarch 2/2

Updated:
ceph-ansible.noarch 0:3.1.0-0.1.rc21.el7cp

Complete!
---> 41a804e032f5
Removing intermediate container e80fce669471
Step 5/5 : USER mistral
---> Running in bc0db608c299
---> f5ad6b3ed630
Removing intermediate container bc0db608c299
Successfully built f5ad6b3ed630
[root@undercloud-0 ~]#
Upload the new container to the registry:

[root@undercloud-0 ~]# docker push 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
The push refers to a repository [192.168.24.1:8787/tripleomaster/openstack-mistral-executor]
606ffb827a1b: Pushed
fc3710ffba43: Pushed
4e770d9096db: Layer already exists
4d7e8476e5cd: Layer already exists
9eef3d74eb8b: Layer already exists
977c2f6f6121: Layer already exists
00860a9b126f: Layer already exists
366de6e5861a: Layer already exists
2018-08-20.1: digest: sha256:50aae064d930e8d498702673c6703b70e331d09e966c6f436b683bb152e80337 size: 2007
[root@undercloud-0 ~]#
Now we see new the f5ad6b3ed630 container in addition to the old one:

[root@undercloud-0 ~]# docker images | grep mistral-executor
192.168.24.1:8787/tripleomaster/openstack-mistral-executor 2018-08-20.1 f5ad6b3ed630 4 minutes ago 1.09 GB
192.168.24.1:8787/tripleomaster/openstack-mistral-executor 740bb6f24755 2 days ago 1.05 GB
[root@undercloud-0 ~]#
The old container is still running though:

[root@undercloud-0 ~]# docker ps | grep mistral
373f8c17ce74 192.168.24.1:8787/tripleomaster/openstack-mistral-api:2018-08-20.1 "kolla_start" 6 hours ago Up 6 hours (healthy) mistral_api
4f171deef184 192.168.24.1:8787/tripleomaster/openstack-mistral-engine:2018-08-20.1 "kolla_start" 6 hours ago Up 6 hours (healthy) mistral_engine
8f25657237cd 192.168.24.1:8787/tripleomaster/openstack-mistral-event-engine:2018-08-20.1 "kolla_start" 6 hours ago Up 6 hours (healthy) mistral_event_engine
a7fb6df4e7cf 740bb6f24755 "kolla_start" 6 hours ago Up 6 hours (healthy) mistral_executor
[root@undercloud-0 ~]#
Merely updating the image doesn't restart the container and neither does `docker restart a7fb6df4e7cf`. Instead I need to stop it and start it but there's a lot that goes into starting these containers with the correct parameters.

The upstream docs section on Debugging with Paunch shows me a command to get the exact command that was used to start my container. I just needed to use `paunch list | grep mistral` first to know I need to look at the tripleo_step4.


[root@undercloud-0 ~]# paunch debug --file /var/lib/tripleo-config/docker-container-startup-config-step_4.json --container mistral_executor --action print-cmd
docker run --name mistral_executor-glzxsrmw --detach=true --env=KOLLA_CONFIG_STRATEGY=COPY_ALWAYS --net=host --health-cmd=/openstack/healthcheck --privileged=false --restart=always --volume=/etc/hosts:/etc/hosts:ro --volume=/etc/localtime:/etc/localtime:ro --volume=/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro --volume=/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro --volume=/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro --volume=/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro --volume=/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro --volume=/dev/log:/dev/log --volume=/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro --volume=/etc/puppet:/etc/puppet:ro --volume=/var/lib/kolla/config_files/mistral_executor.json:/var/lib/kolla/config_files/config.json:ro --volume=/var/lib/config-data/puppet-generated/mistral/:/var/lib/kolla/config_files/src:ro --volume=/run:/run --volume=/var/run/docker.sock:/var/run/docker.sock:rw --volume=/var/log/containers/mistral:/var/log/mistral --volume=/var/lib/mistral:/var/lib/mistral --volume=/usr/share/ansible/:/usr/share/ansible/:ro --volume=/var/lib/config-data/nova/etc/nova:/etc/nova:ro 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1
[root@undercloud-0 ~]#
Now that I know the command I can see my six-hour old conatiner:

[root@undercloud-0 ~]# docker ps | grep mistral_executor
a7fb6df4e7cf 740bb6f24755 "kolla_start" 6 hours ago Up 12 minutes (healthy) mistral_executor
[root@undercloud-0 ~]#
stop it

[root@undercloud-0 ~]# docker stop a7fb6df4e7cf
a7fb6df4e7cf
[root@undercloud-0 ~]#
ensure it's gone

[root@undercloud-0 ~]# docker rm a7fb6df4e7cf
Error response from daemon: No such container: a7fb6df4e7cf
[root@undercloud-0 ~]#
and then run the command I got from above to start the container and finally see my new container

[root@undercloud-0 ~]# docker ps | grep mistral-executor
d8e4073441c0 192.168.24.1:8787/tripleomaster/openstack-mistral-executor:2018-08-20.1 "kolla_start" 14 seconds ago Up 13 seconds (health: starting) mistral_executor-glzxsrmw
[root@undercloud-0 ~]#
Finally I confirm that my container has the new ceph-ansible package:

(undercloud) [stack@undercloud-0 ~]$ docker exec -ti d8e4073441c0 rpm -q ceph-ansible
ceph-ansible-3.1.0-0.1.rc21.el7cp.noarch
(undercloud) [stack@undercloud-0 ~]$
I was then able to deploy my overcloud and see that the rc21 version fixed a bug.

by John (noreply@blogger.com) at October 05, 2018 12:13 PM

October 04, 2018

Emilien Macchi

OpenStack Containerization with Podman – Part 2 (SystemD)

In the first post, we demonstrated that we can now use Podman to deploy a containerized OpenStack TripleO Undercloud. Let’s see how we can operate the containers with SystemD.

Podman, by design, doesn’t have any daemon running to manage the containers lifecycle; while Docker runs dockerd-current and docker-containerd-current which take care of a bunch of things, such as restarting the containers when they are in failure (and configured to do it, with restart policies).

In OpenStack TripleO, we still want our containers to restart when they are configured to, so we thought about managing the containers with SystemD. I recently wrote a blog post about how Podman can be controlled by SystemD, and we finally implemented it in TripleO.

The way it works, as of today, is that any container managed by Podman with a restart policy in Paunch container configuration, will be managed by SystemD.

Let’s take the example of Glance API. This snippet is the configuration of the container at step 4:

    step_4:
      map_merge:
        - glance_api:
            start_order: 2
            image: *glance_api_image
            net: host
            privileged: {if: [cinder_backend_enabled, true, false]}
            restart: always
            healthcheck:
              test: /openstack/healthcheck
            volumes: *glance_volumes
            environment:
              - KOLLA_CONFIG_STRATEGY=COPY_ALWAYS

As you can see, the Glance API container was configured to always try to restart (so Docker would do so). With Podman, we re-use this flag and we create (+ enable) a SystemD unit file:

[Unit]
Description=glance_api container
After=paunch-container-shutdown.service
[Service]
Restart=always
ExecStart=/usr/bin/podman start -a glance_api
ExecStop=/usr/bin/podman stop -t 10 glance_api
KillMode=process
[Install]
WantedBy=multi-user.target

How it works underneath:

  • Paunch will run podman run to start the container, during the deployment steps.
  • If there is a restart policy, Paunch will create a SystemD unit file.
  • The SystemD service is named by the container name, so if you were used to the old services name before the containerization, you’ll have to refresh your mind. By choice, we decided to go with the container name to avoid confusion with the podman ps output.
  • Once the containers are deployed, they need to be stopped / started / restarted by SystemD. If you run Podman CLI to do it, SystemD will take over (see in the demo).

Stay in touch for the next post in the series of deploying TripleO and Podman!

by Emilien at October 04, 2018 11:14 PM

OpenStack Containerization with Podman – Part 1 (Undercloud)

In this series of blog posts, we’ll demonstrate how we can replace Docker by Podman when deploying OpenStack containers with TripleO.

Group of seals, also named as a podGroup of seals, also named as a pod

This first post will focus on the Undercloud (the deployment cloud) which contains the necessary components to deploy and manage an “Overcloud” (a workload cloud). During the Rocky release, we switched the Undercloud to be containerized by default, using the same mechanism as we did for the Overcloud. If you need to be convinced by Podman, I strongly suggest to see this talk but in short, Podman bring more security and make systems more lightweight. It also brings containers into a Kubernetes friendly environment.

Note: Deploying OpenStack on top of Kubernetes isn’t in our short-term roadmap and won’t be discussed during these blog posts for now.

To reproduce this demo, you’ll need to follow the official documentation which explains how to deploy an Undercloud but change the undercloud.conf to have container_cli = podman (instead of default docker for now).

In the next post, we’ll talk about operational changes when containers are managed with Podman versus Docker.

by Emilien at October 04, 2018 09:39 PM

September 26, 2018

Website and blog of Juan Antonio (Ozz) Osorio

Oslo Policy Deep Dive (part 2)

In the previous blog post I covered all you need to know to write your own policies and understand where they come from.

Here, We’ll go through some examples of how you would change the policy for a service, and how to take that new policy into use.

For this, I’ve created a repository to try things out and hopefully get you practicing this kind of thing. Of course, things will be slightly different in your environment, depending on how you’re running OpenStack. But you should get the basic idea.

We’ll use Barbican as a test service to do basic policy changes. The configuration that I’m providing is not meant for production, but it makes it easier to make changes and test things out. It’s a very minimal and simple barbican configuration that has the “unauthenticated” context enabled. This means that it doesn’t rely on keystone, and it will use whatever roles and project you provide in the REST API.

The default policy & how to change it

As mentioned in the previous blog post, nowadays, the default policy in “shipped” as part of the codebase. For some services, folks might still package the policy.json file. However, for our test service (Barbican), this is not the case.

You can easily overwrite the default policy by providing a policy.json file yourself. By default, oslo.policy will read the project’s base directory, and try to get the policy.json file from there. For barbican, this will be /etc/barbican/policy.json. For keystone, /etc/keystone/policy.json.

It is worth noting that this file is configurable by setting the policy_file setting in your service’s configuration, which is under the oslo_policy group of the configuration file.

If you have a running service, and you add or modify the policy.json file, the changes will immediately take effect. No need to restart nor reload your service.

The way this works is that olso.policy will attempt to read the file’s modification time (using os.path.getmtime(filename)), and cache that. If on a subsequent read, the modification time has changed, it’ll re-read the policy file and load the new rules.

It is also worth noting that when using policy.json, you don’t need to provide the whole policy, only the rules and aliases you’re planning to change.

If you need to get the policy of a specific service, it’s fairly straightforward given the tools that oslo.policy provides. All you need to do is the following:

oslopolicy-policy-generator --namespace $SERVICE_NAME

It is important to note that this will get you the effective policy that’s being executed. So, any changes that you make to the policy will be reflected in the output of this command.

If you want to get a sample file for the default policy with all the documentation for each option, you’ll do the following:

oslopolicy-sample-generator --namespace $SERVICE_NAME

So, in order to output Barbican’s effective policy, we’ll do the following:

oslopolicy-policy-generator --namespace barbican

Note that this outputs the policy in yaml format, and oslo.policy reads policy.json by default, so you’ll have to tranform such file into json to take it into use.

Setting up the testing environment

NOTE: If you just plan to read through this and not actually do the exercises, you may skip this section.

Lets clone the repository first:

git clone https://github.com/JAORMX/barbican-policy-tests.git
cd barbican-policy-tests

Now that we’re in the repo, you’ll notice several scripts there. To provide you with a consistent environemnt, I decided to rely on containeeeers!!! So, in order to continue, you’ll need to have Docker installed in your system.

(Maybe in the future I’ll update this to run with Podman and Buildah)

To build the minimal barbican container, execute the following:

./build-barbican-container-image.sh

You can verify that you have the barbican-minimal image with the latest tag by running docker images.

To test that the image was built correctly and you can run barbican, execute the following:

./run-barbican-simple.sh

You will notice barbican is running, and can see the name of its container with docker ps. You’ll notice its listening on the port 9311 on localhost.

Exercises

Preface

In the following exercises, we’ll do some changes to the Barbican policy. To do this, it’s worth understanding some things about the service and the policy itself.

Barbican is Secret Storage as a service. To simplify things, we’ll focus on the secret storage side of things.

There are the operations you can do on a secret:

  • secrets:get: List all secrets for the specific project.

  • secrets:post: Create a new secret.

  • secret:decrypt: Decrypt the specified secret.

  • secret:get: Get the metadata for the specified secret.

  • secret:put: Modify the specified secret.

  • secret:delete: Delete the specified secret.

Barbican also assumes 5 keystone roles, and bases its policy on the usage of these roles:

  • admin: Can do all operations on secrets (List, create, read, update, delete and decrypt)

  • creator: Can do all operations on secrets; This role is limited on other resources (such as secret containers), but we’ll ignore other resources in this exercises.

  • observer: In the context of secrets, observers can only list secrets and view a specific secret’s metadata.

  • audit: In the context of secrets, auditors can only view a specific secret’s metadata (but cannot do anything else).

  • service_admin: can’t do anything related to secrets. This role is meant for admin operations that change the Barbican service itself (such as quotas).

The Barbican default policy also comes with some useful aliases as defaults:

{
"admin": "role:admin",
"observer": "role:observer",
"creator": "role:creator",
"audit": "role:audit",
"service_admin": "role:key-manager:service-admin",
...
}

So this makes overwriting specific roles fairly straight forward.

Scenario #1

The Keystone default roles proposal proposes the usage of three roles, which should also work with all OpenStack services. These roles are: reader, member and admin.

Lets take this into use in Barbican, and replace our already existing observer role, for reader.

In this case, we can take the alias into use, by doing very minimal changes, we can replace the usage of observer entirely.

I have already defined this role in the aforementioned repo, lets take a look:

{
"observer": "role:reader"
}

And that’s it!

Now in the barbican policy, every instance of the “rule:observer” assertion will actually reference the “reader” role.

Testing scenario #1

There is already a script that runs barbican and takes this policy into use. Lets run it, and verify that we can effectively use the reader role instead of the observer role:

# Run the container
./run-barbican-with-reader-role.sh

# Create a sample secret
./create-secret.sh

# Attempt to list the available secrets with the reader role. This
# operation should succeed.
./list-secrets.sh reader

# Attempt to list the available secrets with the observer role. This
# operation should fail.
./list-secrets.sh observer

# Once you're done, you can stop the container

Scenario #2

Barbican’s audit role is meant to only read a very minimal set of things from the barbican’s entities. For some, this role might not be very useful, and it also doesn’t fit with Keystone’s set of default roles, so lets delete it!

As before, I have already defined a policy for this purpose:

{
"audit": "!"
}

As you can see, this replace the audit alias, and any attempt to use it will be rejected in the policy, effectively dissallowing the audit role use.

Testing scenario #2

There is already a script that runs barbican and takes this policy into use. Lets run it, and verify that we can effectively no longer use the audit role:

# run the container
./run-barbican-without-audit-policy.sh

# create a secret
./create-secret.sh

# Attempt to view the secret metadata with the creator role. This
# operation should succeed.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Attempt to view the secret metadata with the audit role. This
# operation should fail.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: audit' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Once you're done, you can stop the container

Scenario #3

Now that we have tried a couple of things and it has gone fine. Lets put it all together and replicate the Keystone default role recommendation.

Here’s what we’ll do: As before, we’ll replace the observer role with reader. We’ll also replace the creator role with member, and finally, we’ll remove the audit role.

Here’s the policy file:

{
"observer": "role:reader",
"creator": "role:member",
"audit": "!"
}

This time, we’ll change the policy file in-place, as this is something you might need to do or automate in your own deployment.

Testing scenario #3

Here, we’ll run a minimal container that doesn’t take any specific policy into use. We’ll log into it, modify the policy.json file, and test out the results.

# Run the container
./run-barbican-simple.sh

# Open a bash session in the container
docker exec -ti (docker ps | grep barbican-minimal | awk '{print $1}') bash

# (In the container) Create the new policy file
cat <<EOF > /etc/barbican/policy.json
{
"observer": "role:reader",
"creator": "role:member",
"audit": "!"
}
EOF

# (In the container) Exit the container
exit

# Attempt to create a sample secret with the creator role. This operation
# should fail
./create-secret.sh creator

# Attempt to create a sample secret with the member role. This operation
# should succeed
./create-secret.sh member

# Attempt to list the available secrets with the observer role. This
# operation should fail.
./list-secrets.sh observer

# Attempt to list the available secrets with the reader role. This
# operation should succeed.
./list-secrets.sh reader

# Attempt to view the secret metadata with the audit role. This
# operation should fail.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: audit' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Attempt to view the secret metadata with the creator role. This
# operation should fail.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Attempt to view the secret metadata with the member role. This
# operation should succeed.
curl -H 'X-Project-Id: 1234' -H 'X-Roles: member' \
    http://localhost:9311/v1/secrets/<some ID> | python -m json.tool

# Once you're done, you can stop the container

Scenario #4

For our last case, lets assume that for some reason you need a “super-admin” role that is able to read everybody’s secret metadata. There is no equivalent of this role in Barbican, so we’ll have to modify more things in order to get this to work.

To simplify things, we’ll only modify the GET operation for secret metadata.

Please note that this is only done for learning purposes, do not try this in production.

First thing we’ll need is to retrieve the policy line that actually gets executed for secret metadata. In Barbican, it’s the secret:get policy.

From whithin the container, or if you have the barbican package installed somewhere, you can do the following in order to get this exact policy:

oslopolicy-policy-generator --namespace barbican | grep "secret:get"

This will get us the following line:

"secret:get": "rule:secret_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read"

Note that in the barbican policy, we explicitly check for most users that the user is in the same project as the project that the secret belongs to. In this case, we’ll omit this in order to enable the “super-admin” to retrieve any secret’s metadata.

Here is the final policy.json file we’ll use:

{
"super_admin": "role:super-admin",
"secret:get": "rule:secret_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read or rule:super_admin"
}

Testing scenario #4

Here, we’ll run a minimal container that doesn’t take any specific policy into use. We’ll log into it, modify the policy.json file, and test out the results.

# Run the container
./run-barbican-simple.sh

# Open a bash session in the container
docker exec -ti (docker ps | grep barbican-minimal | awk '{print $1}') bash

# (In the container) Lets verify what the current policy is for "secret:get".
# This should output the default rule.
oslopolicy-policy-generator --namespace barbican | grep "secret:get"

# (In the container) Create the new policy file
cat <<EOF > /etc/barbican/policy.json
{
"super_admin": "role:super-admin",
"secret:get": "rule:secret_non_private_read or rule:secret_project_creator or rule:secret_project_admin or rule:secret_acl_read or rule:super_admin"
}
EOF

# (In the container) Lets verify what the current policy is for "secret:get".
# This should output the updated policy.
oslopolicy-policy-generator --namespace barbican | grep "secret:get"

# (In the container) Exit the container
exit

# Lets now create a couple of secrets with the creator role in the default
# project (1234).

# This will be secret #1
./create-secret.sh creator
# This will be secret #2
./create-secret.sh creator

# Lets now create a couple of secrets with the creator role in another project
# (1111).

# This will be secret #3
./create-secret.sh creator 1111

Using the creator role and project ‘1234’, you should only be able to retrieve secrets #1 and #2, but should get an error with secret #3.

# So... this should work
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #1> | python -m json.tool

# this should work
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #2> | python -m json.tool

# ...And this should fail
curl -H 'X-Project-Id: 1234' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #3> | python -m json.tool

Using the creator role and project ‘1111’, you should only be able to retrieve secret #3, but should get an error with secrets #1 and #2

# So... this should fail
curl -H 'X-Project-Id: 1111' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #1> | python -m json.tool

# this should fail
curl -H 'X-Project-Id: 1111' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #2> | python -m json.tool

# ...And this should work
curl -H 'X-Project-Id: 1111' -H 'X-Roles: creator' \
    http://localhost:9311/v1/secrets/<secret #3> | python -m json.tool

Finally, lets try our new super-admin role. As you will notice, you don’t even need to be part of the projects to get the metadata:

# So... this should work
curl -H 'X-Project-Id: POLICY' -H 'X-Roles: super-admin' \
    http://localhost:9311/v1/secrets/<secret #1> | python -m json.tool

# this should work
curl -H 'X-Project-Id: IS' -H 'X-Roles: super-admin' \
    http://localhost:9311/v1/secrets/<secret #2> | python -m json.tool

# ...And this should work too
curl -H 'X-Project-Id: COOL' -H 'X-Roles: super-admin' \
    http://localhost:9311/v1/secrets/<secret #3> | python -m json.tool

Conclusion

You have now learned how to do simple modifications to your service’s policy!

With great power comes great responsibility… And all those things… But seriously, be careful! You might end up with unintended results.

In the next blog post, we’ll cover implied roles and how you can use them in your policies!

September 26, 2018 02:01 PM

RDO Blog

Introducing Networking-Ansible

During the OpenStack Rocky release cycle a new OpenStack ML2 driver project was established: networking-ansible. This project integrates OpenStack with the Ansible Networking project. Ansible Networking is the part of the Ansible project that focuses on providing an Ansible interface for network operators to manage network switch configuration. By consuming Ansible Networking as the backend interface to network switch configuration we abstract the interface to communicate with the switching hardware to the Ansible layer. This provides opportunity to support multiple switching platforms in a single ML2 driver. This will reduce the maintenance overhead for OpenStack operators to integrate a heterogeneous network environment with baremetal guest deployments by only requiring a single ML2 driver to configure.

The networking-ansible team had two general goals in the Rocky release cycle. First, to establish the project. A significant amount of work was completed in the Rocky release cycle to establish OpenStack repositories and tracking tools, RDO packaging, upstream testing, and integration with Neutron, Ansible Networking, and Triple-O. We completed, and in some ways exceeded, our goals here. A big thank you to the RDO and OpenStack community members that contributed to this project’s successful establishment. Second, we intended to support a single initial use case that has a single basic feature focused on a single switch platform. We also accomplished and exceeded this goal. The Ironic project needs the ability to modify the switch port a baremetal guest is connected to to be able to have the node put onto the Ironic provisioning network for provisioning, then be moved to the Neutron assigned tenant network for guest tenant network traffic. This use case assumes a single network interface on the guest attached to a switch port in access mode. Using networking-ansible Neutron can swap the access port’s VLAN between the Ironic provisioning network and the Neutron assigned tenant network VLAN using Ansible Networking as its backend. We ended up testing on OVS and a Juniper QFX this cycle. Untested code exists for EOS and NXOS.

Looking towards the future, we have planned a set of goals for the OpenStack Stein release cycle. First, support for more platforms. There are a handful of switch platforms that we have gained access to. We plan to add support to the code base and work through as much testing as possible to expand what platforms are supported. Second, improved security and trunk port support. We are in process of adopting Ansible Vault to store switch credentials and working on offering the ability to configure a baremetal guest’s port in trunk mode to allow them to connect to multiple networks. Finally, expose a Python API. The underlying code that is interfacing with Ansible Networking does not need to have any hard dependencies on OpenStack. An API will be exposed and documented that is isolated from OpenStack dependencies. This API will be useful for use cases that would like the abstracted interface to networking hardware via Ansible Networking, but require different management needs than what OpenStack offers.

My congratulation goes out to the team and supporting community members that worked on this project for a very successful release cycle. My thanks again to the OpenStack and RDO communities for the support offered as we established this project. I look forward to adding the new features being worked on and I hope we’ll be just as successful in completing our new goals six months from now.

by Dan Radez at September 26, 2018 01:31 PM

September 24, 2018

Website and blog of Juan Antonio (Ozz) Osorio

Oslo Policy Deep Dive (part 1)

In the upcoming OpenStack Summit in Berlin we have submitted a talk to teach folks about oslo.policy. The name is of the talk is OpenStack Policy 101, and its purpose is:

  • To teach folks how policy works in OpenStack (from both a developer and an operator point of view) and what can you do with the oslo.policy library.

  • To show folks they can write their own policies, change them and subsequently take them into use in OpenStack.

  • Hopefully to teach folks how to write policy drivers that they can use to evaluate their OpenStack policies.

The purpose of this post is to write a comprehensive set of notes to deliver for the talk, as well as have me review all the material :D. But I hope this is useful for other folks that are interested in this topic as well.

What is oslo.policy - overview

It’s a python library that OpenStack services use in order to enable RBAC (Role Based Access Control) policy. This policy determines which user can access which objects or resources, in which way. The library contains its own implementation of a policy language that the service creator will use in order to create appropriate defaults on what is allowed by each endpoint of the service. Operators can then overwrite these defaults in order to customize the policy for the specific service.

The policy language is based on either yaml or json, however given that these implementations are quite similar, here we’ll focus on only one of these. We assume it’s fairly trivial to use both, since a json-writen policy will also be correctly parsed as yaml.

Where are the policies defined?

Given that each services has different purposes, endpoints, and different needs, each service has its own policy, and the responsibility of defining it.

In the past, it used to be the case that you would find the policy for a given service as a yaml or json file itself. This had the advantage that you would then see the whole policy in a single file.

Recently though, OpenStack moved the default policy to be in-code rather than in a yaml file shipped with the projects. This change was mostly targeted at giving a better experience to operators, with the following reasons:

  • If the operator doesn’t change the default policies, they don’t need to do or change anything. (no longer needing to package the policy.yaml, and everything will work as-is)

  • If the operator does change the default policies, they now only need to add the rules they’re modifying, which has several advantages:

    • Easier auditing of what’s changing

    • Easier maintenance (only maintain the changes and not a whole policy.yaml file).

This doesn’t mean that the usage of a full-fledged policy.yaml is no longer available as folks can still generate these files from the OpenStack project’s codebase with the tooling that was created as part of this work (I’ll tell you how to do this later). So you also don’t need to dig into the project’s code base to get the default policy, just use the tooling and it’ll all be fine :).

How do I write policies?

Whether you’re a service creator or an operator, it is quite useful to know how to write policies if you want proper RBAC to work with your service, or if you want to modify it. So lets give it a go!

Each policy rule is defined as follows:

"< target >": "< rule >"

Simple as that!

The targets could be either aliases or actions. Lets leave aliases for later. Actions, represent API calls or operations. For Keystone, for instance, it could be something like “create user” or “list users”. For Barbican it could be “create secret” or “list secrets”. It is whatever operation your service is capable of doing.

The target (as an action) will typically typically look as follows: secrets:get. In the aforementioned case, that target refers to the “list secrets for a specific project” action. Typically, the service creators define these names, and the only way to know what action name refers to what operation is to either refer to the project’s documentation, or to dig into the project’s code-base.

The “rule” section defines what needs to be fulfilled in order to allow the operation.

Here’s what rules can be:

  • Always true.
  • Always false.
  • A special check (for a role, another rule, or an external target).
  • A comparison of two values.
  • Boolean expressions based on simpler rules.

It is also possible to use operators on these rules. The available operators are the following (in order of precedence:

  • grouping: Defined with parentheses: ( ... )
  • negation: Defined with the not operation: not <rule>
  • and: e.g. <rule 1> and <rule 2>
  • or: e.g. <rule 1> or <rule 2>

Lets dig in through each case:

Always true

So, lets say that you want to write a policy where anyone can list the compose instances. Here’s what you can do:

  • An empty string
"compute:get_all": ""
  • An empty list
"compute:get_all": []
  • The “@” value
"compute:get_all": "@"

Any of the three aforementioned values will get you the same result, which is to allow anybody to list the compute instances.

Always false

If you want to be very restrictive, and not allow anybody to do such an operation. You use the following:

  • The “!” value
"compute:get_all": "!"

This will deny the operation for everyone.

Special checks

Role checks

Lets say that you only want to allow users with the role “lister” to list instances. You can do so with the following rule:

"compute:get_all": "role:lister"

These roles tie in directly with Keystone roles, so when using such a policy, you need to make sure that the relevant users have the appropriate roles in Keystone. For some services, this tends to cause confusion. Such as is the case for Barbican. In Barbican, the default policy makes reference to several rules that are non-standard in OpenStack:

  • creator
  • observer
  • audit

So, it is necessary to get your users access to these rules if you want them to have access to Barbican without being admin.

Rule aliases

Remember in the beginning where I mentioned that rule definitions could be either aliases or actions? Well, here are the alises!

In order to re-use rules, it is possible to create rule aliases and subsequently use these aliases in other rules. This comes in handy when your rules start to get longer and you take operators into use. For this example, lets use the “or” operator, and create a rule that allows users with the “admin” role or the “creator” role to list compute instances:

"admin_or_creator": "role:admin or role:creator"
"compute:get_all": "rule:admin_or_creator"

As you can see, the compute:get_all rule is a reference to the admin_or_creator rule that we defined in the line above it. We can even take that rule into use for another target. For instance, to create servers:

"compute:create": "rule:admin_or_creator"

External check

It is also possible to use an external engine in order to evaluate individual policies. The syntax is fairly simple, as one only needs to use the URL of the external decision endpoint.

So, lets say that we have written a service that does this, and we’ll use it to evaluate if a certain user can list the compute instances. We would write the rule as follows:

"compute:create": "http://my-external-service.example.com/path/to/resource"

Or better yet:

"compute:create": "https://my-external-service.example.com/path/to/resource"

The external resource then needs to answer exactly “True”. Any other response is considered negative.

The external resource is passed the same enforcement data as oslo.policy gets: Rule, target and credentials (I’ll talk about these later).

There are also several ways that one can configure the interaction with this external engine, and this is done through oslo.config. In the service configuration and under the oslo_policy section, one can set the following:

  • remote_content_type: This defines how to pass the data to the external enforcer (either URL encoded or as JSON). The available options are: application/x-www-form-urlencoded or application/json

  • remote_ssl_verify_server_crt: Whether to enable or disable the external server certificate validation (it defaults to False).

  • remote_ssl_ca_crt_file: The CA path to use to validate the external server certificate.

  • remote_ssl_client_crt_file: The client certificate to use in order to authenticate (through TLS) to the external server.

  • remote_ssl_client_key_file: The client key to use in order to authenticate (through TLS) to the external server.

Note that it is possible to create custom checks, but we’ll cover this topic in a subsequent blog post.

Comparisons

In certain cases where checking the user’s role isn’t enough, we can also do comparisons between several things. Here’s the available objects we can use:

  • Constants: Strings, numbers, true, false
  • API attributes
  • Target object attributes

Constants

If you would like to base your policy decision by comparing a certain attribute to a constant, it’s possible to do so as follows:

"compute:get_all": "<variable>:'xpto2035abc'"
"compute:create": "'myproject':<variable>"

API attributes

We typically derive these from the request’s context. These would normally be:

  • Project ID: as project_id

  • User ID: as user_id

  • Domain ID: as domain_id

While most projects have tried to keep these attributes constant, it is important to note that not all of the projects use the exact names. This is because the way these are passed is dependent on how the oslo.policy library is called. There are, however, efforts to standardize this. Hopefully in the near future (as this gets standardized), the available API attributes will be the same ones as what’s available from oslo.context.

Target object attributes

This refers to the objects that the policies are working on.

Lets take barbican as an example. We want to make sure that the incoming user’s project ID matches the secret’s project ID. So, for this, we created the following rule:

"secret_project_match": "project:%(target.secret.project_id)s",

Here project refers to the user’s project ID, while target.secret.project_id refers to the secret that is target of this operation.

It is important to note that how these “targets” are passed is highly project specific, and you would typically need to dig into the project’s code to figure out how these attributes are passed.

Checks recap

The olso.policy code documentation contains a very nice table that sums the aforementioned cases quite nicely:

TYPE SYNTAX
User’s Role role:admin
   
Rules already defined on policy rule:admin_required
   
Against URLs http://my-url.org/check
   
User attributes project_id:%(target.project.id)s
   
Strings - <variable>:'xpto2035abc'
  - 'myproject':<variable>
   
  - project_id:xpto2035abc
Literals - domain_id:20
  - True:%(user.enabled)s

Where do API attributes and target objects come from?

As I mentioned in previous sections, these parameters are dependant on how the library is called, and it varies from project to project. Lets see how this works.

oslo.policy enforces policy using an object called Enforcer. You’ll typically create it like this:

from oslo_config import cfg
CONF = cfg.CONF
enforcer = policy.Enforcer(CONF, policy_file=_POLICY_PATH)

Once you have this Enforcer object created, every time you need policy to be evaluated, you need to call the enforce or [authorize][ authorize-method] methods for that object:

enforce(rule, target, creds, do_raise=False, exc=None, *args, **kwargs)

enforce and authorize take the same arguments.

Lets look at the relevant parameters:

  • rule: This is the name of the rule. So if you want to enforce policy on secrets:get, you’ll pass that as a string.

  • target: This is the target object. It is a dictionary that should receive information about the object you’re applying the operation on to. If it’s an secret, you can add here what project the secret belongs to.

  • creds: This is the information about the user, and will be the “API attributes”. You can either pass in a map containing the information, or you can pass an oslo.context object.

Unfortunately, if you ever need to change the policy and decipher what information is passed as the API attributes and the target, you’ll need to dig into the project’s codebase and look for where the enforce or authorize calls are made for the relevant policy rule you’re looking for.

Barbican example

Lets take Barbican as an example. If we look at the code-base, we can see that barbican enforces policy as part of a common decorator. This decorator ends up calling the _do_enforce_rbac function which, at the time of writing this, looks as follows:

def _do_enforce_rbac(inst, req, action_name, ctx, **kwargs):
    """Enforce RBAC based on 'request' information."""
    if action_name and ctx:

        # Prepare credentials information.
        credentials = {
            'roles': ctx.roles,
            'user': ctx.user,
            'project': ctx.project_id
        }

        # Enforce special case: secret GET decryption
        if 'secret:get' == action_name and not is_json_request_accept(req):
            action_name = 'secret:decrypt'  # Override to perform special rules

        target_name, target_data = inst.get_acl_tuple(req, **kwargs)
        policy_dict = {}
        if target_name and target_data:
            policy_dict['target'] = {target_name: target_data}

        policy_dict.update(kwargs)
        # Enforce access controls.
        if ctx.policy_enforcer:
            ctx.policy_enforcer.enforce(action_name, flatten(policy_dict), credentials, do_raise=True)

Here we can see that the credentials are derived from the oslo.context object; which contains information about the user that’s making the request.

Preferrably, barbican should instead pass the oslo.context object and let oslo.policy derive the needed parameters from that object. Although for this, the default policy will need to be adjusted to match the names: e.g. project_id instead of just project.

Subsequently, depending on the class that calls this, we’ll get the appropriate information about the target object from the get_acl_tuple function.

If we’re dealing with secrets, we can see how the target is filled up:

class SecretController(controllers.ACLMixin):
    """Handles Secret retrieval and deletion requests."""

    def __init__(self, secret):
        LOG.debug('=== Creating SecretController ===')
        self.secret = secret
        self.transport_key_repo = repo.get_transport_key_repository()

    def get_acl_tuple(self, req, **kwargs):
        d = self.get_acl_dict_for_user(req, self.secret.secret_acls)
        d['project_id'] = self.secret.project.external_id
        d['creator_id'] = self.secret.creator_id
        return 'secret', d

Note that the target’s project_id and creator_id is derived from the secret that the user is trying to access. We also query the ACL’s to get extra information about the access that the user might have on the secret.

So, for barbican, the target map will look as follows:

{
    "target": {
        "<entity>": {
            # actual target data
            ...
        }
    }
}

For secrets, it would be:

{
    "target": {
        "secret": {
            "project_id": "<some id>",
            "creator_id": "<some other id>",
            ...
        }
    }
}

I would like to point out once more that the target map structure is highly dependant on the project. So, if you’re dealing with another project, you’ll need to dig into that code to know how the project fills it up. But at least now you know what to look for :) .

Conclusion

Here we learned what oslo.policy is, how to write policies with it, and how to get the relevant information on how the policy is called for specific projects.

In the next blog post, we’ll learn how to do modifications to policies and how to reflect them on a running service.

September 24, 2018 10:29 AM

September 20, 2018

RDO Blog

RDO Rocky Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Rocky for RPM-based distributions, CentOS Linux and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Rocky is the 18th release from the OpenStack project, which is the work of more than 1400 contributors from around the world.

The release already available on the CentOS mirror network at http://mirror.centos.org/centos/7/cloud/x86_64/openstack-rocky/

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Linux and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS Linux users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

]4 Photo via Good Free Photos

New and Improved

Interesting things in the Rocky release include:

  • New neutron ML2 driver networking-ansible has been included in RDO. This module abstracts management and interaction with switching hardware to Ansible Networking.
  • Swift3 has been moved to swift package as the “s3api” middleware.

Other improvements include:

  • Metalsmith is now included in RDO. This is a simple tool to provision bare metal machines using ironic, glance and neutron.

Contributors

During the Rocky cycle, we saw the following new contributors:

  • Bob Fournier
  • Bogdan Dobrelya
  • Carlos Camacho
  • Carlos Goncalves
  • Cédric Jeanneret
  • Charles Short
  • Dan Smith
  • Dustin Schoenbrun
  • Florian Fuchs
  • Goutham Pacha Ravi
  • Ilya Etingof
  • Konrad Mosoń
  • Luka Peschke
  • mandreou
  • Nate Johnston
  • Sandhya Dasu
  • Sergii Golovatiuk
  • Tobias Urdin
  • Tony Breeds
  • Victoria Martinez de la Cruz
  • Yaakov Selkowitz

Welcome to all of you and Thank You So Much for participating!

But we wouldn’t want to overlook anyone. A super massive Thank You to all SIXTY-NINE contributors who participated in producing this release. This list includes commits to rdo-packages and rdo-infra repositories:

  • Ade Lee
  • Alan Bishop
  • Alan Pevec
  • Alex Schultz
  • Alfredo Moralejo
  • Bob Fournier
  • Bogdan Dobrelya
  • Brad P. Crochet
  • Carlos Camacho
  • Carlos Goncalves
  • Cédric Jeanneret
  • Chandan Kumar
  • Charles Short
  • Christian Schwede
  • Daniel Alvarez
  • Daniel Mellado
  • Dansmith
  • Dmitry Tantsur
  • Dougal Matthews
  • Dustin Schoenbrun
  • Emilien Macchi
  • Eric Harney
  • Florian Fuchs
  • Goutham Pacha Ravi
  • Haikel Guemar
  • Honza Pokorny
  • Ilya Etingof
  • James Slagle
  • Jason Joyce
  • Javier Peña
  • Jistr
  • Jlibosva
  • Jon Schlueter
  • Juan Antonio Osorio Robles
  • karthik s
  • Kashyap Chamarthy
  • Kevin Tibi
  • Konrad Mosoń
  • Lon
  • Luigi Toscano
  • Luka Peschke
  • marios
  • Martin André
  • Matthew Booth
  • Matthias Runge
  • Mehdi Abaakouk
  • Nate Johnston
  • Nmagnezi
  • Oliver Walsh
  • Pete Zaitcev
  • Pradeep Kilambi
  • rabi
  • Radomir Dopieralski
  • Ricardo Noriega
  • Sandhya Dasu
  • Sergii Golovatiuk
  • shrjoshi
  • Steve Baker
  • Thierry Vignaud
  • Tobias Urdin
  • Tom Barron
  • Tony Breeds
  • Tristan de Cacqueray
  • Victoria Martinez de la Cruz
  • Yaakov Selkowitz
  • yatin

The Next Release Cycle

At the end of one release, focus shifts immediately to the next, Stein which has a slightly longer release cycle due to the PTG Summit co-location next year with an estimated GA the week of 08-12 April 2019. The full schedule is available at https://releases.openstack.org/stein/schedule.html.

Twice during each release cycle, RDO hosts official Test Days shortly after the first and third milestones; therefore, the upcoming test days are 01-02 November 2018 for Milestone One and 14-15 March 2019 for Milestone Three.

Get Started

There are three ways to get started with RDO.

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

For a production deployment of RDO, use the TripleO Quickstart and you’ll be running a production cloud in short order.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help

The RDO Project participates in a Q&A service at https://ask.openstack.org. We also have our [users@lists.rdoproject.org[(https://lists.rdoproject.org/mailman/listinfo/users) for RDO-specific users and operrators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on Freenode IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS mailing lists and the CentOS and TripleO IRC channels (#centos, #centos-devel, and #tripleo on irc.freenode.net), however we have a more focused audience within the RDO venues.

Get Involved

To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo on the Freenode IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook, Google+ and YouTube.

by Rain Leander at September 20, 2018 02:02 PM

September 19, 2018

Website and blog of Juan Antonio (Ozz) Osorio

Adding custom databases and database users in TripleO

For folks integrating with TripleO, it has been quite painful to always need to modify puppet in order to integrate with the engine. This has been typically the case for things like adding a HAProxy andpoint and adding a database and a database user (and grants). As mentioned in a previous post, this is no longer the case for HAProxy endpoints, and this ability has been in TripleO for a a couple of releases now.

With the same logic in mind, I added this same functionality for mysql databases and database users. And this relecently landed in Stein. So, all you need to do is add something like this to your service template:

    service_config_settings:
      mysql:
        ...
        tripleo::my_service_name::mysql_user:
          password: 'myPassword'
          dbname: 'mydatabase'
          user: 'myuser'
          host: {get_param: [EndpointMap, MysqlInternal, host_nobrackets]}
          allowed_hosts:
            - '%'
            - "%{hiera('mysql_bind_host')}"

This will create:

  • A database called mydatabase
  • A user that can access that database, called myuser
  • The user myuser will have the password myPassword
  • And grants will be created so that user can connect from the hosts specificed in the host and allowed_hosts parameters.

Now you don’t need to modify puppet to add a new service to TripleO!

September 19, 2018 04:50 AM

September 18, 2018

Website and blog of Juan Antonio (Ozz) Osorio

Testing TLS everywhere with tripleo-quickstart

I’ve gotten the request for help deploying TLS everywhere with TripleO several times. Even though there’s documentation, deploying from scratch can be quite daunting, specially if all you want to do is test it out, or merely integrate your service to it.

However, for development purposes, there is tripleo-quickstart, which makes deploying such a scenario way simpler.

Here’s the magical incantation to deploy TripleO with TLS everywhere enabled:

./quickstart.sh --no-clone --teardown all --clean -p quickstart-extras.yml \
    -N config/nodes/1ctlr_1comp_1supp.yml \
    -c config/general_config/ipa.yml \
    -R master-tripleo-ci \
    --tags all \
    $VIRTHOST

Note that this assumes that you’re in the tripleo-quickstart repository.

Assuming $VIRTHOST is the host where you’ll do the deployment, this will leave you with a very minimal deployment: An undercloud, one controller, one compute, and a supplemental node where we deploy FreeIPA.

Because we’re using the master-tripleo-ci, this setup also deploys the latest promoted images. If you want to use the latest “stable” master deployment, you can use master instead. If you want to deploy Queens, you’ll merely use queens instead. So, for reference, here’s how to deploy a Queens environment:

./quickstart.sh --no-clone --teardown all --clean -p quickstart-extras.yml \
    -N config/nodes/1ctlr_1comp_1supp.yml \
    -c config/general_config/ipa.yml \
    -R queens \
    --tags all \
    $VIRTHOST

Lets also note that --tags all deploys the “whole thing”; meaning, it’ll also do the overcloud deployment. If you remove this, the quickstart will leave you with a deployed undercloud, and you can do the overcloud deployment yourself.

September 18, 2018 05:50 AM

September 14, 2018

Website and blog of Juan Antonio (Ozz) Osorio

TripleO Denver 2018 PTG notes

I recently had the opportunity to attend the OpenStack PTG at Denver. It’s always good to see folks face to face :D. Here are my notes on what I thought was relevant.

Edge edge edge

A big topic in the PTG has been Edge. On the Working Group perspective, the discussion was more focused on identifying the Edge cases, and figuring out appropriate architectures for them, while at the same time, trying to come up with gaps in the projects that we could tackle.

On the TripleO side, the discussion is slightly more focused, as the main issue is to figure out how to deploy a base architecture that will cover our most sought for cases.

Currently, some “Edge” TripleO deployments have different entire TripleO deployments per edge deployment (undercloud, overcloud and all).

To address this issue, and try to make the deployments more lightweight. There are several ideas and approaches:

  • One of the proposals is to move away from having each deployment be a heat stack, and instead leverage ansible to provide a base deployment, and fill in the gaps with variables. This has the advantage of making very lightweight deployments (heat stacks are expensive), and exposing very clear repeatable edge architectures.

  • Another idea was to have our baremetal deployment driven by metalsmith instead of Heat/Nova. This would reduce the load on Heat/Nova, and instead rely on Ansible. This would allow us to have some failure tolerance, where folks can deploy large amounts of machines, and the deployment won’t necessarily fail if one node fails.

  • There is the approach for split deployments, where each set of machines is represented by a stack. You would deploy the control plane, and subsequently you would deploy sets of computes (on each edge cluster), with their respective storage. The leaf nodes would require information (passwords, IPs and such) from the central control plane nodes’ stack. Ollie Walsh already has a POC working with this, and will continue working on ironing out this approach and making it more accessible. Ultimately, this approach seems like the most likely to come in the near future, since it relies on already existing things in TripleO, and only minimal changes are required.

To make TripleO be more “Edge-friendly” there are still a lot of things that we need to add to the engine. One of them is making routed networks available to TripleO’s internal networks, and not just the control plane. This will allow to have different network segments on the edge sites, while having our interfaces such as the ServiceNetMap work as expected with the segments.

Python 3

With the on-going OpenStack wide goal of switching to python 3, there are several implications and work items that we need to do in TripleO in order to get this properly working and tested.

Alex Schultz has been looking into this, and it seems that at least on the TripleO side we’re quite well off into having our tooling run on python 3. Being a deployment engine, however, we do depend on other projects supporting python 3.

To test all this, we also need to run our deployment in an environment that has python 3 by default. CentOS (what we currently use to test), doesn’t have this. So the proposal has been brought up to start building Fedora 28 container images. This way, we can move forward in our python 3-based deployments testing. This will require a lot of work though, since Kolla currently doesn’t build container images for Fedora. Alternatives will be investigated.

Standalone OpenStack with TripleO

Work has been done to get TripleO to deploy a standalone one-node OpenStack instance, such as Packstack is able to do. There were a bunch of folks trying it out and the results seem quite promising. Here is the relevant documentation. The big advantage in this is that it’ll enable developers to test out their features in a faster manner than the regular full blown deployment, allowing for faster deployment times and faster iterations. The reason being that it’s no longer an undercloud and an overcloud, but one node that contains the base OpenStack services.

We’ll also switch several of our multinode scenario jobs to run standalone deployments. This will enable us to have faster CI times and more lightweight testing environments. However, this also means that the scenarios will have a reduced set of services, given that the nodes we get from infra are quite limited. This will result in us introducing more scenarios to make up for this. However, we would still benefit from shorter CI runs.

tripleo-quickstart / -extras merging

This used to be kept separate for historical reasons, but in the near future, the plan is to merge these two repositories. This will make it easier for folks to make changes and find the relevant places to make such changes. Subsequently, if other projects (such as infrared), would like to use parts of tripleo-quickstart, these will be divided in roles in separate repos, as requests come.

Ansible

With us moving more and more into ansible, our current repo structure is getting more challenging to understand. Where does heat end and ansible begin?

To address this issue, efforts are being made to move the ansible bits into tripleo-specific ansible roles. Right now, the plan is to move each service to have it’s own repository with the relevant role. However, this plan is still open for discussion.

There is also the need to run tasks when deleting a host, or scaling down. This used to work since Heat used to manage the deployment, so we would run scripts based on heat triggering a DELETE action. With the move to Ansible, this no longer works. So a spec has been proposed in order to address this. The plan is to introduce new sections to the service deployment, and run specific ansible tasks on a specialized command that will execute the scale down. This will be very useful, for instance, when scaling down and needing to unregister the RHEL node.

Security

Password rotation

Password rotation for some services was broken when we moved our services to containers. Specifically there’s the issue of changing the master MySQL password, which, currently breaks as the new password is used, and we’re not able to set the new one. Steps are being taken to address this, and in order to avoid regressions, we’ll create a periodic job that will run this action. Here is where the standalone job approach shines, since we can have a fairly fast and lightweight job to only test this capability. Ultimately, we’ll want to notify folks that care about this job when it breaks, so the ability to notify specific people in a Squad will be added to CI.

Another issue that was brought up, is that password rotation requires service restarts. So there is no clean way in OpenStack in general to rotate passwords without service interruptions. Not a lot we can do in TripleO, but I’ll bring this up to the OpenStack TC to see if we can make this a community goal; similarly to the work that was done to make the “debug” option tunable on runtime.

SELinux support for containers

With the move to supporting podman as a container deployment tool, we are also looking into getting our containers to play nicely with SELinux. This work is being lead by Cédric Jeanneret and is a great improvement on TripleO security.

Unfortunately this is not so simple to test upstream, as we get our SELinux rules from RHEL, down to OpenStack.

The proposal to get better visibility on our support for SELinux is to enable better logging in our jobs. We’ll still run with SELinux in permissive mode, however, we can enable more logs and even notifications to the security squad whenever new breakages in the policy happen.

Secret management in TripleO

My team has been working in getting oslo.config to have the ability to fetch the values from different drivers. Castellan will be one of these drivers, which could subsequently use Vault to fetch data in a more secure manner.

This work is moving forward, however, time is soon coming to see how we’ll hook this up to TripleO.

This is not as straight-forward as it seems. If we want to keep the sensitive data to be as safe as possible (which is the whole point), we want to avoid duplicating this in other places (like heat or ansible) where it could end up unencrypted. One of the ideas was to bring up a temporary instance of Vault where we would store all the sensitive data, and eventually copy the encrypted database to the overcloud.

This is still quite raw, and we hope to solidify a sane approach in the coming months.

UX / UI

In a nutshell, there will be on-going work to make the CLI and the UI converge better, so they’ll use the same mistral interfaces and have similar workflows for doing things. This might result in breaking some old CLI commands in favor of workflows similar to what we do in the UI, however, this will reduce the testing matrix and hopefully the code-base as well.

Our validations framework will also be re-vamped, to uniformly depend on Mistral for running. This way, it can be leveraged from both the UI and the CI. The hope is to standardize and make validations part of the service definitions, this will make validations more visible to other developers and improve the experience.

Finally, work is coming for a framework for folks to be able to generate roles safely. The issue is that when building custom roles, it’s not apparent what services are needed, and what services can be deployed together, or even which services conflict (such as is the case for ODL). So having a tool to generate roles, and that contains enough information to resolve such metadata about the services, would be a great usability improvement for deployers.

Getting rid of Nova in the undercloud

It was brought up that there is on-going work to remove Nova and expose more explicit options for folks to deploy their baremetal nodes. This is quite beneficial to TripleO as it will make the undercloud a lot lighter than before, while also giving deployers more flexibility and features for their baremetal deployments. It also opens TripleO to the possibility of becoming a more general case baremetal provisioning framework. We already are able to deploy OpenShift on baremetal, hopefully the more this is used, the more use-cases and feature requests we get in order to make TripleO more usable for folks outside of OpenStack.

The baremetal deployment would be driven by a tool called metalsmith which leverages ironic, glance and neutron. Good progress has been already made, and there’s even a patch to enable this workflow.

While this work might land on Stein, it won’t be enabled by default, since there are still many things to figure out; such as how to upgrade from a heat stack that uses Nova resources, to the nova-less approach. Another thing to figure out is how to make the TLS everywhere workflow work without nova, since currently we rely on nova-metadata and a vendor-data plugin to make this work. Given the community seemed to have positive feelings about the metalsmith approach, it seems relevant that we come up with an alternative approach for TLS everywhere that we’ll introduce in the T release. Since we now have config-download as a default, using Ansible to make TLS everywhere work is probably the way to go.

Major OS upgrades

In an effort to make TripleO handle more and more scenarios, and to make operator’s lives easier, it’s only a natural step that TripleO also manages major OS upgrades. Currently our major upgrade workflow only handles major OpenStack version upgrades, but we haven’t taken into account major version upgrades for the Operating System. This type of workflow is quite complex.

In a nutshell, the proposed solution, while destructive in some ways, is probably the only sane way to go.

In a nutshell, the current plan is:

  • Tear down and unprovision the first controller baremetal node. (if the node would be enrolled to FreeIPA, here we could delete it).

  • Get ironic to provision the node again with the new OS installed.

  • Update the OpenStack RPMs.

  • Pull the new containers.

  • Stop pacemaker services on the other controllers.

  • Backup the database from the one of the other controllers on to the first controller which has been updated already.

  • Run per-service upgrade steps.

  • Upgrade the database (mariaDB).

  • Restart pacemaker on the first controller.

  • Force galera restart.

  • Run the regular deployment steps.

  • Shut down vrouters on the rest of the controllers.

  • Delete and unprovision the rest of the controllers

  • Add them to the pacemaker cluster.

  • ???

  • profit.

This was a rough sketch of the rough plan that was thought of in a long discussion about this. Several other options where discussed (such as a big-bang approach that unprovisions all the nodes and puts them up at the same time). However, this seemed to address most of the concerns that people came up with.

A blueprint will be written with a more structured workflow and hopefully we’ll have a working solution in the future.

September 14, 2018 09:04 PM

September 08, 2018

RDO Blog

Interviews at OpenStack PTG Denver

I’m attending PTG this week to conduct project interviews. These interviews have several purposes. Please consider all of the following when thinking about what you might want to say in your interview:

  • Tell the users/customers/press what you’ve been working on in Rocky
  • Give them some idea of what’s (what might be?) coming in Stein
  • Put a human face on the OpenStack project and encourage new participants to join us
  • You’re welcome to promote your company’s involvement in OpenStack but we ask that you avoid any kind of product pitches or job recruitment

In the interview I’ll ask some leading questions and it’ll go easier if you’ve given some thought to them ahead of time:

  • Who are you? (Your name, your employer, and the project(s) on which you are active.)
  • What did you accomplish in Rocky? (Focus on the 2-3 things that will be most interesting to cloud operators)
  • What do you expect to be the focus in Stein? (At the time of your interview, it’s likely that the meetings will not yet have decided anything firm. That’s ok.)
  • Anything further about the project(s) you work on or the OpenStack community in general.

Finally, note that there are only 40 interview slots available, so please consider coordinating with your project to designate the people that you want to represent the project, so that we don’t end up with 12 interview about Neutron, or whatever.

I mean, LOVE me some Neutron, but let’s give some other projects love, too.

It’s fine to have multiple people in one interview – maximum 3, probably.

Interview slots are 30 minutes, in which time we hope to capture somewhere between 10 and 20 minutes of content. It’s fine to run shorter, but 15 minutes is probably an ideal length.

by Rain Leander at September 08, 2018 04:36 PM

September 04, 2018

Website and blog of Juan Antonio (Ozz) Osorio

Adding a custom HAProxy endpoint in TripleO

Typically, when you want to add a new service to TripleO, there’s a bunch of files you need to touch, both in tripleo-heat-templates and some puppet code too.

Unfortunately this has made it quite tedious to add new services to TripleO, as you need to modify puppet-tripleo’s haproxy manifest to add your service.

A while ago, I thought to add a clever nice trick, so you could do this dynamically via hieradata. This code stayed there for a while without a lot of people putting attention to it. And wrongly, I also didn’t document it. But what this gives you is that you now don’t need to touch puppet at all to enable a new endpoint in HAProxy.

So, in your service template’s service_config_settings section, you’ll need to add the following:

    service_config_settings:
      haproxy:
        ...
        tripleo::my_service_name::haproxy_endpoints:
            my_service_name:
                public_virtual_ip: "%{hiera('public_virtual_ip')}"
                internal_ip: "%{hiera('my_service_name_vip')}"
                service_port: {get_param: MyServicePublicPort}
                public_ssl_port: {get_param: MyServicePublicSSLPort}
                member_options: [ 'check', 'inter 2000', 'rise 2', 'fall 5' ]
                haproxy_listen_bind_param: ['transparent']

Here, service_config_settings is used because we specifically want to add this hieradata to nodes that deploy haproxy.

In this example, my_service_name is the service_name from the service template. It has to match in order for the resource to properly fill the ip_addresses and service_names parameters. Else, you’ll have to manually set up the needed values to fill those parameters.

Also, it is important to know that, if you added your service to the ServiceNetMap (which you can add by passing your service via that parameter in heat), there will be some hiera keys enabled for you. For instance, lets say that you added a service entry as follows:

parameter_defaults:
    ServiceNetMap:
        my_service_name: internal_api

This would mean that you added your service to run on the internal API network in TripleO. Thus, you’ll get a hiera key called my_service_name_vip, which will have the value of the Virtual IP associated to the internal API network.

To know and take better use of all the available options, I recommend reading the puppet resource’s code that actually creates the HAProxy endpoint.

It is also important to note that TripleO already fills up some defaults for your application:

  Tripleo::Haproxy::Endpoint {
    haproxy_listen_bind_param   => $haproxy_listen_bind_param,
    member_options              => $haproxy_member_options,
    public_certificate          => $service_certificate,
    use_internal_certificates   => $use_internal_certificates,
    internal_certificates_specs => $internal_certificates_specs,
    listen_options              => $default_listen_options,
    manage_firewall             => $manage_firewall,
}

From these, it is important to know that the certificates will be filled up for you, so you don’t need to add them.

Stein update

There are some services that need two or more endpoints, for these, it’s not possible to make the endpoints’ names match the service_name parameter. For these cases, I added the base_service_name parameter.

By setting base_service_name to match the service_name of the service you want to load balance, the ip_addresses and the server_names parameters will be filled out auto-magically. This makes it easier to add customized endpoints to load balance your service.

Lets take an example from the following patch, which adds HAProxy endpoints to load balance OpenShift’s infra endpoints. This adds two endpoints in HAProxy, which will listen on specific ports, and forward the traffic towards the nodes that contain the openshift_infra service.

      service_config_settings:
        haproxy:
          tripleo::openshift_infra::haproxy_endpoints:
            openshift-router-http:
              base_service_name: openshift_infra
              public_virtual_ip: "%{hiera('public_virtual_ip')}"
              internal_ip: "%{hiera('openshift_infra_vip')}"
              service_port: 80
              listen_options:
                balance: 'roundrobin'
              member_options: [ 'check', 'inter 2000', 'rise 2', 'fall 5' ]
              haproxy_listen_bind_param: ['transparent']
            openshift-router-https:
              base_service_name: openshift_infra
              public_virtual_ip: "%{hiera('public_virtual_ip')}"
              internal_ip: "%{hiera('openshift_infra_vip')}"
              service_port: 443
              listen_options:
                balance: 'roundrobin'
              member_options: [ 'check', 'inter 2000', 'rise 2', 'fall 5' ]
              haproxy_listen_bind_param: ['transparent']

September 04, 2018 01:07 PM

August 30, 2018

RDO Blog

Community Blog Round Up: 30 August

There have only been four articles in the past month? YES! But brace yourself. Release is HERE! Today is the official release day of OpenStack’s latest version, Rocky. And, sure, while we only have four articles for today’s blogroll, we’re about to get a million more posts as everyone installs, administers, uses, reads, inhales, and embraces the latest version of OpenStack. Please enjoy John’s personal system for running TripleO Quickstart at home as well as how to update ceph-ansible in a containerized undercloud, inhale Gonéri’s introduction to distributed CI and InfraRed, a tool to deploy and test OpenStack, and experience Jiří’s instructions to upgrade ceph and OpenShift Origin with TripleO.

Photo by Anderson Aguirre on Unsplash

PC for tripleo quickstart by John

I built a machine for running TripleO Quickstart at home.

Read more at http://blog.johnlikesopenstack.com/2018/08/pc-for-tripleo-quickstart.html

Distributed-CI and InfraRed by Gonéri Le Bouder

Red Hat OpenStack QE team maintains a tool to deploy and test OpenStack. This tool can deploy different types of topologies and is very modular. You can extend it to cover some new use-case. This tool is called InfraRed and is a free software and is available on GitHub.

Read more at https://blogs.rdoproject.org/2018/08/distributed-ci-and-infrared/

Updating ceph-ansible in a containerized undercloud by John

In Rocky the TripleO undercloud will run containers. If you’re using TripleO to deploy Ceph in Rocky, this means that ceph-ansible shouldn’t be installed on your undercloud server directly because your undercloud server is a container host. Instead ceph-ansible should be installed on the mistral-executor container because, as per config-download, That is the container which runs ansible to configure the overcloud.–

Read more at http://blog.johnlikesopenstack.com/2018/08/updating-ceph-ansible-in-containerized.html

Upgrading Ceph and OKD (OpenShift Origin) with TripleO by Jiří Stránský

In OpenStack’s Rocky release, TripleO is transitioning towards a method of deployment we call config-download. Basically, instead of using Heat to deploy the overcloud end-to-end, we’ll be using Heat only to manage the hardware resources and Ansible tasks for individual composable services. Execution of software configuration management (which is Ansible on the top level) will no longer go through Heat, it will be done directly. If you want to know details, i recommend watching James Slagle’s TripleO Deep Dive about config-download.

Read more at https://www.jistr.com/blog/2018-08-15-upgrading-ceph-and-okd-with-tripleo/

by Rain Leander at August 30, 2018 11:00 AM

August 29, 2018

John Likes OpenStack

PC for tripleo quickstart

I built a machine for running TripleO Quickstart at home.

My complete part list is on pcpart picker with the exception of the extra Noctua NM-AM4 Mounting Kit and video card (which I only go to install the OS)

I also have photos from when I built it.

My nodes.yaml gives me:

  • Three 9GB 2CPU controller nodes
  • Three 6GB 2CPU ceph storage nodes
  • One 3GB 2CPU compute node (that's enough to spawn one nested VM for a quick test)
  • One 13GB 8CPU undercloud node
That leaves less than 2GB of RAM for the hypervisor and all 16 vCPUs (8 cores * 2 threads) are marked for a VM so I'm pushing it a little.

When using this system with the same ndoes.yaml my run times are as follows for Rocky RC1:

  • unercloud install of rocky: 43m44.118s
  • overcloud install of rocky: 49m51.369s

by John (noreply@blogger.com) at August 29, 2018 09:09 PM

Website and blog of Juan Antonio (Ozz) Osorio

Dissecting TripleO service templates (part 4)

In this series of blog posts, I’ve been covering all the different sections of the service templates for TripleO.

To recap:

  • On the first part I covered the bare-minimum sections you need for your template.

  • On the second part I covered the sections that allow you to use Ansible to write and customize your service.

  • On the third part I covered the sections that allow you to write a containerized service for TripleO.

Here I cover “the rest”.

TripleO offers a lot of options to modify and configure your services, with all this flexibility, we have needed to covered different common cases, and other not so common cases. And it is also worth noting that TripleO is meant to manage your day 2 operations for your OpenStack cloud, so this also needs to be covered by TripleO.

Now that you know the basics, lets briefly cover the more advanced sections of the service templates.

The overall view

While looking at other service templates is a good way to see what’s being done and how one can do things. If you want to know all the available options, there is actually a file where this is gathered: common/services.yaml

Here you’ll find where the ResourceChain is called, so from this you can derive the mandatory parameters from the templates. You’ll also find what outputs are gathered and how.

With this in mind, we can now continue and dive in the rest of the sections. Given the diversity of the outputs that are left to cover; I’ll try to divide them in sections.

Extra hieradata options

These options, similarly to the config_settings section mentioned in part 1, set up appropriate hieradata, however, their usage and behavior varies.

global_config_settings section

While config_settings sets up hieradata for the role where the service is deployed, global_config_settings allows you to output the needed hieradata in all nodes of the cluster.

service_config_settings section

Allows you to output hieradata to wherever a certain service is configured. This is specially useful if your service can be a backend for another service. Lets take Barbican as an example:

service_config_settings:
  ...
  nova_compute:
    nova::compute::keymgr_backend: >
      castellan.key_manager.barbican_key_manager.BarbicanKeyManager
    nova::compute::barbican_endpoint:
      get_param: [EndpointMap, BarbicanInternal, uri]
    nova::compute::barbican_auth_endpoint:
      get_param: [EndpointMap, KeystoneInternal, uri_no_suffix]
  cinder_api:
    cinder::api::keymgr_backend: >
      castellan.key_manager.barbican_key_manager.BarbicanKeyManager
    cinder::api::keymgr_encryption_api_url:
      get_param: [EndpointMap, BarbicanInternal, uri]
    cinder::api::keymgr_encryption_auth_url:
      get_param: [EndpointMap, KeystoneInternal, uri_no_suffix]

In this case, the Barbican service template explicitly configures the nova_compute and cinder_api services by setting hieradata to wherever they’re at. This way, if someone enables barbican, we automatically enable the volume encryption feature.

Update-related options

These sections belong to the update workflow, which is an update within the same version (passing from one version to another is called an upgrade).

update_tasks section

Similarly to the deploy_steps_tasks mentioned in part 2, these are Ansible tasks that run on the node. However, these run as part of the updates workflow at the beginning of the Ansible run. So, if you’re acquainted with this workflow, these run at the beginning of the openstack overcloud update run command, which runs the Ansible playbook for updates. After this, host_prep_tasks and subsequently deploy_step_tasks run. Finalizing with the post_update_tasks section.

To summarize, if your application needs to execute some actions when a minor update is executed, which needs to happen before the TripleO steps, then you need this section.

post_update_tasks section

As mentioned in the previous section, this runs at the end of the minor update workflow. You might need this section for your service if you need to execute some ansible tasks after the TripleO steps.

external_update_tasks section

While at the time of writing this no service is using this section, it might prove to me useful at some point. This is fairly similar to the update_tasks section, except that this runs on the node that runs the playbook (typically the undercloud). So, it also runs before the TripleO steps as part of the updates workflow. This is meant for services that are deployed with the external_deploy_tasks section, which was mentioned in part 2.

Upgrade-related options

These sections belong to the upgrade workflow. Similarly to the update workflow mentioned before, this is also powered via Ansible and has a similar call path, with actions that run before and after the steps.

upgrade_tasks section

Similarly to update_tasks, this runs before the TripleO steps, but in the upgrade workflow.

post_upgrade_tasks section

Similarly to post_update_tasks, this runs after the TripleO steps, but in the upgrade workflow.

external_upgrade_tasks section

Similarly to external_update_tasks, this runs before the TripleO steps, but in the host that’s calling the ansible playbook and on the upgrade workflow.

pre_upgrade_rolling_tasks section

This runs before upgrade_tasks (which already runs before the TripleO steps) and is be executed in a node-by-node rolling fashion at the beginning of the major upgrade workflow.

This is quite a special case, where you need to take special care and do so in a node-by-node fashion, such as was the case when upgrading the neutron services on to containerized services. This made sure that instance connectivity wasn’t lost.

Fast Forward upgrades options

The following sections belong to the Fast forward upgrades workflow, which updates from one release up to 3 releases forward (N -> N+3).

fast_forward_upgrade_tasks section

These are carried in steps, but are also carried by release. So moving from release to release, you’ll need to specify which tasks are executed for what release, and in what step of the deployment they’re executed.

There is a maximum of 9 steps, of which the loops are divided into two categories.

From steps 0 to 3, these are considered prep tasks, so they’re ran on all nodes containing that service.

After this, from steps 4 to 9, these are bootstrap tasks, so they’re ran only on one node that contains the service.

For more information on this, the developer documentation is quite relevant, and the commit that introduced this has a great explanation.

fast_forward_post_upgrade_tasks section

Similarly to the updates and upgrades _post_*_tasks sections, this runs after the TripleO steps on the FFU ugprades and does certain ansible tasks.

Other options

external_post_deploy_tasks section

In synergy with the external_deploy_tasks section described in part 2, and similarly to other *_post_* sections, external_post_deploy_tasks executes ansible tasks on the node that runs the Ansible playbook, and this happens after the TripleO steps.

monitoring_subscription section

This is used for sensu integration. For all of the services, the subscription names will be gathered and set up for the sensu client susbcriptions.

service_metadata_settings section

This belongs to the TLS everywhere workflow and is better described in the developer documentation. But, in a nutshell, this controls how the service principals are created in FreeIPA via novajoin. Eventually this information gets passed from the templates via nova-metadata to the novajoin vendordata plugin, which subsequently calls FreeIPA and generates the necessary service principals.

docker_config_scripts section

This section is meant to create scripts that will be persisted in the /var/lib/docker-config-scripts directory. It takes the following options:

  • mode: The mode number that defines the file permissions.

  • content the actual content of the script.

workflow_tasks section

Allows you to execute Mistral actions or workflows for a specific service. It was primarily used by Ceph when it was introduced, but it changed to use Ansible directly instead.

An example would be the following:

  workflow_tasks:
    step2:
      - name: echo
        action: std.echo output=Hello
    step3:
      - name: external
        workflow: my-pre-existing-workflow-name
        input:
          workflow_param1: value
          workflow_param2: value

cellv2_discovery section

This is meant to be a boolean flag that indicates if a node should be considered for cellv2 discovery. Mostly, the nova-compute and ironic services set this flag in order for t-h-t to consider add them to the list of nodes. Chances are, you don’t need to set this flag at all, unless you do a service that overwrites the nova-compute service.

Deprecated or unused parameters

Finally, the following parameters are deprecated, set or set for deprecation. I’m adding them here in case you have Queens or newer templates, and with hopes they don’t confuse you.

The following commands used to be for fluentd integration:

  • logging_sources

  • logging_groups

These are no longer used, and instead, this integration is now done via hieradata directly.

August 29, 2018 04:35 AM

August 28, 2018

Website and blog of Juan Antonio (Ozz) Osorio

Dissecting TripleO service templates (part 3)

In this series of blog posts, I’ve been covering all the different sections of the service templates for TripleO.

To recap:

  • On the first part I covered the bare-minimum sections you need for your template.

  • On the second part I covered the sections that allow you to use Ansible to write and customize your service.

This covers the sections that allow you to write a containerized service for TripleO.

Containerized services brought a big change to TripleO. From packaging puppet manifests and relying on them for configuration, we now have to package containers, make sure the configuration ends up in the container somehow, then run the containers. Here I won’t describe the whole workflow of how we containerized OpenStack services, but instead I’ll describe what you need to know to deploy a containerized service with TripleO.

Lets take a look at an example. Here’s the output section of the containerized etcd template in t-h-t:

outputs:
  role_data:
    description: Role data for the etcd role.
    value:
      service_name: {get_attr: [EtcdPuppetBase, role_data, service_name]}
      ...
      config_settings:
        map_merge:
          - {get_attr: [EtcdPuppetBase, role_data, config_settings]}
          - etcd::manage_service: false
      # BEGIN DOCKER SETTINGS
      puppet_config:
        config_volume: etcd
        config_image: &etcd_config_image {get_param: DockerEtcdConfigImage}
        step_config:
          list_join:
            - "\n"
            - - "['Etcd_key'].each |String $val| { noop_resource($val) }"
              - get_attr: [EtcdPuppetBase, role_data, step_config]
      kolla_config:
        /var/lib/kolla/config_files/etcd.json:
          command: /usr/bin/etcd --config-file /etc/etcd/etcd.yml
          config_files:
            - source: "/var/lib/kolla/config_files/src/*"
              dest: "/"
              merge: true
              preserve_properties: true
          permissions:
            - path: /var/lib/etcd
              owner: etcd:etcd
              recurse: true
      docker_config:
        step_2:
          etcd:
            image: {get_param: DockerEtcdImage}
            net: host
            privileged: false
            restart: always
            healthcheck:
              test: /openstack/healthcheck
            volumes:
              - /var/lib/etcd:/var/lib/etcd
              - /etc/localtime:/etc/localtime:ro
              - /var/lib/kolla/config_files/etcd.json:/var/lib/kolla/config_files/config.json:ro
              - /var/lib/config-data/puppet-generated/etcd/:/var/lib/kolla/config_files/src:ro
            environment:
              - KOLLA_CONFIG_STRATEGY=COPY_ALWAYS
      docker_puppet_tasks:
        # Etcd keys initialization occurs only on single node
        step_2:
          config_volume: 'etcd_init_tasks'
          puppet_tags: 'etcd_key'
          step_config:
            get_attr: [EtcdPuppetBase, role_data, step_config]
          config_image: *etcd_config_image
          volumes:
            - /var/lib/config-data/etcd/etc/etcd/:/etc/etcd:ro
            - /var/lib/etcd:/var/lib/etcd:ro
...

Here, we can already see some familiar sections:

  • service_name: Typically you want the service name to match the non-containerized service’s name, so here we directly call the service_name output.

  • config_settings: Since we still use puppet to configure the service, we still take the hieradata into use, so we use the same hieradata as the non-containerized template, and add any extra hieradata we need.

After these, the rest are container-only sections. So lets describe them in more detail

puppet_config section

As I mentioned in a previous blog post, before getting into the steps where TripleO starts running services and containers, there is a step where puppet is ran in containers and all the needed configurations are created. The puppet_config section controls this step.

There are several options we can pass here:

  • puppet_tags: This describes the puppet resources that will be allowed to run in puppet when generating the configuration files. Note that deeper knowledge of your manifests and what runs in puppet is required for this. Else, it might be better to generate the configuration files with Ansible with the mechanisms described in a previous blog post. Any service that specifies tags will have the default tags of 'file,concat,file_line,augeas,cron' appended to the setting. To know what settings to set here, as mentioned, you need to know your puppet manifests. But, for instance, for keystone, an appropriate setting would be: keystone_config. For our etcd example, no tags are needed, since the default tags we set here are enough.

  • config_volume: The name of the directory where configuration files will be generated for this service. You’ll eventually use this to know what location to bind-mount into the container to get the configuration. So, the configuration will be persisted in: /var/lib/config-data/puppet-generated/<config_volume>

  • config_image: The name of the container image that will be used for generating configuration files. This is often the same container that the runtime service uses. Some services share a common set of config files which are generated in a common base container. Typically you’ll get this from a paremeter you pass to the template, e.g. <Service name>Image or <Service name>ConfigImage. Dealing with these images requires dealing with the container image prepare workflow. The parameter should point to the specific image to be used, and it’ll be pulled from the registry as part of the deployment.

  • step_config: Similarly to the step_config that I described in the first blog post of this series this setting controls the puppet manifest that is ran for this service. The aforementioned puppet tags are used along with this manifest to generate a config directory for this container.

One important thing to note is that, if you’re creating a containerized service, you don’t need to output a step_config section from the roles_data output. TripleO figured out if you’re creating a containerized service by checking for the existence of the docker_config section in the roles_data output.

kolla_config section

As you might know, TripleO uses kolla to build the container images. Kolla, however, not only provides the container definitions, but provides a rich framework to extend and configure your containers. Part of this is the fact that it provides an entry point that receives a configuration file, with which you can modify several things from the container on start-up. We take advantage of this in TripleO, and it’s exactly what the kolla_config represents.

For each container we create, we have a relevant kolla_config entry, with a mapping key that has the following format:

/var/lib/kolla/config_files/<container name>.json

This, contains YAML that represents how to map config files into the container. In the container, this typically ends up mapped as /var/lib/kolla/config_files/config.json which kolla will end up reading.

The typical configuration settings we use with this setting are the following:

  • command: This defines the command we’ll be running on the container. Typically it’ll be the command that runs the “server”. So, in the example you see /usr/bin/etcd ..., which will be the main process running.

  • config_files: This tells kolla where to read the configuration files from, and where to persist them to. Typically what this is used for is that the configuration generated by puppet is read from the host as “read-only”, and mounted on /var/lib/kolla/config_files/src. Subsequently, it is copied on to the right location by the kolla mechanisms. This way we make sure that the container has the right permissions for the right user, given we’ll typically be in another user namespace in the container.

  • permissions: As you would expect, this sets up the appropriate permissions for a file or set of files in the container.

docker_config section

This is the section where we tell TripleO what containers to start. Here, we explicitly write on which step to start which container. Steps are set as keys with the step_<step number> format. Inside these, we should set up keys with the specific container names. In our example, we’re running only the etcd container, so we use a key called etcd to give it such a name. A tool called paunch will read these parameters, and start the containers with those settings.

In our example, this is the container definition:

step_2:
  etcd:
    image: {get_param: DockerEtcdImage}
    net: host
    privileged: false
    restart: always
    healthcheck:
      test: /openstack/healthcheck
    volumes:
      - /var/lib/etcd:/var/lib/etcd
      - /etc/localtime:/etc/localtime:ro
      - /var/lib/kolla/config_files/etcd.json:/var/lib/kolla/config_files/config.json:ro
      - /var/lib/config-data/puppet-generated/etcd/:/var/lib/kolla/config_files/src:ro
    environment:
      - KOLLA_CONFIG_STRATEGY=COPY_ALWAYS

This is what we’re telling TripleO to do:

  • Start the container on step 2

  • Use the container image coming from the DockerEtcdImage heat parameter.

  • For the container, use the host’s network.

  • The container is not privileged.

  • Docker will use the /openstack/healthcheck endpoint for healthchecking

  • We tell it what volumes to mount

    • Aside from the necessary mounts, note that we’re bind-mounting the file /var/lib/kolla/config_files/etcd.json on to /var/lib/kolla/config_files/config.json. This will be read by kolla in order for the container to execute the actions we configured in the kolla_config section.

    • We also bind-mount /var/lib/config-data/puppet-generated/etcd/, which is where the puppet ran (which was ran inside a container) persisted the needed configuration files. We bind-mounted this to /var/lib/kolla/config_files/src since we told kolla to copy this to the correct location inside the container on the config_files section that’s part of kolla_config.

  • Environment tells the container engine which environment variables to set

    • We set KOLLA_CONFIG_STRATEGY=COPY_ALWAYS in the example, since this tells kolla to always execute the config_files and permissions directives as part of the kolla entry point. If we don’t set this, it will only be executed the first time we run the container.

docker_puppet_tasks section

These are containerized puppet executions that are meant as bootstrapping tasks. They typically run on a “bootstrap node”, meaning, they only run on one relevant node in the cluster. And are meant for actions that you should only execute once. Examples of this are: creating keystone endpoints, creating keystone domains, creating the database users, etc.

The format for this is quite similar to the one described in puppet_config section, except for the fact that you can set several of these, and they also run as part of the steps (you can specify several of these, divided by the step_<step number> keys).

Conclusion

With these sections you can create service templates for your containerized services. If you plan to develop a containerized service, I suggest you also read the guide on the containerized deployment from the TripleO documentation.

With this, we have covered all the need-to-know sections for you to be effective with TripleO templates. There are still several other sections which you can take use of. I’ll cover the rest in a subsequent blog post.

August 28, 2018 04:38 AM

August 27, 2018

Website and blog of Juan Antonio (Ozz) Osorio

Dissecting TripleO service templates (part 2)

In the previous blog post we covered the bare-minimum pieces you need to create a service. It was also briefly mentioned that you can use Ansible in order to configure your service.

In the blog post about the steps in which TripleO deploys services you can notice that there are three main bits in the deployment where Ansible is ran:

  • Host prep deployment (host_prep_tasks in the templates)

  • External deploy tasks (external_deploy_tasks in the templates)

  • Deploy steps tasks (deploy_steps_tasks in the templates)

Please note that External deploy tasks and deploy step tasks are only available since the Rocky release. So, if you need to write templates for Queens, this is not yet available.

Lets describe these better:

Host prep deployment (or host prep tasks)

This is seen as host_prep_tasks in the deployment service templates. These are Ansible tasks that run before the configuration steps start, and before any major services are configured (such as pacemaker). Here you would put actions such as wiping out your disk, or migrating log files.

Lets look at the output section of the example from the previous blog post:

outputs:
  role_data:
    description: Role data for the RHSM service.
    value:
      service_name: rhsm
      config_settings:
        tripleo::rhsm::firewall_rules: {}
      upgrade_tasks: []
      step_config: ''
      host_prep_tasks:
        - name: Red Hat Subscription Management configuration
          vars: {get_attr: [RoleParametersValue, value, vars]}
          block:
          - include_role:
              name: redhat-subscription

Here we see that an Ansible role is called directly from the host_prep_tasks section. In this case, we’re setting up the Red Hat subscription for the node where this is running. We would definitely want this to happen in the very beginning of the deployment, so host_prep_tasks is an appropriate place to put it.

External deploy tasks

These are Ansible tasks that take place in the node where you executed the “overcloud deploy”. You’ll find these in the service templates in the external_deploy_tasks section. These actions are also ran as part of the deployment steps, so you’ll have the step fact available in order to limit the ansible tasks to only run on a specific step. Note that this runs on each step before the “deploy steps tasks”, the puppet run, and the container deployment.

Typically you’ll see this used when, to configure a service, you need to execute an Ansible role that has special requirements for the Ansible inventory.

Such is the case for deploying OpenShift on baremetal via TripleO. The Ansible role for deploying OpenShift requires several hosts and groups to exist in the inventory, so we set those up in external_deploy_tasks:

...

- name: generate openshift inventory for openshift_master service
  copy:
    dest: "{{playbook_dir}}/openshift/inventory/{{tripleo_role_name}}_openshift_master.yml"
    content: |
      {% if master_nodes | count > 0%}
      masters:
        hosts:
        {% for host in master_nodes %}
        {{host.hostname}}:
            {{host | combine(openshift_master_node_vars) | to_nice_yaml() | indent(6)}}
        {% endfor %}
      {% endif %}

      {% if new_masters | count > 0 %}
      new_masters:
        hosts:
        {% for host in new_masters %}
        {{host.hostname}}:
            {{host | combine(openshift_master_node_vars) | to_nice_yaml() | indent(6)}}
        {% endfor %}

      new_etcd:
        children:
          new_masters: {}
      {% endif %}

      etcd:
        children:
          masters: {}

      OSEv3:
        children:
          masters: {}
          nodes: {}
          new_masters: {}
          new_nodes: {}
          {% if groups['openshift_glusterfs'] | default([]) %}glusterfs: {}{% endif %}

In the case of OpenShift, Ansible itself is also called as a command from here, using variables and the inventory that’s generated in this section. This way we don’t need to mix the inventory that the overcloud deployment itself is using with the inventory that the OpenShift deployment uses.

Deploy steps tasks

These are Ansible tasks that take place in the overcloud nodes. Note that like any other service, these tasks will only execute on the nodes whose role has this service enabled. You’ll find this as the deploy_steps_tasks section in the service templates. These actions are also ran as part of the deployment steps, so you’ll have the step fact available in order to limit the ansible tasks to only run on a specific step. Note that on each step, this runs after the “external deploy tasks”, but before the puppet run and the container deployment.

Typically you’ll run quite simple tasks in this section, such as setting the boot parameters for the nodes. Although, you can also run more complex roles, such as the IPSec service deployment for TripleO:

...
- name: IPSEC configuration on step 1
  when: step == '1'
  block:
  - include_role:
      name: tripleo-ipsec
    vars:
      map_merge:
      - ipsec_configure_vips: false
        ipsec_skip_firewall_rules: false
      - {get_param: IpsecVars}
...

This type of deployment applies for services that are better tied to TripleO’s Ansible inventory or that don’t require a specific inventory to run.

Conclusion

With these three options you can already build quite powerful service templates powered by Ansible. Please note that full support for “external deploy tasks” and “deploy steps tasks” came on the Rocky release; so this is not available for Queens.

Finally, in the next part of the series, I’ll describe all the relevant parts of the service template in order to deploy containerized services as implemented in TripleO.

August 27, 2018 12:18 PM

Dissecting TripleO service templates (part 1)

The purpose of this blog post is to dissect TripleO service templates and explain what each part does and why it’s there.

Please note that it’s important to know how to write Heat templates before continuing this guide; else you’re gonna be quite confused and won’t get the most benefit out of this.

As I mentioned in a previous blog post, all the service definitions for TripleO live in tripleo-heat-templates. At the time of writing this, we have three main directories where we can find these service definitions:

  • puppet/services/

  • docker/services/

  • extraconfig/services/

But, looking at the services in these directories can be quite confusing… Even knowing that the role_data output is the main thing, and that it has several options, it’s hard to discern what all these sections actually do; which are mandatory and which are optional; and even, in what cases are parameters mandatory and in which cases they aren’t. There’s a lot of legacy in these templates, and so, I thought trying to give some explanation for them would be a good idea.

What’s the bare-minimum?

Before, digging into details, it’s always good to know what the bare-minimum is. So lets look at a very minimal service template, rhsm.yaml

heat_template_version: rocky

description: Configure Red Hat Subscription Management.

parameters:
  RoleNetIpMap:
    default: {}
    type: json
  ServiceData:
    default: {}
    description: Dictionary packing service data
    type: json
  ServiceNetMap:
    default: {}
    description: Mapping of service_name -> network name. Typically set
                 via parameter_defaults in the resource registry.  This
                 mapping overrides those in ServiceNetMapDefaults.
    type: json
  DefaultPasswords:
    default: {}
    type: json
  RoleName:
    default: ''
    description: Role name on which the service is applied
    type: string
  RoleParameters:
    default: {}
    description: Parameters specific to the role
    type: json
  EndpointMap:
    default: {}
    description: Mapping of service endpoint -> protocol. Typically set
                 via parameter_defaults in the resource registry.
    type: json
  RhsmVars:
    default: {}
    description: Hash of ansible-role-redhat-subscription variables
                 used to configure RHSM.
    # The parameters contains sensible data like activation key or password.
    hidden: true
    tags:
      - role_specific
    type: json

resources:
  # Merging role-specific parameters (RoleParameters) with the default parameters.
  # RoleParameters will have the precedence over the default parameters.
  RoleParametersValue:
    type: OS::Heat::Value
    properties:
      type: json
      value:
        map_replace:
          - map_replace:
            - vars: RhsmVars
            - values: {get_param: [RoleParameters]}
          - values:
              RhsmVars: {get_param: RhsmVars}

outputs:
  role_data:
    description: Role data for the RHSM service.
    value:
      service_name: rhsm
      config_settings:
        tripleo::rhsm::firewall_rules: {}
      upgrade_tasks: []
      step_config: ''
      host_prep_tasks:
        - name: Red Hat Subscription Management configuration
          vars: {get_attr: [RoleParametersValue, value, vars]}
          block:
          - include_role:
              name: redhat-subscription

Lets go piece by piece and explain what’s going on.

Version and description

As with any other heat template, you do need to specify the heat_template_version, and preferably give a description of what the stack/template does.

Parameters

You’ll notice that there are a bunch of heat parameters defined in this template that are not necessarily used. This is because service templates are created in the form of a heat resource chain object. This type of objects can create a “chain” or a set of objects with the same parameters, and gather the outputs of them. So, eventually we pass the same mandatory parameters to the chain. This happens in the common/services.yaml file. Lets take a look and see how this is called:

  ServiceChain:
    type: OS::Heat::ResourceChain
    properties:
      resources: {get_param: Services}
      concurrent: true
      resource_properties:
        ServiceData: {get_param: ServiceData}
        ServiceNetMap: {get_param: ServiceNetMap}
        EndpointMap: {get_param: EndpointMap}
        DefaultPasswords: {get_param: DefaultPasswords}
        RoleName: {get_param: RoleName}
        RoleParameters: {get_param: RoleParameters}

Here we can see that the mandatory parameters for the services are the following:

  • ServiceData: Contains an entry called net_cidr_map, which is a map that has the CIDRs for each network in your deployment.

  • ServiceNetMap: Contains a mapping that tells you what network is each service configured at. Typical entries will look like: BarbicanApiNetwork: internal_api.

  • EndpointMap: Contains the keystone endpoints for each service. With this you’ll be able to get what port, what protocol, and even different entries for the public, internal and admin endpoints.

  • DefaultPasswords: Defines the default passwords for only some of the services… Namely, the following parameters are available through here: DefaultMysqlRootPassword, DefaultRabbitCookie, DefaultHeatAuthEncryptionKey, DefaultPcsdPassword, DefaultHorizonSecret. Note that TripleO usually will autogenerate the passwords with secure, randomly generated defaults, so this is barely used.

  • RoleName: This is the name of the role on which the service is applied. It could be one of the default roles (e.g. “Controller” or “Compute”), or a custom role, depending on how you’re deploying.

  • RoleParameters: A Map containing parameters to be applied to the specific role.

So, if you’re writing a service template yourself, these are the parameters you have to copy into your template.

Aside from these parameters, you can define any other parameter yourself for the service, and in order for your service to consume the parameter, you need to pass them via parameter_defaults.

The role_data output

This is the sole output that will be read and parsed in order to get the relevant information needed from your service. It’s value must be a map, and from the aforementioned example, it minimally contains the following:

  • service_name: This is the name of the service you’re configuring. The format is lower case letters and underscores. Setting this is quite important, since this is how TripleO reports what services are enabled, and generates appropriate hieradata, such as a list of all services enabled, and flags that say that your service is enabled on a certain node.

  • config_settings: This will contain a map of key value pairs; the map will be written to the hosts in the form of hieradata, which puppet can then run and use to configure your service. Note that the hieradata will only be written on hosts that are tagged with a role that enables your service.

  • upgrade_tasks: These are ansible tasks that run when TripleO is running an upgrade with your service enabled. If you don’t have any upgrade tasks to do, you still have to specify this output, but it’s enough to set it as an empty list.

  • step_config: This defines what puppet manifest should be run to configure your service. It typically is a string with the specific include statement that puppet will run. If you’re not configuring your service with puppet, then you need to set this value as an empty string. There is an exception, however: When you’re configuring a containerized service. We’ll dig into that later.

These are the bare-minimum sections of role_data you need to set up. However, you might have noticed that the example I linked above has another section called host_prep_data. This section is not mandatory, but it is one of the several ways you can execute Ansible tasks on the host in order to configure your service. These options powered by Ansible will be covered in the next part of this series.

Also note if the service is executing its configuration on bare-metal, step_config will execute in steps. So it’s important that the puppet manifests take steps into account (which you will note in the manifests in puppet-tripleo). If you want to understand what steps does TripleO execute, check out my blog post about it

August 27, 2018 11:31 AM

August 23, 2018

RDO Blog

Distributed-CI and InfraRed

Introduction

Red Hat OpenStack QE team maintains a tool to deploy and test OpenStack. This tool can deploy different types of topologies and is very modular. You can extend it to cover some new use-case. This tool is called InfraRed and is a free software and is available on GitHub.

The purpose of Distributed-CI (or DCI) is to help OpenStack partners to test new Red Hat OpenStack (RHOSP) releases before they are published. This allows them to train on new releases, identify regression or prepare new driver ahead of time. In this article, we will explain how to integrate InfraRed with another too called Distributed-CI, or DCI.

InfraRed

InfraRed has been designed to be flexible and it can address numerous different use-cases. In this article, we will use it to prepare a virtual environment and driver a regular Red Hat OpenStack Platform 13 (OSP13) deployment on it.

InfraRed is covered by a complete documentation that we won’t copy past here. To summarize, once it’s installed, InfraRed exposes a CLI. This CLI gives the user the ability to create a workspace that will trace the state of the environment. The user can then trigger all the required steps to ultimately get a running OpenStack. In addition, InfraRed offers additional features through a plug-in system.

Distributed-CI

Global diagram of DCI

The partners use DCI to validate OpenStack on their labs. It’s a way to validate that they will still be able to use their gear with the next release. A DCI agent runs the deployment and is in charge of the communication with Red Hat. They then have to provide a set of scripts to deploy OpenStack on it automatically. These scripts will be used during the deployment.

DCI can be summarized with the following list of actions:

  1. Red Hat exposes the last internal snapshots of the product on the DCI
  2. Partner’s DCI agent pulls the last snapshot and deploys it internally using the local configuration and deployment scripts
  3. Partner’s DCI agent runs the tests and sends back the final result to DCI.

Deployment of the lab

For this article, we will use a libvirt hypervisor to virtualize our lab. The hypervisor can be based either on RHEL7 or a CentOS7.

The network configuration

In this tutorial, we will rely on libvirt ‘default’ network. This network uses the 192.168.122.0 range. 192.168.122.1 is our hypervisor. The IP addresses of the other VM will by dynamical and InfraRed will create some additional networks for you. We also use the hypervisor public IP which is `192.168.1.40.

Installation of the Distributed-CI agent for OpenStack

The installation of DCI agent is covered by its own documentation. All the steps are rather simple as soon as the partner has a host to run the agent that matches DCI requirements. This host is called the jumpbox in DCI jargon. In this document, the jumpbox is also the hypervisor host.

In the rest of this document will assume you have an admin access to a DCI project, that you created the remoteci on http://www.distributed-ci.io and that you have deployed the agent on your jumpbox with the help if its installation guide. To validate everything, you should be able to list the remoteci of your tenant with the following command.

# source /etc/dci-ansible-agent/dcirc.sh
# dcictl remoteci-list
+--------------------------------------+--------------+--------+------------------------------------------------------------------+--------+--------------------------------------+--------------------------------------+
|                  id                  |     name     | state  |                            api_secret                            | public |               role_id                |               team_id                |
+--------------------------------------+--------------+--------+------------------------------------------------------------------+--------+--------------------------------------+--------------------------------------+
| e86ab5ba-695c-4437-b163-261e20b20f56 | FutureTown | active | something |  None  | e5e20d68-bbbe-411c-8be4-e9dbe83cc74e | 2517154c-46b4-4db9-a447-1c89623cc00a |
+--------------------------------------+--------------+--------+------------------------------------------------------------------+--------+--------------------------------------+--------------------------------------+

So far so good, we can now start the agent for the very first time with:

# systemctl start dci-ansible-agent --no-block
# journalctl -exf -u dci-ansible-agent

The agent pulls the bits from Red Hat and uses the jumpbox to expose them. Technically speaking, it’s a Yum repository in /var/www/html and a image registry on port 5000. These resources need to be consumed during the deployment. Since we don’t have any configuration yet, the run will fail. It’s time to fix that and prepare our integration with InfraRed.

One of the crucial requirement is the set of scripts that will be used to deploy OpenStack. Those scripts are maintained by the user. They will be called by the agent through a couple of Ansible playbooks:

  • hooks/pre-run.yml: This playbook is the very first one to called on the jumpbox. It’s the place where the partner can, for instance, fetch the last copy of the configuration.
  • hooks/running.yml: This is the place where the automation will be called. Most of the time, it’s a couple of extra Ansible tasks that will call a script or include another playbook.

Preliminar configuration

Security, firewall and SSH keypair

Some services like Apache will be exposed without any restriction. This is why we assume the hypervisor is on a trusted network.

We take the freedom to disable firewalld to simplify the whole process. Please do:

# systemctl stop firewalld
# systemctl disable firewalld

InfraRed interacts with the hypervisor using SSH. Just a reminder, in our case, the hypervisor is the local machine. To keep the whole setup simple, we share the same SSH key for the root and dci-ansible-agent users:

# ssh-keygen
# mkdir -p /var/lib/dci-ansible-agent/.ssh
# cp /root/.ssh /var/lib/dci-ansible-agent/.ssh
# chown -R dci-ansible-agent:dci-ansible-agent /var/lib/dci-ansible-agent/.ssh
# chmod 700 /var/lib/dci-ansible-agent/.ssh
# chmod 600 /var/lib/dci-ansible-agent/.ssh/*
# restorecon /var/lib/dci-ansible-agent/.ssh

You can validate everything work fine with:

# su - dci-ansible-agent
$ ssh root@localhost id

Libvirt

We will deploy OpenStack on our libvirt hypervisor with the Virsh provisioner.

# yum install libvirt
# systemctl start libvirtd
# systemctl enable libvirtd

Red Hat Subscription Manager configuration (RHSM)

InfraRed uses the RHSM during the deployment to register the nodes and pull the last RHEL updates. It loads the credentials from a little YAML file that you can store in the /etc/dci-ansible-agent directory with the other files:

# cat /etc/dci-ansible-agent/cdn_creds.yml
username: gleboude@redhat.com
password: 9328878db3ea4519912c36525147a21b
autosubscribe: yes

RHEL guest image

InfraRed needs a RHEL guest image to prepare the nodes. It tries hard to download it by itself, thanks InfraRed… But the default location is https://url.corp.redhat.com/rhel-guest-image-7-5-146-x86-64-qcow2 which is unlikely to match your environment. Got on https://access.redhat.com/downloads and download the last RHEL guest image. The file should be stored here on your hypervisor: /var/lib/libvirt/images/rhel-guest-image-7-5-146-x86-64-qcow2. The default image name will probably change in the future, you can list the default values for the driver with the infrared (or ir) command:

# su - dci-ansible-agent
$ source .venv/bin/activate
$ ir virsh --help

Configure the agent for InfraRed

All the configuration files of this example are available on GitHub.

Run bootstrap (pre-run.yml)

First, we want to install InfraRed dependencies and prepare a virtual environment. These steps will be done with the pre-run.yml.

---
- name: Install the RPM that InfraRed depends on
  yum:
    name: '{{ item }}'
    state: present
  with_items:
  - git
  - python-virtualenv
  become: True

We pull InfraRed directly from its Git repository using Ansible’s git module:

- name: Wipe any existing infrared virtualenv
  file:
    path: ~/infrared
    state: absent

- name: Pull the last InfraRed version
  git:
    repo: https://github.com/openstack-redhat/infrared.git
    dest: /var/lib/dci-ansible-agent/infrared
    version: master

Finally, we prepare a Python virtual environment to preserve the integrity of the system and we install InfraRed in it.

- name: Wipe any existing infrared virtualenv
  file:  ~/infrared/.venv
    state: absent
- name: Install InfraRed in a fresh virtualenv
  shell: |
    cd ~/infrared
    virtualenv .venv &amp;&amp; source .venv/bin/activate
    pip install --upgrade pip
    pip install --upgrade setuptools
    pip install .

As mentioned above, the agent is called by the dci-ansible-agent user, we have to ensure everything is done in its home directory.

- name: Enable the InfraRed plugins that we will use during the deployment
  shell: |
    cd ~/infrared
    source .venv/bin/activate
    infrared plugin add plugins/virsh
    infrared plugin add plugins/tripleo-undercloud
    infrared plugin add plugins/tripleo-overcloud

Before we start anything, we do a cleanup of the environment. For that, we rely on InfraRed. Its virsh plugin can remove all the existing resources thinks to the --cleanup argument:

- name: Clean the hypervisor
  shell: |
    cd ~/infrared
    source .venv/bin/activate
    infrared virsh \
      --host-address 192.168.122.1 \
      --host-key $HOME/.ssh/id_rsa \
      --cleanup True

Be warned, InfraRed removes all the existing VM, network and storages from your hypervisor.

Hosts deployment (running.yml)

As mentioned before, the running.yml is actually the place where the deployment is actually done. We ask InfraRed to prepare our hosts:

- name: Prepare the hosts
  shell: |
    cd ~/infrared
    source .venv/bin/activate
    infrared virsh \
      --host-address 192.168.122.1 \
      --host-key $HOME/.ssh/id_rsa \
      --topology-nodes undercloud:1,controller:1,compute:1

Undercloud deployment (running.yml)

We can now deploy the Undercloud:

- name: Install the undercloud
  shell: |
    cd ~/infrared
    source .venv/bin/activate
    infrared tripleo-undercloud \
      --version 13 \
      --images-task rpm \
      --cdn /etc/dci-ansible-agent/cdn_creds.yml \
      --repos-skip-release True \
      --repos-url http://192.168.1.40/dci_repo/dci_repo.repo

At this stage, our libvirt virtual machines are ready and one of them host the undercloud. All these machines have a floating IP. InfraRed keeps the machines names up to date in /etc/hosts. We rely on that to get the undercloud IP address:

- name: Registry InfraRed's undercloud-0 IP
  set_fact: undercloud_ip="{{ lookup('pipe', 'getent hosts undercloud-0').split()[0]}}"

You can also use InfraRed to interact with all these hosts with a dynamic IP:

# su - dci-ansible-agent
$ cd ~/infrared
$ source .venv/bin/activate
$ ir ssh undercloud-0

Here ir is an alias for the infrared command. In both cases, it’s pretty cool, InfraRed did all the voodoo for us.

Overcloud deployment (running.yml)

It’s time to run the final step of our deployment.

- name: Deploy the overcloud
 shell: |
     cd ~/infrared
    source .venv/bin/activate
    infrared tripleo-overcloud \
      --deployment-files virt \
      --version 13 \
      --introspect yes \
      --tagging yes \
      --deploy yes \
      --post yes \
      --containers yes \
      --registry-skip-puddle yes \
      --registry-undercloud-skip yes \
      --registry-mirror 192.168.1.40:5000 \
      --registry-tag latest \
      --registry-namespace rhosp13 \
      --registry-prefix openstack- \
      --vbmc-host undercloud \
      --ntp-server 0.rhel.pool.ntp.org

Here we pass some extra arguments to accommodate InfraRed:

  • --registry-mirror: we don’t want to use the images from Red Hat. Instead, we will pick the ones delivered by DCI. Here 192.168.1.40 is the first IP address of our jumpbox. It’s the one the agent use when it deploys the image registry. Use the following command to validate you use the correct address: cat /etc/docker-distribution/registry/config.yml|grep addr
  • --registry-namespace and --registry-prefix: our images name start with /rhosp13/openstack-.
  • --vbmc-host undercloud: During the Overcloud installation, TripleO uses Ironic for the node provisioning. Ironic interacts with the nodes through a Virtual BMC server. By default InfraRed install it on the hypervisor, in our case we prefer to keep it clean. This is why we target the undercloud instead.

The virtual BMC instances will look like that on the undercloud:

[stack@undercloud-0 ~]$ ps aux|grep bmc
stack     4315  0.0  0.0 426544 15956 ?        Sl   13:19   0:00 /usr/bin/python2 /usr/bin/vbmc start controller-2
stack     4383  0.0  0.0 426544 15952 ?        Sl   13:19   0:00 /usr/bin/python2 /usr/bin/vbmc start controller-1
stack     4451  0.0  0.0 426544 15952 ?        Sl   13:19   0:00 /usr/bin/python2 /usr/bin/vbmc start controller-0
stack     4520  0.0  0.0 426544 15936 ?        Sl   13:19   0:00 /usr/bin/python2 /usr/bin/vbmc start compute-1
stack     4590  0.0  0.0 426544 15948 ?        Sl   13:19   0:00 /usr/bin/python2 /usr/bin/vbmc start compute-0
stack    10068  0.0  0.0 112708   980 pts/0    S+   13:33   0:00 grep --color=auto bmc

DCI lives

Let’s start the beast!

Ok, at this stage, we can start the agent. The standard way to trigger a DCI run is through systemd:

# systemctl start dci-ansible-agent --no-block

A full run takes more than 2 hours, the --no-block argument above tells systemctl to give back the control to the shell. Even if the unit’s start-up is not completed yet.

You can follow the progress of your deployment either on the web interface: https://www.distributed-ci.io/ or with journalctl:

# journalctl -exf -u dci-ansible-agent

The CLI

DCI also comes with CLI interface that you can use directly on the hypervisor.

# source /etc/dci-ansible-agent/dcirc.sh
# dcictl job-list

This command can also give you an output in the JSON format. It’s handy when you want to reuse the DCI results in some script:

# dcictl --format json job-list --limit 1 | jq .jobs[].status
running

To conclude

I hope you enjoyed the article and this will help you to prepare your own configuration. Please, don’t hesitate to contact me if you have any question.

I would like to thanks François Charlier and the InfraRed team. François started the DCI InfraRed integration several months ago. He did a great job to resolve all the issues one by one with the help of the InfraRed team.

by Gonéri Le Bouder at August 23, 2018 05:13 PM

August 15, 2018

Website and blog of Jiří Stránský

Upgrading Ceph and OKD (OpenShift Origin) with TripleO

In OpenStack’s Rocky release, TripleO is transitioning towards a method of deployment we call config-download. Basically, instead of using Heat to deploy the overcloud end-to-end, we’ll be using Heat only to manage the hardware resources and Ansible tasks for individual composable services. Execution of software configuration management (which is Ansible on the top level) will no longer go through Heat, it will be done directly. If you want to know details, i recommend watching James Slagle’s TripleO Deep Dive about config-download.

Transition towards config-download affects also services/components which we deploy by embedding external installers, like Ceph or OKD (aka OpenShift Origin). E.g. previously we’ve deployed Ceph via a Heat resource, which created a Mistral workflow, which executed ceph-ansible. This is no longer possible with config-download, so we had to adapt the solution for external installers.

Deployment architecture

Before talking about upgrades, it is important to understand how we deploy services with external installers when using config-download.

Deployment using external installers with config-download has been developed during OpenStack’s Queens release cycle for the purpose of installing Kubernetes and OpenShift Origin. In Rocky release, installation of Ceph and Skydive services transitioned to using the same method (shout out to Giulio Fidente and Sylvain Afchain who ported those services to the new method).

The general solution is described in my earlier Kubernetes in TripleO blog post. I recommend being somewhat familiar with that before reading on.

Upgrades architecture

In OpenStack, and by extension in TripleO, we distinguish between minor updates and major upgrades, but with external installers the distinction is sometimes blurred. The solution described here was applied to both updates and upgrades. We still make a distinction between updates and upgrades with external installers in TripleO (e.g. by having two different CLI commands), but the architecture is the same for both. I will only mention upgrades in the text below for the sake of brevity, but everything described applies for updates too.

It was more or less given that we would use Ansible tasks for upgrades with external installers, same as we already use Ansible tasks for their deployment. However, we had two possible approaches suggest themselves. Option A was to execute service’s upgrade tasks and then immediately its deploy tasks, favoring service upgrade procedure which reuses a significant part of that service’s deployment procedure. Option B was to execute only upgrade tasks, giving more separation between the deployment and upgrade procedures, at the risk of producing repetitive code in the service templates.

We went with option A (upgrade procedure includes re-execution of deploy tasks). The upgrade tasks in this architecture are mainly meant to set variables which then affect what the deploy tasks do (e.g. select a different Ansible playbook to run). Note that with this solution, it is still possible to fully skip the deploy tasks if needed (using variables and when conditions), but it optimizes for maximum reuse between upgrade and deployment procedures.

Upgrades with external installers

Implementation for Ceph and OKD

With the focus on reuse of deploy tasks, and both ceph-ansible and openshift-ansible being suitable for such approach, implementing upgrades via the architecture described above didn’t require much code.

Feel free to skim through the Ceph upgrade and OKD upgrade patches to get an idea of how the upgrades were implemented.

CLI and workflow

In CLI, the external installer upgrades got a new command openstack overcloud external-upgrade run. (For minor version updates it is openstack overcloud external-update run, service template authors may decide if they want to distinguish between updates and upgrades, or if they want to run the same code.)

The command is a part of the normal upgrade workflow, and should be run between openstack overcloud upgrade prepare and openstack overcloud upgrade converge. It is recommended to execute it after openstack overcloud upgrade run, which corresponds to the place within upgrade workflow where we have been upgrading Ceph.

After introducing the new external-upgrade run command we have removed ceph-upgrade run command. This means that Ceph is no longer a special citizen in the TripleO upgrade procedure, and uses generic commands and hooks available to any other service.

Separate execution of external installers

There might be more services utilizing external installers within a single TripleO-managed environment, and the operator might wish to upgrade them separately. openstack overcloud external-upgrade run would upgrade all of them at the same time.

We started adding Ansible tags to the external upgrade and deploy tasks, allowing us to select which installers we want to run. This way openstack overcloud external-upgrade run --tags ceph would only run ceph-ansible, similarly openstack overcloud external-upgrade run --tags openshift would only run openshift-ansible. This also allows fine tuning the spot in the upgrade workflow where operator wants to run a particular external installer upgrade (e.g. before or after upgrade of natively managed TripleO services).

A full upgrade workflow making use of these possibilities could then perhaps look like this:

openstack overcloud upgrade prepare <args>
openstack overcloud external-upgrade run --tags openshift
openstack overcloud upgrade run --roles Controller
openstack overcloud upgrade run --roles Compute
openstack overcloud external-upgrade run --tags ceph
openstack overcloud upgrade converge <args>

by Jiří Stránský at August 15, 2018 12:00 AM

July 31, 2018

RDO Blog

Community Blog Round Up: 31 July

One last happy birthday to OpenStack before we get ready to wrap up Rocky and prep for OpenStack PTG in Denver Colorado. Mary makes us drool over cupcakes, Carlos asks for our vote for his TripleO presentations, and Assaf dives into tenant, provider, and external neutron networks!

Happy Birthday OpenStack from SF Bay Area Open Infra by Mary Thengvall

I love birthday celebrations! They’re so full of joy and reminiscing of years gone by. Discussions of “I knew her when” or “Remember when he… ?” They have a tendency to bring separate communities of people together in unique and fun ways. And we all know how passionate I am about communities…

Read more at https://blogs.rdoproject.org/2018/07/happy-birthday-openstack-from-sf-bay-area-open-infra/

Vote for the OpenStack Berlin Summit presentations! by Carlos Camacho

I pushed some presentations for this year OpenStack summit in Berlin, the presentations are related to updates, upgrades, backups, failures and restores.

Read more at https://www.anstack.com/blog/2018/07/24/openstack-berlin-summit-vote-for-presentations.html

Tenant, Provider and External Neutron Networks by assafmuller

To this day I see confusion surrounding the terms: Tenant, provider and external networks. No doubt countless words have been spent trying to tease apart these concepts, so I thought that it’d be a good use of my time to write 470 more.

Read more at https://assafmuller.com/2018/07/23/tenant-provider-and-external-neutron-networks/

by Rain Leander at July 31, 2018 09:57 AM

July 30, 2018

RDO Blog

Happy Birthday OpenStack from SF Bay Area Open Infra

I love birthday celebrations! They’re so full of joy and reminiscing of years gone by. Discussions of “I knew her when” or “Remember when he… ?” They have a tendency to bring separate communities of people together in unique and fun ways. And we all know how passionate I am about communities…

So when Rain Leander suggested that I attend the SF Bay Area celebration of OpenStack’s 8th birthday as one of my last tasks as interim community lead for RDO, I jumped at the chance! Celebrating a birthday AND getting to know this community better, as well as reuniting with friends I already knew? Sign me up!

I arrived at the event in time to listen to a thought-provoking panel led by Lisa-Marie Namphy, Developer Advocate, community architect, and open source software specialist at Portworx. She spoke with Michael Richmond (NIO), Tony Campbell (Red Hat) and Robert Starmer (Kumulus Tech) about Kubernetes in the Real World.

Lew Tucker, CTO of Cisco, spoke next, and said one of my favorite quotes of the night:

Cloud computing has won… and it’s multiple clouds.

My brain instantly jumped to wondering about the impact that community has had on the fact that it’s not a particular company that has won in this new stage of technology, but a concept.

Dinner and mingling with OpenStack community friends new and old was up next, followed by an awesome recap of how the OpenStack community has grown over the last 8 years.

While 8 years in the grand scheme of history doesn’t seem like much, in the Open Source world, it’s a big accomplishment. The fact that OpenStack is up to 89,000+ community members in 182 countries and supported by 672 organizations is a huge feat and one that deserves to be celebrated!

Speaking of celebrating… we at RDO express our appreciation and love for community through sharing food (Rocky Road ice cream, anyone?) and this celebration was no exception. We provided the best (and cutest) mini cupcakes that I’ve ever had. The Oreo cupcake with cookie frosting gets two thumbs up in my book!

The night ended with smiles and promises of another great year to come.

Here’s to the next 8 years of fostering and building communities, moving the industry forward, and enjoying the general awesomeness that is OpenStack.

by Mary Thengvall at July 30, 2018 12:14 PM

July 24, 2018

Carlos Camacho

Vote for the OpenStack Berlin Summit presentations!

¡¡¡Please vote!!!

I pushed some presentations for this year OpenStack summit in Berlin, the presentations are related to updates, upgrades, backups, failures and restores.

Happy TripleOing!

by Carlos Camacho at July 24, 2018 12:00 AM

July 23, 2018

Assaf Muller

Tenant, Provider and External Neutron Networks

To this day I see confusion surrounding the terms: Tenant, provider and external networks. No doubt countless words have been spent trying to tease apart these concepts, so I thought that it’d be a good use of my time to write 470 more.

At a Glance

Creator Model Segmentation External router interfaces
Tenant User Self service Selected by Neutron
Provider Administrator Pre created & shared Selected by the creator
External Administrator Pre created & shared Selected by the creator Yes

A Closer Look

Tenant networks are created by users, and Neutron is configured to automatically select a network segmentation type like VXLAN or VLAN. The user cannot select the segmentation type.

Provider networks are created by administrators, that can set one or more of the following attributes:

  1. Segmentation type (flat, VLAN, Geneve, VXLAN, GRE)
  2. Segmentation ID (VLAN ID, tunnel ID)
  3. Physical network tag

Any attributes not specified will be filled in by Neutron.

OpenStack Neutron supports self service networking – the notion that a user in a project can articulate their own networking topology, completely isolated from other projects in the same cloud, via the support of overlapping IPs and other technologies. A user can create their own network and subnets without the need to open a support ticket or the involvement of an administrator. The user creates a Neutron router, connects it to the internal and external networks (defined below) and off they go. Using the built-in ML2/OVS solution, this implies using the L3 agent, tunnel networks, floating IPs and liberal use of NAT techniques.

Provider networks (read: pre-created networks) is an entirely different networking architecture for your cloud. You’d forgo the L3 agent, tunneling, floating IPs and NAT. Instead, the administrator creates one or more provider networks, typically using VLANs, shares them with users of the cloud, and disables the ability of users to create networks, routers and floating IPs. When a new user signs up for the cloud, the pre-created networks are already there for them to use. In this model, the provider networks are typically routable – They are advertised to the public internet via physical routers via BGP. Therefor, provider networks are often said to be mapped to pre-existing data center networks, both in terms of VLAN IDs and subnet properties.

External networks are a subset of provider networks with an extra flag enabled (aptly named ‘external’). The ‘external’ attribute of a network signals that virtual routers can connect their external facing interface to the network. When you use the UI to give your router external connectivity, only external networks will show up on the list.

To summarize, I think that the confusion is due to a naming issue. Had the network types been called: self-service networks, data center networks and external networks, this blog post would not have been necessary and the world would have been even more exquisite.

by assafmuller at July 23, 2018 05:56 PM

July 18, 2018

RDO Blog

Community Blog Round-Up: 18 July

We’ve got three posts this week related to OpenStack – Adam Young’s insight on how to verify if a patch has been tested as a reviewer, while Zane Bitter takes a look at OpenStack’s multiple layers of services, and then Nir Yechiel introduces us to the five things we need to know about networking on Red Hat OpenStack Platform 13. As always, if you know of an article not included in this round up, please comment below or track down leanderthal (that’s me! Rain Leander!) on Freenode irc #rdo.

Testing if a patch has test coverage by Adam Young

When a user requests a code review, the review is responsible for making sure that the code is tested. While the quality of the tests is a subjective matter, their presences is not; either they are there or they are not there. If they are not there, it is on the developer to explain why or why not.

Read more at https://adam.younglogic.com/2018/07/testing-patch-has-test/

Limitations of the Layered Model of OpenStack by Zane Bitter

One model that many people have used for making sense of the multiple services in OpenStack is that of a series of layers, with the ‘compute starter kit’ projects forming the base. Jay Pipes recently wrote what may prove to be the canonical distillation (this post is an edited version of my response):

Read more at https://www.zerobanana.com/archive/2018/07/17#openstack-layer-model-limitations

Red Hat OpenStack Platform 13: five things you need to know about networking by Nir Yechiel, Principal Product Manager, Red Hat

Red Hat OpenStack Platform 13, based on the upstream Queens release, is now Generally Available. Of course this version brings in many improvements and enhancements across the stack, but in this blog post I’m going to focus on the five biggest and most exciting networking features found this latest release.–

Read more at https://redhatstackblog.redhat.com/2018/07/12/red-hat-openstack-platform-13-five-things-you-need-to-know-about-networking/

by Rain Leander at July 18, 2018 09:05 AM

July 17, 2018

Adam Young

Testing if a patch has test coverage

When a user requests a code review, the review is responsible for making sure that the code is tested.  While the quality of the tests is a subjective matter, their presences is not;  either they are there or they are not there.  If they are not there, it is on the developer to explain why or why not.

Not every line of code is testable.  Not every test is intelligent.  But, at a minimum, a test should ensure that the code in a patch is run at least once, without an unexpected exception.

For Keystone and related projects, we have a tox job called cover that we can run on a git repo at a given revision.  For example, I can code review (even without git review) by pulling down a revision using the checkout link in  gerrit, and then running tox:

 

git fetch git://git.openstack.org/openstack/keystoneauth refs/changes/15/583215/2 && git checkout FETCH_HEAD
git checkout -b netloc-and-version
tox -e cover

I can look at the patch using show –stat to see what files were changed:

$ git show --stat
commit 2ac26b5e1ccdb155a4828e3e2d030b55fb8863b2
Author: wangxiyuan 
Date:   Tue Jul 17 19:43:21 2018 +0800

    Add netloc and version check for version discovery
    
    If the url netloc in the catalog and service's response
    are not the same, we should choose the catalog's and
    add the version info to it if needed.
    
    Change-Id: If78d368bd505156a5416bb9cbfaf988204925c79
    Closes-bug: #1733052

 keystoneauth1/discover.py                                 | 16 +++++++++++++++-
 keystoneauth1/tests/unit/identity/test_identity_common.py |  2 +-

and I want to skip looking at any files in keystoneauth1/tests as those are not production code. So we have 16 lines of new code. What are they?

Modifying someone elses’ code, I got to

 git show | gawk 'match($0,"^@@ -([0-9]+),[0-9]+ [+]([0-9]+),[0-9]+ @@",a){left=a[1];right=a[2];next};\
   /^\+\+\+/{print;next};\
   {line=substr($0,2)};\
   /^-/{left++; next};\
   /^[+]/{print right++;next};\
   {left++; right++}'

Which gives me:

+++ b/keystoneauth1/discover.py
420
421
422
423
424
425
426
427
428
429
430
431
432
433
437
+++ b/keystoneauth1/tests/unit/identity/test_identity_common.py
332

Looking in a the cover directory, I can see if a line is uncovered by its class:

class="stm mis"

For example:

$ grep n432\" cover/keystoneauth1_discover_py.html | grep "class=\"stm mis\""

432

For the lines above, I can use a seq to check them, since they are in order (with none missing)

for LN in `seq 420 437` ; do grep n$LN\" cover/keystoneauth1_discover_py.html ; done

Which produces:

420

421

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

I drop the grep “class=\”stm mis\”” to make sure I get something, then add it back in, and get no output.

by Adam Young at July 17, 2018 05:35 PM

Zane Bitter

Limitations of the Layered Model of OpenStack

One model that many people have used for making sense of the multiple services in OpenStack is that of a series of layers, with the ‘compute starter kit’ projects forming the base. Jay Pipes recently wrote what may prove to be the canonical distillation (this post is an edited version of my response):

Nova, Neutron, Cinder, Keystone and Glance are a definitive lower level of an OpenStack deployment. They represent a set of required integrated services that supply the most basic infrastructure for datacenter resource management when deploying OpenStack. Depending on the particular use cases and workloads the OpenStack deployer wishes to promote, an additional layer of services provides workload orchestration and workflow management capabilities.

I am going to explain why this viewpoint is wrong, but first I want to acknowledge what is attractive about it (even to me). It contains a genuinely useful observation that leads to a real insight.

The insight is that whereas the installation instructions for something like Kubernetes usually contain an implicit assumption that you start with a working datacenter, the same is not true for OpenStack. OpenStack is the only open source project concentrating on the gap between a rack full of unconfigured equipment and somewhere that you could run a higher-level service like Kubernetes. We write the bit where the rubber meets the road, and if we do not there is nobody else to do it! There is an almost infinite variety of different applications and they will all need different parts of the higher layers, but ultimately they must be reified in a physical data center and when they are OpenStack will be there: that is the core of what we are building.

It is only the tiniest of leaps from seeing that idea as attractive, useful, and genuinely insightful to believing it is correct. I cannot really blame anybody who made that leap. But an abyss awaits them nonetheless.

Back in the 1960s and early 1970s there was this idea about Artificial Intelligence: even a 2 year old human can (for example) recognise images with a high degree of accuracy, but doing (say) calculus is extremely hard in comparison and takes years of training. But computers can already do calculus! Ergo, we have solved the hardest part already and building the rest out of that will be trivial, AGI is just around the corner, and so on. The popularity of this idea arguably helped created the AI bubble, and the inevitable collision with the reality of its fundamental wrongness led to the AI Winter. Because, in fact, though you can build logic out of many layers of heuristics (as human brains do), it absolutely does not follow that it is trivial to build other things that also require layers of heuristics out of some basic logic building blocks. (In contrast, the AI technology of the present, which is showing more promise, is called Deep Learning because it consists literally of multiple layers of heuristics. It is also still considerably worse at it than any 2 year old human.)

I see the problem with the OpenStack-as-layers model as being analogous. (I am not suggesting there will be a full-on OpenStack Winter, but we are well past the Peak of Inflated Expectations.) With Nova, Keystone, Glance, Neutron, and Cinder you can build a pretty good Virtual Private Server hosting service. But it is a mistake to think that cloud is something you get by layering stuff on top of VPS hosting. It is relatively easy to build a VPS host on top of a cloud, just like teaching someone calculus. But it is enormously difficult to build a cloud on top of a VPS host (it would involve a lot of expensive layers of abstraction, comparable to building artificial neurons in software).

That is all very abstract, so let me bring in a concrete example. Kubernetes is event-driven at a very fundamental level: when a pod or a whole kubelet dies, Kubernetes gets a notification immediately and that prompts it to reschedule the workload. In contrast, Nova/Cinder/&c. are a black hole. You cannot even build a sane dashboard for your VPS—let alone cloud-style orchestration—over them, because it will have to spend all of its time polling the APIs to find out if anything happened. There is an entire separate project, that almost no deployments include, basically dedicated to spelunking in the compute node without Nova’s knowledge to try to surface this information. It is no criticism of the team in question, who are doing something that desperately needs doing in the only way that is really open to them, but the result is an embarrassingly bad architecture for OpenStack as a whole.

So yes, it is sometimes helpful to think about the fact that there is a group of components that own the low level interaction with outside systems (hardware, or IdM in the case of Keystone), and that almost every application will end up touching those directly or indirectly, while each using different subsets of the other functionality… but only in the awareness that those things also need to be built from the ground up as interlocking pieces in a larger puzzle.

Saying that the compute starter kit projects represent a ‘definitive lower level of an OpenStack deployment’ invites the listener to ignore the bigger picture; to imagine that if those lower level services just take care of their own needs then everything else can just build on top. That is a mistake, unless you believe that OpenStack needs only to provide enough building blocks to build VPS hosting out of, because support for all of those higher-level things does not just fall out for free. You have to consciously work at it.

Imagine for a moment that, knowing everything we know now, we had designed OpenStack around a system of event sources and sinks that are reliable in the face of hardware failures and network partitions, with components connecting into it to provide services to the user and to each other. That is what Kubernetes did. That is the key to its success. We need to enable something similar, because OpenStack is still necessary even in a world where Kubernetes exists.

One reason OpenStack is still necessary is the one we started with above: something needs to own the interaction with the underlying physical infrastructure, and the alternatives are all proprietary. Another place where OpenStack can provide value is by being less opinionated and allowing application developers to choose how the event sources and sinks are connected together. That means that users should, for example, be able to customise their own failover behaviour in ‘userspace’ rather than rely on the one-size-fits-all approach of handling everything automatically inside Kubernetes. This is theoretically an advantage of having separate projects instead of a monolithic design—though the fact that the various agents running on a compute node are more tightly bound to their corresponding services than to each other has the potential to offer the worst of both worlds.

All of these thoughts will be used as fodder for writing a technical vision statement for OpenStack. My hope is that will help align our focus as a community so that we can work together in the same direction instead of at cross-purposes. Along the way, we will need many discussions like this one to get to the root of what can be some quite subtle differences in interpretation that nevertheless lead to divergent assumptions. Please join in if you see one happening!

by Zane Bitter at July 17, 2018 03:17 PM

July 15, 2018

Nir Yechiel

July 13, 2018

Red Hat Stack

Red Hat OpenStack Platform 13: five things you need to know about networking

Red Hat OpenStack Platform 13, based on the upstream Queens release, is now Generally Available. Of course this version brings in many improvements and enhancements across the stack, but in this blog post I’m going to focus on the five biggest and most exciting networking features found this latest release.

franck-v-705445-unsplashPhoto by Franck V. on Unsplash

ONE: Overlay network management – bringing consistency and better operational experience

Offering solid support for network virtualization was always a priority of ours. Like many other OpenStack components, the networking subsystem (Neutron) is pluggable so that customers can choose the solution that best fits their business and technological requirements. Red Hat OpenStack Platform 13 adds support for Open Virtual Network (OVN), a network virtualization solution which is built into the Open vSwitch (OVS) project. OVN supports the Neutron API, and offers a clean and distributed implementation of the most common networking capabilities such as bridging, routing, security groups, NAT, and floating IPs. In addition to OpenStack, OVN is also supported in Red Hat Virtualization (available with Red Hat Virtualization 4.2 which was announced earlier this year), with support for Red Hat OpenShift Container Platform expected down the road. This marks our efforts to create consistency and a more unified operational experience between Red Hat OpenStack Platform, Red Hat OpenShift, and Red Hat Virtualization.     

OVN was available as a technology preview feature with Red Hat OpenStack Platform 12, and is now fully supported with Red Hat OpenStack Platform 13. OVN must be enabled as the overcloud Neutron backend from Red Hat OpenStack Platform director during deployment time, as the default Neutron backend is still ML2/OVS. Also note that migration tooling from ML2/OVS to OVN is not supported with Red Hat OpenStack Platform 13, and is expected to be offered in a future release, and so OVN is only recommended for new deployments.

TWO: Open source SDN Controller

OpenDaylight is a flexible, modular, and open software-defined networking (SDN) platform, which is now fully integrated and supported with Red Hat OpenStack Platform 13. The Red Hat offering combines carefully selected OpenDaylight components that are designed to enable the OpenDaylight SDN controller as a networking backend for OpenStack, giving it visibility into, and control over, OpenStack networking, utilization, and policies.

OpenDaylight is co-engineered and integrated with Red Hat OpenStack Platform, including Red Hat OpenStack Platform director for automated deployment, configuration and lifecycle management.

The key OpenDaylight project used in this solution is NetVirt, offering support for the OpenStack Neutron API on top of OVS. For telecommunication customers this support extends to OVS-DPDK implementations. Also available in technology preview, customers can leverage OpenDaylight with OVS hardware offload on capable network adapters to offload the virtual switch data path processing to the network card, further optimizing the server footprint.

 

2OpenStack_OpenDaylight-Product-Guide_437720_0217_ECE_Architecture

THREE: Cloud ready load balancing as a service

Load balancing is a fundamental service of any cloud. It is a key element essential for enabling automatic scaling and availability of applications hosted in the cloud, and is required for both “three tier” apps, as well as for emerging cloud native, microservices based, app architectures.

During the last few development cycles, the community has worked on a new load balancing as a service (LBaaS) solution based on the Octavia project. Octavia provides tenants with a load balancing API, as well as implements the delivery of load balancing services via a fleet of service virtual machine instances, which it spins up on demand. With Red Hat OpenStack Platform 13, customers can use the OpenStack Platform director to easily deploy and setup Octavia and expose it to the overcloud tenants, including setting up a pre-created, supported and secured Red Hat Enterprise Linux based service VM image.

OpenStack_Networking-Guide_450456_0617_ECE_LBaaSFigure 2. Octavia HTTPS traffic flow through to a pool member

FOUR: Integrated networking for OpenStack and OpenShift

OpenShift Container Platform, Red Hat’s enterprise distribution of Kubernetes optimized for continuous application development, is infrastructure independent. You can run it on public cloud, virtualization, OpenStack or anything that can boot Red Hat Enterprise Linux. But in order to run Kubernetes and application containers, you need control and flexibility at scale on the infrastructure level. Many of our customers are looking into OpenStack as a platform to expose VM and bare metal resources for OpenShift to provide Kubernetes clusters to different parts of the organization – nicely aligning with the strong multi-tenancy and isolation capabilities of OpenStack as well as its rich APIs.     

As a key contributor to both OpenStack and Kubernetes, Red Hat is shaping this powerful combination so that enterprises can not only deploy OpenShift on top of OpenStack, but also take advantage of the underlying infrastructure services exposed by OpenStack. A good example of this is through networking integration. Out of the box, OpenStack provides overlay networks managed by Neutron. However, OpenShift, based on Kubernetes and the Container Network Interface (CNI) project, also provides overlay networking between container pods. This results in two, unrelated, network virtualization stacks that run on top of each other and make the operational experience, as well as the overall performance of the solution, not optimal. With Red Hat OpenStack Platform 13, Neutron was enhanced so that it can serve as the networking layer for both OpenStack and OpenShift, allowing a single network solution to serve both container and non-container workloads. This is done through project Kuryr and kuryr-kubernetes, a CNI plugin that provides OpenStack networking to Kubernetes objects.

Customers will be able to take advantage of Kuryr with an upcoming Red Hat OpenShift Container Platform release, where we will also release openshift-ansible support for automated deployment of Kuryr components (kuryr-controller, kuryr-cni) on OpenShift Master and Worker nodes.   

Screen Shot 2018-07-12 at 3.13.30 pmFigure 3. OpenShift and OpenStack

FIVE: Deployment on top of routed networks

As data center network architectures evolve, we are seeing a shift away from L2-based network designs towards fully L3 routed fabrics in an effort to create more efficient, predictable, and scalable communication between end-points in the network. One such trend is the adoption of leaf/spine (Clos) network topology where the fabric is composed of leaf and spine network switches: the leaf layer consists of access switches that connect to devices like servers, and the spine layer is the backbone of the network. In this architecture, every leaf switch is interconnected with each and every spine switch using routed links. Dynamic routing is typically enabled throughout the fabric and allows the best path to be determined and adjusted automatically. Modern routing protocol implementations also offers Equal-Cost Multipathing (ECMP) for load sharing of traffic between all available links simultaneously.

Originally, Red Hat OpenStack Platform director was designed to use shared L2 networks between nodes. This significantly reduces the complexity required to deploy OpenStack, since DHCP and PXE booting are simply done over a shared broadcast domain. This also makes the network switch configuration straightforward, since typically there is only a need to configure VLANs and ports, but no need to enable routing between all switches. This design, however, is not compatible with L3 routed network solutions such as the leaf/spine network architecture described above.

With Red Hat OpenStack Platform 13, director can now deploy OpenStack on top of fully routed topologies, utilizing its composable network and roles architecture, as well as a DHCP relay to support provisioning across multiple subnets. This provides customers with the flexibility to deploy on top of L2 or L3 routed networks from a single tool.

OpenStack_NFV_Mobile_Networks_438707_0317_ECE_Figure12

Learn more

Learn more about Red Hat OpenStack Platform:


For more information on Red Hat OpenStack Platform and Red Hat Virtualization contact your local Red Hat office today!

by Nir Yechiel, Principal Product Manager, Red Hat at July 13, 2018 01:28 AM

July 11, 2018

RDO Blog

Community Blog Round-Up: 11 July

I know what you’re thinking – another blog round up SO SOON?!? Is it MY BIRTHDAY?!? Maybe! But it’s definitely OpenStack’s birthday this month – eight years old – and there are an absolute TON of blog posts as a result. Well, maybe not a ton, but definitely a lot to write about and therefore, there are a lot more community blog round ups. Expect more of the same as content allows! So, sit back and enjoy the latest RDO community blog round-up while you eat a piece of cake and wish a very happy birthday to OpenStack.

Virtualize your OpenStack control plane with Red Hat Virtualization and Red Hat OpenStack Platform 13 by Ramon Acedo Rodriguez, Product Manager, OpenStack

With the release of-Red Hat OpenStack Platform 13 (Queens)-we’ve added support to Red Hat OpenStack Platform director to deploy the overcloud controllers as virtual machines in a Red Hat Virtualization cluster. This allows you to have your controllers, along with other supporting services such as Red Hat Satellite, Red Hat CloudForms, Red Hat Ansible Tower, DNS servers, monitoring servers, and of course, the undercloud node (which hosts director), all within a Red Hat Virtualization cluster. This can reduce the physical server footprint of your architecture and provide an extra layer of availability.

Read more at https://redhatstackblog.redhat.com/2018/07/10/virtualize-your-openstack-control-plane-with-red-hat-virtualization-and-red-hat-openstack-platform-13/

Red Hat OpenStack Platform: Making innovation accessible for production by Maria Bracho, Principal Product Manager OpenStack

An OpenStack®️-based cloud environment can help you digitally transform to succeed in fast-paced, competitive markets. However, for many organizations, deploying open source software supported only by the community can be intimidating. Red Hat®️ OpenStack Platform combines community-powered innovation with enterprise-grade features and support to help your organization build a production-ready private cloud.

Read more at https://redhatstackblog.redhat.com/2018/07/09/red-hat-openstack-platform-making-innovation-accessible-for-production/

Converting policy.yaml to a list of dictionaries by Adam Young

The policy .yaml file generated from oslo is not very useful for anything other than feeding to oslo-policy to enforce. If you want to use these values for anything else, it would be much more useful to have each rule as a dictionary, and all of the rules in a list. Here is a little bit of awk to help out:

Read more at https://adam.younglogic.com/2018/07/policy-yaml-dictionary/

A Git Style change management for a Database driven app. by Adam Young

The Policy management tool I’m working on really needs revision and change management.- Since I’ve spent so much time with Git, it affects my thinking about change management things.- So, here is my attempt to lay out my current thinking for implementing a git-like scheme for managing policy rules.

Read more at https://adam.younglogic.com/2018/07/a-git-style-change-management-for-a-database-driven-app/

by Rain Leander at July 11, 2018 02:58 PM

July 10, 2018

Red Hat Stack

Virtualize your OpenStack control plane with Red Hat Virtualization and Red Hat OpenStack Platform 13

With the release of Red Hat OpenStack Platform 13 (Queens) we’ve added support to Red Hat OpenStack Platform director to deploy the overcloud controllers as virtual machines in a Red Hat Virtualization cluster. This allows you to have your controllers, along with other supporting services such as Red Hat Satellite, Red Hat CloudForms, Red Hat Ansible Tower, DNS servers, monitoring servers, and of course, the undercloud node (which hosts director), all within a Red Hat Virtualization cluster. This can reduce the physical server footprint of your architecture and provide an extra layer of availability.

Please note: this is not using Red Hat Virtualization as an OpenStack hypervisor (i.e. the compute service, which is already nicely done with nova via libvirt and KVM) nor is this about hosting the OpenStack control plane on OpenStack compute nodes.

Video courtesy: Rhys Oxenham, Manager, Field & Customer Engagement

Benefits of virtualization

Red Hat Virtualization (RHV) is an open, software-defined platform built on Red Hat Enterprise Linux and the Kernel-based Virtual Machine (KVM) featuring advanced management tools.  RHV gives you a stable foundation for your virtualized OpenStack control plane.

By virtualizing the control plane you gain instant benefits, such as:

  • Dynamic resource allocation to the virtualized controllers: scale up and scale down as required, including CPU and memory hot-add and hot-remove to prevent downtime and allow for increased capacity as the platform grows.
  • Native high availability for Red Hat OpenStack Platform director and the control plane nodes.
  • Additional infrastructure services can be deployed as VMs on the same RHV cluster, minimizing the server footprint in the datacenter and making an efficient use of the physical nodes.
  • Ability to define more complex OpenStack control planes based on composable roles. This capability allows operators to allocate resources to specific components of the control plane, for example, an operator may decide to split out networking services (Neutron) and allocate more resources to them as required. 
  • Maintenance without service interruption: RHV supports VM live migration, which can be used to relocate the OSP control plane VMs to a different hypervisor during their maintenance.
  • Integration with third party and/or custom tools engineered to work specifically with RHV, such as backup solutions.

Benefits of subscription

There are many ways to purchase Red Hat Virtualization, but many Red Hat OpenStack Platform customers already have it since it’s included in our most popular OpenStack subscription bundles, Red Hat Cloud Infrastructure and Red Hat Cloud Suite. If you have purchased OpenStack through either of these, you already own RHV subscriptions!

Logical Architecture

This is how the architecture looks when splitting the overcloud between Red Hat Virtualization for the control plane and utilizing bare metal for the tenants’ workloads via the compute nodes.

Screen Shot 2018-07-10 at 1.22.13 pm

Installation workflow

A typical installation workflow looks like this:

RHVOSP integration Blog post

Preparation of the Cluster/Host networks

In order to use multiple networks (referred to as “network isolation” in OpenStack deployments), each VLAN (Tenant, Internal, Storage, …) will be mapped to a separate logical network and allocated to the hosts’ physical nics. Full details are in the official documentation.

Preparation of the VMs

The Red Hat OpenStack Platform control plane usually consists of one director node and (at least) three controller nodes. When these VMs are created in RHV, the same requirements we have for these nodes on bare metal apply.

The director VM should have a minimum of 8 cores (or vCPUs), 16 GB of RAM and 100 GB of storage. More information can be found in the official documentation.

The controllers should have at least 32 GB of RAM and 16 vCPUs. While the same amount of resources are required for virtualized controllers, by using RHV we gain the ability to better optimize that resource consumption across our underlying hypervisors

Red Hat Virtualization Considerations

Red Hat Virtualization needs to be configured with some specific settings to host the VMs for the controllers:

Anti-affinity for the controller VMs

We want to ensure there is only one OpenStack controller per hypervisor so that in case of a hypervisor failure, the service level disruption minimalized to a single controller. This allows for HA to be taken care of using the different levels of high availability mechanisms already built in to the system. For this to work we use RHV to configure an affinity group with “soft negative affinity,” effectively giving us “anti-affinity!” Additionally it provides the flexibility to override this rule in case of system constraints.

VM network configuration

One vNIC per VLAN

In order to use multiple networks (referred to as “network isolation” in OpenStack deployments), each VLAN (Tenant, Internal, Storage, …) will be mapped to a separate virtual NIC (vNIC) in the controller VMs and VLAN “untagging” will be done at the hypervisor (cluster) and VM level.

Full details can be found in the official documentation.

Screen Shot 2018-07-10 at 11.35.44 (1)

Allow MAC Spoofing

For the virtualized controllers to allow the network traffic in and out correctly, the MAC spoofing filter must be disabled on the networks that are attached to the controller VMs. To do this we set no_filter in the vNIC of the director and controller VMs, then restart the VMs and disable the MAC anti-spoofing filter.

Important Note: If this is not done DHCP and PXE booting of the VMs from director won’t work.

Implementation in director

Red Hat OpenStack Platform director (TripleO’s downstream release) uses the Ironic Bare Metal provisioning component of OpenStack to deploy the OpenStack components on physical nodes. In order to add support for deploying the controllers on Red Hat Virtualization VMs, we enabled support in Ironic with a new driver named staging-ovirt.

This new driver manages the VMs hosted in RHV similar to how other drivers manage physical nodes using BMCs supported by Ironic, such as iRMC, iDrac or iLO. For RHV this is done by interacting with the RHV manager directly to trigger power management actions on the VMs.

Enabling the staging-ovirt driver in director

Director needs to enable support for the new driver in Ironic. This is done as you would do it for any other Ironic driver by simply specifying it in the undercloud.conf configuration file:

enabled_hardware_types = ipmi,redfish,ilo,idrac,staging-ovirt

After adding the new entry and running openstack undercloud install we can see the staging-ovirt driver listed in the output:

(undercloud) [stack@undercloud-0 ~]$ openstack baremetal driver list
+---------------------+-----------------------+
| Supported driver(s) | Active host(s)        |
+---------------------+-----------------------+
| idrac               | localhost.localdomain |
| ilo                 | localhost.localdomain |
| ipmi                | localhost.localdomain |
| pxe_drac            | localhost.localdomain |
| pxe_ilo             | localhost.localdomain |
| pxe_ipmitool        | localhost.localdomain |
| redfish             | localhost.localdomain |
| staging-ovirt       | localhost.localdomain |

Register the RHV-hosted VMs with director

When defining a RHV-hosted node in director’s instackenv.json file we simply set the power management type (pm_type) to the “staging-ovirt” driver, provide the relevant RHV manager host name, and include the username and password for the RHV account that can control power functions for the VMs.

{
    "nodes": [
        {
            "name":"osp13-controller-1",
            "pm_type":"staging-ovirt",
            "mac":[
                "00:1a:4a:16:01:39"
            ],
            "cpu":"2",
            "memory":"4096",
            "disk":"40",
            "arch":"x86_64",
            "pm_user":"admin@internal",
            "pm_password":"secretpassword",
            "pm_addr":"rhvm.lab.redhat.com",
            "pm_vm_name":"osp13-controller-1",
            "capabilities": "profile:control,boot_option:local"
        },
        {
            "name":"osp13-controller-2",
            "pm_type":"staging-ovirt",
            "mac":[
                "00:1a:4a:16:01:3a"
            ],
            "cpu":"2",
            "memory":"4096",
            "disk":"40",
            "arch":"x86_64",
            "pm_user":"admin@internal",
            "pm_password":"secretpassword",
            "pm_addr":"rhvm.lab.redhat.com",
            "pm_vm_name":"osp13-controller-2",
            "capabilities": "profile:control,boot_option:local"
        },
        {
            "name":"osp13-controller-3",
            "pm_type":"staging-ovirt",
            "mac":[
                "00:1a:4a:16:01:3b"
            ],
            "cpu":"2",
            "memory":"4096",
            "disk":"40",
            "arch":"x86_64",
            "pm_user":"admin@internal",
            "pm_password":"secretpassword",
            "pm_addr":"rhvm.lab.redhat.com",
            "pm_vm_name":"osp13-controller-3",
            "capabilities": "profile:control,boot_option:local"
        }
    ]
}

A summary of the relevant parameters required for RHV are as follows:

  • pm_user: RHV-M username.
  • pm_password: RHV-M password.
  • pm_addr: hostname or IP of the RHV-M server.
  • pm_vm_name: Name of the virtual machine in RHV-M where the controller will be created.

For more information on Red Hat OpenStack Platform and Red Hat Virtualization contact your local Red Hat office today!

by Ramon Acedo Rodriguez, Product Manager, OpenStack at July 10, 2018 08:31 PM

July 09, 2018

Red Hat Stack

Red Hat OpenStack Platform: Making innovation accessible for production

An OpenStack®️-based cloud environment can help you digitally transform to succeed in fast-paced, competitive markets. However, for many organizations, deploying open source software supported only by the community can be intimidating. Red Hat®️ OpenStack Platform combines community-powered innovation with enterprise-grade features and support to help your organization build a production-ready private cloud.

Through an open source development model, community leadership, and production-grade life-cycle options, Red Hat makes open source software more accessible for production use across industries and organizations of any size and type.

omar-albeik-589641-unsplashPhoto by Omar Albeik on Unsplash

Open source development model

In order for open source technologies to be effective in production, they must provide stability and performance while also delivering the latest features and advances. Our open source development model combines fast-paced, cross-industry community innovation with production-grade hardening, integrations, support, and services. We take an upstream-first approach by contributing all developments back to the upstream community. This makes new features immediately available and helps to drive the interoperability of Red Hat products with upstream releases. Based on community OpenStack releases, Red Hat OpenStack Platform is intensively tested and hardened to meet the rigors of production environments. Ongoing patching, bug fixes, and certification keep your environment up and running.

Community leadership

We know that open source technologies can be of the highest quality and work with communities to deliver robust code. Red Hat is the top code contributor to the OpenStack community. We are responsible for 28% of the code in the Queens release and 18% of the code across all releases. We collaborate with our customers, partners, and industry organizations to identify the features they need to be successful. We then work to add that functionality into OpenStack. Over time, these efforts have resulted in enhancements in OpenStack’s availability, manageability, and performance, as well as industry-specific additions like OpenDaylight support for telecommunications.

Production-grade life-cycle options

The OpenStack community delivers new releases every six months, which can be challenging for many organizations looking to deploy OpenStack-based production environments. We provide stable branch releases of OpenStack that are supported for an enterprise production life cycle—beyond the six-month release cycle of the OpenStack community. With Red Hat OpenStack Platform, we give you two life-cycle options that let you choose when to upgrade and add new features to your cloud environment.

  • Standard release cadence. Upgrade every six to twelve months between standard releases to stay aligned with the latest features as they become available. Standard releases include one year of support.
  • Long-life release cadence. Standardize on long-life releases for up to five years. Long-life releases include three years of support, with the option to extend support for an additional two years with extended life-cycle support (ELS), for up to five years of support total. All new features are included with each long-life release.

Red Hat OpenStack Platform director—an integrated deployment and life-cycle management tool—streamlines upgrades between standard releases. And, the new fast forward upgrade feature in director lets you easily transition between long-life releases, without the need to upgrade to each in-between release. So, if you are currently using Red Hat OpenStack Platform 10, you now have an easy upgrade path to Red Hat OpenStack Platform 13—with fewer interruptions, no need for additional hardware, and simpler implementation of containerized OpenStack services.

Fast forward upgrade diagram v1

Learn more

Red Hat OpenStack Platform can help you overcome the challenges of deploying OpenStack into production use. And, if you aren’t sure about how to build your cloud environment, don’t have the time or resources to do so, or just want some help on your cloud journey, we provide a variety of expert services and training.

Learn more about Red Hat OpenStack Platform:

by Maria Bracho, Principal Product Manager OpenStack at July 09, 2018 08:19 PM

July 08, 2018

Adam Young

Converting policy.yaml to a list of dictionaries

The policy .yaml file generated from oslo has the following format:

# Intended scope(s): system
#"identity:update_endpoint_group": "rule:admin_required"

# Delete endpoint group.
# DELETE /v3/OS-EP-FILTER/endpoint_groups/{endpoint_group_id}
# Intended scope(s): system
#"identity:delete_endpoint_group": "rule:admin_required"

This is not very useful for anything other than feeding to oslo-policy to enforce. If you want to use these values for anything else, it would be much more useful to have each rule as a dictionary, and all of the rules in a list. Here is a little bit of awk to help out:

#!/usr/bin/awk -f
BEGIN {apilines=0; print("---")}
/#"/ {
    if (api == 1){
	printf("  ")
    }else{
	printf("- ")
    }
  split ($0,array,"\"")
  print ("rule:", array[2]);
  print ("  check:", array[4]);
  rule=0
}    
/# / {api=1;}
/^$/ {api=0; apilines=0;}
api == 1 && apilines == 0 {print ("- description:" substr($0,2))}
/# GET/  || /# DELETE/ || /# PUT/ || /# POST/ || /# HEAD/ || /# PATCH/ {
     print ("  " $2 ": " $3)
}
api == 1 { apilines = apilines +1 }

I have it saved in mungepolicy.awk. I ran it like this:

cat etc/keystone.policy.yaml.sample | ./mungepolicy.awk > /tmp/keystone.access.yaml

And the output looks like this:

---
- rule: admin_required
  check: role:admin or is_admin:1
- rule: service_role
  check: role:service
- rule: service_or_admin
  check: rule:admin_required or rule:service_role
- rule: owner
  check: user_id:%(user_id)s
- rule: admin_or_owner
  check: rule:admin_required or rule:owner
- rule: token_subject
  check: user_id:%(target.token.user_id)s
- rule: admin_or_token_subject
  check: rule:admin_required or rule:token_subject
- rule: service_admin_or_token_subject
  check: rule:service_or_admin or rule:token_subject
- description: Show application credential details.
  GET: /v3/users/{user_id}/application_credentials/{application_credential_id}
  HEAD: /v3/users/{user_id}/application_credentials/{application_credential_id}
  rule: identity:get_application_credential
  check: rule:admin_or_owner
- description: List application credentials for a user.
  GET: /v3/users/{user_id}/application_credentials
  HEAD: /v3/users/{user_id}/application_credentials
  rule: identity:list_application_credentials
  check: rule:admin_or_owner

Which is valid yaml. It might be a pain to deal with the verbs in separate keys. Ideally, that would be a list, too, but this will work for starters.

by Adam Young at July 08, 2018 03:38 AM

July 06, 2018

Adam Young

A Git Style change management for a Database driven app.

The Policy management tool I’m working on really needs revision and change management.  Since I’ve spent so much time with Git, it affects my thinking about change management things.  So, here is my attempt to lay out my current thinking for implementing a git-like scheme for managing policy rules.

A policy line is composed of two chunks of data.  A Key and a Value.  The keys are in the form

  identity:create_user.

Additionally, the keys are scoped to a specific service (Keystone, Nova, etc).

The value is the check string.  These are of the form

role:admin and project_id=target.project_id

It is the check string that is most important to revision control. This lends itself to an entity diagram like this:

Whether each of these gets its own table remains to be seen.  The interesting part is the rule_name to policy_rule mapping.

Lets state that the policy_rule table entries are immutable.  If we want to change policy, we add a new entry, and leave the old ones in there.  The new entry will have a new revision value.  For now, lets assume revisions are integers and are monotonically increasing.  So, when I first upload the Keystone policy.json file, each entry gets a revision ID of 1.  In this example, all check_strings start off as are “is_admin:True”

Now lets assume I modify the identity:create_user rule.  I’m going to arbitrarily say that the id for this record is 68.  I want to Change it to:

role:admin and domain_id:target.domain_id

So we can do some scope checking.  This entry goes into the policy_rule table like so:

 

rule_name_id check_string revision
68 is_admin:True 1
68 role:admin and domain_id:target.domain_id 2

From a storage perspective this is quite nice, but from a “what does my final policy look like” perspective it is a mess.

In order to build the new view, we need sql along the lines of

select * from policy_rule where revision = ?

Lets call this line_query and assume that when we call it, the parameter is substituted for the question mark.  We would then need code like this pseudo-code:

doc = dict()
for revision in 1 to max:
    for result in line_query.execute(revision):
        index = result['rule_name_id']
        doc[index] = result.check_string()

 

This would build a dictionary layer by layer through all the revisions.

So far so good, but what happens if we decided to revert, and then to go a different direction? Right now, we have a revision chain like this:

And if we keep going, we have,

But what happens if 4 was a mistake? We need to revert to 6 and create a new new branch.

We have two choices. First, we could be destructive and delete all of the lines in revision 4, 5, and 6. This means we can never recreate the state of 6 again.

What if we don’t know that 4 is a mistake? What if we just want to try another route, but come back to 4,5, and 6 in the future?

We want this:

 

But how will we know to take the branch when we create the new doc?

Its a database! We put it in another table.

revision_id revision_parent_id
2 1
3 2
4 3
5 4
6 5
7 3
8 7
9 8

In order to recreate revision 9, we use a stack. Push 9 on the stack, then find the row with revision_id 9 in the table, push the revision_parent_id on the stack, and continue until there are no more rows.  Then, pop each revision_id off the stack and execute the same kind of pseudo code I posted above.

It is a lot.  It is kind of complicated, but it is the type of complicated that Python does well.  However, database do not do this kind of iterative querying well.  It would take a stored procedure to perform this via a single database query.

Talking through this has encouraged me decide to take another look at using git as the backing store instead of a relational database.

by Adam Young at July 06, 2018 07:38 PM

July 04, 2018

RDO Blog

Community Blog Round-Up: 04 July

So much happened over the past month that it’s definitely time to set off the fireworks! To start, Steve Hardy shares his tips and tricks for TripleO containerized deployments, then Zane Bitter talks discusses the ever expanding OpenStack Foundation, while Maria Bracho introduces us to Red Hat OpenStack Platform’s fast forward upgrades in a step-by-step overview, and so very much more. Obviously, prep the barbecue, it’s time for the fourth of July community blog round-up!

Red Hat OpenStack Platform: Two life-cycle choices to fit your organization by Maria Bracho, Principal Product Manager OpenStack

OpenStack®️ is a powerful platform for building private cloud environments that support modern, digital business operations. However, the OpenStack community’s six-month release cadence can pose challenges for enterprise organizations that want to deploy OpenStack in production. Red Hat can help.

Read more at https://redhatstackblog.redhat.com/2018/07/02/red-hat-openstack-platform-two-life-cycle-choices-to-fit-your-organization/

CPU model configuration for QEMU/KVM on x86 hosts by Daniel Berrange

With the various CPU hardware vulnerabilities reported this year, guest CPU configuration is now a security critical task. This blog post contains content I’ve written that is on its way to become part of the QEMU documentation.

Read more at https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/

Requirements for an OpenStack Access Control Policy Management Tool by Adam Young

“We need a read only role.”

Read more at https://adam.younglogic.com/2018/06/requirements-for-an-openstack-access-control-policy-management-tool/

Red Hat OpenStack Platform 13 is here! by Rosa Guntrip

Accelerate. Innovate. Empower. In the digital economy, IT organizations can be expected to deliver services anytime, anywhere, and to any device. IT speed, agility, and innovation can be critical to help stay ahead of your competition. Red Hat OpenStack Platform lets you build an on-premise cloud environment designed to accelerate your business, innovate faster, and empower your IT teams.

Read more at https://redhatstackblog.redhat.com/2018/06/27/red-hat-openstack-platform-13-is-here/

Red Hat Certified Cloud Architect – An OpenStack Perspective – Part Two by Chris Janiszewski – Senior OpenStack Solutions Architect – Red Hat Tiger Team

Previously we learned about what the Red Hat Certified Architect certification is and what exams are included in the “OpenStack-focused” version of the certification. This week we want to focus on personal experience and benefits from achieving this milestone.

Read more at https://redhatstackblog.redhat.com/2018/06/24/red-hat-certified-cloud-architect-an-openstack-perspective-part-two/

Red Hat OpenStack Platform fast forward upgrades: A step-by-step overview by Maria Bracho, Principal Product Manager OpenStack

New in Red Hat®️ OpenStack®️ Platform 13, the fast forward upgrade feature lets you easily move between long-life releases, without the need to upgrade to each in-between release. Fast forward upgrades fully containerize Red Hat OpenStack Platform deployment to simplify and speed the upgrade process while reducing interruptions and eliminating the need for additional hardware. Today, we’ll take a look at what the fast forward upgrade process from Red Hat OpenStack Platform 10 to Red Hat OpenStack Platform 13 looks like in practice.

Read more at https://redhatstackblog.redhat.com/2018/06/22/red-hat-openstack-platform-fast-forward-upgrades-a-step-by-step-overview/

Red Hat Certified Cloud Architect – An OpenStack Perspective – Part One by Chris Janiszewski – Senior OpenStack Solutions Architect – Red Hat Tiger Team

The Red Hat Certified Architect (RHCA) is the highest certification provided by Red Hat. To many, it can be looked at as a “holy grail” of sorts in open source software certifications. It’s not easy to get. In order to receive it, you not only need to already be a Red Hat Certified Engineer -(RHCE) for Red Hat Enterprise Linux (with the Red Hat Certified System Administrator, (RHCSA) as pre-requisite) but also pass additional exams from various technology categories.—

Read more at https://redhatstackblog.redhat.com/2018/06/21/red-hat-certified-cloud-architect-an-openstack-perspective-part-one/

Tips on searching ceph-install-workflow.log on TripleO by John

  1. Only look at the logs relevant to the last run

Read more at http://blog.johnlikesopenstack.com/2018/06/tips-on-searching-ceph-install.html

TripleO Ceph Integration on the Road in June by John

The first week of June I went to an upstream TripleO workshop in Brno. The labs we used are at https://github.com/redhat-openstack/tripleo-workshop

Read more at http://blog.johnlikesopenstack.com/2018/06/tripleo-ceph-integration-on-road-in-june.html

The Expanding OpenStack Foundation by Zane Bitter

The OpenStack Foundation has begun the process of becoming an umbrella organisation for open source projects adjacent to but outside of OpenStack itself. However, there is no clear roadmap for the transformation, which has resulted in some confusion. After attending the joint leadership meeting with the Foundation Board of Directors and various Forum sessions that included some members of the board at the (2018) OpenStack Summit in Vancouver, I believe I can help shed some light on the situation. (Of course this is my subjective take on the topic, and I am not speaking for the Technical Committee.)–

Read more at https://www.zerobanana.com/archive/2018/06/14#osf-expansion

Configuring a static address for wlan0 on Raspbian Stretch by Lars Kellogg-Stedman

Recent releases of Raspbian have adopted the use of dhcpcd to manage both dynamic and static interface configuration. If you would prefer to use the traditional /etc/network/interfaces mechanism instead, follow these steps.

Read more at https://blog.oddbit.com/2018/06/14/configuring-a-static-address-f/

Configuring collectd plugins with TripleO by mrunge

A way of deploying OpenStack is to use TripleO. This takes the an approach to deploy a small OpenStack environment, and then to take OpenStack provided infrastructure and tools to deploy the actual production environment.

Read more at http://www.matthias-runge.de/2018/06/08/tripleo-collectd/

TripleO Containerized deployments, debugging basics by Steve Hardy

Since the Pike release, TripleO has supported deployments with OpenStack services running in containers.- Currently we use docker to run images based on those maintained by the Kolla project.We already have some tips and tricks for container deployment debugging in tripleo-docs, but below are some more notes on my typical debug workflows.

Read more at https://hardysteven.blogspot.com/2018/06/tripleo-containerized-deployments.html

by Rain Leander at July 04, 2018 02:00 PM

July 02, 2018

Red Hat Stack

Red Hat OpenStack Platform: Two life-cycle choices to fit your organization

OpenStack®️ is a powerful platform for building private cloud environments that support modern, digital business operations. However, the OpenStack community’s six-month release cadence can pose challenges for enterprise organizations that want to deploy OpenStack in production. Red Hat can help.

elizabeth-lies-20237-unsplashPhoto by elizabeth lies on Unsplash

Red Hat®️ OpenStack Platform is an intensely tested, hardened, and supported distribution of OpenStack based on community releases. In addition to production-grade features and functionality, it gives you two life-cycle choices to align with the way your organization operates:

  • Standard releases. These releases follow the six-month community release cadence and include one year of support.
  • Long-life releases. Starting with Red Hat OpenStack Platform 10, every third release is a long-life release. These include three years of support, with option to extend support for an additional two years with extended life-cycle support (ELS), for up to five years of support total.

Why does this matter? Different organizations have different needs when it comes to infrastructure life cycles and management. Some need to implement the latest innovations as soon as they are available, and have the processes in place to continuously upgrade and adapt their IT environment. For others, the ability to standardize and stabilize operations for long durations of time is paramount. These organizations may not need the newest features right away—periodic updates are fine.

tristan-colangelo-39719-unsplashPhoto by Tristan Colangelo on Unsplash

Red Hat OpenStack Platform life-cycle options accommodate both of these approaches. Organizations that need constant innovation can upgrade to the latest Red Hat OpenStack Platform release every six months to take advantage of new features as they become available. Organizations that prefer to use a given release for a longer time can skip standard releases and simply upgrade between long-life releases every 18 to 60 months.

Here’s a deeper look into each option and why you might choose one over the other.

Standard upgrade path

With this approach, you upgrade every six to twelve months as a new release of Red Hat OpenStack Platform is made available. Red Hat OpenStack Platform director provides upgrade tooling to simplify the upgrade process. As a result, you can adopt the latest features and innovations as soon as possible. This keeps your cloud infrastructure aligned closely with the upstream community releases, so if you’re active in the OpenStack community, you’ll be able to take advantage of your contributions sooner.

This upgrade path typically requires organizations to have processes in place to efficiently manage continuously changing infrastructure. If you have mature, programmatic build and test processes, you’re in good shape.

The standard upgrade path is ideal for organizations involved in science and research, financial services, and other fields that innovate fast and change quickly.

jordan-ladikos-62738-unsplashPhoto by Jordan Ladikos on Unsplash 

 

Long-life upgrade path

With this approach, you upgrade every 18 to 60 months between long-life releases of Red Hat OpenStack Platform, skipping two standard releases at a time. Starting with Red Hat OpenStack Platform 13, the fast forward upgrade feature in director simplifies the upgrade process by fully containerizing Red Hat OpenStack Platform deployment. This minimizes interruptions due to upgrading and eliminates the need for additional hardware to support the upgrade process. As a result, you can use a long-life release, like Red Hat OpenStack Platform 10 or 13, for an extended time to stabilize operations. Based on customer requests and feasibility reviews, select features in later standard releases may be backported to the last long-life release (Full Support phase only), so you can still gain access to some new features between upgrades.

The long-life upgrade path works well for organizations that are more familiar and comfortable with traditional virtualization and may still be adopting a programmatic approach to IT operations.

This path is ideal for organizations that prefer to standardize on infrastructure and don’t necessarily need access to the latest features right away. Organizations involved in telecommunications and other regulated fields often choose the long-life upgrade path.

Wrapping up

With two life-cycle options for Red Hat OpenStack Platform, Red Hat supports you no matter where you are in your cloud journey. If you have questions about which path is best for your organization, contact us and we’ll help you get started.

Learn more about Red Hat OpenStack Platform:

by Maria Bracho, Principal Product Manager OpenStack at July 02, 2018 12:36 PM

June 29, 2018

Daniel Berrange

CPU model configuration for QEMU/KVM on x86 hosts

With the various CPU hardware vulnerabilities reported this year, guest CPU configuration is now a security critical task. This blog post contains content I’ve written that is on its way to become part of the QEMU documentation.

QEMU / KVM virtualization supports two ways to configure CPU models

Host passthrough
This passes the host CPU model features, model, stepping, exactly to the guest. Note that KVM may filter out some host CPU model features if they cannot be supported with virtualization. Live migration is unsafe when this mode is used as libvirt / QEMU cannot guarantee a stable CPU is exposed to the guest across hosts. This is the recommended CPU to use, provided live migration is not required.
Named model
QEMU comes with a number of predefined named CPU models, that typically refer to specific generations of hardware released by Intel and AMD. These allow the guest VMs to have a degree of isolation from the host CPU, allowing greater flexibility in live migrating between hosts with differing hardware.

In both cases, it is possible to optionally add or remove individual CPU features, to alter what is presented to the guest by default.

Libvirt supports a third way to configure CPU models known as “Host model”. This uses the QEMU “Named model” feature, automatically picking a CPU model that is similar the host CPU, and then adding extra features to approximate the host model as closely as possible. This does not guarantee the CPU family, stepping, etc will precisely match the host CPU, as they would with “Host passthrough”, but gives much of the benefit of passthrough, while making live migration safe.

Recommendations for KVM CPU model configuration on x86 hosts

The information that follows provides recommendations for configuring CPU models on x86 hosts. The goals are to maximise performance, while protecting guest OS against various CPU hardware flaws, and optionally enabling live migration between hosts with hetergeneous CPU models.

Preferred CPU models for Intel x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

Skylake-Server
Skylake-Server-IBRS
Intel Xeon Processor (Skylake, 2016)
Skylake-Client
Skylake-Client-IBRS
Intel Core Processor (Skylake, 2015)
Broadwell
Broadwell-IBRS
Broadwell-noTSX
Broadwell-noTSX-IBRS
Intel Core Processor (Broadwell, 2014)
Haswell
Haswell-IBRS
Haswell-noTSX
Haswell-noTSX-IBRS
Intel Core Processor (Haswell, 2013)
IvyBridge
IvyBridge-IBRS
Intel Xeon E3-12xx v2 (Ivy Bridge, 2012)
SandyBridge
SandyBridge-IBRS
Intel Xeon E312xx (Sandy Bridge, 2011)
Westmere
Westmere-IBRS
Westmere E56xx/L56xx/X56xx (Nehalem-C, 2010)
Nehalem
Nehalem-IBRS
Intel Core i7 9xx (Nehalem Class Core i7, 2008)
Penryn
Intel Core 2 Duo P9xxx (Penryn Class Core 2, 2007)
Conroe
Intel Celeron_4x0 (Conroe/Merom Class Core 2, 2006)

Important CPU features for Intel x86 hosts

The following are important CPU features that should be used on Intel x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

pcid
Recommended to mitigate the cost of the Meltdown (CVE-2017-5754) fix. Included by default in Haswell, Broadwell & Skylake Intel CPU models. Should be explicitly turned on for Westmere, SandyBridge, and IvyBridge Intel CPU models. Note that some desktop/mobile Westmere CPUs cannot support this feature.
spec-ctrl
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in Intel CPU models with -IBRS suffix. Must be explicitly turned on for Intel CPU models without -IBRS suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any Intel CPU model. Must be explicitly turned on for all Intel CPU models. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages.Not included by default in any Intel CPU model. Should be explicitly turned on for all Intel CPU models. Note that not all CPU hardware will support this feature.

Preferred CPU models for AMD x86 hosts

The following CPU models are preferred for use on Intel hosts. Administrators / applications are recommended to use the CPU model that matches the generation of the host CPUs in use. In a deployment with a mixture of host CPU models between machines, if live migration compatibility is required, use the newest CPU model that is compatible across all desired hosts.

EPYC
EPYC-IBPB
AMD EPYC Processor (2017)
Opteron_G5
AMD Opteron 63xx class CPU (2012)
Opteron_G4
AMD Opteron 62xx class CPU (2011)
Opteron_G3
AMD Opteron 23xx (Gen 3 Class Opteron, 2009)
Opteron_G2
AMD Opteron 22xx (Gen 2 Class Opteron, 2006)
Opteron_G1
AMD Opteron 240 (Gen 1 Class Opteron, 2004)

Important CPU features for AMD x86 hosts

The following are important CPU features that should be used on AMD x86 hosts, when available in the host CPU. Some of them require explicit configuration to enable, as they are not included by default in some, or all, of the named CPU models listed above. In general all of these features are included if using “Host passthrough” or “Host model”.

ibpb
Required to enable the Spectre (CVE-2017-5753 and CVE-2017-5715) fix, in cases where retpolines are not sufficient. Included by default in AMD CPU models with -IBPB suffix. Must be explicitly turned on for AMD CPU models without -IBPB suffix. Requires the host CPU microcode to support this feature before it can be used for guest CPUs.
virt-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This should be provided to guests, even if amd-ssbd is also provided, for maximum guest compatibility. Note for some QEMU / libvirt versions, this must be force enabled when when using “Host model”, because this is a virtual feature that doesn’t exist in the physical host CPUs.
amd-ssbd
Required to enable the CVE-2018-3639 fix. Not included by default in any AMD CPU model. Must be explicitly turned on for all AMD CPU models. This provides higher performance than virt-ssbd so should be exposed to guests whenever available in the host. virt-ssbd should none the less also be exposed for maximum guest compatability as some kernels only know about virt-ssbd.
amd-no-ssb
Recommended to indicate the host is not vulnerable CVE-2018-3639. Not included by default in any AMD CPU model. Future hardware genarations of CPU will not be vulnerable to CVE-2018-3639, and thus the guest should be told not to enable its mitigations, by exposing amd-no-ssb. This is mutually exclusive with virt-ssbd and amd-ssbd.
pdpe1gb
Recommended to allow guest OS to use 1GB size pages. Not included by default in any AMD CPU model. Should be explicitly turned on for all AMD CPU models. Note that not all CPU hardware will support this feature.

Default x86 CPU models

The default QEMU CPU models are designed such that they can run on all hosts. If an application does not wish to do perform any host compatibility checks before launching guests, the default is guaranteed to work.

The default CPU models will, however, leave the guest OS vulnerable to various CPU hardware flaws, so their use is strongly discouraged. Applications should follow the earlier guidance to setup a better CPU configuration, with host passthrough recommended if live migration is not needed.

qemu32
qemu64
QEMU Virtual CPU version 2.5+ (32 & 64 bit variants). qemu64 is used for x86_64 guests and qemu32 is used for i686 guests, when no -cpu argument is given to QEMU, or no <cpu> is provided in libvirt XML.

Other non-recommended x86 CPUs

The following CPUs models are compatible with most AMD and Intel x86 hosts, but their usage is discouraged, as they expose a very limited featureset, which prevents guests having optimal performance.

kvm32
kvm64
Common KVM processor (32 & 64 bit variants). Legacy models just for historical compatibility with ancient QEMU versions.
486
athlon
phenom
coreduo
core2duo
n270
pentium
pentium2
pentium3
Various very old x86 CPU models, mostly predating the introduction of hardware assisted virtualization, that should thus not be required for running virtual machines.

Syntax for configuring CPU models

The example below illustrate the approach to configuring the various CPU models / features in QEMU and libvirt

QEMU command line

Host passthrough
   $ qemu-system-x86_64 -cpu host

With feature customization:

   $ qemu-system-x86_64 -cpu host,-vmx,...
Named CPU models
   $ qemu-system-x86_64 -cpu Westmere

With feature customization:

   $ qemu-system-x86_64 -cpu Westmere,+pcid,...

Libvirt guest XML

Host passthrough
   <cpu mode='host-passthrough'/>

With feature customization:

   <cpu mode='host-passthrough'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Host model
   <cpu mode='host-model'/>

With feature customization:

   <cpu mode='host-model'>
       <feature name="vmx" policy="disable"/>
       ...
   </cpu>
Named model
   <cpu mode='custom'>
       <model>Westmere</model>
   </cpu>

With feature customization:

   <cpu mode='custom'>
       <model>Westmere</model>
       <feature name="pcid" policy="require"/>
       ...
   </cpu>

 

by Daniel Berrange at June 29, 2018 12:49 PM

June 28, 2018

Adam Young

Requirements for an OpenStack Access Control Policy Management Tool

“We need a read only role.”

It seems like such a simple requirement.  Users have been requesting a read-only role for several years now.  Why is it so tough to implement?   Because it calls for  modifying access control policy across multiple, disjoint services deployed at innumerable distinct locations.

“We need help in modifying policy to implement our own read only role.”

This one is a little bit more attainable.  We should be able to provide better tools to help people customize their policy.  What should that look like?

We gathered some information at the last summit, and I am going to try and distill it to a requirements document here.

Definitions

  • Verb and Path:  the combination of the HTTP verb and the templated sub path that is used by the mapping engines.  If I were to use Curl to call https://hostname:5000/v3/users/a0123ab6, the verb would be the implicit GET, and the path would be /v3/users/{user_id}.
  • policy key:  the key in the policy.json and policy.yaml file that is used to match the python code to the policy.  For example, the Keystone GET /v3/user/{user_id} verb and path tests against the policy key identity:get_user.
  • API Policy Mapping:  the mapping from Verb and Path to Policy key.

The tool needs to be run from the installer. While that means Tripleo for my team, it should be a tool that can be enlisted into any of the installers.  It should also be able to run for day 2 operations from numerous tools.

It should not be deployed as a standard service, at least not one tied in with the active OpenStack install, as modifying policy is a tricky and potentially destructive and dangerous operation.

Input

Policy files need to be gathered from the various services, but this tool does not need to do that; the variations in how to generate, collect, and distribute policy files are too numerous to solve in a single, focused tool.  The collection and distribution fits more into Ansible playbooks than a tool for modifying policy.

External API definitions

End users need to be able to test their policy.  While the existing oslo-policy command line can tell whether a token would or would not pass the checks, those are done at the policy key level.  All integration is done at the URL level, even if it then passes through libraries or the CLI.  The Verb and URL  can be retrieved from network tools or debug mode of the CLI, and matched against the tuple of (service,verb,template path) to link back to the policy key, and the thus the policy rule that oslo-policy will enforce.  Deducing this mapping must be easy.  With this mapping, additional tools can mock a request/response to test whether a given set of auth-data would pass or fail a request.  Thus, the tool should accept a simple format for uploading the mappings of Verb and Path to policy key.

Policy.json

Policy files have several implementations.  The old Policy.json structure provides the least amount of information. Here is a sample:

"context_is_admin": "role:admin",
"default": "role:admin",

"add_image": "",
"delete_image": "",
"get_image": "",
"get_images": "",
"modify_image": "",
"publicize_image": "role:admin",
"copy_from": "",

policy.yaml

The policy in code structure provides the most, including the HTTP Verbs and templated Paths that map to the rules that are the keys in the policy files. The Python code that is used by oslo-policy to generate the sample YAML files uses, but does not expose, all that data.  Here is an example:

# This policy only checks if the user has access to the requested
# project limits. And this check is performed only after the check
# os_compute_api:limits passes
# GET /limits
# "os_compute_api:os-used-limits": "rule:admin_api"

A secondary tool should expose all this data as YAML , probably a modification of the oslo-policy CLI.  The management tool should be able to consume this format.  It should also be able to consume a document that maps the policy keys  to the Verb and Path separate from the policy

Upgrades

A new version of an OpenStack service will likely have new APIs.  These APIs will not be covered by existing policy.  However, if a site has made major efforts into customizing policy in the past, they will not want to lose and redo all of their changes.  Thus, it should be possible to upload a new file indicating the over all or just changes to the API mapping from a previous version.  If an updated policy-in-code format is available, that file should merge in with the existing policy modifications.  The user needs to be able to identify

  • Any new APIs that require application of the transformations listed below
  • Any changes to base policy that the user has customized and now conflict with the assumptions.  The tool user should be able to accept the old version, the new version, or come up with a modified new, manually merged version.

Transformations

End users need to be able to describe the transformations that then need to perform in simple terms.  Here are some that have been identified so far:

  • ensure that all APIs match against some role
  • ensure that APIs that require an role (especially admin) also perform a scope check
  • switch the role used for a given operation or set of operations
  • standardize the meaning of interim rules such as “owner.”
  • Inline an interim rule into the rules that use it
  • Extract an interim rule from all the rules that have a common fragment

Implied Roles

The Implied Roles mechanism provides support for policy,  The tool should be able to help the tool users to take advantage of implied roles.

  • Make use of implied roles to simplify complex matching rules
  • Make use of implied roles to provide additional granularity for an API:
  • Make it possible to expand implied rules in the policy file based on a data model

Change sets

The operations to transform the rules are complex enough that users will need to be able to role them forward and back, much like a set of changes to a git repository.

User Interface

While the tool should be visible, the majority of the business logic should reside in an API that is callable from other systems.  This seems to imply a pattern of REST API + A visible UI toolkit.

The User Interface should make working with large sets of rules possible and convenient.  Appropriate information hiding and selection should be coupled with the transformations to select the set of rules to be transformed.

Datastore

The data store for the application should be light enough to run during the install process. For example, SQLite would be preferred over MySQL.

Output

The tool should be able to produce the individual policy files consumed by the APIs.

It is possible to have a deployment where different policy is in place for different endpoints of the same service.  The tools should support endpoint specific overrides.  However, the main assumption is that these will be small changes from the core service definitions.  As such, they should be treated as “service X plus these changes” as opposed to a completely separate set of policy rules.

 

by Adam Young at June 28, 2018 06:57 PM

Red Hat Stack

Red Hat OpenStack Platform 13 is here!

Accelerate. Innovate. Empower.

In the digital economy, IT organizations can be expected to deliver services anytime, anywhere, and to any device. IT speed, agility, and innovation can be critical to help stay ahead of your competition. Red Hat OpenStack Platform lets you build an on-premise cloud environment designed to accelerate your business, innovate faster, and empower your IT teams.

Logotype_RH_OpenStackPlatform_RGB_Black (2)

Accelerate. Red Hat OpenStack Platform can help you accelerate IT activities and speed time to market for new products and services. Red Hat OpenStack Platform helps simplify application and service delivery using an automated self-service IT operating model, so you can provide users with more rapid access to resources. Using Red Hat OpenStack Platform, you can build an on-premises cloud architecture that can provide resource elasticity, scalability, and increased efficiency to launch new offerings faster.

Innovate. Red Hat OpenStack Platform enables you differentiate your business by helping to make new technologies more accessible without sacrificing current assets and operations. Red Hat’s open source development model combines faster-paced, cross-industry community innovation with production-grade hardening, integrations, support, and services. Red Hat OpenStack Platform is designed to provide an open and flexible cloud infrastructure ready for modern, containerized application operations while still supporting the traditional workloads your business relies on.

Empower. Red Hat OpenStack Platform helps your IT organization deliver new services with greater ease. Integrations with Red Hat’s open software stack let you build a more flexible and extensible foundation for modernization and digital operations. A large partner ecosystem helps you customize your environment with third-party products, with greater confidence that they will be interoperable and stable.

With Red Hat OpenStack Platform 13, Red Hat continues to bring together community-powered innovation with the stability, support, and services needed for production deployment. Red Hat OpenStack Platform 13 is a long-life release with up to three years of standard support and an additional, optional two years of extended life-cycle support (ELS). This release includes many features to help you adopt cloud technologies more easily and support digital transformation initiatives.

Fast forward upgrades

With both standard and long-life releases, Red Hat OpenStack Platform lets you choose when to implement new features in your cloud environment:

  • Upgrade every six months and benefit from one year of support on each release.
  • Upgrade every 18 months with long-life releases and benefit from 3 years of support on that release, with an optional ELS totalling to up to 5 years of support. Long life releases include innovations from all previous releases.

Now, with the fast forward upgrade feature, you can skip between long-life releases on an 18-month upgrade cadence. Fast forward upgrades fully containerize Red Hat OpenStack Platform deployment to simplify the process of upgrading between long-life releases. This means that customers who are currently using Red Hat OpenStack Platform 10 have an easier upgrade path to Red Hat OpenStack Platform 13—with fewer interruptions and no need for additional hardware.

Fast forward upgrade diagram v1Red Hat OpenStack Platform life cycle by version

Containerized OpenStack services

Red Hat OpenStack Platform now supports containerization of all OpenStack services. This means that OpenStack services can be independently managed, scaled, and maintained throughout their life cycle, giving you more control and flexibility. As a result, you can simplify service deployment and upgrades and allocate resources more quickly, efficiently, and at scale.

Red Hat stack integrations

The combination of Red Hat OpenStack Platform with Red Hat OpenShift provides a modern, container-based application development and deployment platform with a scalable hybrid cloud foundation. Kubernetes-based orchestration simplifies application portability across scalable hybrid environments, designed to provide a consistent, more seamless experience for developers, operations, and users.

Red Hat OpenStack Platform 13 delivers several new integrations with Red Hat OpenShift Container Platform:

  • Integration of openshift-ansible into Red Hat OpenStack Platform director eases troubleshooting and deployment.
  • Network integration using the Kuryr OpenStack project unifies network services between the two platforms, designed to eliminate the need for multiple network overlays and reduce performance and interoperability issues.  
  • Load Balancing-as-a-Service with Octavia provides highly available cloud-scale load balancing for traditional or containerized workloads.

Additionally, support for the Open Virtual Networking (OVN) networking stack supplies consistency between Red Hat OpenStack Platform, Red Hat OpenShift, and Red Hat Virtualization.

Security features and compliance focus

Security and compliance are top concerns for organizations deploying clouds. Red Hat OpenStack Platform includes integrated security features to help protect your cloud environment. It encrypts control flows and, optionally, data stores and flows, enhancing the privacy and integrity of your data both at rest and in motion.

Red Hat OpenStack Platform 13 introduces several new, hardened security services designed to help further safeguard enterprise workloads:

  • Programmatic, API-driven secrets management through Barbican
  • Encrypted communications between OpenStack services using Transport Layer Security (TLS) and Secure Sockets Layer (SSL)
  • Cinder volume encryption and Glance image signing and verification

Additionally, Red Hat OpenStack Platform 13 can help your organization meet relevant technical and operational controls found in risk management frameworks globally. Red Hat can help support compliance guidance provided by government standards organizations, including:

  • The Federal Risk and Authorization Management Program (FedRAMP) is a U.S. government-wide program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services.
  • Agence nationale de la sécurité des systèmes d’information (ANSSI) is the French national authority for cyber-defense and network and information security (NIS).

A updated security guide is also available to help you when deploying a cloud environment.

Storage and hyperconverged infrastructure options

Red Hat Ceph Storage provides unified, highly scalable, software-defined block, object, and file storage for Red Hat OpenStack Platform deployments and services. Integration between the two enables you to deploy, scale, and manage your storage back end just like your cloud infrastructure. New storage integrations included in Red Hat OpenStack Platform 13 give you more choice and flexibility. With support for the OpenStack Manila project, you can use the CephFS NFS file share as a service to better support applications using file storage. As a result, you can choose the type of storage for each workload, from a unified storage platform.

Red Hat Hyperconverged Infrastructure for Cloud combines Red Hat OpenStack Platform and Red Hat Ceph Storage into a single offering with a common life cycle and support. Both Red Hat OpenStack Platform compute and Red Hat Ceph Storage functions are run on the same host, enabling consolidation and efficiency gains. NFV use cases for Red Hat Hyperconverged Infrastructure for Cloud include:

  • Core datacenters
  • Central office datacenters
  • Edge and remote point of presence (POP) environments
  • Virtual radio access networks (vRAN)
  • Content delivery networks (CDN)

You can also add hyperconverged capabilities to your current Red Hat OpenStack Platform subscriptions using an add-on SKU.

RHHCI use cases v0Red Hat Hyperconverged Infrastructure for Cloud use cases

Telecommunications optimizations

Red Hat OpenStack Platform 13 delivers new telecommunications-specific features that allow CSPs to build innovative, cloud-based network infrastructure more easily:

  • OpenDaylight integration lets you connect your OpenStack environment with the OpenDaylight software-defined networking (SDN) controller, giving it greater visibility into and control over OpenStack networking, utilization, and policies.
  • Real-time Kernel-based Virtual Machine (KVM) support designed to deliver ultra-low latency for performance-sensitive environments.
  • Open vSwitch (OVS) offload support (tech preview) lets you implement single root input/output virtualization (SR-IOV) to help reduce the performance impact of virtualization and deliver better performance for high IOPS applications.
OpenStack_OpenDaylight-NetVirt_437720_0317-illustratedRed Hat OpenStack Platform and OpenDaylight cooperation

Learn more

Red Hat OpenStack Platform combines community-powered innovation with enterprise-grade features and support to help your organization build a production-ready private cloud. With it, you can accelerate application and service delivery, innovate faster to differentiate your business, and empower your IT teams to support digital initiatives.

Learn more about Red Hat OpenStack Platform:

by Rosa Guntrip, Senior Principal Product Marketing Manager at June 28, 2018 12:53 AM

June 25, 2018

Red Hat Stack

Red Hat Certified Cloud Architect – An OpenStack Perspective – Part Two

Previously we learned about what the Red Hat Certified Architect certification is and what exams are included in the “OpenStack-focused” version of the certification. This week we want to focus on personal experience and benefits from achieving this milestone.

Let’s be honest, even for the most skilled engineers the path to becoming an RHCA can be quite challenging and even a little bit intimidating!  Not only do the exams test your ability to perform specific tasks based on the certification requirements, but they also test your ability to repurpose that knowledge and combine it with the knowledge of other technologies while solving extremely complex scenarios.  This can make achieving the RHCA even more difficult; however, it also makes achieving the RHCA extremely validating and rewarding.

samuel-clara-69657-unsplashPhoto by Samuel Clara on Unsplash

Many busy professionals decide to prepare for the exams with Red Hat Online Learning (ROLE), which allows students to access the same robust course content and hands-on lab experience delivered in classroom training from the comfort of their own computer and at their own pace. This is made even easier through the Red Hat Learning Subscription (RHLS).

RHLS provides access to the entire Red Hat courseware catalog, including video classrooms, for a single, convenient price per year. This kind of access can help you prepare for all the certifications. We found that before sitting an exam, it was important to be able to perform 100 percent of the respective ROLE lab without referring back to any documentation for help; with RHLS this is much easier to do!  

While documentation and man pages are available during an exam, they should be used as a resource and not a replacement for deep knowledge. Indeed, it’s much better to make sure you know it by heart without needing to look! We also found that applying the comprehensive reviews found at the end of each ROLE course to real world scenarios helped us better understand how what we learned in the course applied to what we would do on a day-to-day basis.  

For example, when taking the Ansible ROLE course DO407, which uses a comprehensive virtual environment and a video classroom version, we were easily able to spawn instances in our own physical OpenStack environment and apply what we had learned in the course to the real world.  By putting the courseware into action in the real world it better allowed us to align the objectives of the course to real-life scenarios, making the knowledge more practical and easier to retain.

What about formal training?

nathan-dumlao-572047-unsplashPhoto by Nathan Dumlao on Unsplash

We wouldn’t recommend for anyone to just show up at the examination room without taking any formal training. Even if you feel that your level of proficiency in any of these technologies is advanced, keep in mind that Red Hat exams go very deep, covering large portions of the technology. For example, you might be an ‘Ansible Ninja’ writing playbooks for a living. But how often do you work with dynamic inventories or take advantage of delegation, vaults or parallelism? The same applies for any other technology you want to test yourself in, there is a good chance it will cover aspects you are not familiar with.

The value comes from having the combination of skills.  Take the example of an auto mechanic who is great at rebuilding a transmission, but may not know how to operate a manual transmission!  You can’t be an expert at one without knowing a lot about the other.

For us, this is where Red Hat training has been invaluable. With every exam there is a corresponding class provided. These classes not only cover each aspect of the technology (and beyond) that you will be tested on, but also provide self-paced lab modules and access to lab environments. They are usually offered with either a live instructor or via an online option so you can juggle the education activities with your ‘day job’ requirements!

More information about the classes for these exams can be found on the Red Hat Training site. 

How long does it take?

It doesn’t have to take long at all. If you already have an RHCE in Red Hat Enterprise Linux and OpenStack is not a new subject to you, the training will serve as an excellent reminder rather than something that you have to learn from scratch. Some people may even be able to complete all 5 exams in less then a month.

But does everyone want to go that fast? Probably not.

estee-janssens-396876-unsplashPhoto by Estée Janssens on Unsplash

When our customers ask us about what we recommend to achieve these certifications in a realistic timeframe we suggest the Red Hat Learning Subscription to them. As mentioned, it gives you amazing access to Red Hat courseware.

But it is more than that.

The Red Hat Learning Subscription is a program for individuals and organizations that not only provides the educational content to prepare you for the exams (including videos and lab access), but also, in some cases, may includes actual exams (and some retakes) at many Red Hat certified facilities. It is is valid for one year, which is plenty of time to work through all the courses and exams.

This kind of flexibility can help to shape an individual learning path.

For instance, imagine doing it like this:

With the Red Hat Learning subscription you could schedule all the exams in advance in two month intervals. These exams then become your milestones and give you a good predictable path for studying. You can always reschedule them if something urgent comes up. This lets you sign up for classes, but don’t take them too far apart before your exam. Then re-take all the self paced labs a week before your exam, without reading guided instructions. After that you should be in a position to assess your readiness for the exams and reach the ultimate goal of an RHCA.

Don’t get discouraged if you don’t pass on the first try, it’s not unusual even for subject experts to fail at first try! Simply close the knowledge gaps and retake the exam again. And with RHLS, you’ve got the access and time to do so!

The benefits of becoming RHCA can be substantial. Outside of gaining open source “street cred”, the most important aspect is, of course, for your career – it’s simple: you can get better at your job.

clark-tibbs-367075-unsplashPhoto by Clark Tibbs on Unsplash

And of course, being better at your job can translate to being more competitive in the job market, which can lead to being more efficient in your current role and potentially even bring additional financial compensation!

But becoming an RHCA is so much more. It helps to broaden your horizons. You can learn more ways to tackle real life business problems, including how to become more capable of taking leadership roles through translating problems into technology solutions.

As a proud Red Hat Certified Architect you will have the tools to help make the IT world a better place!

So what are you waiting for … go get it!


Icon_RH_Transportation_Space-Rocket_RGB_Flat (1)Ready to start your certification journey? Get in touch with the friendly Red Hatters at Red Hat Training in your local area today to find all the ways you can master the skills you need to accelerate your career and run your enterprise cloud!


About the authors:

Screen Shot 2018-06-20 at 12.23.14 pmChris Janiszewski is an Red Hat OpenStack Solutions Architect. He is proud to help his clients validate their business and technical use cases on OpenStack and supporting components like storage, networking or cloud automation and management. He is the father of two little kids and enjoys the majority of his free time playing with them.  When the kids are asleep he gets to put the “geek hat” on and build OpenStack labs to hack crazy use cases!


Screen Shot 2018-06-20 at 12.23.23 pmKen Holden is a Senior Solution Architect with Red Hat.  He has spent the past 3 years on the OpenStack Tiger Team with the primary responsibility of deploying Red Hat OpenStack Platform Proof-Of-Concept IaaS Clouds for Strategic Enterprise Customers across North America.  Throughout his 20 year career in Enterprise IT, Ken has focussed on Linux, Unix, Storage, Networking, and Security with the past 5 years being primarily focused on Cloud Solutions. Ken has achieved Red Hat Certified Architect status (RHCA 110-009-776) and holds Certified OpenStack Administrator status (COA-1700-0387-0100) with the OpenStack Foundation. Outside of work, Ken spends the majority of his time with his wife and two daughters, but also aspires to be the world’s most OK Guitar Player when time permits!

by Chris Janiszewski - Senior OpenStack Solutions Architect - Red Hat Tiger Team at June 25, 2018 02:54 AM

June 22, 2018

Red Hat Stack

Red Hat OpenStack Platform fast forward upgrades: A step-by-step overview

New in Red Hat®️ OpenStack®️ Platform 13, the fast forward upgrade feature lets you easily move between long-life releases, without the need to upgrade to each in-between release. Fast forward upgrades fully containerize Red Hat OpenStack Platform deployment to simplify and speed the upgrade process while reducing interruptions and eliminating the need for additional hardware. Today, we’ll take a look at what the fast forward upgrade process from Red Hat OpenStack Platform 10 to Red Hat OpenStack Platform 13 looks like in practice.

Screen Shot 2018-03-22 at 9.33.50 am

There are six main steps in the process:

  1. Cloud backup. Back up your existing cloud.
  2. Minor update. Update to the latest minor release.
  3. Undercloud upgrade. Upgrade your undercloud.
  4. Overcloud preparation. Prepare your overcloud.
  5. Overcloud upgrade. Upgrade your overcloud.
  6. Convergence. Converge your environment.

Step 1: Back up your existing cloud

First, you need to back up everything in your existing Red Hat OpenStack Platform 10 cloud, including your undercloud, overcloud, and any supporting services. It’s likely that you already have these procedures in place, but Red Hat also provides comprehensive Ansible playbooks to simply the fast forward process even more.

Manual backup procedures are likewise supported by Red Hat’s Customer Experience and Engagement (CEE) group.

A typical OpenStack backup process may involve the following steps:

  1. Notify your users.
  2. Purge your databases, including any unnecessary data stored by Heat or other OpenStack services. This will help to streamline the backup and upgrade process.
  3. Run undercloud and overcloud backups. This will preserve an initial backup of the cloud – it may take some time if you don’t have another backup to reference to this point in time.

By performing a backup before starting the upgrade, you can speed the overall upgrade process by only requiring smaller backups later on.

Step 2: Update to the latest minor release

lucas-davies-500439-unsplashPhoto by Lucas Davies on Unsplash

Next, update your Red Hat OpenStack Platform environment to the latest minor release using the standard minor update processes. This step consolidates all undercloud and overcloud node reboots required for moving to Red Hat OpenStack Platform 13. This simplifies the overall upgrade, as no reboots are needed in later steps. For example, an upgrade from Red Hat OpenStack Platform 10 to the latest, fast forward-ready minor release will update Open vSwitch (OVS) to version 2.9, Red Hat Enterprise Linux to version 7.5, and Red Hat Ceph®️ Storage to version 2.5 in your overcloud. These steps do require node reboots, so you can live-migrate workloads prior rebooting nodes to avoid downtime.

Step 3: Upgrade your undercloud

In this step, you’ll upgrade Red Hat OpenStack Platform director, known as the undercloud, to the new long-life release. This requires manual rolling updates from Red Hat OpenStack Platform 10 to 11 to 12 to 13, but does not require any reboots, as they were completed in the previous minor update. The same action pattern is repeated for each release: enable the new repository, stop main OpenStack Platform services, upgrade director’s main packages, and upgrade the undercloud. Note that Red Hat OpenStack Platform director will not be able to manage the version 10 overcloud during or after these upgrades.

Step 4: Prepare your overcloud

Red Hat OpenStack Platform 13 introduces containerized OpenStack services to the long-life release cadence. This step goes through the process to create the container registry to support the deployment of these new services during the fast forward procedure.

arnel-hasanovic-673679-unsplashPhoto by Arnel Hasanovic on Unsplash

The first part of this step is to prepare the container images to accomplish this:

  1. Upload Red Hat OpenStack Platform 13 container images to your cloud environment. These can be stored on the director node or on additional hardware. If you choose to store them on your director node, ensure that the node has enough space available for the images. Note that during this part, your undercloud will be unable to scale your overcloud.

Next, you’ll prepare your overcloud for features introduced in Red Hat OpenStack Platform 11 and 12, including composable networks and roles:

  1. Include new services in any custom roles_data files.
  2. Edit any custom roles_data files to add composable networks (new for Red Hat OpenStack Platform 13) to each role.
  3. Remove deprecated services from any custom roles_data files and update deprecated parameters in custom environment files.

If you have a Red Hat OpenStack Platform director-managed Red Hat Ceph Storage cluster or storage backends, you’ll also need to prepare your storage nodes for new, containerized configuration methods.

  1. Install the ceph-ansible package of playbooks in your undercloud and check that you are using the latest resources and configurations in your storage environment file.
  2. Update custom storage backend environment files to include new parameters and resources for composable services. This applies to NetApp, Dell EMC, and Dell EqualLogic block storage backends using cinder.

Finally, if your undercloud uses SSL/TLS for its Public API, you’ll need to allow your overcloud to access your undercloud’s OpenStack Object Storage (swift) Public API during the upgrade process.

  1. Add your undercloud’s certificate authority to each overcloud node using an Ansible playbook.
  2. Perform one last backup. This is the final opportunity for backups before starting the overcloud upgrade.

Step 5: Upgrade your overcloud

eberhard-grossgasteiger-330357-unsplashPhoto by eberhard grossgasteiger on Unsplash

This step is the core of the fast forward upgrade procedure. Remember that director is unable to manage your overcloud until this step is completed. During this step you’ll upgrade all overcloud roles and services from version 10 to version 13 using a fully managed series of commands. Let’s take a look at the process for each role.

Controller nodes

First, you’ll upgrade your control plane. This is performed on a single controller node, but does require your entire control plane to be down. Even so, it does not affect currently running workloads. Upgrade the chosen controller node sequentially through Red Hat OpenStack Platform releases to version 13. Once the database on the upgraded controller has been updated, containerized Red Hat OpenStack Platform 13 services can be deployed to all other controllers.

Compute nodes

Next, you’ll upgrade your compute nodes. As with your controller nodes, only OpenStack services are upgraded—not the underlying operating system. Node reboots are not required and workloads are unaffected by the process. The upgrade process is very fast, as it adds containerized services alongside RPM-based services and then simply switches over each service. During the process, however, compute users will not be able to create new instances. Some network services may also be affected.

To get familiar with the process and ensure compatibility with your environment, we recommend starting with a single, low-risk compute node.

Storage (Red Hat Ceph Storage) nodes

Finally, you’ll upgrade your Red Hat Ceph Storage nodes. While this upgrade is slightly different than the controller and compute nodes, it is not disruptive to services and your data plane remains available throughout the procedure. Director uses the ceph-ansible installer making upgrading your storage nodes simpler. It uses a rolling upgrade process to first upgrade your bare-metal services to Ceph 3.0, and then containerizes Ceph services.

steve-johnson-541507-unsplashPhoto by Steve Johnson on Unsplash

Step 6: Converge your environment

At this point, you’re almost done with the fast forward process. The final step is to converge all components in your new Red Hat OpenStack Platform 13 environment. As mentioned previously, until all overcloud components are upgraded to the same version as your Red Hat OpenStack Platform director, you have only limited overcloud management capabilities. While your workloads are unaffected, you’ll definitely want to regain full control over your environment.

This step finishes the fast forward upgrade process. You’ll update your overcloud stack within your undercloud. This ensures that your undercloud has the current view of your overcloud and resets your overcloud for ongoing operation. Finally, you’ll be able to operate your Red Hat OpenStack Platform environment as normal: add nodes, upgrade components, scale services, and manage everything from director.

Conclusion

Fast forward upgrades simplify the process of moving between long-life releases of Red Hat OpenStack Platform. However, upgrading from Red Hat OpenStack Platform 10 to containerized architecture of Red Hat OpenStack Platform 13 is still a significant change. As always, Red Hat is ready to help you succeed with detailed documentation, subscription support, and consulting services.

Watch the OpenStack Upgrades Strategy: The Fast Forward Upgrade video from OpenStack Summit Vancouver 2018 to learn more about the fast forward upgrade approach.

Learn more about Red Hat OpenStack Platform:

by Maria Bracho, Principal Product Manager OpenStack at June 22, 2018 09:33 PM

Red Hat Certified Cloud Architect – An OpenStack Perspective – Part One

The Red Hat Certified Architect (RHCA) is the highest certification provided by Red Hat. To many, it can be looked at as a “holy grail” of sorts in open source software certifications. It’s not easy to get. In order to receive it, you not only need to already be a Red Hat Certified Engineer  (RHCE) for Red Hat Enterprise Linux (with the Red Hat Certified System Administrator, (RHCSA) as pre-requisite) but also pass additional exams from various technology categories.

vasily-koloda-620886-unsplashPhoto by Vasily Koloda on Unsplash

There are roughly 20 exams to choose from that qualify towards the RHCA. Each exam is valid for 3 years, so as long as you complete 5 exams within a 3 year period, you will qualify for the RHCA. With that said, you must keep these exams up to date if you don’t want to lose your RHCA status.

An RHCA for OpenStack!

Ok, the subtitle might be misleading – there is no OpenStack specific RHCA certification! However you can select exams that will test your knowledge in technologies needed to successfully build and run OpenStack private clouds. We feel the following certifications demonstrate skills that are crucial for OpenStack:

Let’s take a deeper look at each one.

The first two are strictly OpenStack-based. To become a Red Hat Certified System Administrator in Red Hat OpenStack, you need to know how to deploy and operate an OpenStack private cloud. It is also required that you have a good knowledge of Red Hat OpenStack Platform features and how to take advantage of them.

A Red Hat Certified Engineer in Red Hat OpenStack is expected to be able to deploy and work with Red Hat Storage as well as have strong troubleshooting skills, especially around networking. The EX310 exam has recently been refreshed with a strong emphasis on Network Functions Virtualization (NFV) and advanced networking – which can be considered ‘must have’ skills in many OpenStack Telco use cases in the real world.

Since Red Hat OpenStack Platform comes with Red Hat CloudForms, the knowledge of it can be as crucial as OpenStack itself. Some folks even go as far as saying CloudForms is OpenStack’s missing brother. The next certification on the list, the Red Hat Certified Specialist in Hybrid Cloud Management, focuses on managing infrastructure using Red Hat CloudForms.  Where OpenStack focuses on abstracting compute, network and storage, CloudForms takes care of the business side of the house. It manages compliance, policies, chargebacks, service catalogs, integration with public clouds, legacy virtualization, containers, and automation platforms. CloudForms really can do a lot, so you can see why it is essential for certification.

But what about … Ansible?!

jess-watters-483666-unsplashPhoto by Jess Watters on Unsplash

For workload orchestration in OpenStack you can, of course, natively use Heat. However, if you want to become a truly advanced OpenStack user, you should consider exploring Ansible for these tasks. The biggest advantages of Ansible are its simplicity and flexibility with other platforms (not just OpenStack). It is also popular within DevOps teams for on- and off-premises workload deployments. In fact, Ansible is also a core technology behind the Red Hat OpenStack director, CloudForms, and Red Hat OpenShift Container Platform. It’s literally everywhere in the Red Hat product suites!

Logotype_RH_AnsibleAutomation_RGB_Black (1)

One of the reasons for Ansible’s popularity is the amazing functionality it provides through many reusable modules and playbooks. The Red Hat Certified Specialist in Ansible Automation deeply tests your knowledge of writing Ansible playbooks for automation of workload deployments and system operation tasks.

Virtualization of the nation

The last three certifications on this list (the Specialist certifications in Virtualization, Configuration Management, and OpenShift Administration), although not as closely related to OpenStack as the other certifications described here, extend the capability of your OpenStack skill set.

Many OpenStack deployments are complemented by standalone virtualization solutions such as Red Hat Virtualization. This is often useful for workloads not yet ready for a cloud platform. And with CloudForms, Red Hat Virtualization (RHV) and Red Hat OpenStack Platform can both be managed from one place, so having a solid understanding of Red Hat Virtualization can be very beneficial. This is why being a Red Hat Certified Specialist in Virtualization can be so crucial. Being able to run and manage both cloud native workloads and traditional virtualization is essential to your OpenStack skillset.

ng-30950-unsplashPhoto by 贝莉儿 NG on Unsplash

Puppets and Containers

To round things off, since Red Hat OpenStack Platform utilizes Puppet, we recommend earning the Red Hat Certified Specialist in Configuration Management certification for a true OpenStack-focused RHCA. Through it you demonstrate skills and knowledge in the underlying deployment mechanism allowing for a much deeper understanding and skill set.

Finally, a popular use case for OpenStack is running containerized applications on top of it. Earning the Red Hat Certified Specialist in OpenShift Administration shows you know how to install and manage Red Hat’s enterprise container platform, Red Hat OpenShift Container Platform!

Reach for the stars!

Whether you are already an OpenStack expert or looking to become one, the Red Hat Certified Architect track from Red Hat Certification offers the framework to allow you to prove those skills through an industry-recognized premier certification program. And if you follow our advice here you will not only be perfecting your OpenStack skills, but mastering other highly important supporting technologies including CloudForms, Ansible, Red Hat Virtualization, OpenShift, and Puppet on your journey to the RHCA.

greg-rakozy-38802-unsplashPhoto by Greg Rakozy on Unsplash

So what is it like to actually GET these certifications? In the next part of our blog we share our accounts of achieving the RHCA! Check back soon and bookmark so you don’t miss it!


Icon_RH_Transportation_Space-Rocket_RGB_Flat (1)Ready to start your certification journey now!? Get in touch with the friendly Red Hatters at Red Hat Training in your local area today to find all the ways you can master the skills you need to accelerate your career and run your enterprise cloud!


About our authors:

Screen Shot 2018-06-20 at 12.23.14 pmChris Janiszewski is a Red Hat OpenStack Solutions Architect. He is proud to help his clients validate their business and technical use cases on OpenStack and supporting components like storage, networking or cloud automation and management. He is the father of two little kids and enjoys the majority of his free time playing with them.  When the kids are asleep he gets to put the “geek hat” on and build OpenStack labs to hack crazy use cases!



Screen Shot 2018-06-20 at 12.23.23 pmKen Holden is a Senior Solution Architect with Red Hat. He has spent the past 3 years on the OpenStack Tiger Team with the primary responsibility of deploying Red Hat OpenStack Platform Proof-Of-Concept IaaS Clouds for Strategic Enterprise Customers across North America.  Throughout his 20 year career in Enterprise IT, Ken has focussed on Linux, Unix, Storage, Networking, and Security with the past 5 years being primarily focused on Cloud Solutions. Ken has achieved Red Hat Certified Architect status (RHCA 110-009-776) and holds Certified OpenStack Administrator status (COA-1700-0387-0100) with the OpenStack Foundation. Outside of work, Ken spends the majority of his time with his wife and two daughters, but also aspires to be the world’s most OK Guitar Player when time permits!


 

by Chris Janiszewski - Senior OpenStack Solutions Architect - Red Hat Tiger Team at June 22, 2018 12:27 AM

June 21, 2018

John Likes OpenStack

Tips on searching ceph-install-workflow.log on TripleO

1. Only look at the logs relevant to the last run

/var/log/mistral/ceph-install-workflow.log will contain a concatenation of the ceph-ansible runs. The last N lines of the file will have what you're looking for, so what is N?

Determine how long the file is:


[root@undercloud mistral]# wc -l ceph-install-workflow.log
20287 ceph-install-workflow.log
[root@undercloud mistral]#

Find the lines where previous ansible runs finshed.


[root@undercloud mistral]# grep -n failed=0 ceph-install-workflow.log
5425:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.21 : ok=118 changed=19 unreachable=0 failed=0
5426:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.23 : ok=81 changed=13 unreachable=0 failed=0
5427:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.25 : ok=113 changed=18 unreachable=0 failed=0
5428:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.27 : ok=38 changed=3 unreachable=0 failed=0
5429:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.28 : ok=77 changed=13 unreachable=0 failed=0
5430:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.29 : ok=58 changed=7 unreachable=0 failed=0
5431:2018-06-18 23:06:58,901 p=22256 u=mistral | 172.16.0.30 : ok=83 changed=18 unreachable=0 failed=0
5432:2018-06-18 23:06:58,902 p=22256 u=mistral | 172.16.0.31 : ok=110 changed=17 unreachable=0 failed=0
9948:2018-06-20 12:06:38,325 p=11460 u=mistral | 172.16.0.21 : ok=107 changed=12 unreachable=0 failed=0
9949:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.23 : ok=69 changed=4 unreachable=0 failed=0
9950:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.25 : ok=102 changed=11 unreachable=0 failed=0
9951:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.27 : ok=26 changed=0 unreachable=0 failed=0
9952:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.29 : ok=46 changed=5 unreachable=0 failed=0
9953:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.30 : ok=70 changed=8 unreachable=0 failed=0
9954:2018-06-20 12:06:38,326 p=11460 u=mistral | 172.16.0.31 : ok=99 changed=10 unreachable=0 failed=0
14927:2018-06-20 23:14:57,881 p=7702 u=mistral | 172.16.0.23 : ok=118 changed=19 unreachable=0 failed=0
14928:2018-06-20 23:14:57,881 p=7702 u=mistral | 172.16.0.27 : ok=110 changed=17 unreachable=0 failed=0
14932:2018-06-20 23:14:57,881 p=7702 u=mistral | 172.16.0.34 : ok=113 changed=18 unreachable=0 failed=0
20255:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.22 : ok=118 changed=19 unreachable=0 failed=0
20256:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.26 : ok=134 changed=18 unreachable=0 failed=0
20257:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.27 : ok=102 changed=14 unreachable=0 failed=0
20258:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.28 : ok=113 changed=18 unreachable=0 failed=0
20260:2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.34 : ok=110 changed=17 unreachable=0 failed=0
[root@undercloud mistral]#

Subtract the last run's line number from the total file lines:


[root@undercloud mistral]# echo $(( 20260 - 14932))
5328
[root@undercloud mistral]#

Tail from that line line going forward.

2. Identify the node(s) where the playbook run failed:

I know the last 100 lines of the relevant run will have failed set to true if there was a failure. Doing a grep for that will also show me the host:


[root@undercloud mistral]# tail -5328 ceph-install-workflow.log | tail -100 | grep failed=1
2018-06-21 09:46:40,571 p=17564 u=mistral | 172.16.0.32 : ok=66 changed=14 unreachable=0 failed=1
[root@undercloud mistral]#

Now that I know the host I want to see on which task that host failed so I grep for 'failed:'. Just grepping for failed won't help as the log will be full of '"failed": false'.

In this case I extract out the failure:


[root@undercloud mistral]# tail -5328 ceph-install-workflow.log | grep 172.16.0.32 | grep failed:
2018-06-21 09:46:06,093 p=17564 u=mistral | failed: [172.16.0.32 -> 172.16.0.22] (item=[{u'rule_name': u'', u'pg_num': 128, u'name': u'metrics'},
{'_ansible_parsed': True, 'stderr_lines': [u"Error ENOENT: unrecognized pool 'metrics'"], u'cmd': [u'docker', u'exec', u'ceph-mon-controller02',
u'ceph', u'--cluster', u'ceph', u'osd', u'pool', u'get', u'metrics', u'size'], u'end': u'2018-06-21 13:46:01.070270', '_ansible_no_log': False,
'_ansible_delegated_vars': {'ansible_delegated_host': u'172.16.0.22', 'ansible_host': u'172.16.0.22'}, '_ansible_item_result': True, u'changed':
True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'docker exec ceph-mon-controller02
ceph --cluster ceph osd pool get metrics size', u'removes': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'stdout': u'', u'start':
u'2018-06-21 13:46:00.729965', u'delta': u'0:00:00.340305', 'item': {u'rule_name': u'', u'pg_num': 128, u'name': u'metrics'}, u'rc': 2, u'msg':
u'non-zero return code', 'stdout_lines': [], 'failed_when_result': False, u'stderr': u"Error ENOENT: unrecognized pool 'metrics'",
'_ansible_ignore_errors': None, u'failed': False}]) => {"changed": false, "cmd": ["docker", "exec", "ceph-mon-controller02", "ceph",
"--cluster", "ceph", "osd", "pool", "create", "metrics", "128", "128", "replicated_rule", "1"], "delta": "0:00:01.421755", "end":
"2018-06-21 13:46:06.390381", "item": [{"name": "metrics", "pg_num": 128, "rule_name": ""}, {"_ansible_delegated_vars":
{"ansible_delegated_host": "172.16.0.22", "ansible_host": "172.16.0.22"}, "_ansible_ignore_errors": null, "_ansible_item_result":
true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": true, "cmd": ["docker", "exec", "ceph-mon-controller02",
"ceph", "--cluster", "ceph", "osd", "pool", "get", "metrics", "size"], "delta": "0:00:00.340305", "end": "2018-06-21 13:46:01.070270",
"failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "docker exec ceph-mon-controller02
ceph --cluster ceph osd pool get metrics size", "_uses_shell": false, "chdir": null, "creates": null, "executable": null,
"removes": null, "stdin": null, "warn": true}}, "item": {"name": "metrics", "pg_num": 128, "rule_name": ""}, "msg":
"non-zero return code", "rc": 2, "start": "2018-06-21 13:46:00.729965", "stderr": "Error ENOENT: unrecognized pool
'metrics'", "stderr_lines": ["Error ENOENT: unrecognized pool 'metrics'"], "stdout": "", "stdout_lines": []}],
"msg": "non-zero return code", "rc": 34, "start": "2018-06-21 13:46:04.968626", "stderr": "Error ERANGE:
pg_num 128 size 3 would mean 768 total pgs, which exceeds max 600 (mon_max_pg_per_osd 200 * num_in_osds 3)",
"stderr_lines": ["Error ERANGE: pg_num 128 size 3 would mean 768 total pgs, which exceeds max 600
(mon_max_pg_per_osd 200 * num_in_osds 3)"], "stdout": "", "stdout_lines": []}
...
[root@undercloud mistral]#

So that's how I quickly find what went wrong in a ceph-ansible run when debugging a TripleO deployment.

3. Extra

You may be wondering what that error is.

There was a ceph-ansible issue with creating pools before the OSDs were running made the deployment fail because of the overdose protection check. This is something you can still fail if your PG numbers and OSDs are not aligned correctly (use pgcalc) but better to fail a deployment then put production data on a misconfigured cluster. You could also fail it because of this issue that ceph-ansible rc9 fixed (technically it was fixed in an earlier version but it had other bugs so I recommend rc9).

by John (noreply@blogger.com) at June 21, 2018 03:56 PM

TripleO Ceph Integration on the Road in June

The first week of June I went to an upstream TripleO workshop in Brno. The labs we used are at https://github.com/redhat-openstack/tripleo-workshop

The third week of June I went to a downstream Red Hat OpenStack Platform event in Montreal for those deploying the upcoming version 13 in the field. I covered similar topics with respect to Ceph deployment via TripleO.

by John (noreply@blogger.com) at June 21, 2018 03:39 PM

June 14, 2018

Zane Bitter

The Expanding OpenStack Foundation

The OpenStack Foundation has begun the process of becoming an umbrella organisation for open source projects adjacent to but outside of OpenStack itself. However, there is no clear roadmap for the transformation, which has resulted in some confusion. After attending the joint leadership meeting with the Foundation Board of Directors and various Forum sessions that included some members of the board at the (2018) OpenStack Summit in Vancouver, I believe I can help shed some light on the situation. (Of course this is my subjective take on the topic, and I am not speaking for the Technical Committee.)

In November 2017, the board authorised the Foundation staff to begin incubation of several ‘Strategic Focus Areas’, including piloting projects that fit in those areas. The three focus areas are Container Infrastructure, Edge Computing Infrastructure, and CI/CD Infrastructure. To date, there have been two pilot projects accepted. Eventually, it is planned for each focus area to have its own Technical Committee (or equivalent governance body), holding equal status with the OpenStack TC—there will be no paramount technical governance body for the whole Foundation.

The first pilot project is Kata Containers, which combines container APIs and container-like performance with VM-level isolation. You will not be shocked to learn that it is part of the Container Infrastructure strategic focus.

The other pilot project, in the CI/CD strategic focus, is Zuul. Zuul will already be familiar to OpenStack developers as the CI system developed by and for the OpenStack project. Its governance is moving from the OpenStack TC to the new Strategic Focus Area, in recognition of its general usefulness as a tool that is not in any way specific to OpenStack development.

Thus far there are no pilot projects in the Edge Computing Infrastructure focus area, but nevertheless there is plenty of work going on—including to figure out what Edge Computing is.

If you attended the Summit then you would have heard about Kata, Zuul and Edge Computing, but this is probably the first time you’ve heard the terms ‘incubate’ or ‘pilot’ associated with them. Nor have the steps that come after incubation or piloting been defined. This has opened the door to confusion, not only about the status of the pilot projects but also that of unofficial projects (outside of either OpenStack-proper or any of the Strategic Focus Areas) that are hosted using on the same infrastructure provided by the Foundation for OpenStack development. It also heralds the return of what I call the October surprise—a half-baked code dump ‘open sourced’ the week before a Summit—which used to be a cottage industry around the OpenStack community until the TC was able to bed in a set of robust processes for accepting new projects.

Starting out without a lot of preconceived ideas about how things would proceed was the right way to begin, but members of the board recognise that now is the time to give the process some structure. I expect to see more work on this in the near future.

There is also a proposed initiative, dubbed Winterscale, to move governance of the foundation’s infrastructure out from under the OpenStack TC, to reflect its new status as a service provider to the OpenStack project, the other Strategic Focus Areas, and unofficial projects.

by Zane Bitter at June 14, 2018 08:26 PM

Lars Kellogg-Stedman

Configuring a static address for wlan0 on Raspbian Stretch

Recent releases of Raspbian have adopted the use of dhcpcd to manage both dynamic and static interface configuration. If you would prefer to use the traditional /etc/network/interfaces mechanism instead, follow these steps.

  1. First, disable dhcpcd and wpa_supplicant.

    systemctl disable --now dhdpcd wpa_supplicant
    
  2. You will need a wpa_supplicant configuration …

by Lars Kellogg-Stedman at June 14, 2018 04:00 AM

June 08, 2018

Matthias Runge

Configuring collectd plugins with TripleO

A way of deploying OpenStack is to use TripleO. This takes the an approach to deploy a small OpenStack environment, and then to take OpenStack provided infrastructure and tools to deploy the actual production environment. This is actually done by an addition to the openstack command line client:

openstack overcloud …

by mrunge at June 08, 2018 06:40 AM

June 07, 2018

RDO Blog

Rocky Test Days Milestone 2: June 14-15

Who’s up for a rematch? Rocky Milestone 2 is here and we’re ready to rumble! Join us on June 14 & 15 (next Thursday and Friday) for an awesome time of taking down bugs and fighting errors in the most recent release. We won’t be pulling any punches.

Want to get in on the action? We’re looking for developers, users, operators, quality engineers, writers, and, yes, YOU. If you’re reading this, we think you’re a champion and we want your help!

Here’s the plan:
We’ll have packages for the following platforms:
* RHEL 7
* CentOS 7

You’ll want a fresh install with latest updates installed so that there’s no hard-to-reproduce interactions with other things.

We’ll be collecting feedback, writing up tickets, filing bugs, and answering questions.

Even if you only have a few hours to spare, we’d love your help taking this new version for a spin to work out any kinks. Not only will this help identify issues early in the development process, but you can be the one of the first to cut your teeth on the latest versions of your favorite deployment methods like TripleO, PackStack, and Kolla.

Interested? We’ll be gathering on #rdo (on Freenode IRC) for any associated questions/discussion, and working through the “Does it work?” tests.

As Rocky said, “The world ain’t all sunshine and rainbows,” but with your help, we can keep moving forward and make the RDO world better for those around us. Hope to see you on the 14th & 15th!

enter image description here

by Mary Thengvall at June 07, 2018 02:51 PM

June 04, 2018

Steve Hardy

TripleO Containerized deployments, debugging basics

Containerized deployments, debugging basics

Since the Pike release, TripleO has supported deployments with OpenStack services running in containers.  Currently we use docker to run images based on those maintained by the Kolla project.

We already have some tips and tricks for container deployment debugging in tripleo-docs, but below are some more notes on my typical debug workflows.

Config generation debugging overview

In the TripleO container architecture, we still use Puppet to generate configuration files and do some bootstrapping, but it is run (inside a container) via a script docker-puppet.py

The config generation usage happens at the start of the deployment (step 1) and the configuration files are generated for all services (regardless of which step they are started in).

The input file used is /var/lib/docker-puppet/docker-puppet.json, but you can also filter this (e.g via cut/paste or jq as shown below) to enable debugging for specific services - this is helpful when you need to iterate on debugging a config generation issue for just one service.

[root@overcloud-controller-0 docker-puppet]# jq '[.[]|select(.config_volume | contains("heat"))]' /var/lib/docker-puppet/docker-puppet.json | tee /tmp/heat_docker_puppet.json
{
"puppet_tags": "heat_config,file,concat,file_line",
"config_volume": "heat_api",
"step_config": "include ::tripleo::profile::base::heat::api\n",
"config_image": "192.168.24.1:8787/tripleomaster/centos-binary-heat-api:current-tripleo"
}
{
"puppet_tags": "heat_config,file,concat,file_line",
"config_volume": "heat_api_cfn",
"step_config": "include ::tripleo::profile::base::heat::api_cfn\n",
"config_image": "192.168.24.1:8787/tripleomaster/centos-binary-heat-api-cfn:current-tripleo"
}
{
"puppet_tags": "heat_config,file,concat,file_line",
"config_volume": "heat",
"step_config": "include ::tripleo::profile::base::heat::engine\n\ninclude ::tripleo::profile::base::database::mysql::client",
"config_image": "192.168.24.1:8787/tripleomaster/centos-binary-heat-api:current-tripleo"
}

 

Then we can run the config generation, if necessary changing the tags (or puppet modules, which are consumed from the host filesystem e.g /etc/puppet/modules) until the desired output is achieved:


[root@overcloud-controller-0 docker-puppet]# export NET_HOST='true'
[root@overcloud-controller-0 docker-puppet]# export DEBUG='true'
[root@overcloud-controller-0 docker-puppet]# export PROCESS_COUNT=1
[root@overcloud-controller-0 docker-puppet]# export CONFIG=/tmp/heat_docker_puppet.json
[root@overcloud-controller-0 docker-puppet]# python /var/lib/docker-puppet/docker-puppet.py2018-02-09 16:13:16,978 INFO: 102305 -- Running docker-puppet
2018-02-09 16:13:16,978 DEBUG: 102305 -- CONFIG: /tmp/heat_docker_puppet.json
2018-02-09 16:13:16,978 DEBUG: 102305 -- config_volume heat_api
2018-02-09 16:13:16,978 DEBUG: 102305 -- puppet_tags heat_config,file,concat,file_line
2018-02-09 16:13:16,978 DEBUG: 102305 -- manifest include ::tripleo::profile::base::heat::api
2018-02-09 16:13:16,978 DEBUG: 102305 -- config_image 192.168.24.1:8787/tripleomaster/centos-binary-heat-api:current-tripleo
...

 

When the config generation is completed, configuration files are written out to /var/lib/config-data/heat.

We then compare timestamps against the /var/lib/config-data/heat/heat.*origin_of_time file (touched for each service before we run the config-generating containers), so that only those files modified or created by puppet are copied to /var/lib/config-data/puppet-generated/heat.

Note that we also calculate a checksum for each service (see /var/lib/config-data/puppet-generated/*.md5sum), which means we can detect when the configuration changes - when this happens we need paunch to restart the containers, even though the image did not change.

This checksum is added to the /var/lib/tripleo-config/hashed-docker-container-startup-config-step_*.json files by docker-puppet.py, and these files are later used by paunch to decide if a container should be restarted (see below).

 

Runtime debugging, paunch 101

Paunch is a tool that orchestrates launching containers for each step, and performing any bootstrapping tasks not handled via docker-puppet.py.

It accepts a json format, which are the /var/lib/tripleo-config/docker-container-startup-config-step_*.json files that are created based on the enabled services (the content is directly derived from the service templates in tripleo-heat-templates)

These json files are then modified via docker-puppet.py (as mentioned above) to add a TRIPLEO_CONFIG_HASH value to the container environment - these modified files are written with a different name, see /var/lib/tripleo-config/hashed-docker-container-startup-config-step_*.json

Note this environment variable isn't used by the container directly, it is used as a salt to trigger restarting containers when the configuration files in the mounted config volumes have changed.

As in the docker-puppet case it's possible to filter the json file with jq and debug e.g mounted volumes or other configuration changes directly.

It's also possible to test configuration changes by manually modifying /var/lib/config-data/puppet-generated/ then either restarting the container via docker restart, or by modifying TRIPLEO_CONFIG_HASH then re-running paunch.

Note paunch will kill any containers tagged for a particular step e.g the --config-id tripleo_step4 --managed-by tripleo-Controller means all containers started during this step for any previous paunch apply will be killed if they are removed from your json during testing.  This is a feature which enables changes to the enabled services on update to your overcloud but it's worth bearing in mind when testing as described here.


[root@overcloud-controller-0]# cd /var/lib/tripleo-config/
[root@overcloud-controller-0 tripleo-config]# jq '{"heat_engine": .heat_engine}' hashed-docker-container-startup-config-step_4.json | tee /tmp/heat_startup_config.json
{
"heat_engine": {
"healthcheck": {
"test": "/openstack/healthcheck"
},
"image": "192.168.24.1:8787/tripleomaster/centos-binary-heat-engine:current-tripleo",
"environment": [
"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS",
"TRIPLEO_CONFIG_HASH=14617e6728f5f919b16c74f1e98d0264"
],
"volumes": [
"/etc/hosts:/etc/hosts:ro",
"/etc/localtime:/etc/localtime:ro",
"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro",
"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro",
"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro",
"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro",
"/dev/log:/dev/log",
"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro",
"/etc/puppet:/etc/puppet:ro",
"/var/log/containers/heat:/var/log/heat",
"/var/lib/kolla/config_files/heat_engine.json:/var/lib/kolla/config_files/config.json:ro",
"/var/lib/config-data/puppet-generated/heat/:/var/lib/kolla/config_files/src:ro"
],
"net": "host",
"privileged": false,
"restart": "always"
}
}
[root@overcloud-controller-0 tripleo-config]# paunch --debug apply --file /tmp/heat_startup_config.json --config-id tripleo_step4 --managed-by tripleo-Controller
stdout: dd60546daddd06753da445fd973e52411d0a9031c8758f4bebc6e094823a8b45

stderr:
[root@overcloud-controller-0 tripleo-config]# docker ps | grep heat
dd60546daddd 192.168.24.1:8787/tripleomaster/centos-binary-heat-engine:current-tripleo "kolla_start" 9 seconds ago Up 9 seconds (health: starting) heat_engine

 

 

Containerized services, logging

There are a couple of ways to access the container logs:

  • On the host filesystem, the container logs are persisted under /var/log/containers/<service>
  • docker logs <container id or name>
It is also often useful to use docker inspect <container id or name> to verify the container configuration, e.g the image in use and the mounted volumes etc.

 

Debugging containers directly

Sometimes logs are not enough to debug problems, and in this case you must interact with the container directly to diagnose the issue.

When a container is not restarting, you can attach a shell to the running container via docker exec:


[root@openstack-controller-0 ~]# docker exec -ti heat_engine /bin/bash
()[heat@openstack-controller-0 /]$ ps ax
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 /usr/local/bin/dumb-init /bin/bash /usr/local/bin/kolla_start
5 ? Ss 1:50 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
25 ? S 3:05 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
26 ? S 3:06 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
27 ? S 3:06 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
28 ? S 3:05 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
2936 ? Ss 0:00 /bin/bash
2946 ? R+ 0:00 ps ax

 

That's all for today, for more information please refer to tripleo-docs,, or feel free to ask questions in #tripleo on Freenode!

by Steve Hardy (noreply@blogger.com) at June 04, 2018 05:09 PM

RDO Blog

Community Blog Round-up: June 4

I’m in a bit of shock that it’s already June… anyone else share that feeling? With summer around the corner for those of us in the northern hemisphere (or Juneuary as we call it in San Francisco), there’s a promise of vacations ahead. Be sure to take us along on your various adventures — sharing about your new favorite hacks, the projects you’re working on, and the conferences you’re traveling to. We love hearing what you’re up to! Speaking of which… here’s what you’ve been blogging about recently:

TripleO deep dive session #13 (Containerized Undercloud) by Carlos Camacho

This is the 13th release of the TripleO “Deep Dive” sessions. Thanks to Dan Prince & Emilien Macchi for this deep dive session about the next step of the TripleO’s Undercloud evolution. In this session, they will explain in detail the movement re-architecting the Undercloud to move towards containers in order to reuse the containerized Overcloud ecosystem.

Read more at https://www.anstack.com/blog/2018/05/31/tripleo-deep-dive-session-13.html

Tracking Quota by Adam Young

This OpenStack Summit marks the third that I have attended where we’ve discussed the algorithms to try and record quota in Keystone but not update it on each resource allocation and free.

Read more at https://adam.younglogic.com/2018/05/tracking-quota/

Don’t rewrite your driver. 80 storage drivers for containers rolled into one! by geguileo

Do you work with containers but your storage doesn’t support your Container Orchestration system? Have you or your company already developed an Openstack/Cinder storage driver and now you have to do it again for containers? Are you having trouble deciding how to balance your engineering force between storage driver development in OpenStack, Containers, Ansible, etc? Then read on, as your life may be about to get better.

Read more at https://gorka.eguileor.com/cinderlib-csi/

Ansible Storage Role: automating your storage solutions by geguileo

Were you in the middle of writing your Ansible playbooks to automate your software provisioning, configuration, and application deployment when you realized you had to manage your storage as well? And it turns out that each of your storage solutions has a completely different Ansible module. Now you have to figure out how each module works to create ad-hoc tasks for each one. What a pain! If this has happened to you, or if you are interested in automating your storage solutions, you may be interested in the new Ansible Storage Role.

Read more at https://gorka.eguileor.com/ansible-role-storage/

Cinderlib: Every storage driver on a single Python library by geguileo

Wouldn’t it be great if we could manage any storage array using a single Python library that provided the right storage management abstraction? Well, this is no longer a beautiful dream, it has become a reality! Keep reading to find out how.

Read more at https://gorka.eguileor.com/cinderlib/

“Ultimate Private Cloud” Demo, Under The Hood! by Steven Hardy, Senior Principal Software Engineer

At the recent Red Hat Summit in San Francisco, and more recently the OpenStack Summit in Vancouver, the OpenStack engineering team worked on some interesting demos for the keynote talks. I’ve been directly involved with the deployment of Red Hat OpenShift Platform on bare metal using the Red Hat OpenStack Platform director deployment/management tool, integrated with openshift-ansible. I’ll give some details of this demo, the upstream TripleO features related to this work, and insight around the potential use-cases.

Read more at https://redhatstackblog.redhat.com/2018/05/22/ultimate-private-cloud-demo-under-the-hood/

Testing Undercloud backup and restore using Ansible by Carlos Camacho

Testing the Undercloud backup and restore It is possible to test how the Undercloud backup and restore should be performed using Ansible.

Read more at https://www.anstack.com/blog/2018/05/18/testing-undercloud-backup-and-restore-using-ansible.html

Your LEGO® Order Has Been Shipped by rainsdance

In preparation for the Red Hat Summit this week and OpenStack Summit in a week, I put together a hardware demo to sit in the RDO booth.

Read more at http://groningenrain.nl/your-lego-order-has-been-shipped/

Introducing GPUs to the CERN Cloud by Konstantinos Samaras-Tsakiris

High-energy physics workloads can benefit from massive parallelism — and as a matter of fact, the domain faces an increasing adoption of deep learning solutions. Take for example the newly-announced TrackML challenge [7], already running in Kaggle! This context motivates CERN to consider GPU provisioning in our OpenStack cloud, as computation accelerators, promising access to powerful GPU computing resources to developers and batch processing alike.

Read more at https://openstack-in-production.blogspot.com/2018/05/introducing-gpus-to-cern-cloud.html

A modern hybrid cloud platform for innovation: Containers on Cloud with Openshift on OpenStack by Stephane Lefrere

Market trends show that due to long application life-cycles and the high cost of change, enterprises will be dealing with a mix of bare-metal, virtualized, and containerized applications for many years to come. This is true even as greenfield investment moves to a more container-focused approach.

Read more at https://redhatstackblog.redhat.com/2018/05/08/containers-on-cloud/

Using a TM1637 LED module with CircuitPython by Lars Kellogg-Stedman

CircuitPython is “an education friendly open source derivative of MicroPython”. MicroPython is a port of Python to microcontroller environments; it can run on boards with very few resources such as the ESP8266. I’ve recently started experimenting with CircuitPython on a Wemos D1 mini, which is a small form-factor ESP8266 board.

Read more at https://blog.oddbit.com/2018/05/03/using-a-tm-led-module-with-cir/

ARA Records Ansible 0.15 has been released by DM Simard

I was recently writing that ARA was open to limited development for the stable release in order to improve the performance for larger scale users.

Read more at https://dmsimard.com/2018/05/03/ara-records-ansible-0.15-has-been-released/

Highlights from the OpenStack Rocky Project Teams Gathering (PTG) in Dublin by Rich Bowen

Last month in Dublin, OpenStack engineers gathered from dozens of countries and companies to discuss the next release of OpenStack. This is always my favorite OpenStack event, because I get to do interviews with the various teams, to talk about what they did in the just-released version (Queens, in this case) and what they have planned for the next one (Rocky).

Read more at https://redhatstackblog.redhat.com/2018/04/26/highlights-from-the-openstack-rocky-project-teams-gathering-ptg-in-dublin/

Red Hat Summit 2018: HCI Lab by John

I will be at Red Hat Summit in SFO on May 8th jointly hosting the lab Deploy a containerized HCI IaaS with OpenStack and Ceph.

Read more at http://blog.johnlikesopenstack.com/2018/04/red-hat-summit-2018-hci-lab.html

by Mary Thengvall at June 04, 2018 02:18 PM

RDO Duck adventures at the Vancouver OpenStack Summit

Last week, I attended the OpenStack Summit in Vancouver, as RDO Ambassador. That was a good experience to connect with the RDO community.

What does an RDO ambassador do?

As an ambassador, your role is to engage the community by answering their questions, helping them to fix their issues, or getting started to contribute. You don’t need to know everything, but it implies, you can at least point people to the right direction.

On Sunday, we started helping with the booth setup, and for me it started with massive T-shirt folding and sorting them by size in laundry baskets. During trade shows, many people just want their swag, so efficient T-shirt distribution means more time to exchange with community members. Thanks to everyone at the booth, it was done quickly.

Massive T-shirt folding with RDO Duck

I also set up the RDO hardware demo. Rain Leander, RDO community liaison had prepared a shiny new demo using two NUCs and a portable gaming screen, using TripleO, this will be used in future events to demo RDO. Since we had limited time, I had to wait monday morning to reinstall everything since demo was borked during the previous event.

RDO Duck busy debugging TripleO deployment

From monday to thursday, I had to do shifts at the booth to advocate RDO and engage with the community, welcoming people doing their shifts and helping them to get set up. The community pod was shared with Carol Chen (ManageIQ) and Leonardo Vaz (Ceph), so we were able to cover many topics.

RDO’s swag was ducks – everyone loves them, we had three colors (Red, Green, Blue!) and I can say they were a bit mischievous 😉

RDO Duck having fun with stickers

Most of the questions we’ve had were related to RDO/RHOSP relations, how to contribute, TripleO, Ceph, etc. We also helped an user to debug his RDO deployment to find out a bug in CentOS cloud images breaking qemu-ev repository ($contentdir variable set to altarch instead of x86_64)

Demo and Q/A sessions

We also had people who came for demo’ing the cool stuff they made. We’ve had Michael J. Turek who demo’ed Ironic deployment on ppc64le (the shiny new arch supported by RDO!) mjturek and I

Also T. Nicholle Williams who presented us how to deploy OpenShift on OpenStack and as a guest star her lovely puddle. 🙂 Nicholle and I

And last but not least, David M. Simmard, creator of ARA which was the center of many questions.

David and I

RDO meetup

We had a RDO meetup jointly with the Ceph community at the Portside Pub. It was awesome to discuss around drinks and snacks. I noticed that among us, there were local stackers not attending the summit, so we reached a wider audience.

RDO meetup as if you were there

Conclusion

That was a great Summit, it was good to connect with fellow stackers (old and new friends!), and share the love around RDO. We’ve had an exceptionally good weather in Vancouver, and I must say that the scenery in British Columbia is just breathtaking. I really enjoyed my short stay there.

Thanks to Rain for sending me, my team for allowing me to go 🙂 Many thanks to Leo, Carol, Jennifer, Tracy with whom I had a lot of fun 🙂 The Dream Team packing up

by hguemar at June 04, 2018 01:35 PM

June 01, 2018

Groningen Rain

What We’ve Learned So Far

As I’ve been rebuilding this site. In case you hadn’t noticed. I have. Cause of the hack. And this is what I’ve learned so far… Backups Aren’t Necessary UNTIL THEY ARE If I had been maintaining regular backups (which the hosting company is totally happy to do for just a few euros more per month), …

by K Rain at June 01, 2018 08:00 AM

May 31, 2018

Carlos Camacho

TripleO deep dive session #13 (Containerized Undercloud)

This is the 13th release of the TripleO “Deep Dive” sessions

Thanks to Dan Prince & Emilien Macchi for this deep dive session about the next step of the TripleO’s Undercloud evolution.

In this session, they will explain in detail the movement re-architecting the Undercloud to move towards containers in order to reuse the containerized Overcloud ecosystem.

You can access the presentation or the Etherpad notes.

So please, check the full session content on the TripleO YouTube channel.



Please check the sessions index to have access to all available content.

by Carlos Camacho at May 31, 2018 12:00 AM

May 26, 2018

Adam Young

Tracking Quota

This OpenStack summit marks the third that I have attended where we’ve discussed the algorithms to try and record quota in Keystone but not update it on each resource allocation and free.

We were stumped, again. The process we had planned on using was game-able and thus broken. I was kinda bummed.

Fortunately, I had a long car ride from Vancouver to Seattle and talked it over with Morgan Fainberg.

We also discussed the Pig War. Great piece of history from the region.

By the time we got to the airport the next day, I think we had it solved. Morgan came to the solution first, and I followed, slowly.  Here’s what we think will work.

First, lets get a 3 Level deep project setup to use for out discussion.

The rule is simple:  even if a quota is subdivided, you still need to check the overall quota of the project  and all the parent projects.

In the example structure above,  lets assume that project A gets a quota of 100 units of some resource: VMs, GB memory, network ports, hot-dogs, very small rocks, whatever.  I’ll use VMs in this example.

There are a couple ways this could be further managed.  The simplest is that, any resource allocated anywhere else in this tree is counted against this quota.  There are 9 total projects in the tree.  If each allocate 11 VMs, there will be 99 created and counted against the quota.  The next VM created uses up the quota.  The request after that will fail due to lack of available quota.

Lets say, however, that the users in project C33 are greedy, and allocate all 100 VMs.  The people in C11 Are filled with righteous indignation.  They need VMs too.

The Admins wipe everything out and we start all over.  They set up a system to fragment the quota by allowing A project to split its quota assignment up and allocate some of it to subordinate projects.

Project A says “I’m going to keep 50 VMs for myself, and allocate 25 to B1 and B2.”

Project B1 Says I am going to keep 10 for Me and I’m going to allocate 5 to each C11, C12, C13.  And the B1 Tree is happy.

B2 is a manipulative schemer and decides to play around.  B2 Allocates his entire quota of 25 to C21.  C21 Creates 25 VMs.

B2 now withdraws his quota from C21.  There is no communication with Nova.  The VMs keep running.  He then allocates his entire quota of 25 VMs to C22, and C22 creates 25 VMs.

Nova says “What project is this?  C22?   What is its quota?  25?  All good.”

But in reality, B2 has doubled his quota.  His subordinates have allocated 50 VMs total.  He does this again with project C33, gets up to 75 VMs, and contemplates creating yet another project C34 just to keep up the pattern.  This would allocate more VMs than project A was originally allocated.

The admins notice this and get mad, wipe everything out, and start over again.  This time they’ve made a change.  Whenever the check quota on a project, they also will go and check quota on the parent project, counting all VMs underneath that parent.  Essentially, they will record that a VM created in project C11 is also reducing the original quota on B1  and on A.  In essence, they record a table.  If the user creates a VM in Project C11, the following will be recorded and check for quota.

VM Project
VM1 A
VM1 B1
VM1 C11

 

When a User then creates a VM in C21 the table will extend to this:

VM Project
VM1 A
VM1 B1
VM1 C11
VM2 A
VM2 B2
VM2 C21

In addition, when creating the VM2, Nova will check quota and see that, after creation:

  • C21 now has 1 out of 25 allocated
  • B2 now has 1 out of 25 allocated
  • A now has 2 out of 100 allocated

(quota is allocated prior to the creation of the resource to prevent a race condition)

Note that the quota is checked against the original amount, and not the amount reduced by sub allocating the quota.  If project C21 allocates 24 more VMs, to quota check will show:

  • C21 now has 25 out of 25 allocated
  • B2 now has 25 out of 25 allocated
  • A now has 26 out of 100 allocated

If B2 tried to play games, and removes the quota from C21 and gives it to C22, project C21 will be over quota, but Nova will have no way to trap this.  However, the only people this affects is other people within projects B2, C21, C22, and C23.  If C22 attempts to allocate a virtual machine, the quota check will show that B2 has allocated its full quota and cannot create any more.  The quota check will fail.

You might have noticed that the higher level projects can rob quota from the child projects in this scheme.  For example.  If Project A allocates 74 more VMs now, project B1 and children will still have allocated quota, but their quota check will fail because A is at full.  This could be mitigated by having 2 checks for project A: total quota (max 100), and directly allocated quota (max 50).

This scheme removes the quota violation by gaming the system.  I promised to write it up so we could continue to try and poke holes in it.

EDIT:  Quota would also have to be allocated per endpoint, or the endpoints will have to communicate with each other to evaluate usage.

 

by Adam Young at May 26, 2018 05:46 AM

May 23, 2018

Gorka Eguileor

Don’t rewrite your driver. 80 storage drivers for containers rolled into one!

Do you work with containers but your storage doesn’t support your Container Orchestration system? Have you or your company already developed an Openstack/Cinder storage driver and now you have to do it again for containers? Are you having trouble deciding how to balance your engineering force between storage driver development in OpenStack, Containers, Ansible, etc? […]

by geguileo at May 23, 2018 12:06 PM