Planet RDO

January 18, 2018

RDO Blog

RDO's infrastructure server metrics are now available

Reposted from dev@lists.rdoproject.org post by David Moreau Simard

We have historically been monitoring RDO's infrastructure through Sensu and it has served us well to pre-emptively detect issues and maximize our uptime.

At some point, Software Factory grew an implementation of Grafana, InfluxDB and Telegraf in order to monitor the health of the servers, not unlike how upstream's openstack-infra leverages cacti. This implementation was meant to eventually host graphs such as the ones for Zuul and Nodepool upstream.

While there are still details to be ironed out for the Zuul and Nodepool data collection, there was nothing preventing us from just deploying telegraf everywhere just for the general server metrics. It's one standalone package and one configuration file, that's it.

Originally, we had been thinking about feeding the Sensu metric data to Influxdb … but why even bother if it's there for free in Software Factory ? So here we are.

The metrics are now available here We will use this as a foundation to improve visibility into RDO's infrastructure, make it more "open" and accessible in the future.

We're not getting rid of Sensu although we may narrow it's scope to keep some of the more complex service and miscellaneous monitoring that we need to be doing. We'll see what time has in store for us.

Let me know if you have any questions !

by Rich Bowen at January 18, 2018 09:12 PM

January 11, 2018

RDO Blog

Summary of rdopkg development in 2017

During the year of 2017, 10 contributors managed to merge 146 commits into rdopkg.

3771 lines of code were added and 1975 lines deleted across 107 files.

54 unit tests were added on top of existing 32 tests - an increase of 169 % to total of 86 unit tests.

33 scenarios for 5 core rdopkg features were added in new feature tests spanning total of 228 test steps.

3 minor releases increased version from 0.42 to 0.45.0.

Let's talk about the most significant improvements.

Stabilisation

rdopkg started as a developers' tool, basically a central repository to accumulate RPM packaging automation in a reusable manner. Quickly adding new features was easy, but making sure existing functionality works consistently as code is added and changed proved to be much greater challenge.

As rdopkg started shifting from developers' powertool to a module used in other automation systems, unevitable breakages started to become a problem and prompted me to adapt development accordingly. As a first step, I tried to practice Test-Driven Development (TDD) as opposed to writing tests after a breakage to prevent specific case. Unit tests helped discover and prevent various bugs introduced by new code, but testing complex behaviors was a frustrating experience where most of development time was spent on writing units tests for cases they weren't meant to cover.

Sounds like using a wrong tool for the job, right? And so I opened a rather urgent rdopkg RFE: test actions in a way that doesn't suck and started researching what cool kids use to develop and test python software without suffering.

Behavior-Driven Development

It would seem that cucumber started quite a revolution of Behavior-Driven Development (BDD) and I really like Gherkin, the Business Readable, Domain Specific Language that lets you describe software's behaviour without detailing how that behaviour is implemented. Gherkin serves two purposes — documentation and automated tests.

After some more research on python BDD tools, I liked behave's implementation, documentation and community the most so I integrated it into rdopkg and started using feature tests. They make it easy to describe and define expected behavior before writing code. New features now start with feature scenario which can be reviewed before writing any code. Covering existing behavior with feature tests helps ensuring they are both preserved and well defined/explained/documented. Big thanks goes to Jon Schlueter who contributed huge number of initial feature tests for core rdopkg features.

Here is an example of rdopkg fix scenario:

    Scenario: rdopkg fix
        Given a distgit
        When I run rdopkg fix
        When I add description to .spec changelog
        When I run rdopkg --continue
        Then spec file contains new changelog entry with 1 lines
        Then new commit was created
        Then rdopkg state file is not present
        Then last commit message is:
            """
            foo-bar-1.2.3-3

            Changelog:
            - Description of a change
            """

Proper CI/gating

Thanks to Software Factory, zuul and gerrit, every rdopkg change now needs to pass following automatic gate tests before it can be merged:

  • unit tests (python 2, python 3, Fedora, EPEL, CentOS)
  • feature tests (python 2, python 3, Fedora, EPEL, CentOS)
  • integration tests
  • code style check

In other words, master is now significantly harder to break!

Tests are managed as individual tox targets for convenience.

Paying back the Technical Debt

I tried to write rdopkg code with reusability and future extension in mind, yet in one point of development with big influx of new features/modifications, rdopkg approached critical mass of technical debt where it got into spiral of new functionality breaking existing functionality and with each fix two new bugs surfaced. This kept happening so I stopped adding new stuff and focused on ensuring rdopkg keeps doing what people use it for before extending(breaking) it further. This required quite a few core code refactors, proper integration of features that were hacked in on the clock, as well as leveraging new tools like software factory CI pipeline, and behave described above. But I think it was a success and rdopkg paid its technical debt in 2017 and is ready to face whatever community throws at it in near and far future.

Integration

Join Software Factory project

rdopkg became a part of Software Factory project and found a new home alongside DLRN.

Software Factory is an open source, software development forge with an emphasis on collaboration and ensuring code quality through Continuous Integration (CI). It is inspired by OpenStack's development workflow that has proven to be reliable for fast-changing, interdependent projects driven by large communities. Read more in Introducing Software Factory.

Specifically, rdopkg leverages following Software Factory features:

rdopkg repo is still mirrored to github and bugs are kept in Issues tracker there as well because github is accessible public open space.

Did I meantion you can login to Software Factory using github account?

Finally, big thanks to Javier Peña, who paved the way towards Software Factory with DLRN.

Continuous Integration

rdopkg has been using human code reviews for quite some time, and it proved very useful even though I often +2/+1 my own reviews due to lack of reviewers. However, people unevitably make mistakes. There are decent unit and feature tests now to detect mistakes, so we fight human error with computing power and automation.

Each review and thus each code change to rdopkg is gated - all unit tests, feature tests, integration tests and code style checks need to pass before human reviewers consider accepting the change.

Instead of setting up machines and testing environments and installing requirements and waiting for tests to pass, this boring process is now automated on supported distributions and humans can focus on the changes themselves.

Integration with Fedora, EPEL and CentOS

rdopkg is now finally available directly from Fedora/EPEL repositories, so install instructions on Fedora 25+ systems boiled down to:

dnf install rdopkg

On CentOS 7+, EPEL is needed:

yum install epel-release
yum install rdopkg

Fun fact: to update Fedora rdopkg package, I use rdopkg:

fedpkg clone rdopkg
cd rdopkg
rdopkg new-version -bN
fedpkg mockbuild
# testing
fedpkg push
fedpkg build
fedpkg update

So rdopkg is officially packaging itself while also being packaged by itself.

Please nuke jruzicka/rdopkg copr if you were using it previously, it is now obsolete.

Documentation

rdopkg documentation was cleand up, proof-read, extended with more details and updated with latest information and links.

Feature scenarios are now available as man pages thanks to mhu.

Packaging and Distribution

Python 3 compatibility

By popular demand, rdopkg now supports Python 3. There are Python 3 unit tests and python3-rdopkg RPM package.

Adopt pbr for Versioning

Most of initial patches rdopkg was handling in the very beginning were related to distutils and pbr, the OpenStack packaging meta-library, specifically making it work on a distribution with integrated package management and old conservative packages.

Amusingly, pbr was integrated into rdopkg (well, it actually does solve some problems aside from creating new ones) and in order to release the new rdopkg version with pbr on CentOS/EPEL 7, I had to disable hardcoded pbr>=2.1.0 checks on update of python-pymod2pkg because older version of pbr is available from EPEL 7. I removed the check (in two different places) as I did so many times before and it works fine.

As a tribute to all the fun I had with pbr and distutils, here is a link to my first nuke bogus requirements patch of 2018.

Aside from being consistent with OpenStack related projects, rdopkg adopted strict sematic versioning that pbr uses, which means that releases are always going to have 3 version numbers from now on:

0.45 -> 0.45.0
1.0  -> 1.0.0

And More!

Aside from the big changes mentioned above, large amount of new feature tests and numerous not-so-exciting fixes, here is a list of changes might be worth mentioning:

  • unify rdopkg patch and rdopkg update-patches and use alias
  • rdopkg pkgenv shows more information and better color coding for easy telling of a distgit state and branches setup
  • preserve Change-Id when amending a commit
  • allow fully unattended runs of core actions.
  • commit messages created by all rdopkg actions are now clearer, more consistent and can be overriden using -H/--commit-header-file.
  • better error messages on missing patches in all actions
  • git config can be used to override patches remote, pranch, user name and email
  • improved handling of patches_base and patches_ignore including tests
  • improved handling of %changelog
  • improved new/old patcehs detection
  • improved packaging as suggested in Fedora review
  • improved naming in git and specfile modules
  • properly handle state files
  • linting cleanup and better code style checks
  • python 3 support
  • improve unicode support
  • handle VX.Y.Z tags
  • split bloated utils.cmd into utils.git module
  • merge legacy rdopkg.utils.exception so there is only single module for exceptions now
  • refactor unreasonable default atomic=False affecting action definitions
  • remove legacy rdopkg coprbuild action

Thank you, rdopkg community!

January 11, 2018 06:23 PM

January 10, 2018

RDO Blog

RDO Community Blogposts

If you've missed out on some of the great RDO Community content over the past few weeks while you were on holiday, not to worry. I've gathered the recent blogposts right here for you. Without further ado…

New TripleO quickstart cheatsheet by Carlos Camacho

I have created some cheatsheets for people starting to work on TripleO, mostly to help them to bootstrap a development environment as soon as possible.

Read more at http://anstack.github.io/blog/2018/01/05/tripleo-quickstart-cheatsheet.html

Using Ansible for Fernet Key Rotation on Red Hat OpenStack Platform 11 by Ken Savich, Senior OpenStack Solution Architect

In our first blog post on the topic of Fernet tokens, we explored what they are and why you should think about enabling them in your OpenStack cloud. In our second post, we looked at the method for enabling these.

Read more at https://redhatstackblog.redhat.com/2017/12/20/using-ansible-for-fernet-key-rotation-on-red-hat-openstack-platform-11/

Automating Undercloud backups and a Mistral introduction for creating workbooks, workflows and actions by Carlos Camacho

The goal of this developer documentation is to address the automated process of backing up a TripleO Undercloud and to give developers a complete description about how to integrate Mistral workbooks, workflows and actions to the Python TripleO client.

Read more at http://anstack.github.io/blog/2017/12/18/automating-the-undercloud-backup-and-mistral-workflows-intro.html

Know of other bloggers that we should be including in these round-ups? Point us to the articles on Twitter or IRC and we'll get them added to our regular cadence.

by Mary Thengvall at January 10, 2018 02:34 AM

January 05, 2018

Carlos Camacho

New TripleO quickstart cheatsheet

I have created some cheatsheets for people starting to work on TripleO, mostly to help them to bootstrap a development environment as soon as possible.

The previous version of this cheatsheet series was used in several community conferences (FOSDEM, DevConf.cz), now, they are deprecated as they way TripleO should be deployed changed considerably last months.

Here you have the last version:

The source code of these bookmarks is available as usual on GitHub

And this is the code if you want to execute it directly:

# 01 - Create the toor user.
sudo useradd toor
echo "toor:toor" | chpasswd
echo "toor ALL=(root) NOPASSWD:ALL" \
  | sudo tee -a /etc/sudoers.d/toor
sudo chmod 0440 /etc/sudoers.d/toor
su - toor

# 02 - Prepare the hypervisor node.
cd
mkdir .ssh
ssh-keygen -t rsa -N "" -f .ssh/id_rsa
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
sudo bash -c "cat .ssh/id_rsa.pub \
  >> /root/.ssh/authorized_keys"
sudo bash -c "echo '127.0.0.1 127.0.0.2' \
  >> /etc/hosts"
export VIRTHOST=127.0.0.2
sudo yum groupinstall "Virtualization Host" -y
sudo yum install git lvm2 lvm2-devel -y
ssh root@$VIRTHOST uname -a

# 03 - Clone repos and install deps.
git clone \
  https://github.com/openstack/tripleo-quickstart
chmod u+x ./tripleo-quickstart/quickstart.sh
bash ./tripleo-quickstart/quickstart.sh \
  --install-deps
sudo setenforce 0

# 04 - Configure the TripleO deployment with Docker and HA.
export CONFIG=~/deploy-config.yaml
cat > $CONFIG << EOF
overcloud_nodes:
  - name: control_0
    flavor: control
    virtualbmc_port: 6230
  - name: compute_0
    flavor: compute
    virtualbmc_port: 6231
node_count: 2
containerized_overcloud: true
delete_docker_cache: true
enable_pacemaker: true
run_tempest: false
extra_args: >-
  --libvirt-type qemu
  --ntp-server pool.ntp.org
  -e /usr/share/openstack-tripleo-heat-templates/environments/docker.yaml
  -e /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml
EOF

# 05 - Deploy TripleO.
export VIRTHOST=127.0.0.2
bash ./tripleo-quickstart/quickstart.sh \
      --clean          \
      --release master \
      --teardown all   \
      --tags all       \
      -e @$CONFIG      \
      $VIRTHOST

Happy TripleOing!!!

by Carlos Camacho at January 05, 2018 12:00 AM

December 21, 2017

Red Hat Stack

Using Ansible for Fernet Key Rotation on Red Hat OpenStack Platform 11

In our first blog post on the topic of Fernet tokens, we explored what they are and why you should think about enabling them in your OpenStack cloud. In our second post, we looked at the method for enabling these

Fernet tokens in Keystone are fantastic. Enabling these, instead of UUID or PKI tokens, really does make a difference in your cloud’s performance and overall ease of management. I get asked a lot about how to manage keys on your controller cluster when using Fernet. As you may imagine, this could potentially take your cloud down if you do it wrong. Let’s review what Fernet keys are, as well as how to manage them in your Red Hat OpenStack Platform cloud.

freddy-marschall-186922Photo by Freddy Marschall on Unsplash

Prerequisites

  • A Red Hat OpenStack Platform 11 director-based deployment
  • One or more controller nodes
  • Git command-line client

What are Fernet Keys?

Fernet keys are used to encrypt and decrypt Fernet tokens in OpenStack’s Keystone API. These keys are stored on each controller node, and must be available to authenticate and validate users of the various OpenStack components in your cloud.

Any given implementation of keystone can have (n)keys based on the max_active_keys setting in /etc/keystone/keystone.conf. This number will include all of the types listed below.

There are essentially three types of keys:

Primary

Primary keys are used for token generation and validation. You can think of this as the active key in your cloud. Any time a user authenticates, or is validated by an OpenStack API, these are the keys that will be used. There can only be one primary key, and it must exist on all nodes (usually controllers) that are running the keystone API. The primary key is always the highest indexed key.

Secondary

Secondary keys are only used for token validation. These keys are rotated out of primary status, and thus are used to validate tokens that may exist after a new primary key has been created. There can be multiple secondary keys, the oldest of which will be deleted based on your max_active_keys setting after each key rotation.

Staged

These keys are always the lowest indexed keys (0). Whenever keys are rotated, this key is promoted to a primary key at the highest index allowable by max_active_keys. These keys exist to allow you to copy them to all nodes in your cluster before they’re promoted to primary status. This avoids the potential issue where keystone fails to validate a token because the key used to encrypt it does not yet exist in /etc/keystone/fernet-keys.

The following example shows the keys that you’d see in /etc/keystone/fernet-keys, with max_active_keys set to 4.

0 (staged: the next primary key)
1 (primary: token generation & validation)

Upon performing a key rotation, our staged key (0), will be the new primary key (2), while our old primary key (1), will be moved to secondary status (1).

0 (staged: the next primary key)
1 (secondary: token validation)
2 (primary: token generation & validation)

We have three keys here, so yet another key rotation will produce the following result:

0 (staged: the next primary key)
1 (secondary: token validation)
2 (secondary: token validation)
3 (primary: token generation & validation)

Our staged key (0), now becomes our primary key (3). Our old primary key (2), now becomes a secondary key (2), and (1) remains a secondary key.

We now have four keys, the number we’ve set in max_active_keys. One more final rotation would produce the following:

0 (staged: the next primary key)
1 (deleted)
2 (secondary: token validation)
3 (secondary: token validation)
4 (primary: token generation & validation)

Our oldest key, secondary (1), is deleted. Our previously staged key (0), is moved to primary (4) status.  A new staged key (0) is created. And finally our old primary key (3) is moved to secondary status.

If you haven’t noticed this by now, rotating keys will always remove the key with the lowest index, excluding 0 — up to your max_active_keys. Additionally, note that you must be careful to set your max_active_keys configuration setting to something that makes sense, given your token lifetime and how often you plan to rotate your keys.

When to rotate?

uros-jovicic-322314Photo by Uroš Jovičić on Unsplash

The answer to this question would probably be different for most organizations. My take on this is simply: if you can do it safely, why not automate it and do it on a regular basis? Your threat model and use-case would normally dictate this or you may need to adhere to certain encryption and key management security controls in a given compliance framework. Whatever the case, I think about regular key rotation as a best-practices security measure. You always want to limit the amount of sensitive data, in this case Fernet tokens, encrypted with a single version of any given encryption key. Rotating your keys on a regular basis creates a smaller exposure surface for your cloud and your users.

How many keys do you need active at one time? This all depends on how often you plan to rotate them, as well as how long your token lifetime is. The answer to this can be expressed in the following equation:

fernet-keys = token-validity(hours) / rotation-time(hours) + 2

Let’s use an example of rotation every 8 hours, with a default token lifetime of 24 hours. This would be

24 hours / 8 hours + 2 = 5

Five keys on your controllers would ensure that you always had an active set of keys for your cloud. With this in mind, let’s look at way to rotate your keys using Ansible.

Rotating Fernet keys

So you may be wondering, how does one automate this process? You can image that this process can be painful and prone to error if done by hand. While you could use the fernet_rotate command to do this on each node manually, why would you?

Let’s look at how to do this with Ansible, Red Hat’s awesome tool for automation. If you’re new to Ansible, please do yourself a favor and check out this quick-start video.

We’ll be using an Ansible role, created by my fellow Red Hatter Juan Antonio Osorio (Ozz), one of the coolest guys I know. This is just one way of doing this. For a Red Hat OpenStack Platform install you should contact Red Hat support to review your options and support implications. And of course, your results may vary so be sure to test out on a non-production install!

Let’s start by logging into your Red Hat OpenStack director node as the stack user, and creating a roles directory in /home/stack:

$ cat << EOF > ~/rotate.yml
- hosts: controller 
  become: true 
  roles: 
    - tripleo-fernet-keys-rotation
EOF

We need to source our stackrc, as we’ll be operating on our controller nodes in the next step

$ source ~/stackrc

Using a dynamic inventory from /usr/bin/tripleo-ansible-inventory, we’ll run this playbook and rotate the keys on our controllers

$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory rotate.yml

Ansible Role Analysis

What happened? Looking at Ansible’s output, you’ll note that several tasks were performed. If you’d like to see these tasks, look no further than /home/stack/roles/tripleo-fernet-keys-rotation/tasks/main.yml:

This task runs a python script, generate_key_yaml.py, in ~/roles/tripleo-ansible-inventory/files, that creates a new fernet key:

- name: Generate new key
 script: generate_key_yaml.py
 register: new_key_register
 run_once: true

This task will take the output of the previous task, from stdout, and register it as the new_key.

- name: Set new key fact
 set_fact:
 new_key: "{{ new_key_register.stdout }}"

Next, we get a sorted list of the keys that currently exist in /etc/keystone/fernet-keys

- name: Get current primary key index
 shell: ls /etc/keystone/fernet-keys | sort -r | head -1
 register: current_key_index_register

Let’s set the next primary key index

- name: Set next key index fact
 set_fact:
 next_key_index: "{{ current_key_index_register.stdout|int + 1 }}"

Now we’ll move the staged key to the new primary key

- name: Move staged key to new index
 command: mv /etc/keystone/fernet-keys/0 /etc/keystone/fernet-keys/{{ next_key_index }}

Next, let’s set our new_key to the new staged key

- name: Set new key as staged key
 copy:
 content: "{{ new_key }}"
 dest: /etc/keystone/fernet-keys/0
 owner: keystone
 group: keystone
 mode: 0600

Finally, we’ll reload (not restart) httpd on the controller, allowing keystone to load the new keys

- name: Reload httpd
 service:
 name: httpd
 state: reloaded

Scheduling

Now that we have a way to automate rotation of our keys, it’s time to schedule this automation. There are several ways you could do this:

Cron

You could, but why?

Systemd Realtime Timers

Let’s create the systemd service that will run our playbook:

cat << EOF > /etc/systemd/system/fernet-rotate.service
[Unit]
Description=Run an Ansible playbook to rotate fernet keys on the overcloud
[Service]
User=stack
Group=stack
ExecStart=/usr/bin/ansible-playbook \
  -i /usr/bin/tripleo-ansible-inventory /home/stack/rotate.yml
EOF

Now we’ll create a timer with the same name, only with .timer as the suffix, in /etc/systemd/system on the director node:

cat << EOF > /etc/systemd/system/fernet-rotate.timer
[Unit]
Description=Timer to rotate our Overcloud Fernet Keys weekly
[Timer]
OnCalendar=weekly
Persistent=true
[Install]
WantedBy=timers.target
EOF

Ansible Tower

I like how your thinking! But that’s a topic for another day.

Red Hat OpenStack Platform 12

Red Hat OpenStack Platform 12 provides support for key rotation via Mistral. Learn all about Red Hat OpenStack Platform 12 here.

What about logging?

Ansible to the rescue!

Ansible will use the log_path configuration option from /etc/ansible/ansible.cfg, ansible.cfg in the directory of the playbook, or $HOME/.ansible.cfg. You just need to set this and forget it.

So let’s enable this service and timer, and we’re off to the races:

$ sudo systemctl enable fernet-rotate.service
$ sudo systemctl enable fernet-rotate.timer

Credit: Many thanks to Lance Bragstad and Dolph Matthews for the key rotation methodology.

by Ken Savich, Senior OpenStack Solution Architect at December 21, 2017 02:08 AM

December 18, 2017

Carlos Camacho

Automating Undercloud backups and a Mistral introduction for creating workbooks, workflows and actions

The goal of this developer documentation is to address the automated process of backing up a TripleO Undercloud and to give developers a complete description about how to integrate Mistral workbooks, workflows and actions into the Python TripleO client.

This tutorial will be divided into several sections:

  1. Introduction and prerequisites
  2. Undercloud backups
  3. Creating a new OpenStack CLI command in python-tripleoclient (openstack undercloud backup).
  4. Creating Mistral workflows for the new python-tripleoclient CLI command.
  5. Give support for new Mistral environment variables when installing the undercloud.
  6. Show how to test locally the changes in python-tripleoclient and tripleo-common.
  7. Give elevated privileges to specific Mistral actions that need to run with elevated privileges.
  8. Debugging actions
  9. Unit tests
  10. Why all previous sections are related to Upgrades?

1. Introduction and prerequisites

Let’s assume you have a TripleO development environment healthy and working properly. All the commands and customization we are going to run will run in the Undercloud, as usual logged in as the stack user and having sourced the stackrc file.

Then let’s proceed by cloning the repositories we are going to work with in a temporary folder:

mkdir dev-docs
cd dev-docs
git clone https://github.com/openstack/python-tripleoclient
git clone https://github.com/openstack/tripleo-common
git clone https://github.com/openstack/instack-undercloud
  • python-tripleoclient: Will define the OpenStack CLI commands.
  • tripleo-common: Will have the Mistral logic.
  • instack-undercloud: Allows to update and create mistral environments to store configuration details needed when executing Mistral workflows.

2. Undercloud backups

Most of the Undercloud back procedure is available in the TripleO official documentation site.

We will focus on the automation of backing up the resources required to restore the Undercloud in case of a failed upgrade.

  • All MariaDB databases on the undercloud node
  • MariaDB configuration file on undercloud (so we can restore databases accurately)
  • All glance image data in /var/lib/glance/images
  • All swift data in /srv/node
  • All data in stack users home directory

For doing this we need to be able to:

  • Connect to the database server as root.
  • Dump all databases to file.
  • Create a filesystem backup of several folders (and be able to access folders with restricted access).
  • Upload this backup to a swift container to be able to get it from the TripleO web UI.

3. Creating a new OpenStack CLI command in python-tripleoclient (openstack undercloud backup).

The first action needed is to be able to create a new CLI command for the OpenStack client. In this case, we are going to implement the openstack undercloud backup command.

cd dev-docs
cd python-tripleoclient

Let’s list the files inside this folder:

[stack@undercloud python-tripleoclient]$ ls
AUTHORS           doc                            setup.py
babel.cfg         LICENSE                        test-requirements.txt
bindep.txt        zuul.d                         tools
build             README.rst                     tox.ini
ChangeLog         releasenotes                   tripleoclient
config-generator  requirements.txt               
CONTRIBUTING.rst  setup.cfg

Once inside the python-tripleoclient folder we need to check the following file:

setup.cfg: This file defines all the CLI commands for the Python TripleO client. Specifically, we will need at the end of this file our new command definition:

undercloud_backup = tripleoclient.v1.undercloud_backup:BackupUndercloud

This means that we have a new command defined as undercloud backup that will instantiate the BackupUndercloud class defined in the file tripleoclient/v1/undercloud_backup.py

For further details related to this class definition please go to the gerrit review.

Now, having our class defined we can call other methods to invoke Mistral in this way:

clients = self.app.client_manager

files_to_backup = ','.join(list(set(parsed_args.add_files_to_backup)))

workflow_input = {
    "sources_path": files_to_backup
}
output = undercloud_backup.prepare(clients, workflow_input)

So forth, we will call the undercloud_backup.prepare method defined in the file tripleoclient/workflows/undercloud_backup.py wich will call the Mistral workflow:

def prepare(clients, workflow_input):
    workflow_client = clients.workflow_engine
    tripleoclients = clients.tripleoclient
    with tripleoclients.messaging_websocket() as ws:
        execution = base.start_workflow(
            workflow_client,
            'tripleo.undercloud_backup.v1.prepare_environment',
            workflow_input=workflow_input
        )
        for payload in base.wait_for_messages(workflow_client, ws, execution):
            if 'message' in payload:
                return payload['message']

In this case, we will create a loop within the tripleoclient and wait until we receive a message from the Mistral workflow tripleo.undercloud_backup.v1.prepare_environment that indicates if the invoked workflow ended correctly.

4. Creating Mistral workflows for the new python-tripleoclient CLI command.

The next step is to define the tripleo.undercloud_backup.v1.prepare_environment Mistral workflow, all the Mistral workbooks, workflows and actions will be defined in the tripleo-common repository.

Let’s go inside tripleo-common

cd dev-docs
cd tripleo-common

And see it’s conent:

[stack@undercloud tripleo-common]$ ls
AUTHORS           doc                README.rst        test-requirements.txt
babel.cfg         HACKING.rst        releasenotes      tools
build             healthcheck        requirements.txt  tox.ini
ChangeLog         heat_docker_agent  scripts           tripleo_common
container-images  image-yaml         setup.cfg         undercloud_heat_plugins
contrib           LICENSE            setup.py          workbooks
CONTRIBUTING.rst  playbooks          sudoers           zuul.d

Again we need to check the following file:

setup.cfg: This file defines all the Mistral actions we can call. Specifically, we will need at the end of this file our new actions:

tripleo.undercloud.get_free_space = tripleo_common.actions.undercloud:GetFreeSpace
tripleo.undercloud.create_backup_dir = tripleo_common.actions.undercloud:CreateBackupDir
tripleo.undercloud.create_database_backup = tripleo_common.actions.undercloud:CreateDatabaseBackup
tripleo.undercloud.create_file_system_backup = tripleo_common.actions.undercloud:CreateFileSystemBackup
tripleo.undercloud.upload_backup_to_swift = tripleo_common.actions.undercloud:UploadUndercloudBackupToSwift

4.1. Action definition

Let’s take the first action to describe it’s definition, tripleo.undercloud.get_free_space = tripleo_common.actions.undercloud:GetFreeSpace

We have defined the action named as tripleo.undercloud.get_free_space which will instantiate the class GetFreeSpace defined in the file tripleo_common/actions/undercloud.py file.

If we open tripleo_common/actions/undercloud.py we can see the class definition as:

class GetFreeSpace(base.Action):
    """Get the Undercloud free space for the backup.

       The default path to check will be /tmp and the default minimum size will
       be 10240 MB (10GB).
    """

    def __init__(self, min_space=10240):
        self.min_space = min_space

    def run(self, context):
        temp_path = tempfile.gettempdir()
        min_space = self.min_space
        while not os.path.isdir(temp_path):
            head, tail = os.path.split(temp_path)
            temp_path = head
        available_space = (
            (os.statvfs(temp_path).f_frsize * os.statvfs(temp_path).f_bavail) /
            (1024 * 1024))
        if (available_space < min_space):
            msg = "There is no enough space, avail. - %s MB" \
                  % str(available_space)
            return actions.Result(error={'msg': msg})
        else:
            msg = "There is enough space, avail. - %s MB" \
                  % str(available_space)
            return actions.Result(data={'msg': msg})

In this specific case this class will check if there is enough space to perform the backup. Later we will be able to inkove action as

mistral run-action tripleo.undercloud.get_free_space

or use it workbooks.

4.2. Workflow definition.

Once we have defined all our new actions, we need to orchestrate them in order to have a fully working Mistral workflow.

All tripleo-comon workbooks are defined in the workbooks folder.

In the next example we have a workbook definition with all actions inside it, in this case we put in the example the first workflow with all the tasks involved.

---
version: '2.0'
name: tripleo.undercloud_backup.v1
description: TripleO Undercloud backup workflows

workflows:

  prepare_environment:
    description: >
      This workflow will prepare the Undercloud to run the database backup
    tags:
      - tripleo-common-managed
    input:
      - queue_name: tripleo
    tasks:
      # Action to know if there is enough available space
      # to run the Undercloud backup
      get_free_space:
        action: tripleo.undercloud.get_free_space
        publish:
            status: <% task().result %>
            free_space: <% task().result %>
        on-success: send_message
        on-error: send_message
        publish-on-error:
          status: FAILED
          message: <% task().result %>


      # Sending a message that the folder to create the backup was
      # created succesfully
      send_message:
        action: zaqar.queue_post
        retry: count=5 delay=1
        input:
          queue_name: <% $.queue_name %>
          messages:
            body:
              type: tripleo.undercloud_backup.v1.launch
              payload:
                status: <% $.status %>
                execution: <% execution() %>
                message: <% $.get('message', '') %>
        on-success:
          - fail: <% $.get('status') = "FAILED" %>

The workflow its self explanatory, the only not so clear part might be the last one as the workflow uses an action to send a message stating that the workflow ended correctly. Passing as the message the output of the previous task, in this case the result of the create_backup_dir.

5. Give support for new Mistral environment variables when installing the undercloud.

Sometimes is needed to use additional values inside a Mistral task. For example, if we need to create a dump of a database we might need another that the Mistral user credentials for authentication purposes.

Initially when the Undercloud is installed it’s created a Mistral environment called tripleo.undercloud-config. This environment variable will have all required configuration details that we can get from Mistral. This is defined in the instack-undercloud repository.

Let’s get into the repository and check the content of the file instack_undercloud/undercloud.py.

This file defines a set of methods to interact with the Undercloud, specifically the method called _create_mistral_config_environment allows to configure additional environment variables when installing the Undercloud.

For additional testing, you can use the Python snippet to call Mistral client from the Undercloud node available in gist.github.com.

6. Show how to test locally the changes in python-tripleoclient and tripleo-common.

If it’s needed a local test of a change in python-tripleoclient or tripleo-common, the following procedures allow to test it locally.

For a change in python-tripleoclient, assuming you already have downloaded the change you want to test, execute:

cd python-tripleoclient
sudo rm -Rf /usr/lib/python2.7/site-packages/tripleoclient*
sudo rm -Rf /usr/lib/python2.7/site-packages/python_tripleoclient*
sudo python setup.py clean --all install

For a change in tripleo-common, assuming you already have downloaded the change you want to test, execute:

cd tripleo-common
sudo rm -Rf /usr/lib/python2.7/site-packages/tripleo_common*
sudo python setup.py clean --all install
sudo cp /usr/share/tripleo-common/sudoers /etc/sudoers.d/tripleo-common
# this loads the actions via entrypoints
sudo mistral-db-manage --config-file /etc/mistral/mistral.conf populate
# make sure the new actions got loaded
mistral action-list | grep tripleo
for workbook in workbooks/*.yaml; do
    mistral workbook-create $workbook
done

for workbook in workbooks/*.yaml; do
    mistral workbook-update $workbook
done
sudo systemctl restart openstack-mistral-executor
sudo systemctl restart openstack-mistral-engine

If we want to execute a Mistral action or a Mistral workflow you can execute:

Examples about how to test Mistral actions independently:

mistral run-action tripleo.undercloud.get_free_space #Without parameters
mistral run-action tripleo.undercloud.get_free_space '{"path": "/etc/"}' # With parameters
mistral run-action tripleo.undercloud.create_file_system_backup '{"sources_path": "/tmp/asdf.txt,/tmp/asdf", "destination_path": "/tmp/"}'

Examples about how to test a Mistral workflow independently:

mistral execution-create tripleo.undercloud_backup.v1.prepare_environment # No parameters
mistral execution-create tripleo.undercloud_backup.v1.filesystem_backup '{"sources_path": "/tmp/asdf.txt,/tmp/asdf", "destination_path": "/tmp/"}' # With parameters

7. Give elevated privileges to specific Mistral actions that need to run with elevated privileges.

Sometimes its is not possible to execute some restricted actions from the Mistral user, for example, when creating the Undercloud backup we won’t be able to access the /home/stack/ folder to create a tarball of it. For this cases it’s possible to execute elevates actions from the Mistral user:

This is the content of the sudoers in the root of the tripleo-common repository at the time of the creatino of this guide.

Defaults!/usr/bin/run-validation !requiretty
Defaults:validations !requiretty
Defaults:mistral !requiretty
mistral ALL = (validations) NOPASSWD:SETENV: /usr/bin/run-validation
mistral ALL = NOPASSWD: /usr/bin/chown -h validations\: /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \
        /usr/bin/chown validations\: /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \
        !/usr/bin/chown /tmp/validations_identity_* *, !/usr/bin/chown /tmp/validations_identity_*..*
mistral ALL = NOPASSWD: /usr/bin/rm -f /tmp/validations_identity_[A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_][A-Za-z0-9_], \
        !/usr/bin/rm /tmp/validations_identity_* *, !/usr/bin/rm /tmp/validations_identity_*..*
mistral ALL = NOPASSWD: /bin/nova-manage cell_v2 discover_hosts *
mistral ALL = NOPASSWD: /usr/bin/tar --ignore-failed-read -C / -cf /tmp/undercloud-backup-*.tar *
mistral ALL = NOPASSWD: /usr/bin/chown mistral. /tmp/undercloud-backup-*/filesystem-*.tar
validations ALL = NOPASSWD: ALL

Here you can grant permissions for specific tasks in when executing Mistral workflows from tripleo-common

7. Debugging actions.

Let’s assume the action is written, added to setup.cfg but not appeared. Firstly, check if action was added by sudo mistral-db-manage populate. Run

mistral action-list -f value -c Name | grep -e '^tripleo.undercloud'

If you don’t see your actions check output of sudo mistral-db-manage populate as

sudo mistral-db-manage populate 2>&1| grep ERROR | less

The following output may indicate issues in code. Simply fix code.

2018-01-01:00:59.730 7218 ERROR stevedore.extension [-] Could not load 'tripleo.undercloud.get_free_space': unexpected indent (undercloud.py, line 40):   File "/usr/lib/python2.7/site-packages/tripleo_common/actions/undercloud.py", line 40

Execute single action, execute workflow from workbook to make sure it works as designed.

8. Unit tests

Writing Unit test is essential instrument of Software Developer. Unit tests are much faster that running Workflow itself. So, let’s write unit tests for written action. Let’s add tripleo_common/tests/actions/test_undercloud.py file with the following content in tripleo-comon repositiry.

import mock

from tripleo_common.actions import undercloud
from tripleo_common.tests import base


class GetFreeSpaceTest(base.TestCase):
    def setUp(self):
        super(GetFreeSpaceTest, self).setUp()
        self.temp_dir = "/tmp"

    @mock.patch('tempfile.gettempdir')
    @mock.patch("os.path.isdir")
    @mock.patch("os.statvfs")
    def test_run_false(self, mock_statvfs, mock_isdir, mock_gettempdir):
        mock_gettempdir.return_value = self.temp_dir
        mock_isdir.return_value = True
        mock_statvfs.return_value = mock.MagicMock(
            spec_set=['f_frsize', 'f_bavail'],
            f_frsize=4096, f_bavail=1024)
        action = undercloud.GetFreeSpace()
        action_result = action.run(context={})
        mock_gettempdir.assert_called()
        mock_isdir.assert_called()
        mock_statvfs.assert_called()
        self.assertEqual("There is no enough space, avail. - 4 MB",
                         action_result.error['msg'])

    @mock.patch('tempfile.gettempdir')
    @mock.patch("os.path.isdir")
    @mock.patch("os.statvfs")
    def test_run_true(self, mock_statvfs, mock_isdir, mock_gettempdir):
        mock_gettempdir.return_value = self.temp_dir
        mock_isdir.return_value = True
        mock_statvfs.return_value = mock.MagicMock(
            spec_set=['f_frsize', 'f_bavail'],
            f_frsize=4096, f_bavail=10240000)
        action = undercloud.GetFreeSpace()
        action_result = action.run(context={})
        mock_gettempdir.assert_called()
        mock_isdir.assert_called()
        mock_statvfs.assert_called()
        self.assertEqual("There is enough space, avail. - 40000 MB",
                         action_result.data['msg'])

Run

tox -epy27

to see any unit tests errors.

  • Undercloud backups are an important step before runing an Upgrade.
  • Writing developer docs will help people to create and develope new features.

9. References

  • http://www.dougalmatthews.com/2016/Sep/21/debugging-mistral-in-tripleo/
  • http://blog.johnlikesopenstack.com/2017/06/accessing-mistral-environment-in-cli.html
  • http://hardysteven.blogspot.com.es/2017/03/developing-mistral-workflows-for-tripleo.html

by Carlos Camacho at December 18, 2017 12:00 AM

December 15, 2017

RDO Blog

Blog Round-up

It's time for another round-up of the great content that's circulating our community. But before we jump in, if you know of an OpenStack or RDO-focused blog that isn't featured here, be sure to leave a comment below and we'll add it to the list.

ICYMI, here's what has sparked the community's attention this month, from Ansible to TripleO, emoji-rendering, and more.

TripleO and Ansible (Part 2) by slagle

In my last post, I covered some of the details about using Ansible to deploy with TripleO. If you haven’t read that yet, I suggest starting there: http://blog-slagle.rhcloud.com/?p=355

Read more at http://blog-slagle.rhcloud.com/?p=369

TripleO and Ansible deployment (Part 1) by slagle

In the Queens release of TripleO, you’ll be able to use Ansible to apply the software deployment and configuration of an Overcloud.

Read more at http://blog-slagle.rhcloud.com/?p=355

An Introduction to Fernet tokens in Red Hat OpenStack Platform by Ken Savich, Senior OpenStack Solution Architect

Thank you for joining me to talk about Fernet tokens. In this first of three posts on Fernet tokens, I’d like to go over the definition of OpenStack tokens, the different types and why Fernet tokens should matter to you. This series will conclude with some awesome examples of how to use Red Hat Ansible to manage your Fernet token keys in production.

Read more at https://redhatstackblog.redhat.com/2017/12/07/in-introduction-to-fernet-tokens-in-red-hat-openstack-platform/

Full coverage of libvirt XML schemas achieved in libvirt-go-xml by Daniel Berrange

In recent times I have been aggressively working to expand the coverage of libvirt XML schemas in the libvirt-go-xml project. Today this work has finally come to a conclusion, when I achieved what I believe to be effectively 100% coverage of all of the libvirt XML schemas. More on this later, but first some background on Go and XML…

Read more at https://www.berrange.com/posts/2017/12/07/full-coverage-of-libvirt-xml-schemas-achieved-in-libvirt-go-xml/

Full colour emojis in virtual machine names in Fedora 27 by Daniel Berrange

Quite by chance today I discovered that Fedora 27 can display full colour glyphs for unicode characters that correspond to emojis, when the terminal displaying my mutt mail reader displayed someone’s name with a full colour glyph showing stars:

Read more at https://www.berrange.com/posts/2017/12/01/full-colour-emojis-in-virtual-machine-names-in-fedora-27/

Booting baremetal from a Cinder Volume in TripleO by higginsd

Up until recently in TripleO booting, from a cinder volume was confined to virtual instances, but now thanks to some recent work in ironic, baremetal instances can also be booted backed by a cinder volume.

Read more at http://goodsquishy.com/2017/11/booting-baremetal-from-a-cinder-volume-in-tripleo/

by Mary Thengvall at December 15, 2017 06:20 AM

December 14, 2017

Red Hat Stack

Red Hat OpenStack Platform 12 Is Here!

We are happy to announce that Red Hat OpenStack Platform 12 is now Generally Available (GA).

This is Red Hat OpenStack Platform’s 10th release and is based on the upstream OpenStack release, Pike.

Red Hat OpenStack Platform 12 is focused on the operational aspects to deploying OpenStack. OpenStack has established itself as a solid technology choice and with this release, we are working hard to further improve the usability aspects and bring OpenStack and operators into harmony.

Logotype_RH_OpenStackPlatform_RGB_Black (1)

With operationalization in mind, let’s take a quick look at some the biggest and most exciting features now available.

Containers.

As containers are changing and improving IT operations it only stands to reason that OpenStack operators can also benefit from this important and useful technology concept. In Red Hat OpenStack Platform we have begun the work of containerizing the control plane. This includes some of the main services that run OpenStack, like Nova and Glance, as well as supporting technologies, such as Red Hat Ceph Storage. All these services can be deployed as containerized applications via Red Hat OpenStack Platform’s lifecycle and deployment tool, director.

frank-mckenna-252014Photo by frank mckenna on Unsplash

Bringing a containerized control plane to OpenStack is important. Through it we can immediately enhance, among other things, stability and security features through isolation. By design, OpenStack services often have complex, overlapping library dependencies that must be accounted for in every upgrade, rollback, and change. For example, if Glance needs a security patch that affects a library shared by Nova, time must be spent to ensure Nova can survive the change; or even more frustratingly, Nova may need to be updated itself. This makes the change effort and resulting change window and impact, much more challenging. Simply put, it’s an operational headache.

However, when we isolate those dependencies into a container we are able to work with services with much more granularity and separation. An urgent upgrade to Glance can be done alongside Nova without affecting it in any way. With this granularity, operators can more easily quantify and test the changes helping to get them to production more quickly.

We are working closely with our vendors, partners, and customers to move to this containerized approach in a way that is minimally disruptive. Upgrading from a non-containerized control plane to one with most services containerized is fully managed by Red Hat OpenStack Platform director. Indeed, when upgrading from Red Hat OpenStack Platform 11 to Red Hat OpenStack Platform 12 the entire move to containerized services is handled “under the hood” by director. With just a few simple preparatory steps director delivers the biggest change to OpenStack in years direct to your running deployment in an almost invisible, simple to run, upgrade. It’s really cool!

Red Hat Ansible.

Like containers, it’s pretty much impossible to work in operations and not be aware of, or more likely be actively using, Red Hat Ansible. Red Hat Ansible is known to be easier to use for customising and debugging; most operators are more comfortable with it, and it generally provides an overall nicer experience through a straightforward and easy to read format.

Logotype_RH_AnsibleAutomation_RGB_Black

Of course, we at Red Hat are excited to include Ansible as a member of our own family. With Red Hat Ansible we are actively integrating this important technology into more and more of our products.

In Red Hat OpenStack Platform 12, Red Hat Ansible takes center stage.

But first, let’s be clear, we have not dropped Heat; there are very real requirements around backward compatibility and operator familiarity that are delivered with the Heat template model.

But we don’t have to compromise because of this requirement. With Ansible we are offering operator and developer access points independent of the Heat templates. We use the same composable services architecture as we had before; the Heat-level flexibility still works the same, we just translate to Ansible under the hood.

Simplistically speaking, before Ansible, our deployments were mostly managed by Heat templates driving Puppet. Now, we use Heat to drive Ansible by default, and then Ansible drives Puppet and other deployment activities as needed. And with the addition of containerized services, we also have positioned Ansible as a key component of the entire container deployment. By adding a thin layer of Ansible, operators can now interact with a deployment in ways they could not previously.

For instance, take the new openstack overcloud config download command. This command allows an operator to generate all the Ansible playbooks being used for a deployment into a local directory for review. And these aren’t mere interpretations of Heat actions, these are the actual, dynamically generated playbooks being run during the deployment. Combine this with Ansible’s cool dynamic inventory feature, which allows an operator to maintain their Ansible inventory file based on a real-time infrastructure query, and you get an incredibly powerful troubleshooting entry point.

Check out this short (1:50) video showing Red Hat Ansible and this new exciting command and concept:

Network composability.

Another major new addition for operators is the extension of the composability concept into networks.

As a reminder, when we speak about composability we are talking about enabling operators to create detailed solutions by giving them basic, simple, defined components from which they can build for their own unique, complex topologies.

With composable networks, operators are no longer only limited to using the predefined networks provided by director. Instead, they can now create additional networks to suit their specific needs. For instance, they might create a network just for NFS filer traffic, or a dedicated SSH network for security reasons.

radek-grzybowski-74331Photo by Radek Grzybowski on Unsplash

And as expected, composable networks work with composable roles. Operators can create custom roles and apply multiple, custom networks to them as required. The combinations lead to an incredibly powerful way to build complex enterprise network topologies, including an on-ramp to the popular L3 spine-leaf topology.

And to make it even easier to put together we have added automation in director that verifies that resources and Heat templates for each composable network are automatically generated for all roles. Fewer templates to edit can mean less time to deployment!

Telco speed.

Telcos will be excited to know we are now delivering production ready virtualized fast data path technologies. This release includes Open vSwitch 2.7 and the Data Plane Development Kit (DPDK) 16.11 along with improvements to Neutron and Nova allowing for robust virtualized deployments that include support for large MTU sizing (i.e. jumbo frames) and multiple queues per interface. OVS+DPDK is now a viable option alongside SR-IOV and PCI passthrough in offering more choice for fast data in Infrastructure-as-a-Service (IaaS) solutions.

Operators will be pleased to see that these new features can be more easily deployed thanks to new capabilities within Ironic, which store environmental parameters during introspection. These values are then available to the overcloud deployment providing an accurate view of hardware for ideal tuning. Indeed, operators can further reduce the complexity around tuning NFV deployments by allowing director to use the collected values to dynamically derive the correct parameters resulting in truly dynamic, optimized tuning.

Serious about security.

praveesh-palakeel-352584

Helping operators, and the companies they work for, focus on delivering business value instead of worrying about their infrastructure is core to Red Hat’s thinking. And one way we make sure everyone sleeps better at night with OpenStack is through a dedicated focus on security.

Starting with Red Hat OpenStack Platform 12 we have more internal services using encryption than in any previous release. This is an important step for OpenStack as a community to help increase adoption in enterprise datacenters, and we are proud to be squarely at the center of that effort. For instance, in this release even more services now feature internal TLS encryption.

Let’s be realistic, though, focusing on security extends beyond just technical implementation. Starting with Red Hat OpenStack Platform 12 we are also releasing a comprehensive security guide, which provides best practices as well as conceptual information on how to make an OpenStack cloud more secure. Our security stance is firmly rooted in meeting global standards from top international agencies such as FedRAMP (USA), ETSI (Europe), and ANSSI (France). With this guide, we are excited to share these efforts with the broader community.

Do you even test?

How many times has someone asked an operations person this question? Too many! “Of course we test,” they will say. And with Red Hat OpenStack Platform 12 we’ve decided to make sure the world knows we do, too.

Through the concept of Distributed Continuous Integration (DCI), we place remote agents on site with customers, partners, and vendors that continuously build our releases at all different stages on all different architectures. By engaging outside resources we are not limited by internal resource restrictions; instead, we gain access to hardware and architecture that could never be tested in any one company’s QA department. With DCI we can fully test our releases to see how they work under an ever-increasing set of environments. We are currently partnered with major industry vendors for this program and are very excited about how it helps us make the entire OpenStack ecosystem better for our customers.

So, do we even test? Oh, you bet we do!

Feel the love!

grafxart-photo-420180Photo by grafxart photo on Unsplash

And this is just a small piece of the latest Red Hat OpenStack Platform 12 release. Whether you are looking to try out a new cloud, or thinking about an upgrade, this release brings a level of operational maturity that will really impress!

Now that OpenStack has proven itself an excellent choice for IaaS, it can focus on making itself a loveable one.

Let Red Hat OpenStack Platform 12 reignite the romance between you and your cloud!

Red Hat OpenStack Platform 12 is designated as a “Standard” release with a one-year support window. Click here for more details on the release lifecycle for Red Hat OpenStack Platform.

Find out more about this release at the Red Hat OpenStack Platform Product page. Or visit our vast online documentation.

And if you’re ready to get started now, check out the free 60-day evaluation available on the Red Hat portal.

Looking for even more? Contact your local Red Hat office today.

 

by August Simonelli, Technical Marketing Manager, Cloud at December 14, 2017 01:49 AM

December 13, 2017

James Slagle

TripleO and Ansible (Part 2)

In my last post, I covered some of the details about using Ansible to deploy
with TripleO. If you haven’t read that yet, I suggest starting there:
http://blog-slagle.rhcloud.com/?p=355

I’ll now cover interacting with Ansible more directly.

When using --config-download as a deployment argument, a Mistral workflow will be enabled that runs ansible-playbook to apply the deployment and configuration data to each node. When the deployment is complete, you can interact with the files that were created by the workflow.

Let’s take a look at how to do that.

You need to have a shell on the Undercloud. Since the files used by the workflow potentially contain sensitive data, they are only readable by the mistral user or group. So either become the root user, or add your interactive shell user account (typically “stack”) to the mistral group:

sudo usermod -a -G mistral stack
# Activate the new group
newgrp mistral

Once the permissions are sorted, change to the mistral working directory for
the config-download workflows:

cd /var/lib/mistral

Within that directory, there will be directories named according to the Mistral
execution uuid. An easy way to find the most recent execution of
config-download is to just cd into the most recently created directory and list
the files in that directory:

cd 2747b55e-a7b7-4036-82f7-62f09c63d671
ls

The following files (or a similar set, as things could change) will exist:

ansible.cfg
ansible.log
ansible-playbook-command.sh
common_deploy_steps_tasks.yaml
Controller
deploy_steps_playbook.yaml
deploy_steps_tasks.yaml
external_deploy_steps_tasks.yaml
external_post_deploy_steps_tasks.yaml
group_vars
ssh_private_key
templates
tripleo-ansible-inventory
update_steps_playbook.yaml
update_steps_tasks.yaml
upgrade_steps_playbook.yaml
upgrade_steps_tasks.yaml

All the files that are needed to re-run ansible-playbook are present. The exact ansible-playbook command is saved in ansible-playbook-command.sh. Let’s take a look at that file:

$ cat ansible-playbook-command.sh
 #!/bin/bash

OS_AUTH_TOKEN="gAAAAABaMD3b3UQziKRzm2jjutrxBbYgqfWSTZWAMXyU5DcTA83Nn28eBVUvr0darSl0LcF3kb-I7OYnMxAp3dBs39ejrINYmsuBmT7ZE3SjYjWqtgivQyYWOHJmgKscl2VuBnWF8Jq-kd3wOHpHQVpJ0ILls35uFPUQvf91ckpr2QsEg67i9Ys"
 OS_AUTH_URL="http://192.168.24.1:5000/v3"
 OS_PROJECT_NAME="admin"
 OS_USERNAME="admin"
 ANSIBLE_CONFIG="/var/lib/mistral/2747b55e-a7b7-4036-82f7-62f09c63d671/ansible.cfg"
 HOME="/var/lib/mistral/2747b55e-a7b7-4036-82f7-62f09c63d671"

ansible-playbook -v /var/lib/mistral/2747b55e-a7b7-4036-82f7-62f09c63d671/deploy_steps_playbook.yaml --user tripleo-admin --become --ssh-extra-args "-o StrictHostKeyChecking=no" --timeout 240 --inventory-file /var/lib/mistral/2747b55e-a7b7-4036-82f7-62f09c63d671/tripleo-ansible-inventory --private-key /var/lib/mistral/2747b55e-a7b7-4036-82f7-62f09c63d671/ssh_private_key $@

You can see how the call to ansible-playbook is reproduced in this script. Also notice that $@ is used to pass any additional arguments directly to ansible-playbook when calling this script, such as --check, --limit, --tags, --start-at-task, etc.

Some of the other files present are:

  • tripleo-ansible-inventory
    • Ansible inventory file containing hosts and vars for all the Overcloud nodes.
  • ansible.log
    • Log file from the last run of ansible-playbook.
  • ansible.cfg
    • Config file used when running ansible-playbook.
  • ansible-playbook-command.sh
    • Executable script that can be used to rerun ansible-playbook.
  • ssh_private_key
    • Private ssh key used to ssh to the Overcloud nodes.

Within the group_vars directory, there is a corresponding file per role. In my
example, I have a Controller role. If we take a look at group_vars/Controller we see it contains:

$ cat group_vars/Controller
Controller_pre_deployments:
- HostsEntryDeployment
- DeployedServerBootstrapDeployment
- UpgradeInitDeployment
- InstanceIdDeployment
- NetworkDeployment
- ControllerUpgradeInitDeployment
- UpdateDeployment
- ControllerDeployment
- SshHostPubKeyDeployment
- ControllerSshKnownHostsDeployment
- ControllerHostsDeployment
- ControllerAllNodesDeployment
- ControllerAllNodesValidationDeployment
- ControllerArtifactsDeploy
- ControllerHostPrepDeployment

Controller_post_deployments: []

The <RoleName>_pre_deployments and <RoleName>_post_deployments variables contain the list of Heat deployment names to run for that role. Suppose we wanted to just rerun a single deployment. That command would be:

$ ./ansible-playbook-command.sh --tags pre_deploy_steps -e Controller_pre_deployments=ControllerArtifactsDeploy -e force=true

That would run just the ControllerArtifactsDeploy deployment. Passing -e force=true is necessary to force the deployment to rerun. Also notice we restrict what tags get run with --tags pre_deploy_steps.

For documentation on what tags are available see:
https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ansible_config_download.html#tags

Finally, suppose we wanted to just run the 5 deployment steps that are the same for all nodes of a given role. We can use --limit <RoleName>, as the role names are defined as groups in the inventory file. That command would be:

$ ./ansible-playbook-command.sh --tags deploy_steps --limit Controller

I hope this info is helpful. Let me know what you want to see next.

 


 

P.S.

Cross posted at: https://blogslagle.wordpress.com/2017/12/13/tripleo-and-ansible-part-2/

 

by slagle at December 13, 2017 01:12 PM

December 11, 2017

Red Hat Stack

Enabling Keystone’s Fernet Tokens in Red Hat OpenStack Platform

As we learned in part one of this blog post, beginning with the OpenStack Kilo release, a new token provider is now available as an alternative to PKI and UUID. Fernet tokens are essentially an implementation of ephemeral tokens in Keystone. What this means is that tokens are no longer persisted and hence do not need to be replicated across clusters or regions.

“In short, OpenStack’s authentication and authorization metadata is neatly bundled into a MessagePacked payload, which is then encrypted and signed as a Fernet token. OpenStack Kilo’s implementation supports a three-phase key rotation model that requires zero downtime in a clustered environment.” (from: http://dolphm.com/openstack-keystone-fernet-tokens/)

In our previous post, I covered the different types of tokens, the benefits of Fernet and a little bit of the technical details. In this part of our three part series we provide a method for enabling Fernet tokens on Red Hat OpenStack Platform Platform 10, during both pre and post deployment of the overcloud stack.

Pre-Overcloud Deployment

Official Red Hat documentation for enabling Fernet tokens in the overcloud can be found here:

Deploy Fernet on the Overcloud

Tools

We’ll be using the Red Hat OpenStack Platform here, so this means we’ll be interacting with the director node and heat templates. Our primary tool is the command-line client keystone-manage, part of the tools provided by the openstack-keystone RPM and used to set up and manage keystone in the overcloud. Of course, we’ll be using the director-based deployment of Red Hat’s OpenStack Platform to enable Fernet pre and/or post deployment.

barn-images-12223Photo by Barn Images on Unsplash

Prepare Fernet keys on the undercloud

This procedure will start with preparation of the Fernet keys, which a default  deployment places on each controller in /etc/keystone/fernet-keys. Each controller must have the same keys, as tokens issued on one controller must be able to be validated on all controllers. Stay tuned to part three of this blog for an in-depth explanation of Fernet signing keys.

  1. Source the stackrc file to ensure we are working with the undercloud:
$ source ~/stackrc‍‍‍‍‍‍‍‍‍‍‍‍
  1. From your director, use keystone_manage to generate the Fernet keys as deployment artifacts:
$ sudo keystone-manage fernet_setup \
    --keystone-user keystone \
    --keystone-group keystone‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
  1. Tar up the keys for upload into a swift container on the undercloud:
$ sudo tar -zcf keystone-fernet-keys.tar.gz /etc/keystone/fernet-keys‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
  1. Upload the Fernet keys to the undercloud as swift artifacts (we assume your templates exist in ~/templates):
$ upload-swift-artifacts -f keystone-fernet-keys.tar.gz \
    --environment ~/templates/deployment-artifacts.yaml‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
  1. Verify that your artifact exists in the undercloud:
$ swift list overcloud-artifacts Keystone-fernet-keys.tar.gz

NOTE: These keys should be secured as they can be used to sign and validate tokens that will have access to your cloud.‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

  1. Let’s verify that deployment-artifacts.yaml exists in ~/templates (NOTE: your URL detail will differ from what you see here – as this is a uniquely generated temporary URL):
$ cat ~/templates/deployment-artifacts.yaml
# Heat environment to deploy artifacts via Swift Temp URL(s)
parameter_defaults:
  DeployArtifactURLs:
    - 'http://192.0.2.1:8080/v1/AUTH_c9d16242396b4eb1a0f950093fa9464c/over
 cloud-artifacts/keystone-fernet-keys.tar.gz?temp_url_sig=917bd467e70516
 581b1db295783205622606e367&temp_url_expires=1520463185'‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

NOTE: This is the swift URL that your overcloud deployment will use to copy the Fernet keys to your controllers.

  1. Finally, generate the fernet.yaml template to enable Fernet as the default token provider in your overcloud:
$ cat << EOF > ~/templates/fernet.yaml
parameter_defaults:
          controllerExtraConfig:
            keystone::token_provider: 'fernet'‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Deploy and Validate

At this point, you are ready to deploy your overcloud with Fernet enabled as the token provider, and your keys distributed to each controller in /etc/keystone/fernet-keys.

glenn-carstens-peters-190592Photo by Glenn Carstens-Peters on Unsplash

NOTE: This is an example deploy command, yours will likely include many more templates. For the purposes of our discussion, it is important that you simply include fernet.yaml as well as deployment-artifacts.yaml.

$ openstack overcloud deploy \
--templates /home/stack/templates \
-e  /home/stack/templates/environments/deployment-artifacts.yaml \
-e /home/stack/templates/environments/fernet.yaml \
--control-scale 3 \
--compute-scale 4 \
--control-flavor control \
--compute-flavor compute \
--ntp-server pool.ntp.org

Testing

Once the deployment is done you should validate that your overcloud is indeed using Fernet tokens instead of the default UUID token provider. From the director node:

$ source ~/overcloudrc
$ openstack token issue
+------------+------------------------------------------+
| Field      | Value                                    |
+------------+------------------------------------------+
| expires    | 2017-03-22 19:16:21+00:00                |
| id | gAAAAABY0r91iYvMFQtGiRRqgMvetAF5spEZPTvEzCpFWr3  |
|    | 1IB8T8L1MRgf4NlOB6JsfFhhdxenSFob_0vEEHLTT6rs3Rw  |
|    | q3-Zm8stCF7sTIlmBVms9CUlwANZOQ4lRMSQ6nTfEPM57kX  |
|    | Xw8GBGouWDz8hqDYAeYQCIHtHDWH5BbVs_yC8ICXBk       |
| project_id | f8adc9dea5884d23a30ccbd486fcf4c6         |
| user_id    | 2f6106cef80741c6ae2bfb3f25d70eee         |
+------------+------------------------------------------+‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Note the length of this token in the “id” field. This is a Fernet token.

Enabling Fernet Post Overcloud Deployment

Part of the power of the Red Hat OpenStack Platform director deployment methodology lies in its ability to easily upgrade and change a running overcloud. Features such as Fernet, scaling, and complex service management, can be managed by running a deployment update directly against a running overcloud.

Updating is really straightforward. If you’ve already deployed your overcloud with UUID tokens you can change them to Fernet by simply following the pre-deploy example above and run the openstack deploy command again, with the enabled heat templates mentioned, against your running deployment! This will change your overcloud token default to Fernet. Be sure to deploy with your original deploy command, as any changes there could affect your overcloud. And of course, standard outage windows apply – production changes should be tested and prepared accordingly.

Conclusion

I hope you’ve enjoyed our discussion on enabling Fernet tokens in the overcloud. Additionally, I hope that I was able to shed some light on this process as well. Official documentation on these concepts and Fernet tokens in the overcloud process is available

In our last, and final instalment on this topic we’ll look at some of the many methods for rotating your newly enabled Fernet keys on your controller nodes. We’ll be using Red Hat’s awesome IT automation tool, Red Hat Ansible to do just that.

by Ken Savich, Senior OpenStack Solution Architect at December 11, 2017 08:59 PM

James Slagle

TripleO and Ansible deployment (Part 1)

In the Queens release of TripleO, you’ll be able to use Ansible to apply the
software deployment and configuration of an Overcloud.

Before jumping into some of the technical details, I wanted to cover some
background about how the Ansible integration works along side some of the
existing tools in TripleO.

The Ansible integration goes as far as offering an alternative to the
communication between the existing Heat agent (os-collect-config) and the Heat
API. This alternative is opt-in for Queens, but we are exploring making it the
default behavior for future releases.

The default behavior for Queens (and all prior releases) will still use the
model where each Overcloud node has a daemon agent called os-collect-config
that periodically polls the Heat API for deployment and configuration data.
When Heat provides updated data, the agent applies the deployments, making
changes to the local node such as configuration, service management,
pulling/starting containers, etc.

The Ansible alternative instead uses a “control” node (the Undercloud) running
ansible-playbook with a local inventory file and pushes out all the changes to
each Overcloud node via ssh in the typical Ansible fashion.

Heat is still the primary API, while the parameter and environment files that
get passed to Heat to create an Overcloud stack remain the same regardless of
which method is used.

Heat is also still fully responsible for creating and orchestrating all
OpenStack resources in the services running on the Undercloud (Nova servers,
Neutron networks, etc).

This sequence diagram will hopefully provide a clear picture:
https://slagle.fedorapeople.org/tripleo-ansible-arch.png

Replacing the application and transport layer of the deployment with Ansible
allows us to take advantage of features in Ansible that will hopefully make
deploying and troubleshooting TripleO easier:

  • Running only specific deployments
  • Including/excluding specific nodes or roles from an update
  • More real time sequential output of the deployment
  • More robust error reporting
  • Faster iteration and reproduction of deployments

Using Ansible instead of the Heat agent is easy. Just include 2 extra cli args
in the deployment command:

-e /path/to/templates/environments/config-download-environment.yaml \
--config-download

Once Heat is done creating the stack (will be much faster than usual), a
separate Mistral workflow will be triggered that runs ansible-playbook to
finish the deployment. The output from ansible-playbook will be streamed to
stdout so you can follow along with the progress.

Here’s a demo showing what a stack update looks like:

(I suggest making the demo fully screen or watch it here: https://slagle.fedorapeople.org/tripleo-ansible-deployment-1.mp4)

Note that we don’t get color output from ansible-playbook since we are
consuming the stdout from a Zaqar queue. However, in my next post I will go
into how to execute ansible-playbook manually, and detail all of the related
files (inventory, playbooks, etc) that are available to interact with manually.

If you want to read ahead, have a look at the official documentation:
https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ansible_config_download.html


P.S.

The infrastructure that hosts this blog may go away soon. In which case I’m
also cross posting to: https://blogslagle.wordpress.com


 

by slagle at December 11, 2017 03:14 PM

December 08, 2017

RDO Blog

Gate repositories on Github with Software Factory and Zuul3

Introduction

Software Factory is an easy to deploy software development forge. It provides, among others features, code review and continuous integration (CI). The latest Software Factory release features Zuul V3 that provides integration with Github.

In this blog post I will explain how to configure a Software Factory instance, so that you can experiment with gating Github repositories with Zuul.

First we will setup a Github application to define the Software Factory instance as a third party application and we will configure this instance to act as a CI system for Github.

Secondly, we will prepare a Github test repository by:

  • Installing the application on it
  • configuring its master branch protection policy
  • providing Zuul job description files

Finally, we will configure the Software Factory instance to test and gate Pull Requests for this repository, and we will validate this CI by opening a first Pull Request on the test repository.

Note that Zuul V3 is not yet released upstream however it is already in production, acting as the CI system of OpenStack.

Pre-requisite

A Software Factory instance is required to execute the instructions given in this blog post. If you need an instance, you can follow the quick deployment guide in this previous article. Make sure the instance has a public IP address and TCP/443 is open so that Github can reach Software Factory via HTTPS.

Application creation and Software Factory configuration

Let's create a Github application named myorg-zuulapp and register it on the instance. To do so, follow this section from Software Factory's documentation.

But make sure to:

  • Replace fqdn in the instructions by the public IP address of your Software Factory instance. Indeed the default sftests.com hostname won't be resolved by Github.
  • Check "Disable SSL verification" as the Software Factory instance is by default configured with a self-signed certificate.
  • Check "Only on this account" for the question "Where can this Github app be installed".

Configuration of the app part 1 Configuration of the app part 2 Configuration of the app part 3

After adding the github app settings in /etc/software-factory/sfconfig.yaml, run:

sudo sfconfig --enable-insecure-slaves --disable-fqdn-redirection

Finally, make sure Github.com can contact the Software Factory instance by clicking on "Redeliver" in the advanced tab of the application. Having the green tick is the pre-requisite to go further. If you cannot get it, the rest of the article will not be able to be accomplished successfuly.

Configuration of the app part 4

Define Zuul3 specific Github pipelines

On the Software Factory instance, as root, create the file config/zuul.d/gh_pipelines.yaml.

cd /root/config
cat <<EOF > zuul.d/gh_pipelines.yaml
---
- pipeline:
    name: check-github.com
    description: |
      Newly uploaded patchsets enter this pipeline to receive an
      initial +/-1 Verified vote.
    manager: independent
    trigger:
      github.com:
        - event: pull_request
          action:
            - opened
            - changed
            - reopened
        - event: pull_request
          action: comment
          comment: (?i)^\s*recheck\s*$
    start:
      github.com:
        status: 'pending'
        status-url: "https://sftests.com/zuul3/{tenant.name}/status.html"
        comment: false
    success:
      github.com:
        status: 'success'
      sqlreporter:
    failure:
      github.com:
        status: 'failure'
      sqlreporter:

- pipeline:
    name: gate-github.com
    description: |
      Changes that have been approved by core developers are enqueued
      in order in this pipeline, and if they pass tests, will be
      merged.
    success-message: Build succeeded (gate pipeline).
    failure-message: Build failed (gate pipeline).
    manager: dependent
    precedence: high
    require:
      github.com:
        review:
          - permission: write
        status: "myorg-zuulapp[bot]:local/check-github.com:success"
        open: True
        current-patchset: True
    trigger:
      github.com:
        - event: pull_request_review
          action: submitted
          state: approved
        - event: pull_request
          action: status
          status: "myorg-zuulapp[bot]:local/check-github.com:success"
    start:
      github.com:
        status: 'pending'
        status-url: "https://sftests.com/zuul3/{tenant.name}/status.html"
        comment: false
    success:
      github.com:
        status: 'success'
        merge: true
      sqlreporter:
    failure:
      github.com:
        status: 'failure'
      sqlreporter:
EOF
sed -i s/myorg/myorgname/ zuul.d/gh_pipelines.yaml

Make sure to replace "myorgname" by the organization name.

git add -A .
git commit -m"Add github.com pipelines"
git push git+ssh://gerrit/config master

Setup a test repository on Github

Create a repository called ztestrepo, initialize it with an empty README.md.

Install the Github application

Then follow the process below to add the application myorg-zuulapp to ztestrepo.

  1. Visit your application page, e.g.: https://github.com/settings/apps/myorg-zuulapp/installations
  2. Click “Install”
  3. Select ztestrepo to install the application on
  4. Click “Install”

Then you should be redirected on the application setup page. This can be safely ignored for the moment.

Define master branch protection

We will setup the branch protection policy for the master branch of ztestrepo. We want a Pull Request to have, at least, one code review approval and all CI checks passed with success before a PR become mergeable.

You will see, later in this article, that the final job run and the merging phase of the Pull Request are ensured by Zuul.

  1. Go to https://github.com/myorg/ztestrepo/settings/branches
  2. Choose the master branch
  3. Check "Protect this branch"
  4. Check "Require pull request reviews before merging"
  5. Check "Dismiss stale pull request approvals when new commits are pushed"
  6. Check "Require status checks to pass before merging"
  7. Click "Save changes"

Attach the application

Add a collaborator

A second account on Github is needed to act as collaborator of the repository ztestrepo. Select one in https://github.com/myorg/ztestrepo/settings/collaboration. This collaborator will act as the PR reviewer later in this article.

Define a Zuul job

Create the file .zuul.yaml at the root of ztestrepo.

git clone https://github.com/myorg/ztestrepo.git
cd ztestrepo
cat <<EOF > .zuul.yaml
---
- job:
    name: myjob-noop
    parent: base
    description: This a noop job
    run: playbooks/noop.yaml
    nodeset:
      nodes:
        - name: test-node
          label: centos-oci

- project:
    name: myorg/ztestrepo
    check-github.com:
      jobs:
        - myjob-noop
    gate-github.com:
      jobs:
        - myjob-noop
EOF
sed -i s/myorg/myorgname/ .zuul.yaml

Make sure to replace "myorgname" by the organization name.

Create playbooks/noop.yaml.

mkdir playbooks
cat <<EOF > playbooks/noop.yaml
- hosts: test-node
  tasks:
    - name: Success
      command: "true"
EOF

Push the changes directly on the master branch of ztestrepo.

git add -A .
git commit -m"Add zuulv3 job definition"
git push origin master

Register the repository on Zuul

At this point, the Software Factory instance is ready to receive events from Github and the Github repository is properly configured. Now we will tell Software Factory to consider events for the repository.

On the Software Factory instance, as root, create the file myorg.yaml.

cd /root/config
cat <<EOF > zuulV3/myorg.yaml
---
- tenant:
    name: 'local'
    source:
      github.com:
        untrusted-projects:
          - myorg/ztestrepo
EOF
sed -i s/myorg/myorgname/ zuulV3/myorg.yaml

Make sure to replace "myorgname" by the organization name.

git add zuulV3/myorg.yaml && git commit -m"Add ztestrepo to zuul" && git push git+ssh://gerrit/config master

Create a Pull Request and see Zuul in action

  1. Create a Pull Request via the Github UI
  2. Wait the for check-github.com pipeline to finish with success

Check test

  1. Ask the collaborator to set his approval on the Pull request

Approval

  1. Wait for Zuul to detect the approval
  2. Wait the for gate-github.com pipeline to finish with success

Gate test

  1. Wait for for the Pull Request to be merged by Zuul

Merged

As you can see, after the run of the check job and the reviewer's approval, Zuul has detected that the state of the Pull Request was ready to enter the gating pipeline. During the gate run, Zuul has executed the job against the Pull Request code change rebased on the current master then made Github merge the Pull Request as the job ended with a success.

Other powerful Zuul features such as cross-repository testing or Pull Request dependencies between repositories are supported but beyond the scope of this article. Do not hesitate to refer to the upstream documentation to learn more about Zuul.

Next steps to go further

To learn more about Software Factory please refer to the upstream documentation. You can reach the Software Factory team on IRC freenode channel #softwarefactory or by email at the softwarefactory-dev@redhat.com mailing list.

by fboucher at December 08, 2017 02:13 PM

December 07, 2017

Red Hat Stack

An Introduction to Fernet tokens in Red Hat OpenStack Platform

Thank you for joining me to talk about Fernet tokens. In this first of three posts on Fernet tokens, I’d like to go over the definition of OpenStack tokens, the different types and why Fernet tokens should matter to you. This series will conclude with some awesome examples of how to use Red Hat Ansible to manage your Fernet token keys in production.

First, some definitions …

What is a token? OpenStack tokens are bearer tokens, used to authenticate and validate users and processes in your OpenStack environment. Pretty much any time anything happens in OpenStack a token is involved. The OpenStack Keystone service is the core service that issues and validates tokens. Using these tokens, users and and software clients via API’s authenticate, receive, and finally use that token when requesting operations ranging from creating compute resources to allocating storage. Services like Nova or Ceph then validate that token with Keystone and continue on with or deny the requested operation. The following diagram, shows a simplified version of this dance.

Screen Shot 2017-12-05 at 12.06.02 pmCourtesy of the author

Token Types

Tokens come in several types, referred to as “token providers” in Keystone parlance. These types can be set at deployment time, or changed post deployment. Ultimately, you’ll have to decide what works best for your environment, given your organization’s workload in the cloud.

The following types of tokens exist in Keystone:

UUID (Universal Unique Identifier)

The default token provider in Keystone is UUID. This is a 32-byte bearer token that must be persisted (stored) across controller nodes, along with their associated metadata, in order to be validated.

PKI & PKIZ (public key infrastructure)

This token format is deprecated as of the OpenStack Ocata release, which means it is deprecated in Red Hat OpenStack Platform 11. This format is also persisted across controller nodes. PKI tokens contain catalog information of the user that bears them, and thus can get quite large, depending on how large your cloud is. PKIZ tokens are simply compressed versions of PKI tokens.

Fernet

Fernet tokens (pronounced fehr:NET) are message packed tokens that contain authentication and authorization data. Fernet tokens are signed and encrypted before being handed out to users. Most importantly, however, Fernet tokens are ephemeral. This means they do not need to be persisted across clustered systems in order to successfully be validated.

Fernet was originally a secure messaging format created by Heroku. The OpenStack implementation of this lightweight and more API-friendly format was developed by the OpenStack Keystone core team.

The Problem

As you may have guessed by now, the real problem solved by Fernet tokens is one of persistence. Imagine, if you will, the following scenario:

  1. A user logs into Horizon (the OpenStack Dashboard)
  2. User creates a compute instance
  3. User requests persistent storage upon instance creation
  4. User assigns a floating IP to the instance

While this is a simplified scenario, you can clearly see that there are multiple calls to different core components being made. In even the most basic of examples  you see at least one authentication, as well as multiple validations along the way. Not only does this require network bandwidth, but when using persistent token providers such as UUID it also requires a lot of storage in Keystone. Additionally, the token table in the database

eugenio-mazzone-190204Photo by Eugenio Mazzone on Unsplash

used by  Keystone grows as your cloud gets more usage. When using UUID tokens, operators must implement a detailed and comprehensive strategy to prune this table at periodic intervals to avoid real trouble down the line. This becomes even more difficult in a clustered environment.

It’s not only backend components which are affected. In fact, all services that are exposed to users require authentication and authorization. This leads to increased bandwidth and storage usage on one of the most critical core components in OpenStack. If Keystone goes down, your users will know it and you no longer have a cloud in any sense of the word.

Now imagine the impact as you scale your cloud;  the  problems with UUID tokens are dangerously amplified.

Benefits of Fernet tokens

Because Fernet tokens are ephemeral, you have the following immediate benefits:

  • Tokens do not need to be replicated to other instances of Keystone in your controller cluster
  • Storage is not affected, as these tokens are not stored

The end-result offers increased performance overall. This was the design imperative of Fernet tokens, and the OpenStack community has more than delivered.  

Show me the numbers

All of these benefits sound good, but what are the real numbers behind the performance differences between UUID and Fernet? One of the core keystone developers, Dolph Matthews, created a great post about Fernet benchmarks.

Note that these benchmarks are for OpenStack Kilo, so you’ll most likely see even greater performance numbers in newer releases.

The most important benchmarks in Dolph’s post are the ones comparing the various token formats to each other on a globally-distributed Galera cluster. These show the following results using UUID as a baseline:

Token creation performance

Fernet 50.8 ms (85% faster than UUID) 237.1 (42% faster than UUID)


Token validation performance

Fernet 5.55 ms (8% faster than UUID) 1957.8 (14% faster then UUID)


As you can see, these numbers are quite remarkable. More informal benchmarks can be found at the Cern OpenStack blog,
OpenStack in Production.

Security Implications

praveesh-palakeel-352584Photo by Praveesh Palakeel on Unsplash

One important aspect of using Fernet tokens is security. As these tokens are signed and encrypted, they are inherently more secure than plain text UUID tokens. One really great aspect of this is the fact that you can invalidate a large number of tokens, either during normal operations or during a security incident, by simply changing the keys used to validate them. This requires a key rotation strategy, which I’ll get into in the third part of this series.

While there are security advantages to Fernet tokens, it must be said they are only as secure as the keys that created them. Keystone creates the tokens with a set of keys in your Red Hat OpenStack Platform environment. Using advanced technologies like SELinux, Red Hat Enterprise Linux is a trusted partner in this equation. Remember, the OS matters.

Conclusion

While OpenStack functions just fine with its default UUID token format, I hope that this article shows you some of the benefits of Fernet tokens. I also hope that you find the knowledge you’ve gained here to be useful, once you decide to move forward to implementing them.

In our follow-up blog post in this series, we’ll be looking at how to enable Fernet tokens in your OpenStack environment — both pre and post-deploy. Finally, our last post will show you how to automate key rotation using Red Hat Ansible in a production environment. I hope you’ll join me along the way.

by Ken Savich, Senior OpenStack Solution Architect at December 07, 2017 07:06 PM

Daniel Berrange

Full coverage of libvirt XML schemas achieved in libvirt-go-xml

In recent times I have been aggressively working to expand the coverage of libvirt XML schemas in the libvirt-go-xml project. Today this work has finally come to a conclusion, when I achieved what I believe to be effectively 100% coverage of all of the libvirt XML schemas. More on this later, but first some background on Go and XML….

For those who aren’t familiar with Go, the core library’s encoding/xml module provides a very easy way to consume and produce XML documents in Go code. You simply define a set of struct types and annotate their fields to indicate what elements & attributes each should map to. For example, given the Go structs:

type Person struct {
    XMLName xml.Name `xml:"person"`
    Name string `xml:"name,attr"`
    Age string `xml:"age,attr"` 
    Home *Address `xml:"home"`
    Office *Address `xml:"office"`
} 
type Address struct { 
    Street string `xml:"street"`
    City string `xml:"city"` 
}

You can parse/format XML documents looking like

<person name="Joe Blogs" age="24">
  <home>
    <street>Some where</street><city>London</city>
  </home>
  <office>
    <street>Some where else</street><city>London</city>
  </office>  
</person>

Other programming languages I’ve used required a great deal more work when dealing with XML. For parsing, there’s typically a choice between an XML stream based parser where you have to react to tokens as they’re parsed and stuff them into structs, or a DOM object hierarchy from which you then have to pull data out into your structs. For outputting XML, apps either build up a DOM object hierarchy again, or dynamically format the XML document incrementally. Whichever approach is taken, it generally involves writing alot of tedious & error prone boilerplate code. In most cases, the Go encoding/xml module eliminates all the boilerplate code, only requiring the data type defintions. This really makes dealing with XML a much more enjoyable experience, because you effectively don’t deal with XML at all! There are some exceptions to this though, as the simple annotations can’t capture every nuance of many XML documents. For example, integer values are always parsed & formatted in base 10, so extra work is needed for base 16. There’s also no concept of unions in Go, or the XML annotations. In these edge cases custom marshaling / unmarshalling methods need to be written. BTW, this approach to XML is also taken for other serialization formats including JSON and YAML too, with one struct field able to have many annotations so it can be serialized to a range of formats.

Back to the point of the blog post, when I first started writing Go code using libvirt it was immediately obvious that everyone using libvirt from Go would end up re-inventing the wheel for XML handling. Thus about 1 year ago, I created the libvirt-go-xml project whose goal is to define a set of structs that can handle documents in every libvirt public XML schema. Initially the level of coverage was fairly light, and over the past year 18 different contributors have sent patches to expand the XML coverage in areas that their respective applications touched. It was clear, however, that taking an incremental approach would mean that libvirt-go-xml is forever trailing what libvirt itself supports. It needed an aggressive push to achieve 100% coverage of the XML schemas, or as near as practically identifiable.

Alongside each set of structs we had also been writing unit tests with a set of structs populated with data, and a corresponding expected XML document. The idea for writing the tests was that the author would copy a snippet of XML from a known good source, and then populate the structs that would generate this XML. In retrospect this was not a scalable approach, because there is an enourmous range of XML documents that libvirt supports. A further complexity is that Go doesn’t generate XML documents in the exact same manner. For example, it never generates self-closing tags, instead always outputting a full opening & closing pair. This is semantically equivalent, but makes a plain string comparison of two XML documents impractical in the general case.

Considering the need to expand the XML coverage, and provide a more scalable testing approach, I decided to change approach. The libvirt.git tests/ directory currently contains 2739 XML documents that are used to validate libvirt’s own native XML parsing & formatting code. There is no better data set to use for validating the libvirt-go-xml coverage than this. Thus I decided to apply a round-trip testing methodology. The libvirt-go-xml code would be used to parse the sample XML document from libvirt.git, and then immediately serialize them back into a new XML document. Both the original and new XML documents would then be parsed generically to form a DOM hierarchy which can be compared for equivalence. Any place where documents differ would cause the test to fail and print details of where the problem is. For example:

$ go test -tags xmlroundtrip
--- FAIL: TestRoundTrip (1.01s)
	xml_test.go:384: testdata/libvirt/tests/vircaps2xmldata/vircaps-aarch64-basic.xml: \
            /capabilities[0]/host[0]/topology[0]/cells[0]/cell[0]/pages[0]: \
            element in expected XML missing in actual XML

This shows the filename that failed to correctly roundtrip, and the position within the XML tree that didn’t match. Here the NUMA cell topology has a ‘<pages>‘  element expected but not present in the newly generated XML. Now it was simply a matter of running the roundtrip test over & over & over & over & over & over & over……….& over & over & over, adding structs / fields for each omission that the test identified.

After doing this for some time, libvirt-go-xml now has 586 structs defined containing 1816 fields, and has certified 100% coverage of all libvirt public XML schemas. Of course when I say 100% coverage, this is probably a lie, as I’m blindly assuming that the libvirt.git test suite has 100% coverage of all its own XML schemas. This is certainly a goal, but I’m confident there are cases where libvirt itself is missing test coverage. So if any omissions are identified in libvirt-go-xml, these are likely omissions in libvirt’s own testing.

On top of this, the XML roundtrip test is set to run in the libvirt jenkins and travis CI systems, so as libvirt extends its XML schemas, we’ll get build failures in libvirt-go-xml and thus know to add support there to keep up.

In expanding the coverage of XML schemas, a number of non-trivial changes were made to existing structs  defined by libvirt-go-xml. These were mostly in places where we have to handle a union concept defined by libvirt. Typically with libvirt an element will have a “type” attribute, whose value then determines what child elements are permitted. Previously we had been defining a single struct, whose fields represented all possible children across all the permitted type values. This did not scale well and gave the developer no clue what content is valid for each type value. In the new approach, for each distinct type attribute value, we now define a distinct Go struct to hold the contents. This will cause API breakage for apps already using libvirt-go-xml, but on balance it is worth it get a better structure over the long term. There were also cases where a child XML element previously represented a single value and this was mapped to a scalar struct field. Libvirt then added one or more attributes on this element, meaning the scalar struct field had to turn into a struct field that points to another struct. These kind of changes are unavoidable in any nice manner, so while we endeavour not to gratuitously change currently structs, if the libvirt XML schema gains new content, it might trigger further changes in the libvirt-go-xml structs that are not 100% backwards compatible.

Since we are now tracking libvirt.git XML schemas, going forward we’ll probably add tags in the libvirt-go-xml repo that correspond to each libvirt release. So for app developers we’ll encourage use of Go vendoring to pull in a precise version of libvirt-go-xml instead of blindly tracking master all the time.

by Daniel Berrange at December 07, 2017 02:14 PM

December 01, 2017

Daniel Berrange

Full colour emojis in virtual machine names in Fedora 27

Quite by chance today I discovered that Fedora 27 can display full colour glyphs for unicode characters that correspond to emojis, when the terminal displaying my mutt mail reader displayed someone’s name with a full colour glyph showing stars:

Mutt in GNOME terminal rendering color emojis in sender name

Chatting with David Gilbert on IRC I learnt that this is a new feature in Fedora 27 GNOME, thanks to recent work in the GTK/Pango stack. David then pointed out this works in libvirt, so I thought I would illustrate it.

Virtual machine name with full colour emojis rendered

No special hacks were required to do this, I simply entered the emojis as the virtual machine name when creating it from virt-manager’s wizard

Virtual machine name with full colour emojis rendered

As mentioned previously, GNOME terminal displays colour emojis, so these virtual machine names appear nicely when using virsh and other command line tools

Virtual machine name rendered with full colour emojis in terminal commands

The more observant readers will notice that the command line args have a bug as the snowman in the machine name is incorrectly rendered in the process listing. The actual data in /proc/$PID/cmdline is correct, so something about the “ps” command appears to be mangling it prior to output. It isn’t simply a font problem because other comamnds besides “ps” render properly, and if you grep the “ps” output for the snowman emoji no results are displayed.

by Daniel Berrange at December 01, 2017 01:28 PM

November 30, 2017

RDO Blog

Open Source Summit, Prague

In October, RDO had a small presence at the Open Source Summit (formerly known as LinuxCon) in Prague, Czechia.

While this event does not traditionally draw a big OpenStack audience, we were treated to a great talk by Monty Taylor on Zuul, and Fatih Degirmenci gave an interesting talk on cross-community CI, in which he discussed the joint work between the OpenStack and OpenDaylight communities to help one another verify cross-project functionality.

centos_fedora On one of the evenings, members of the Fedora and CentOS community met in a BoF (Birds of a Feather) meeting, to discuss how the projects relate, and how some of the load - including the CI work that RDO does in the CentOS infrastructure - can better be shared between the two projects to reduce duplication of effort.

This event is always a great place to interact with other open source enthusiasts. While, in the past, it was very Linux-centric, the event this year had a rather broader scope, and so drew people from many more communities.

Upcoming Open Source Summits will be held in Japan (June 20-22, 2018), Vancouver (August 29-31, 2018) and Edinburgh (October 22-24, 2018), and we expect to have a presence of some kind at each of these events.

by Rich Bowen at November 30, 2017 09:29 PM

Upcoming changes to test day

TL;DR: Live RDO cloud will be available for testing on upcoming test day. http://rdoproject.org/testday/queens/milestone2/ for more info.

The last few test days have been somewhat lackluster, and have not had much participation. We think that there's a number of reasons for this:

  • Deploying OpenStack is hard and boring
  • Not everyone has the necessary hardware to do it anyways
  • Automated testing means that there's not much left for the humans to do

In today's IRC meeting, we were brainstorming about ways to improve participation in test day.

We think that, in addition to testing the new packages, it's a great way for you, the users, to see what's coming in future releases, so that you can start thinking about how you'll use this functionality.

One idea that came out of it is to have a test cloud, running the latest packages, available to you during test day. You can get on there, poke around, break stuff, and help test it, without having to go through the pain of deploying OpenStack.

David has written more about this on his blog.

If you're interested in participating, please sign up.

Please also give some thought to what kinds of test scenarios we should be running, and add those to the test page. Or, respond to this thread with suggestions of what we should be testing.

Details about the upcoming test day may be found on the RDO website.

Thanks!

by Rich Bowen at November 30, 2017 06:47 PM

Getting started with Software Factory and Zuul3

Introduction

Software Factory 2.7 has been recently released. Software Factory is an easy to deploy software development forge that is deployed at review.rdoproject.org and softwarefactory-project.io. Software Factory provides, among other features, code review and continuous integration (CI). This new release features Zuul V3 that is, now, the default CI component of Software Factory.

In this blog post I will explain how to deploy a Software Factory instance for testing purposes in less than 30 minutes and initialize two demo repositories to be tested via Zuul.

Note that Zuul V3 is not yet released upstream however it is already in production, acting as the CI system of OpenStack.

Prerequisites

Software Factory requires CentOS 7 as its base Operating System so the commands listed below should be executed on a fresh deployment of CentOS 7.

The default FQDN of a Software Factory deployment is sftests.com. In order to be accessible in your browser, sftests.com must be added to your /etc/hosts with the IP address of your deployment.

Installation

First, let's install the repository of the last version then install sf-config, the configuration management tool.

sudo yum install -y https://softwarefactory-project.io/repos/sf-release-2.7.rpm
sudo yum install -y sf-config

Activating extra components

Software Factory has a modular architecture that can be easily defined through a YAML configuration file, located in /etc/software-factory/arch.yaml. By default, only a limited set of components are activated to set up a minimal CI with Zuul V3.

We will now add the hypervisor-oci component to configure a container provider, so that OCI containers can be consumed by Zuul when running CI jobs. In others words it means you won't need an OpenStack cloud account for running your first Zuul V3 jobs with this Software Factory instance.

Note that the OCI driver, on which hypervisor-oci relies, while totally functional, is still under review and not yet merged upstream.

echo "      - hypervisor-oci" | sudo tee -a /etc/software-factory/arch.yaml

Starting the services

Finally run sf-config:

sudo sfconfig --enable-insecure-slaves --provision-demo

When the sf-config command finishes you should be able to access the Software Factory web UI by connecting your browser to https://sftests.com. You should then be able to login using the login admin and password userpass (Click on "Toggle login form" to display the built-in authentication).

Triggering a first job on Zuul

The –provision-demo option is a special command to provision two demo Git repositories on Gerrit with two demo jobs.

Let's propose a first change on it:

sudo -i
cd demo-project
touch f1 && git add f1 && git commit -m"Add a test change" && git review

Then you should see the jobs being executed on the ZuulV3 status page.

Zuul buildset

And get the jobs' results on the corresponding Gerrit review page.

Gerrit change

Finally, you should find the links to the generated artifacts and the ARA reports.

ARA report

Next steps to go further

To learn more about Software Factory please refer to the user documentation. You can reach the Software Factory team on IRC freenode channel #softwarefactory or by email at the softwarefactory-dev@redhat.com mailing list.

by fboucher at November 30, 2017 06:47 PM

November 29, 2017

RDO Blog

A summary of Sydney OpenStack Summit docs sessions

Here I'd like to give a summary of the Sydney OpenStack Summit docs sessions that I took part in, and share my comments on them with the broader OpenStack community.

Docs project update

At this session, we discussed a recent major refocus of the Documentation project work and restructuring of the OpenStack official documentation. This included migrating documentation from the core docs suite to project teams who now own most of the content.

We also covered the most important updates from the Documentation planning sessions held at the Denver Project Teams Gathering, including our new retention policy for End-of-Life documentation, which is now being implemented.

This session was recorded, you can watch the recording here:

Docs/i18n project onboarding

This was a session jointly organized with the i18n community. Alex Eng, Stephen Finucane, and yours truly gave three short presentations on translating OpenStack, OpenStack + Sphinx in a tree, and introduction to the docs community, respectively.

As it turned out, the session was not attended by newcomers to the community, instead, community members from various teams and groups joined us for the onboarding, which made it a bit more difficult to find out what the proper focus of the session should be to better accommodate the different needs and expectations of those in the audience. Definitely something to think about for the next Summit.

Installation guides updates and testing

I held this session to identify what are the views of the community on the future of installation guides and testing of installation procedures.

The feedback received was mostly focused on three points:

  • A better feedback mechanism for new users who are the main audience here. One idea is to bring back comments at the bottom of install guides pages.

  • To help users better understand the processes described in instructions and the overall picture, provide more references to conceptual or background information.

  • Generate content from install shell scripts, to help with verification and testing.

The session etherpad with more details can be found here:

Ops guide transition and maintenance

This session was organized by Erik McCormick from the OpenStack Operators community. There is an ongoing effort driven by the Ops community to migrate retired OpenStack Ops docs over to the OpenStack wiki, for easy editing.

We mostly discussed a number of challenges related to maintaining the technical content in wiki, and how to make more vendors interested in the effort.

The session etherpad can be found here:

Documentation and relnotes, what do you miss?

This session was run by Sylvain Bauza and the focus of the discussion was on identifying gaps in content coverage found after the documentation migration.

Again, Ops-focused docs tuned out to be a hot topic as well as providing more detailed conceptual information together with the procedural content, and structuring of release notes. We should also seriously consider (semi-)automating checks for broken links.

You can read more about the discussion points here:

by Petr Kovar at November 29, 2017 07:25 PM

November 28, 2017

RDO Blog

Anomaly Detection in CI logs

Continous Integration jobs can generate a lot of data and it can take a lot of time to figure out what went wrong when a job fails. This article demonstrates new strategies to assist with failure investigations and to reduce the need to crawl boring log files.

First, I will introduce the challenge of anomaly detection in CI logs. Second, I will present a workflow to automatically extract and report anomalies using a tool called LogReduce. Lastly, I will discuss the current limitations and how more advanced techniques could be used.

Introduction

Finding anomalies in CI logs using simple patterns such as "grep -i error" is not enough because interesting log lines doesn't necessarly feature obvious anomalous messages such as "error" or "failed". Sometime you don't even know what you are looking for.

In comparaison to regular logs, such as system logs of a production service, CI logs have a very interresting characteristic: they are reproducible. Thus, it is possible to carefully look for new events that are not present in other job execution logs. This article focuses on this particular characteristic to detect anomalies.

The challenge

For this article, baseline events are defined as the collection of log lines produced by nominal jobs execution and target events are defined as the collection of log lines produced by a failed job run.

Searching for anomalous events is challenging because:

  • Events can be noisy: they often includes unique features such as timestamps, hostnames or uuid.
  • Events can be scattered accross many differents files.
  • False positives events may appear for various reasons, for example when a new test option has been introduced. However they often share a common semantic with some baseline events.

Moreover, there can be a very high number of events, for example, more than 1 million lines for tripleo jobs. Thus, we can not easily look for each target event not present in baseline events.

OpenStack Infra CRM114

It is worth noting that anomaly detection is already happening live in the openstack-infra operated review system using classify-log.crm, which is based on CRM114 bayesian filters.

However it is currently only used to classify global failures in the context of the elastic-recheck process. The main drawbacks to using this tool are:

  • Events are processed per words without considering complete lines: it only computes the distances of up to a few words.
  • Reports are hard to find for regular users, they would have to go to elastic-recheck uncategorize, and click the crm114 links.
  • It is written in an obscure language

LogReduce

This part presents the techniques I used in LogReduce to overcome the challenges described above.

Reduce noise with tokenization

The first step is to reduce the complexity of the events to simplify further processing. Here is the line processor I used, see the Tokenizer module:

  • Skip known bogus events such as ssh scan: "sshd.+[iI]nvalid user"
  • Remove known words:
    • Hashes which are hexa decimal words that are 32, 64 or 128 characters long
    • UUID4
    • Date names
    • Random prefixes such as (tmp|req-|qdhcp-)[^\s\/]+
  • Discard every character that is not [a-z_\/]

For example this line:

  2017-06-21 04:37:45,827 INFO [nodepool.builder.UploadWorker.0] Uploading DIB image build 0000000002 from /tmpxvLOTg/fake-image-0000000002.qcow2 to fake-provider

Is reduced to:

  INFO nodepool builder UploadWorker Uploading image build from /fake image fake provider

Index events in a NearestNeighbors model

The next step is to index baseline events. I used a NearestNeighbors model to query target events' distance from baseline events. This helps remove false-postive events that are similar from known baseline events. The model is fitted with all the baseline events transformed using Term Frequency Inverse Document Frequency (tf-idf). See the SimpleNeighbors model

vectorizer = sklearn.feature_extraction.text.TfidfVectorizer(
    analyzer='word', lowercase=False, tokenizer=None,
    preprocessor=None, stop_words=None)
nn = sklearn.neighbors.NearestNeighbors(
    algorithm='brute',
    metric='cosine')
train_vectors = vectorizer.fit_transform(train_data)
nn.fit(train_vectors)

Instead of having a single model per job, I built a model per file type. This requires some pre-processing work to figure out what model to use per file. File names are converted to model names using another Tokenization process to group similar files. See the filename2modelname function.

For example, the following files are grouped like so:

audit.clf: audit/audit.log audit/audit.log.1
merger.clf: zuul/merger.log zuul/merge.log.2017-11-12
journal.clf: undercloud/var/log/journal.log overcloud/var/log/journal.log

Detect anomalies based on kneighbors distance

Once the NearestNeighbor model is fitted with baseline events, we can repeat the process of Tokenization and tf-idf transformation of the target events. Then using the kneighbors query we compute the distance of each target event.

test_vectors = vectorizer.transform(test_data)
distances, _ = nn.kneighbors(test_vectors, n_neighbors=1)

Using a distance threshold, this technique can effectively detect anomalies in CI logs.

Automatic process

Instead of manually running the tool, I added a server mode that automatically searches and reports anomalies found in failed CI jobs. Here are the different components:

  • listener connects to mqtt/gerrit event-stream/cistatus.tripleo.org and collects all success and failed job.

  • worker processes jobs collected by the listener. For each failed job, it does the following in pseudo-code:

Build model if it doesn't exist or if it is too old:
	For each last 5 success jobs (baseline):
		Fetch logs
	For each baseline file group:
		Tokenize lines
		TF-IDF fit_transform
		Fit file group model
Fetch target logs
For each target file:
	Look for the file group model
	Tokenize lines
	TF-IDF transform
	file group model kneighbors search
	yield lines that have distance > 0.2
Write report
  • publisher processes each report computed by the worker and notifies:
    • IRC channel
    • Review comment
    • Mail alert (e.g. periodic job which doesn't have a associated review)

Reports example

Here are a couple of examples to illustrate LogReduce reporting.

In this change I broke a service configuration (zuul gerrit port), and logreduce correctly found the anomaly in the service logs (zuul-scheduler can't connect to gerrit): sf-ci-functional-minimal report

In this tripleo-ci-centos-7-scenario001-multinode-oooq-container report, logreduce found 572 anomalies out of a 1078248 lines. The interesting ones are:

  • Non obvious new DEBUG statements in /var/log/containers/neutron/neutron-openvswitch-agent.log.txt.
  • New setting of the firewall_driver=openvswitch in neutron was detected in:
    • /var/log/config-data/neutron/etc/neutron/plugins/ml2/ml2_conf.ini.txt
    • /var/log/extra/docker/docker_allinfo.log.txt
  • New usage of cinder-backup was detected accross several files such as:
    • /var/log/journal contains new puppet statement
    • /var/log/cluster/corosync.log.txt
    • /var/log/pacemaker/bundles/rabbitmq-bundle-0/rabbitmq/rabbit@centos-7-rax-iad-0000787869.log.txt.gz
    • /etc/puppet/hieradata/service_names.json
    • /etc/sensu/conf.d/client.json.txt
    • pip2-freeze.txt
    • rpm-qa.txt

Caveats and improvements

This part discusses the caveats and limitations of the current implementation and suggests other improvements.

Empty success logs

This method doesn't work when the debug events are only included in the failed logs. To successfully detect anomalies, failure and success logs need to be similar, otherwise all the extra information in failed logs will be considered anomalous.

This situation happens with testr results where success logs only contain 'SUCCESS'.

Building good baseline model

Building a good baseline model with nominal job events is key to anomaly detection. We could use periodic execution (with or without failed runs), or the gate pipeline.

Unfortunately Zuul currently lacks build reporting and we have to scrap gerrit comments or status web pages, which is sub-optimal. Hopefully the upcomming zuul-web builds API and zuul-scheduler MQTT reporter will make this task easier to implement.

Machine learning

I am by no means proficient at machine learning. Logreduce happens to be useful as it is now. However here are some other strategies that may be worth investigating.

The model is currently using a word dictionnary to build the features vector and this may be improved by using different feature extraction techniques more suited for log line events such as MinHash and/or Locality Sensitive Hash.

The NearestNeighbors kneighbors query tends to be slow for large samples and this may be improved upon by using Self Organizing Map, RandomForest or OneClassSVM model.

When line sizes are not homogeneous in a file group, then the model doesn't work well. For example, mistral/api.log line size varies between 10 and 8000 characters. Using models per bins based on line size may be a great improvement.

CI logs analysis is a broad subject on its own, and I suspect someone good at machine learning might be able to find other clever anomaly detection strategies.

Further processing

Detected anomalies could be further processed by:

  • Merging similar anomalies discovered accross different files.
  • Looking for known anomalies in a system like elastic-recheck.
  • Reporting new anomalies to elastic-recheck so that affected jobs could be grouped.

Conclusion

CI log analysis is a powerful service to assist failure investigations. The end goal would be to report anomalies instead of exhaustive job logs.

Early results of LogReduce models look promising and I hope we could setup such services for any CI jobs in the future. Please get in touch by mail or irc (tristanC on Freenode) if you are interrested.

by tristanC at November 28, 2017 06:13 AM

November 27, 2017

RDO Blog

Emilien Macchi talks TripleO at OpenStack Summit

While at OpenStack Summit, I had an opportunity to talk with Emilien Macchi about the work on TripleO in the Pike and Queens projects.

by Rich Bowen at November 27, 2017 09:39 PM

OpenStack 3rd Party CI with Software Factory

Introduction

When developing for an OpenStack project, one of the most important aspects to cover is to ensure proper CI coverage of our code. Each OpenStack project runs a number of CI jobs on each commit to test its validity, so thousands of jobs are run every day in the upstream infrastructure.

In some cases, we will want to set up an external CI system, and make it report as a 3rd Party CI on certain OpenStack projects. This may be because we want to cover specific software/hardware combinations that are not available in the upstream infrastructure, or want to extend test coverage beyond what is feasible upstream, or any other reason you can think of.

While the process to set up a 3rd Party CI is documented, some implementation details are missing. In the RDO Community, we have been using Software Factory to power our 3rd Party CI for OpenStack, and it has worked very reliably over some cycles.

The main advantage of Software Factory is that it integrates all the pieces of the OpenStack CI infrastructure in an easy to consume package, so let's have a look at how to build a 3rd party CI from the ground up.

Requirements

You will need the following:

  • An OpenStack-based cloud, which will be used by Nodepool to create temporary VMs where the CI jobs will run. It is important to make sure that the default security group in the tenant accepts SSH connections from the Software Factory instance.
  • A CentOS 7 system for the Software Factory instance, with at least 8 GB of RAM and 80 GB of disk. It can run on the OpenStack cloud used for nodepool, just make sure it is running on a separate project.
  • DNS resolution for the Software Factory system.
  • A 3rd Party CI user on review.openstack.org. Follow this guide to configure it.
  • Some previous knowledge on how Gerrit and Zuul work is advisable, as it will help during the configuration process.

Basic Software Factory installation

For a detailed installation walkthrough, refer to the Software Factory documentation. We will highlight here how we set it up on a test VM.

Software installation

On the CentOS 7 instance, run the following commands to install the latest release of Software Factory (2.6 at the time of this article):

$ sudo yum install -y https://softwarefactory-project.io/repos/sf-release-2.6.rpm
$ sudo yum update -y
$ sudo yum install -y sf-config

Define the architecture

Software Factory has several optional components, and can be set up to run them on more than one system. In our setup, we will install the minimum required components for a 3rd party CI system, all in one.

$ sudo vi /etc/software-factory/arch.yaml

Make sure the nodepool-builder role is included. Our file will look like:

---
description: "OpenStack 3rd Party CI deployment"
inventory:
  - name: managesf
    ip: 192.168.122.230
    roles:
      - install-server
      - mysql
      - gateway
      - cauth
      - managesf
      - gitweb
      - gerrit
      - logserver
      - zuul-server
      - zuul-launcher
      - zuul-merger
      - nodepool-launcher
      - nodepool-builder
      - jenkins

In this setup, we are using Jenkins to run our jobs, so we need to create an additional file:

$ sudo vi /etc/software-factory/custom-vars.yaml

And add the following content

nodepool_zuul_launcher_target: False

Note: As an alternative, we could use zuul-launcher to run our jobs and drop Jenkins. In that case, there is no need to create this file. However, later when defining our jobs we will need to use the jobs-zuul directory instead of jobs in the config repo.

Edit Software Factory configuration

$ sudo vi /etc/software-factory/sfconfig.yaml

This file contains all the configuration data used by the sfconfig script. Make sure you set the following values:

  • Password for the default admin user.
authentication:
  admin_password: supersecurepassword
  • The fully qualified domain name for your system.
fqdn: sftests.com
  • The OpenStack cloud configuration required by Nodepool.
nodepool:
  providers:
  - auth_url: http://192.168.1.223:5000/v2.0
    name: microservers
    password: cloudsecurepassword
    project_name: mytestci
    region_name: RegionOne
    regions: []
    username: ciuser
  • The authentication options if you want other users to be able to log into your instance of Software Factory using OAuth providers like GitHub. This is not mandatory for a 3rd party CI. See this part of the documentation for details.

  • If you want to use LetsEncrypt to get a proper SSL certificate, set:

  use_letsencrypt: true

Run the configuration script

You are now ready to complete the configuration and get your basic Software Factory installation running.

$ sudo sfconfig

After the script finishes, just point your browser to https:// and you can see the Software Factory interface.

SF interface

Configure SF to connect to the OpenStack Gerrit

Once we have a basic Software Factory environment running, and our service account set up in review.openstack.org, we just need to connect both together. The process is quite simple:

  • First, make sure the local Zuul user SSH key, found at /var/lib/zuul/.ssh/id_rsa.pub, is added to the service account at review.openstack.org.

  • Then, edit /etc/software-factory/sfconfig.yaml again, and edit the zuul section to look like:

zuul:
  default_log_site: sflogs
  external_logservers: []
  gerrit_connections:
  - name: openstack
    hostname: review.openstack.org
    port: 29418
    puburl: https://review.openstack.org/r/
    username: mythirdpartyciuser
  • Finally, run sfconfig again. Log information will start flowing in /var/log/zuul/server.log, and you will see a connection to review.openstack.org port 29418.

Create a test job

In Software Factory 2.6, a special project named config is automatically created on the internal Gerrit instance. This project holds the user-defined configuration, and changes to the project must go through Gerrit.

Configure images for nodepool

All CI jobs will use a predefined image, created by Nodepool. Before creating any CI job, we need to prepare this image.

  • As a first step, add your SSH public key to the admin user in your Software Factory Gerrit instance.

Add SSH Key

  • Then, clone the config repo on your computer and edit the nodepool configuration file:
$ git clone ssh://admin@sftests.com:29418/config sf-config
$ cd sf-config
$ vi nodepool/nodepool.yaml
  • Define the disk image and assign it to the OpenStack cloud defined previously:
---
diskimages:
  - name: dib-centos-7
    elements:
      - centos-minimal
      - nodepool-minimal
      - simple-init
      - sf-jenkins-worker
      - sf-zuul-worker
    env-vars:
      DIB_CHECKSUM: '1'
      QEMU_IMG_OPTIONS: compat=0.10
      DIB_GRUB_TIMEOUT: '0'

labels:
  - name: dib-centos-7
    image: dib-centos-7
    min-ready: 1
    providers:
      - name: microservers

providers:
  - name: microservers
    cloud: microservers
    clean-floating-ips: true
    image-type: raw
    max-servers: 10
    boot-timeout: 120
    pool: public
    rate: 2.0
    networks:
      - name: private
    images:
      - name: dib-centos-7
        diskimage: dib-centos-7
        username: jenkins
        min-ram: 1024
        name-filter: m1.medium

First, we are defining the diskimage-builder elements that will create our image, named dib-centos-7.

Then, we are assigning that image to our microservers cloud provider, and specifying that we want to have at least 1 VM ready to use.

Finally we define some specific parameters about how Nodepool will use our cloud provider: the internal (private) and external (public) networks, the flavor for the virtual machines to create (m1.medium), how many seconds to wait between operations (2.0 seconds), etc.

  • Now we can submit the change for review:
$ git add nodepool/nodepool.yaml
$ git commit -m "Nodepool configuration"
$ git review
  • In the Software Factory Gerrit interface, we can then check the open change. The config repo has some predefined CI jobs, so you can check if your syntax was correct. Once the CI jobs show a Verified +1 vote, you can approve it (Code Review +2, Workflow +1), and the change will be merged in the repository.

  • After the change is merged in the repository, you can check the logs at /var/log/nodepool and see the image being created, then uploaded to your OpenStack cloud.

Define test job

There is a special project in OpenStack meant to be used to test 3rd Party CIs, openstack-dev/ci-sandbox. We will now define a CI job to "check" any new commit being reviewed there.

  • Assign the nodepool image to the test job
$ vi jobs/projects.yaml

We are going to use a pre-installed job named demo-job. All we have to do is to ensure it uses the image we just created in Nodepool.

- job:
    name: 'demo-job'
    defaults: global
    builders:
      - prepare-workspace
      - shell: |
          cd $ZUUL_PROJECT
          echo "This is a demo job"
    triggers:
      - zuul
    node: dib-centos-7
  • Define a Zuul pipeline and a job for the ci-sandbox project
$ vi zuul/upstream.yaml

We are creating a specific Zuul pipeline for changes coming from the OpenStack Gerrit, and specifying that we want to run a CI job for commits to the ci-sandbox project:

pipelines:
  - name: openstack-check
    description: Newly uploaded patchsets enter this pipeline to receive an initial +/-1 Verified vote from Jenkins.
    manager: IndependentPipelineManager
    source: openstack
    precedence: normal
    require:
      open: True
      current-patchset: True
    trigger:
      openstack:
        - event: patchset-created
        - event: change-restored
        - event: comment-added
          comment: (?i)^(Patch Set [0-9]+:)?( [\w\\+-]*)*(\n\n)?\s*(recheck|reverify)
    success:
      openstack:
        verified: 0
    failure:
      openstack:
        verified: 0

projects:
  - name: openstack-dev/ci-sandbox
    openstack-check:
      - demo-job

Note that we are telling our job not to send a vote for now (verified: 0). We can change that later if we want to make our job voting.

  • Apply configuration change
$ git add zuul/upstream.yaml jobs/projects.yaml
$ git commit -m "Zuul configuration for 3rd Party CI"
$ git review

Once the change is merged, Software Factory's Zuul process will be listening for changes to the ci-sandbox project. Just try creating a change and see if everything works as expected!

Troubleshooting

If something does not work as expected, here are some troubleshooting tips:

Log files

You can find the Zuul log files in /var/log/zuul. Zuul has several components, so start with checking server.log and launcher.log, the log files for the main server and the process that launches CI jobs.

The Nodepool log files are located in /var/log/nodepool. builder.log contains the log from image builds, while nodepool.log has the log for the main process.

Nodepool commands

You can check the status of the virtual machines created by nodepool with:

$ sudo nodepool list

Also, you can check the status of the disk images with:

$ sudo nodepool image-list

Jenkins status

You can see the Jenkins status from the GUI, at https:///jenkins/, if logged on with the admin user. If no machines show up at the 'Build Executor Status' pane, that means that either Nodepool could not launch a VM, or there was some issue in the connection between Zuul and Jenkins. In that case, check the jenkins logs at `/var/log/jenkins`, or restart the service if there are errors.

Next steps

For now, we have only ran a test job against a test project. The real power comes when you create a proper CI job on a project you are interested in. You should now:

  • Create a file under jobs/ with the JJB definition for your new job.

  • Edit zuul/upstream.yaml to add the project(s) you want your 3rd Party CI system to watch.

by jpena at November 27, 2017 11:58 AM

November 22, 2017

Derek Higgins

Booting baremetal from a Cinder Volume in TripleO

Up until recently in tripleo booting, from a cinder volume was confined to virtual instances, but now thanks to some recent work in ironic, baremetal instances can also be booted backed by a cinder volume.

Below I’ll go through the process of how to take a CentOS cloud image, prepare and load it into a cinder volume so that it can be used to back the root partition of a baremetal instance.

First I do make a few assumptions

  1. you have a working ironic in a tripleo overcloud
    – if this isn’t something you’re familiar with you’ll find some instructions here
    – If you can boot and ssh to a baremetal instance on the provisioning network then your good to go
  2. You have a working cinder in the TripleO overcloud with enough storage to store the volumes
  3. I’ve tested tripleo(and openstack) using RDO as of 2017-11-14, earlier versions had at least one bug and wont work

 

Baremetal instances in the overcloud traditionally use config-drive for cloud-init to read config data from, config-drive isn’t supported in ironic boot from volume, so we need to make sure that the metadata service is available. To do this, if your subnet isn’t already attached to one, you need to create a neutron router and attach it to the subnet you’ll be booting your baremetal instances with,

 $ neutron router-create r1
 $ neutron router-interface-add r1 provisioning-subnet

Each node defined in ironic that you would like to use for booting from volume needs to use the cinder storage driver, the iscsi_boot capability needs to be set and it requires a unique connector id (increment <NUM> for each node)

 $ openstack baremetal node set --property capabilities=iscsi_boot:true --storage-interface cinder <NODEID>
 $ openstack baremetal volume connector create --node <NODEID> --type iqn --connector-id iqn.2010-10.org.openstack.node<NUM>

The last thing you’ll need is a image capable of booting from iscsi, we’ll be starting with the Centos Cloud image but need to alter it slightly so that its capable of booting over iscsi

1. download the image

 $ curl https://cloud.centos.org/centos/7/images/CentOS-7-x86_64-GenericCloud.qcow2.xz > /tmp/CentOS-7-x86_64-GenericCloud.qcow2.xz
 $ unxz /tmp/CentOS-7-x86_64-GenericCloud.qcow2.xz

2. mount it and change root into the image

 $ mkdir /tmp/mountpoint
 $ guestmount -i -a /tmp/CentOS-7-x86_64-GenericCloud.qcow2 /tmp/mountpoint
 $ chroot /tmp/mountpoint /bin/bash

3. load the dracut iscsi module into the ramdisk

 chroot> mv /etc/resolv.conf /etc/resolv.conf_
 chroot> echo "nameserver 8.8.8.8" > /etc/resolv.conf
 chroot> yum install -y iscsi-initiator-utils
 chroot> mv /etc/resolv.conf_ /etc/resolv.conf
 # Be careful here to update the correct ramdisk (check/boot/grub2/grub.cfg)
 chroot> dracut --force --add "network iscsi" /boot/initramfs-3.10.0-693.5.2.el7.x86_64.img 3.10.0-693.5.2.el7.x86_64

4. enable rd.iscsi.firmware so that dracut gets the iscsi target details from the firmware[1]

The kernel must be booted with rd.iscsi.firmware=1 so that the iscsi target details are read from the firmware (passed to it by ipxe), this needs to be added to the grub config

In the chroot Edit the file /etc/default/grub and add rd.iscsi.firmware=1 to GRUB_CMDLINE_LINUX=…

 

5. leave the chroot, unmount the image and update the grub config

 chroot> exit
 $ guestunmount /tmp/mountpoint
 $ guestfish -a /tmp/CentOS-7-x86_64-GenericCloud.qcow2 -m /dev/sda1 sh "/sbin/grub2-mkconfig -o /boot/grub2/grub.cfg"

You now have a image that is capable of mounting its root disk over iscsi, load it into glance and create a volume from it

 $ openstack image create --disk-format qcow2 --container-format bare --file /tmp/CentOS-7-x86_64-GenericCloud.qcow2 centos-bfv
 $ openstack volume create --size 10 --image centos-bfv --bootable centos-test-volume

Once the cinder volume is finish creating(wait for it to become “available”) you should be able to boot a baremetal instance from the newly created cinder volume

 $ openstack server create --flavor baremetal --volume centos-test-volume --key default centos-test
 $ nova list 
 $ ssh centos@192.168.24.49
[centos@centos-test ~]$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 10G 0 disk 
└─sda1 8:1 0 10G 0 part /
vda 253:0 0 80G 0 disk 
[centos@centos-test ~]$ ls -l /dev/disk/by-path/
total 0
lrwxrwxrwx. 1 root root 9 Nov 14 16:59 ip-192.168.24.7:3260-iscsi-iqn.2010-10.org.openstack:volume-e44073e9-0df9-43a0-ad05-9a6c41c80670-lun-0 -> ../../sda
lrwxrwxrwx. 1 root root 10 Nov 14 16:59 ip-192.168.24.7:3260-iscsi-iqn.2010-10.org.openstack:volume-e44073e9-0df9-43a0-ad05-9a6c41c80670-lun-0-part1 -> ../../sda1
lrwxrwxrwx. 1 root root 9 Nov 14 16:58 virtio-pci-0000:00:04.0 -> ../../vda

To see how the cinder volume target information is being passed to the hardware you need to take a look at the iPXE template for the server in questions e.g.

 $ cat /var/lib/ironic/httpboot/<NODEID>/config
<snip/>
:boot_iscsi
imgfree
set username vRefJtDXrEyfDUetpf9S
set password mD5n2hk4FEvNBGSh
set initiator-iqn iqn.2010-10.org.openstack.node1
sanhook --drive 0x80 iscsi:192.168.24.7::3260:0:iqn.2010-10.org.openstack:volume-e44073e9-0df9-43a0-ad05-9a6c41c80670 || goto fail_iscsi_retry
sanboot --no-describe || goto fail_iscsi_retry
<snip/>

[1] – due to bug in dracut(now fixed upstream [2]) setting this means that the image can’t be used for local boot
[2] – https://github.com/dracutdevs/dracut/pull/298

by higginsd at November 22, 2017 12:23 AM

November 21, 2017

RDO Blog

Recent blog posts

It's been a little while since we've posted a roundup of blogposts around RDO, and you all have been rather prolific in the past month!

Here's what we as a community have been talking about:

Hooroo! Australia bids farewell to incredible OpenStack Summit by August Simonelli, Technical Marketing Manager, Cloud

We have reached the end of another successful and exciting OpenStack Summit. Sydney did not disappoint giving attendees a wonderful show of weather ranging from rain and wind to bright, brilliant sunshine. The running joke was that Sydney was, again, just trying to be like Melbourne. Most locals will get that joke, and hopefully now some of our international visitors do, too!

Read more at http://redhatstackblog.redhat.com/2017/11/16/hooroo-australia-bids-farewell-to-incredible-openstack-summit/

Build your Software Defined Data Center with Red Hat CloudForms and Openstack – part 2 by Michele Naldini

Welcome back, here we will continue with the second part of my post, where we will work with Red Hat Cloudforms. If you remember, in our first post we spoke about Red Hat OpenStack Platform 11 (RHOSP). In addition to the blog article, at the end of this article is also a demo video I created to show to our customers/partners how they can build a fully automated software data center.

Read more at https://developers.redhat.com/blog/2017/11/02/build-software-defined-data-center-red-hat-cloudforms-openstack/

Build your Software Defined Data Center with Red Hat CloudForms and Openstack – part 1 by Michele Naldini

In this blog, I would like to show you how you can create your fully software-defined data center with two amazing Red Hat products: Red Hat OpenStack Platform and Red Hat CloudForms. Because of the length of this article, I have broken this down into two parts.

Read more at https://developers.redhat.com/blog/2017/11/02/build-software-defined-data-center-red-hat-cloudforms-openstack-2/

G’Day OpenStack! by August Simonelli, Technical Marketing Manager, Cloud

In less than one week the OpenStack Summit is coming to Sydney! For those of us in the Australia/New Zealand (ANZ) region this is a very exciting time as we get to showcase our local OpenStack talents and successes. This summit will feature Australia’s largest banks, telcos, and enterprises and show the world how they have adopted, adapted, and succeeded with Open Source software and OpenStack.

Read more at http://redhatstackblog.redhat.com/2017/10/30/gday-openstack/

Restarting your TripleO hypervisor will break cinder volume service thus the overcloud pingtest by Carlos Camacho

I don’t usualy restart my hypervisor, today I had to install LVM2 and virsh stopped to work so a restart was required, once the VMs were up and running the overcloud pingtest failed as cinder was not able to start.

Read more at http://anstack.github.io/blog/2017/10/30/restarting-your-tripleo-hypervisor-will-break-cinder.html

CERN CentOS Dojo, Part 4 of 4, Geneva by rbowen

On Friday evening, I went downtown Geneva with several of my colleagues and various people that had attended the event.

Read more at http://drbacchus.com/cern-centos-dojo-part-4-of-4-geneva/

CERN CentOS Dojo, part 3 of 4: Friday Dojo by rbowen

On Friday, I attended the CentOS Dojo at CERN, in Meyrin Switzerland.

Read more at http://drbacchus.com/cern-centos-dojo-part-3-of-4-friday-dojo/

CERN Centos Dojo, event report: 2 of 4 – CERN tours by rbowen

The second half of Thursday was where we got to geek out and tour various parts of CERN.

Read more at http://drbacchus.com/cern-centos-dojo-cern-tours/

CERN Centos Dojo 2017, Event Report (1 of 4): Thursday Meeting by rbowen

On Thursday, prior to the main event, a smaller group of CentOS core community got together for some deep-dive discussions around the coming challenges that the project is facing, and constructive ways to address them.

Read more at http://drbacchus.com/cern-centos-dojo-2017-thursday/

CERN Centos Dojo 2017, Event report (0 of 4) by rbowen

For the last few days I’ve been in Geneva for the CentOS dojo at CERN.

Read more at http://drbacchus.com/cern-centos-dojo-2017/

Using Ansible Openstack modules on CentOS 7 by Fabian Arrotin

Suppose that you have a RDO/Openstack cloud already in place, but that you'd want to automate some operations : what can you do ? On my side, I already mentioned that I used puppet to deploy initial clouds, but I still prefer Ansible myself when having to launch ad-hoc tasks, or even change configuration[s]. It's particulary true for our CI environment where we run "agentless" so all configuration changes happen through Ansible.

Read more at https://arrfab.net/posts/2017/Oct/11/using-ansible-openstack-modules-on-centos-7/

Using Falcon to cleanup Satellite host records that belong to terminated OSP instances by Simeon Debreceni

In an environment where OpenStack instances are automatically subscribed to Satellite, it is important that Satellite is notified of terminated instances so that is can safely delete its host record. Not doing so will:

Read more at https://developers.redhat.com/blog/2017/10/06/using-falcon-cleanup-satellite-host-records-belong-terminated-osp-instances/

My interview with Cool Python Codes by Julien Danjou

A few days ago, I was contacted by Godson Rapture from Cool Python codes to answer a few questions about what I work on in open source. Godson regularly interviews developers and I invite you to check out his website!

Read more at https://julien.danjou.info/blog/2017/interview-coolpythoncodes

Using Red Hat OpenStack Platform director to deploy co-located Ceph storage – Part Two by Dan Macpherson, Principal Technical Writer

Previously we learned all about the benefits in placing Ceph storage services directly on compute nodes in a co-located fashion. This time, we dive deep into the deployment templates to see how an actual deployment comes together and then test the results!

Read more at http://redhatstackblog.redhat.com/2017/10/04/using-red-hat-openstack-platform-director-to-deploy-co-located-ceph-storage-part-two/

Using Red Hat OpenStack Platform director to deploy co-located Ceph storage – Part One by Dan Macpherson, Principal Technical Writer

An exciting new feature in Red Hat OpenStack Platform 11 is full Red Hat OpenStack Platform director support for deploying Red Hat Ceph storage directly on your overcloud compute nodes. Often called hyperconverged, or HCI (for Hyperconverged Infrastructure), this deployment model places the Red Hat Ceph Storage Object Storage Daemons (OSDs) and storage pools directly on the compute nodes.

Read more at http://redhatstackblog.redhat.com/2017/10/02/using-red-hat-openstack-director-to-deploy-co-located-ceph-storage-part-one/

by Rich Bowen at November 21, 2017 02:55 PM

November 16, 2017

Red Hat Stack

Hooroo! Australia bids farewell to incredible OpenStack Summit

We have reached the end of another successful and exciting OpenStack Summit. Sydney did not disappoint giving attendees a wonderful show of weather ranging from rain and wind to bright, brilliant sunshine. The running joke was that Sydney was, again, just trying to be like Melbourne. Most locals will get that joke, and hopefully now some of our international visitors do, too!

keynote-asMonty Taylor (Red Hat), Mark Collier (OpenStack Foundation), and Lauren Sell (OpenStack Foundation) open the Sydney Summit. (Photo: Author)

And much like the varied weather, the Summit really reflected the incredible diversity of both technology and community that we in the OpenStack world are so incredibly proud of. With over 2300 attendees from 54 countries, this Summit was noticeably more intimate but no less dynamic. Often having a smaller group of people allows for a more personal experience and increases the opportunities for deep, important interactions.

To my enjoyment I found that, unlike previous Summits, there wasn’t as much of a singularly dominant technological theme. In Boston it was impossible to turn a corner and not bump into a container talk. While containers were still a strong theme here in Sydney, I felt the general impetus moved away from specific technologies and into use cases and solutions. It feels like the OpenStack community has now matured to the point that it’s able to focus less on each specific technology piece and more on the business value those pieces create when working together.

openkeynoteJonathan Bryce (OpenStack Foundation) (Photo: Author)

It is exciting to see both Red Hat associates and customers following this solution-based thinking with sessions demonstrating the business value that our amazing technology creates. Consider such sessions as SD-WAN – The open source way, where the complex components required for a solution are reviewed, and then live demoed as a complete solution. Truly exceptional. Or perhaps check out an overview of how the many components to an NFV solution come together to form a successful business story in A Telco Story of OpenStack Success.

At this Summit I felt that while the sessions still contained the expected technical content they rarely lost sight of the end goal: that OpenStack is becoming a key, and necessary, component to enabling true enterprise business value from IT systems.

To this end I was also excited to see over 40 sessions from Red Hat associates and our customers covering a wide range of industry solutions and use cases.  From Telcos to Insurance companies it is really exciting to see both our associates and our customers sharing their experiences with our solutions, especially in Australia and New Zealand with the world.

paddyMark McLoughlin, Senior Director of Engineering at Red Hat with Paddy Power Betfair’s Steven Armstrong and Thomas Andrew getting ready for a Facebook Live session (Photo: Anna Nathan)

Of course, there were too many sessions to attend in person, and with the wonderfully dynamic and festive air of the Marketplace offering great demos, swag, food, and, most importantly, conversations, I’m grateful for the OpenStack Foundation’s rapid publishing of all session videos. It’s a veritable pirate’s bounty of goodies and I recommend checking it out sooner rather than later on their website.

I was able to attend a few talks from Red Hat customers and associates that really got me thinking and excited. The themes were varied, from the growing world of Edge computing, to virtualizing network operations, to changing company culture; Red Hat and our customers are doing very exciting things.

Digital Transformation

Take for instance Telstra, who are using Red Hat OpenStack Platform as part of a virtual router solution. Two years ago the journey started with a virtualized network component delivered as an internal trial. This took a year to complete and was a big success from both a technological and cultural standpoint. As Senior Technology Specialist Andrew Harris from Telstra pointed out during the Q and A of his session, projects like this are not only about implementing new technology but also about “educating … staff in Linux, OpenStack and IT systems.” It was a great session co-presented with Juniper and Red Hat and really gets into how Telstra are able to deliver key business requirements such as reliability, redundancy, and scale while still meeting strict cost requirements.

 

Of course this type of digital transformation story is not limited to telcos. The use of OpenStack as a catalyst for company change as well as advanced solutions was seen strongly in two sessions from Australia’s Insurance Australia Group (IAG). Product

IAGEddie Satterly, IAG (Photo: Author)

Engineering and DataOps Lead Eddie Satterly recounted the journey IAG took to consolidate data for a better customer experience using open source technologies. IAG uses Red Hat OpenStack Platform as the basis for an internal open source revolution that has not only lead to significant cost savings but has even resulted in the IAG team open sourcing some of the tools that made it happen. Check out the full story of how they did it and join TechCrunch reporter Frederic Lardinois who chats with Eddie about the entire experience. There’s also a Facebook live chat Eddie did with Mark McLoughlin, Senior Director of Engineering at Red Hat that further tells their story.

 

 

Ops!

An area of excitement for those of us with roots in the operational space is the way that OpenStack continues to become easier to install and maintain. The evolution of TripleO, the upstream project for Red Hat OpenStack Platform’s deployment and lifecycle management tool known as director, has really reached a high point in the Pike cycle. With Pike, TripleO has begun utilizing Ansible as the core “engine” for upgrades, container orchestration, and lifecycle management. Check out Senior Principal Software Engineer Steve Hardy’s deep dive into all the cool things TripleO is doing and learn just how excited the new “openstack overcloud config download” command is going to make you, and your Ops team, become.

jarda2Steve Hardy (Red Hat) and Jaromir Coufal (Red Hat) (Photo: Author)

And as a quick companion to Steve’s talk, don’t miss his joint lightening talk with Red Hat Senior Product Manager Jaromir Coufal, lovingly titled OpenStack in Containers: A Deployment Hero’s Story of Love and Hate, for an excellent 10 minute intro to the journey of OpenStack, containers, and deployment.

Want more? Don’t miss these sessions …

Storage and OpenStack:

Containers and OpenStack:

Telcos and OpenStack

A great event

Although only 3 days long, this Summit really did pack a sizeable amount of content into that time. Being able to have the OpenStack world come to Sydney and enjoy a bit of Australian culture was really wonderful. Whether we were watching the world famous Melbourne Cup horse race with a room full of OpenStack developers and operators, or cruising Sydney’s famous harbour and talking the merits of cloud storage with the community, it really was an unique and exceptional week.

melcup2The Melbourne Cup is about to start! (Photo: Author)

The chance to see colleagues from across the globe, immersed in the technical content and environment they love, supporting and learning alongside customers, vendors, and engineers is incredibly exhilarating. In fact, despite the tiredness at the end of each day, I went to bed each night feeling more and more excited about the next day, week, and year in this wonderful community we call OpenStack!

See you in Vancouver!

image1Photo: Darin Sorrentino

by August Simonelli, Technical Marketing Manager, Cloud at November 16, 2017 10:56 PM

November 02, 2017

Red Hat Developer Blog

Build your Software Defined Data Center with Red Hat CloudForms and Openstack – part 2

Welcome back, here we will continue with the second part of my post, where we will work with Red Hat Cloudforms. If you remember, in our first post we spoke about Red Hat OpenStack Platform 11 (RHOSP). In addition to the blog article, at the end of this article is also a demo video I created to show to our customers/partners how they can build a fully automated software data center.

Hands-on – Part 2

Well, now we need something that can grant us a single pane of glass to manage our environment.

Something that will manage our hypervisors, private/public cloud, Sdn, Paas granting self-service capabilities, compliance and governance, predicting bottleneck, granting forecast accuracy, chargeback/showback, and deep analyze and the securing of our environment.

Here the product name is Red Hat CloudForms.

Picture 1

I invite you to read our blog http://CloudFormsblog.redhat.com/ and our official documentation https://access.redhat.com/documentation/en/red-hat-CloudForms/ to fully understand how amazing CloudForms is!

Now let’s start with some geek stuff here.

We would like to grant to our end users the availability of the same heat stack but in a self-service fashion.

CloudForms though the self-service UI is able to show to our users different type of service items (VM provisioning on different providers, heat stack execution, Ansible Playbook as a Service, etc.), combining them in a Service Catalog or a Service Catalog bundle.

In some cases, you would like to present a Self Service Dialog composed of a simple text box or a checkbox or a drop-down list or more or less whatever you want to grant to your users a simple UI to order their services with a few clicks!

Let me show you in practice what I mean.

You need to download the Red Hat CloudForms appliance (qcow2 format) from the Red Hat customer portal and then import it on KVM.

Remember to setup CF using the appliance_console textual menu and to add a dedicated disk for the VMDB (postgres) as pointed out here. [1]

Please be aware that CloudForms is fully supported on RH-V, OpenStack, Vmware Vsphere, Azure, Amazon EC2…but not on rhel + kvm so DON’T use this configuration for a production environment.

A full list of platforms able to host CloudForms appliance is available here. [2]

Let’s start importing our heat stack inside CloudForms from the administrative interface.

From Services -> Orchestration Templates -> Configuration -> Create New Orchestration Template you will be able to create your stack.

Picture 2

A Name, Description, and our stack content are required; then you can click on Add.

Picture 3

Now we have to create our Service Dialog to manage input parameters of our stack.

From Configuration -> Create Service Dialog from Orchestration Template, name your dialog e.g stack-3tier-heat-service dialog.

Picture 4

Now, let’s go to Automation -> Automate -> Customization to verify if the service dialog was correctly created.

Picture 5

Click on Configuration -> Edit.

We would like to hide some input parameters because usually your customers/end users are not aware of the architectural details (for instance the stack name or tenant_id or the management/Web Provider network id).

So let’s edit at least these values by unselecting the checkbox Visible and Required and putting a Default Value.

Below is an example of the stack name that will be called “demo-3tier-stack” and will not be shown to the end user.

Picture 6

Repeat the same configuration at least for Stack Prefix, Management Network, and Web Provider Network.

Please be aware that Management Network and Web Provider Network will be attached to our OpenStack External Network so here we need to put the correct network ID.

In our case, from our all-in-one rhosp, we can get this value with this command:

[root@osp ~(keystone_admin)]# openstack network list -f value -c ID -c Name | grep FloatingIpWeb

a18e0aa1-88ab-44d3-b751-ec3dfa703060 FloatingIpWeb

Picture 7

After doing our modification, we’ll see a preview of our Service Dialog.

Picture 8

Cool! Now that we have created our orchestration template and a service dialog let’s create our service catalog going to Services -> Catalog -> Catalogs ->All Catalogs.

Now click on Configuration -> Add a New Catalog.

Picture 9

Picture 10

We have to add the last thing, the service catalog item to be created under our “DEMO” service catalog.

Go to Services -> Catalogs -> Catalog Items and select our “DEMO” Catalog.

Picture 11

Select the Orchestration as Catalog Item Type and fill the required fields (Display in Catalog is very important).

Picture 12

If you want to restrict the visibility of the service catalog item you can select a tag to be assigned from Policy -> Edit Tags.

In this case, I’ve previously created a user (developer) member of a group (Demo-developers-grp) with a custom role called “Demo-Developers”.

Picture 13

Picture 14

I have granted the custom role, Demo-Developers only access to the Services Feature so our developer users will be able to order, see, and manage services items from the self-service catalog.

In addition, I have extended the rbac capabilities of our group assigning a custom tag called “Access” to the user group (Picture 13) and to the service item (Picture 8).

This map permits to the users’ member of the Demo-Developers group to request and manage service items tagged with the same value (Note the My Company tags value on the previous images).

Now we can order our service catalog item so let’s switch to Self Service User Interface (SSUI) pointing the url https://[IPADDRESS]/self_service and log in as a developer we’ll see our 3 tier-stack service item.

Picture 15

Click on the service then select the CloudForms tenant (tenant mapping in this setup is enabled so cloudform tenants map exactly our OpenStack tenants). If you want to change input parameters or leave them as default.

Picture 16

Let’s proceed to order our service item clicking on Add to Shopping Cart and then on Order.

Picture 17

I have edited the standard Service Provision method to request approval in case the request comes from the Developer Group, so as admin and from the web admin interface, approve the request from Services -> Requests.

Picture 18

After the approval, we can switch back to the Self Service UI where we’ll find under My Services what we have ordered and in a few minutes all the cool stuff created inside OpenStack.

Picture 19

Picture 20

Picture 21

Demo Video

Conclusions

This is a just an example of how you can create a functional, fully automated, scalable Software Defined Data Center with the help of Red Hat Products and Services

We are more than happy to help you and your organization to reach your business and technical objectives.

This blog highlights part of the job we did for an important transformation project for a big financial customer.

A big THANKS goes to my colleagues Francesco Vollero, Matteo Piccinini, Christian Jung, Nick Catling, Alessandro Arrichiello and Luca Miccini for their support during.


[1]https://access.redhat.com/documentation/en-us/red_hat_CloudForms/4.5/html-single/installing_red_hat_CloudForms_on_red_hat_virtualization/#Configuring-CloudForms

[2]]https://access.redhat.com/documentation/en-us/red_hat_CloudForms/4.5/html-single/support_matrix/#cfme_hosts

[3]https://access.redhat.com/documentation/en-us/red_hat_CloudForms/4.5/html-single/managing_providers/#adding_an_openstack_infrastructure_provider


Whether you are new to Linux or have experience, downloading this cheat sheet can assist you when encountering tasks you haven’t done lately.

Share

The post Build your Software Defined Data Center with Red Hat CloudForms and Openstack – part 2 appeared first on RHD Blog.

by Michele Naldini at November 02, 2017 02:00 PM

Build your Software Defined Data Center with Red Hat CloudForms and Openstack – part 1

In this blog, I would like to show you how you can create your fully software-defined data center with two amazing Red Hat products: Red Hat OpenStack Platform and Red Hat CloudForms. Because of the length of this article, I have broken this down into two parts.

As you probably know, every organization needs to evolve itself becoming a Tech Company, leveraging its own Digital Transformation, embracing or enhancing existing processes, evolving people’s mindset, people’s soft/hard skills and of course using new technologies.

Remember we are living in a digital era where if you don’t change yourself and your organization someone will disrupt your business!

So, how can I become disruptive in my business?

Well, speaking from a purely technical perspective a good approach should consider cloud technologies.

These kinds of technologies can be the first brick of your digital transformation strategy because they can grant business and technologies values.

For instance, you could:

  • Build new services on demand in a fast way
  • Doing it in a self-service fashion
  • Reduce manual activities
  • Scale services and workloads
  • Respond to requests peak
  • Grants services quality and SLAs to your customers
  • Respond to customer demands in a short way
  • Reduce time to market
  • Improve TCO

In my experience as a Solution Architect, I saw many different kinds of approaches when you want to build your Cloud.

For sure, you need to identify your business needs, then evaluate your use cases, build your user stories and then start thinking about tech stuff and their impacts.

Remember, don’t start approaching this kind of project from a technical perspective. There is nothing worse. My suggestion is, start thinking about values you want to deliver to your customers/end users.

Usually, the first technical questions coming to customers’ minds are:

 What kind of services would I like to offer?

  • What use cases do I need to manage?
  • Do I need a public cloud, a private cloud, or a mix of them (hybrid)?
  • Do I like to use an instance based environment (Iaas) or containers based (Paas for friends) or a Saas?
  • How is complex is it to build and manage a cloud environment?
  • What products will help my organization to reach these objectives?

The good news is that Red Hat can help you with amazing solutions, people, and processes.

Hands-on – Part 1

Now let’s start thinking about a cloud environment based on an instance/VM concept.

Here, we are speaking about Red Hat OpenStack Platform 11 (RHOSP).

Now I don’t want to explain in details all modules available in RHOSP 11 but basically, just to give you a brief overview, you can see them in the next figure.

Picture 1

In order to build an SDDC in a fast, fully automated way, in this blog, you’ll see how you can reach the goal using almost all these modules with a focus on the Orchestration Engine HEAT.

HEAT is very powerful orchestration engine that it’s able to build from a single instance to a very complex architecture composed by instances, networks, routers, load balancers, security groups, storage volumes, floating IPs, alarms, and more or less all objects managed by OpenStack.

The only thing you have to do is start to write a Heat Orchestration Template (HOT) written in yaml [1] and ask HEAT to execute it for you.

HEAT will be our driver to build our software-defined data center managing all needed OpenStack components leveraging all your requests.

So, the first thing is to build a HOT template right? Well, let’s start cloning my git repo:

https://gitlab.com/miken/rhosp_heat_stack.git

This heat stack will create a three-tier application with 2 web servers, 2 app servers, 1 db, some dedicated private network segments, virtual routers to interconnect private segments to floating IPs and to segregate networks. Two lbaas v2 (one for the FE and 1 for the APP layer), auto scaling groups, cinder-volumes (boot-from-volumes), ad hoc security groups, and Aodh alarms scale up/scale down policies and so on and so forth.

What? Yes, all these stuff 🙂

In this example, web servers will run httpd 2.4, app servers will load just a simple python http server on port 8080 and db server right now is a placeholder.

In the real world, of course, you will take care of installing and configuring your application servers and db servers in an automatic, reproducible, and idempotent way with heat or for instance with an ansible playbook.

In this repo you’ll find:

  • stack-3tier.yaml
      • The main HOT template which defines the skeleton of our stack (input parameters, resources type, and output).
    • lb-resource-stack-3tier.yaml
      • Hot template to configure our lbaas v2 (load balancer as a service: namespace based HA proxy). This file will be retrieved by the main HOT template via http.
  • Run.sh
    • Bash script to perform:
      • OpenStack project creation
      • User management
      • Heat stack creation under pre-configured OpenStack tenant
      • Deletion of previous points (in case you need it)

As prerequisites to build your environment, you need to prepare:

A laptop or Intel NUC with RHEL 7.X + kvm able to host two virtual machines (1 all-in-one OpenStack vm and 1 CloudForms vm).

I suggest using a server with 32 GB of ram and at least 250 GB of SSD disk.

  • OpenStack 11 all-in-one VM on rhel 7.4  (installed with packstack usable ONLY for test/demo purpose) with:
    • 20 GB of ram
    • 4-8 vcpu ⇒ better 8 🙂
    • 150 GB of disk (to store cinder-volumes as a file)
    • 1 vnic (nat is ok)
    • Pre-configured external shared network available for all projects

OpenStack network create –share –external –provider-network-type flat \  –provider-physical-network extent FloatingIpWeb \

OpenStack subnet create –subnet-range 192.168.122.0/24 \ –allocation-pool start=192.168.122.30,end=192.168.122.50 –no-dhcp –gateway 192.168.122.1 –network FloatingIpWeb –dns-nameserver \ 192.168.122.1 FloatingIpWebSubnet

    • Rhel 7.4 images loaded on glance and available for your projects/tenants (public).
    • Apache2 image loaded on glance (based on rhel 7.4 + httpd installed and enabled). You’ll have to define a virtual host pointing to your web server Document Root.
    • A dedicated flavor called “x1.xsmall” with 2 GB of ram, 1 vcpu and 10 GB of disk.
  • A new Apache virtual host on the rhosp vm to host our load balancer yaml file.

root@osp conf.d(keystone_admin)]# cat /etc/httpd/conf/ports.conf  | grep 8888

Listen 8888

[root@osp conf.d(keystone_admin)]# cat /etc/httpd/conf.d/heatstack.conf

<VirtualHost *:8888>

   ServerName osp.test.local

   ServerAlias 192.168.122.158

   DocumentRoot /var/www/html/heat-templates/

   ErrorLog /var/log/httpd/heatstack_error.log

   CustomLog /var/log/httpd/heatstack_requests.log combined

</VirtualHost>

  • Iptables firewall rule to permit network traffic to tcp port 8888.

iptables -I INPUT 13 -p tcp -m multiport –dports 8888 -m comment –comment “heat stack 8888 port retrieval” -j ACCEPT

That’s all.

Let’s clone the git repo, doing some modifications and start.

  • Modify stack-3tier.yaml
    • Uncomment tenant_id rows

If you are executing the heat stack through the bash script run.sh, don’t worry. Run.sh will take care of updating the tenant_id parameter.

Otherwise, if you are executing it through CloudForms or manually via heat please update the tenant_id accordingly to your environment configuration.

    • Modify management_network and web_provider_network pointing out your floating ip networks. If you are using just a single external network, you can put here the same value. In a true production environment, you’ll probably use more than one external networks with different floating ip pools.
    • Modify str_replace of web_asg (autoscaling group for our web servers) accordingly to what you want to modify on your web-landing page.

We’ll see later why I’ve done some little modification using str_replace.  🙂

Let’s source our keystonerc_admin (so as admin) and run our bash script on our rhosp server:

root@osp ~]# source /root/keystonerc_admin

[root@osp heat-templates(keystonerc_admin)]# bash -x run.sh create

After the automatic creation of the tenant (demo-tenant) you’ll see in the output that heat is creating our resources.

2017-10-05 10:06:14Z [demo-tenant]: CREATE_IN_PROGRESS  Stack CREATE started

2017-10-05 10:06:15Z [demo-tenant.web_network]: CREATE_IN_PROGRESS  state changed

2017-10-05 10:06:16Z [demo-tenant.web_network]: CREATE_COMPLETE  state changed

2017-10-05 10:06:16Z [demo-tenant.boot_volume_db]: CREATE_IN_PROGRESS  state changed

2017-10-05 10:06:17Z [demo-tenant.web_subnet]: CREATE_IN_PROGRESS  state changed

2017-10-05 10:06:18Z [demo-tenant.web_subnet]: CREATE_COMPLETE  state changed

2017-10-05 10:06:18Z [demo-tenant.web-to-provider-router]: CREATE_IN_PROGRESS  state changed

2017-10-05 10:06:19Z [demo-tenant.internal_management_network]: CREATE_IN_PROGRESS  state changed

2017-10-05 10:06:19Z [demo-tenant.internal_management_network]: CREATE_COMPLETE  state changed

2017-10-05 10:06:20Z [demo-tenant.web_sg]: CREATE_IN_PROGRESS  state changed

2017-10-05 10:06:20Z [demo-tenant.web-to-provider-router]: CREATE_COMPLETE  state changed

2017-10-05 10:06:20Z [demo-tenant.web_sg]: CREATE_COMPLETE  state changed

2017-10-05 10:06:21Z [demo-tenant.management_subnet]: CREATE_IN_PROGRESS  state changed

… Output truncated

You can also login to the Horizon dashboard and check the status of the heat stack from Orchestration -> Stack

Picture 2

Clicking on our stack name you will be able to see also all resources managed by heat and their current status (Picture 2).

Picture 3

After 10-12 minutes, your heat stack will be completed. In a production environment, you’ll reach the same goal in 2 minutes! Yes, 2 minutes in order to have a fully automated software-defined data center!

Let’s check what our stack has deployed going to Network Topology tab.

Cool! Everything was deployed as expected.

Picture 4

Now you are probably wondering what kind of services are managed by this environment.

Let’s see if Lbaas which are exposing our services is up and running:

[root@osp ~(keystone_admin)]# neutron lbaas-loadbalancer-list  -c name -c vip_address  -c provisioning_status

Neutron CLI is deprecated and will be removed in the future. Use OpenStack CLI instead.

+————————————————+————–+———————+

| name                                           | vip_address  | provisioning_status |

+————————————————+————–+———————+

| demo-3tier-stack-app_loadbalancer-kjkdiehsldkr | 172.16.30.4  | ACTIVE  |

| demo-3tier-stack-web_loadbalancer-ht5wirjwcof3 | 172.16.10.12 | ACTIVE  |

+————————————————+————–+———————+

Now get the floating ip address associated to our web lbaas.

[root@osp ~(keystone_admin)]# OpenStack floating ip list -f value -c “Floating IP Address” -c “Fixed IP Address” | grep 172.16.10.12

192.168.122.37 172.16.10.12

So, our external floating IP is 192.168.122.37.

Let’s see what is exposing 🙂

Wow, Red Hat Public website hosted our OpenStack!

Picture 5

I have cloned our Red Hat company site as a static website so our app and db server is not really exposing a service but of course, you can extend this stack installing on your app/db server whatever is needed to expose your services.

Now, I want to show you that this website is really running on our instances so let’s scroll the page until the footer

Picture 6

Here you can see what instance is presenting you the website:

I am web-eg4q.novalocal created on Thu Oct 5 12:11:33 EDT 2017

Doing a refresh, our Lbaas will perform a round robin to other web instances;

I am web-9hhm.novalocal created on Thu Oct 5 12:12:51 EDT 2017

This is why I suggested to you to modify str_replace accordingly to what you want to modify on your web page.

In this case, I’ve changed the footer to clearly show you what server is answering to our http requests.

[1]https://docs.openstack.org/heat/latest/template_guide/index.html


Download this Kubernetes cheat sheet for automating deployment, scaling, and operations of application containers across clusters of hosts, providing container-centric infrastructure.

Share

The post Build your Software Defined Data Center with Red Hat CloudForms and Openstack – part 1 appeared first on RHD Blog.

by Michele Naldini at November 02, 2017 11:00 AM

October 30, 2017

Red Hat Stack

G’Day OpenStack!

In less than one week the OpenStack Summit is coming to Sydney! For those of us in the Australia/New Zealand (ANZ) region this is a very exciting time as we get to showcase our local OpenStack talents and successes. This summit will feature Australia’s largest banks, telcos, and enterprises and show the world how they have adopted, adapted, and succeeded with Open Source software and OpenStack.

frances-gunn-41736Photo by Frances Gunn on Unsplash

And at Red Hat, we are doubly proud to feature a lineup of local, regional, and global speakers in over 40 exciting sessions. Not only can you stop by and see speakers from Australia, like Brisbane’s very own Andrew Hatfield (Red Hat, Practice Lead – Cloud Storage and Big Data) who has two talks discussing everything from CephFS’s impact on OpenStack to a joint talk about how OpenStack and Ceph are evolving to integrate with the Linux, Docker, and Kubernetes!

Of course, not only are local Red Hat associates telling their own stories, but so too are our ANZ customers. Australia’s own dynamic Telecom, Telstra, has worked closely with Red Hat and Juniper for all kinds of cutting edge NFV work and you can check out a joint talk from Telstra, Juniper, and Red Hat to learn all about it in “The Road to Virtualization: Highlighting The Unique Challenges Faced by Telcos” featuring Red Hat’s Senior Product Manager for Networking Technologies, Anita Tragler alongside Juniper’s Greg Smith and Telstra’s Senior Technology Specialist extraordinaire Andrew Harris.

On Wednesday at 11:00AM, come see how a 160 year old Aussie insurance company, IAG, uses Red Hat OpenStack Platform as the foundation for their Open Source Data Pipeline. IAG is leading a dynamic and disruptive change in their industry and bringing important Open Source tools and process to accelerate innovation and save costs. They were nominated for a Super User award as well for their efforts and we are proud to call them Red Hat customers.

We can’t wait to meet all our mates!

ewa-gillen-59113Photo by Ewa Gillen on Unsplash

For many of us ANZ-based associates the opportunity to meet the global OpenStack community in our biggest city is very exciting and one we have been waiting on for years. While we will of course be very busy attending the many sessions, one great place to be sure to meet us all is at Booth B1 in the Marketplace Expo Hall. At the booth we will have live environments and demos showcasing the exciting integration between Red Hat CloudForms, Red Hat OpenStack Platform, Red Hat Ceph Storage, and Red Hat OpenShift Container Platform accompanied by our very best ANZ, APAC, and Global talent. Come and chat with Solution Architects, documentation professionals, engineers, and senior management and find out how we develop products so that they continue to grow and lead the OpenStack and Open Source world.

And of course, there will some very special, Aussie-themed swag for you to pick up. There are a few once-in-a-lifetime items that we think you won’t want to miss and will make a truly special souvenir of your visit to our wonderful country! And of course, the latest edition of the RDO ducks will be on hand – get in fast!

There will also be a fun “Marketplace Mixer” on Monday, November 6th from 5:50PM – 7:30PM where you will find great food, conversation, and surprises in the Expo Hall. Our booth will feature yummy food, expert conversations, and more! And don’t miss the very special Melbourne Cup celebration on Tuesday, November 7th from 2:30 – 3:20. There will be a live stream of “the race that stops the nation,” the Melbourne Cup, direct from Flemington Racecourse in Victoria. Prepare your fascinator and come see the event Australia really does stop for!

framed-photograph-phar-lap-winning-melbourne-cup-1930-250974-smallCredit: Museums Victoria

If you’ve not booked yet you can still save 10% with our exclusive code, REDHAT10.

So, there you go, mate.

World’s best software, world’s best community, world’s best city.

As the (in)famous tourism campaign from 2007 asked in the most Australian of ways: “So where the bloody hell are you?

Can’t wait to see you in Sydney!

by August Simonelli, Technical Marketing Manager, Cloud at October 30, 2017 10:32 PM

Carlos Camacho

Restarting your TripleO hypervisor will break cinder volume service thus the overcloud pingtest

I don’t usualy restart my hypervisor, today I had to install LVM2 and virsh stopped to work so a restart was required, once the VMs were up and running the overcloud pingtest failed as cinder was not able to start.

From your Overcloud controller run:

sudo losetup -f /var/lib/cinder/cinder-volumes
sudo vgdisplay
sudo service openstack-cinder-volume restart

This will make your Overcloud pingtest work again.

by Carlos Camacho at October 30, 2017 12:00 AM

October 26, 2017

RDO Blog

CentOS Dojo @ CERN

Hi,

Alan, Matthias, Rich and I were at CERN last week on thursday and friday to attend the CentOS dojo. Rich also a wrote a series of blog posts about the dojo.

First day: CentOS SIGs meetup

Thursday was dedicated to a SIGs meeting, I'll give few highlights but you can read notes on his etherpad.

  • We managed to agree on a proposal to allow bot accounts for SIGs which is one of RDO current pain points.
  • There was also progress into improving CI for SIGs contents, like defining a matrix for SIGs depending on each other to trigger tests
  • Testing against CentOS extras is also an issue. SIGs were advised to provide automated tests that CentOS QA can run and send feedback to SIGs (not blocking updates but still an improvement). Thanks to the t_functional framework
  • Many discussions around the package build workflow (signing, embargoed builds, deprecate content).
  • SIG process: what happens when a chair is MIA? (happened for storage SIG) That was a very productive and focused session, we even managed not to get over schedule, defining a proper agenda ahead of time have helped.

At the end of the day, we had a tour of the datacenter (to see and touch the nodes that run RDO <3). Then, we visited the ATLAS experiment facility.


Second day: CentOS dojo

Friday was the dojo (See schedule with slides attached!) itself, we had about 100 persons registered, with more or less20 not showing up. It started by Belmiro Moreira talk about the OpenStack infrastructure at CERN. It is amazing to see that their RDO cloud runs over 279k cores and has been updated to Pike. It was followed up by a talk from Hervé Rousseau about CERN storage facilities, and the challenge they are facing (Data Deluge in 2026!). They are big users of Ceph and CephFS.

Afterwards, we had a SIGs status from Storage, Opstools (mrunge) and Cloud (myself). It seems that attendance was happy to discover Opstools in a new light, Matthias had many questions after his talk. For my Cloud SIG talk (slides, I collected many stats to show the vitality of our community. I would like to thank boucher and the Software Factory Team for the RepoXplorer project for the stats, it was really helpful. Then, I spoke our contributions to cross-SIG collaboration, including amoralej proposal for a ceph build pipeline inspired by ours. And I ended up with our own infrastructure, showing off DLRN, WeIRDO etc. The day ended up by a talk from kwizart (RPMFusion maintainers) about CentOS and 3rd party repository.

The hallway track was also interesting as I got to meet with Magnum PTL and the other folks maintaining it at CERN. I finally got feedback about magnum packaging working fine, and we spoke about adding RDO 3rd-party CI to magnum. We don't ship magnum in OSP, but this is a visible project and used by RDO biggest use-case, so helping them to set it up is an excellent news for RDO.


Conclusion

This was an excellent event, where SIGs were able to focus on solving our current pain points. As a community, RDO does value our collaboration with CentOS to provide a native and rock-solid experience of OpenStack, from the kernel to the API endpoints!

by Haïkel Guémar at October 26, 2017 10:44 AM

October 24, 2017

RDO Blog

Mailing List Changes

You need to be aware of recent changes to our mailing lists

What Happened, and Why?

Since the start of the project we have had one mailing list for both users and developers of the RDO project. Over time, we felt that user questions have been drowned out by the more technical developer-oriented discussion, leaving users/operators out of the conversation.

To this end, we've decided to split the one mailing list - rdo-list@redhat.com - into two new mailing lists - dev@lists.rdoproject.org and users@lists.rdoproject.org

We've also moved the rdo-newsletter@redhat.com list to the new newsletter@lists.rdoproject.org email address.

What you need to do

You need to update your contacts list to reflect this change, and start sending email to the new addresses.

As in any typical open source project, user conversations (questions, discussion, community announcements, and so on) should go to the users list, while developer related discussion should go to the dev list.

If you send email to the old address, you should receive an immediate autoresponse reminding you of the new addresses.

List descriptions and arhives are now all at https://lists.rdoproject.org/mailman/listinfo. Please let me know if you see references to the old list information, so we can get it updated.

by Rich Bowen at October 24, 2017 10:42 AM

October 21, 2017

Rich Bowen

CERN CentOS Dojo, part 3 of 4: Friday Dojo

On Friday, I attended the CentOS Dojo at CERN, in Meyrin Switzerland.

CentOS dojos are small(ish) gatherings of CentOS enthusiasts that happen all over the world. Each one has a different focus depending on where it is held and the people that plan and attend it.

You can read more about dojos HERE.

On Friday, we had roughly 60-70 people in attendance, in a great auditorium provided by CERN. We had 97 people registered, and 75% is pretty standard turnout for free-to-register events, so we were very pleased.

You can get a general idea of the size of the crowd in this video:

The full schedule of talks can be seen here: https://indico.cern.ch/event/649159/timetable/#20171020

There was an emphasis on large-scale computing, since that’s what CERN does. And the day started with an overview of the CERN cloud computing cluster. Every time I attend this talk (and I’ve seen it perhaps 6 times now) the numbers are bigger and more impressive.

CERN and Geneva

This time, they reported 279 thousands cores in their cluster. That’s a lot. And it’s all running RDO. This makes me insanely proud to be a small part of that endeavor.

Other presentations included reports from various SIGs. SIGs are Special Interest Groups within CentOS. This is where the work is done to develop projects on top of CentOS, including packaging, testing, and promotion of those projects. You can read more about the SIGs here: https://wiki.centos.org/SpecialInterestGroup

If you want to see your project distributed in the CentOS distro, a SIG is the way to make this happen. Drop by the centos-devel mailing list to propose a SIG or join an existing one.

The entire day was recorded, so watch this space for the videos and slides from the various presentations.

The CERN folks appeared very pleased with the day, and stated their intention to do the event again on an annual basis, if all works out. These things aren’t free to produce, of course (even though we strive to make them always free to attend) so if your organization is interested in sponsoring future dojos, please contact me. I’ll also be publishing a blog post over on seven.centos.org in the coming days about what’s involved in doing one of these events, in case you’d like to host one at your own facility..

by rbowen at October 21, 2017 11:59 AM

CERN Centos Dojo, event report: 2 of 4 – CERN tours

(This post is the second in a series of four. They are gathered here.)

The second half of Thursday was where we got to geek out and tour various parts of CERN.

I was a physics minor in college, many years ago, and had studied not just CERN, but many of the actual pieces of equipment we got to tour, so this was a great privilege.

We started by touring the data center where the data from all of the various physics experiments is crunched into useful information and discoveries. This was amazing for a number of reasons.

From the professional side, CERN is the largest installation of RDO – the project I work with at work – that we know of. 279 thousand cores running RDO OpenStack.

For those not part of my geek world, that translates into hundreds of thousands of physical computers, arranged in racks, crunching data to unlock the secrets of the universe.

For those that are part of my geek world, you can understand why this was an exciting thing to see in person and walk through.

The full photo album is here, but I want to particularly show a couple of shots:

Visiting CERN

Here we have several members of the RDO and CentOS team standing in front of some of the systems that run RDO.

Visiting CERN

And here we have a photo that only a geek can love – this is the actual computer on which the very first website ran. Yes, boys and girls, that’s Tim Berners-Lee’s desktop computer from the very first days of the World Wide Web. It’s ok to be jealous.

There will also be some video over on my YouTube channel, but I haven’t yet had an opportunity to edit and post that stuff.

Next, we visited the exhibit about the Superconducting Super Collider, also known as the Large Hadron Collider. This was stuff that I studied in college, and have geeked out about for the years since then.

There are pictures from this in the larger album, but I want to point out one particular picture of something that absolutely blew my mind.

Most of the experiments in the LHC involve accelerating sub-atomic particles (mostly protons) to very high speeds – very close to the speed of light – and then crashing them into something. When this happens, bits of it fly off in random directions, and the equipment has to detect those bits and learn things about them – their mass, speed, momentum, and so on.

In the early days, one of the the ways that they did this was to build a large chamber and string very fine wires across it, so that when the particles hit those wires it would cause electrical impulses.

Those electrical impulses were captured by these:

CERN visit

Those are individual circuit boards. THOUSANDS of them, each individually hand-soldered. Those are individual resistors, capacitors, and ICs, individually soldered to boards. The amount of work involved – the dedication, time, and attention to detail – is simply staggering. This photo is perhaps 1/1000th of the total number of boards. If you’ve done any hand-soldering or electronic projects, you’ll have a small sense of the scale of this thing. I was absolutely staggered by this device.

Outside on the lawn were several pieces of gigantic equipment that were used in the very early days of particle physics, and this was like having the pages of my college text book there in front of me. I think my colleagues thought I’d lost my mind a little.

College was a long time ago, and most of the stuff I learned has gone away, but I still have the sense of awe of it all. That an idea (let’s smash protons together!) resulted in this stuff – and more than 10,000 people working in one place to make it happen, is really a testament to the power of the human mind. I know some of my colleagues were bored by it all, but I am still reeling a little from being there, and seeing and touching these things. I am so grateful to Tim Bell and Thomas Oulevey for making this astonishing opportunity available to me.

Finally, we visited the ATLAS experiment, where they have turned the control room into a fish tank where you can watch the scientists at work.

CERN visit

What struck me particularly here was that most of the people in the room were so young. I hope they have a sense of the amazing opportunity that they have here. I expect that a lot of these kids will go on to change the world in ways that we haven’t even thought of yet. I am immensely jealous of them.

So, that was the geek chapter of our visit. Please read the rest of the series for the whole story.

by rbowen at October 21, 2017 11:13 AM

CERN Centos Dojo 2017, Event report (0 of 4)

For the last few days I’ve been in Geneva for the CentOS dojo at CERN.

What’s CERN? – http://cern.ch/

What’s a dojo? – https://wiki.centos.org/Events/Dojo/

What’s CentOS? – http://centos.org/

A lot has happened that I want to write about, so I’ll be breaking this into several posts:

(As usual, if you’re attempting to follow along on Facebook, you’ll be missing all of the photos and videos, so you’ll really want to go directly to my blog, at http://drbacchus.com/)

 

by rbowen at October 21, 2017 10:21 AM

October 10, 2017

Arrfab's Blog

Using Ansible Openstack modules on CentOS 7

Suppose that you have a RDO/Openstack cloud already in place, but that you'd want to automate some operations : what can you do ? On my side, I already mentioned that I used puppet to deploy initial clouds, but I still prefer Ansible myself when having to launch ad-hoc tasks, or even change configuration[s]. It's particulary true for our CI environment where we run "agentless" so all configuration changes happen through Ansible.

The good news is that Ansible has already some modules for Openstack but it has some requirements and a little bit of understanding before being able to use those.

First of all, all the ansible os_ modules need "shade" on the host included in the play, and that will be responsible of all os_ modules launch. At the time of writing this post, it's not yet available on mirror.centos.org, (a review is open so that will be soon available directly) but you can find the pkg on our CBS builders

Once installed, a simple os_image task was directly failing, despite the fact that auth: was present, and that's due to a simple reason : Ansible os_ modules still want to use v2 API, while it's now defaulting to v3 in Pike release. There is no way to force ansible itself to use v3, but as it uses shade behind the scene, there is a way to force this through os-client-config

That means that you just have to use a .yaml file (does that sound familiar for ansible ?) that will contain everything you need to know about specific cloud, and then just in ansible declare which cloud you're configuring.

That clouds.yaml file can be under $current_directory, ~/.config/openstack or /etc/openstack so it's up to you to decide where you want to temporary host it, but I selected /etc/openstack/ :

- name: Ensuring we have required pkgs for ansible/openstack
  yum:
    name: python2-shade
    state: installed

- name: Ensuring local directory to hold the os-client-config file
  file:
    path: /etc/openstack
    state: directory
    owner: root
    group: root

- name: Adding clouds.yaml for os-client-config for further actions
  template:
    src: clouds.yaml.j2
    dest: /etc/openstack/clouds.yaml
    owner: root
    group: root
    mode: 0700

Of course such clouds.yaml file is itself a jinja2 template distributed by ansible on the host in the play before using the os_* modules :

clouds:
  {{ cloud_name }}:
    auth:
      username: admin
      project_name: admin
      password: {{ openstack_admin_pass }}
      auth_url: http://{{ openstack_controller }}:5000/v3/
      user_domain_name: default
      project_domain_name: default
    identity_api_version: 3

You just have to adapt to your needs (see doc for this) but the interesting part is the identity_api_version to force v3.

Then, you can use all that in a simple way through ansible tasks, in this case adding users to a project :

- name: Configuring OpenStack user[s]
  os_user:
    cloud: "{{ cloud_name }}"
    default_project: "{{ item.0.name }}"
    domain: "{{ item.0.domain_id }}"
    name: "{{ item.1.login }}"
    email: "{{ item.1.email }}"
    password: "{{ item.1.password }}"           
  with_subelements:
    - "{{ cloud_projects }}"
    - users  
  no_log: True

From a variables point of view, I decided to just have a simple structure to host project/users/roles/quotas like this :

cloud_projects:
  - name: demo
    description: demo project
    domain_id: default
    quota_cores: 20
    quota_instances: 10
    quota_ram: 40960
    users:
      - login: demo_user
        email: demo@centos.org
        password: Ch@ngeM3
        role: admin # can be _member_ or admin
      - login: demo_user2
        email: demo2@centos.org
        password: Ch@ngeMe2

Now that it works, you can explore all the other os_* modules and I'm already using those to :

  • Import cloud images in glance
  • Create networks and subnets in neutron
  • Create projects/users/roles in keystone
  • Change quotas for those projects

I'm just discovering how powerful those tools are, so I'll probably discover much more interesting things to do with those later.

by Fabian Arrotin at October 10, 2017 10:00 PM

October 09, 2017

RDO Blog

Project Teams Gathering interviews

Several weeks ago I attended the Project Teams Gathering (PTG) in Denver, and conducted a number of interviews with project teams and a few of the PTLs (Project Technical Leads).

These interviews are now all up on the RDO YouTube channel. Please subscribe, as I'll be doing more interviews like this at OpenStack Summit in Sydney, as well as at future events.

I want to draw particular attention to my interview with the Swift crew about how they collaborate across company lines and across timezones. Very inspiring.

Watch all the videos now.

by Rich Bowen at October 09, 2017 06:09 PM

October 06, 2017

Red Hat Developer Blog

Using Falcon to cleanup Satellite host records that belong to terminated OSP instances

Overview

In an environment where OpenStack instances are automatically subscribed to Satellite, it is important that Satellite is notified of terminated instances so that is can safely delete its host record. Not doing so will:

  • Exhaust the available subscriptions, leading to unsubscribed hosts not being able to apply updates and security errata.
  • In the event that an emergency security errata needs to be deployed across the organization, Satellite administrators would be unable to determine if a host was either off or terminated, leading to uncertainty with their security posture.

In smaller environments, where one team is responsible for both OSP and Satellite, it’s possible to have one system administrator do this by using their administrator level access across both systems to determine which host records can be safely deleted in Satellite when the corresponding instance no longer exists.

This approach, however, does not scale as the number of instances is launched and terminated daily increases across the environment. Larger environments also lead to different teams being responsible for different software suites, and administrator level credentials would seldom be granted to a single person.

One approach to solving this problem in an automated manner is to have Satellite periodically poll OpenStack to determine if a given instance’s UUID still exists and should it not, remove the host record.

Some assumptions before we begin:

  • Instances launched are automatically subscribed to Satellite via rhsm.
  • UUID of the instance is passed to Satellite during the registration process, found by default under the hosts virt::uuid fact on Satellite.
  • An instance/VM/physical box that can connect to the Keystone/nova endpoints and that can be polled by Satellite.

Designing the API

Falcon is a remarkably simple Python web API framework that can be quickly deployed with minimal effort. The API was designed to return status codes depending on the status of the instance’s UUID being checked using the call http://hostname/check/, where the following return codes are used:

200 = instance exists, don’t delete host record

400 = bad UUID request – UUID not formatted correctly

404 = instance does not exist, delete host record

500 = unable to reach OSP

With the API designed, it’s now a simple coding exercise to have the API:

  1. Check that the UUID provided is valid.
  2. Contact OSP and search for the provided UUID.
  3. Return a status code based on the search result.

Using Keystone v3

The keystoneauth session Python API is used to create a token that is then passed to Nova via the novaclient Python API. The following functions will be used later to query OSP:


from novaclient import client as nova_client
from keystoneclient.v3 import client as keystone_client
from keystoneauth1.identity import v3
from keystoneauth1 import session
import sys

def get_osp_token():
  try:
    auth = v3.Password(user_domain_name=default, username=admin, password=XXXX, auth_url=https://osp.endpoint:35357, project_domain_name=default, project_name=admin)
    sess = session.Session(auth=auth, verify="./cacert.pem")
    return sess

  except session.exceptions.http.Unauthorized:
    print ("Credentials incorrect")

  except session.excpetions.connection.ConnectFailure:
    print ("Unable to reach OSP Server")

  except:
    print ("Unexpected error:, sys.exc_info()[0])

Using the token generated by the above get_osp_token() function, the following generate_id_list() function will generate a list of all the instance UUID’s that exist in OSP:


def generate_id_list(token):
  nova = nova_client.Client('2', session=token)
  instance_list = nova.servers.list(detailed=True, search_opts= {'all_tenants': 1,})
  instance_id_list = [instance.id.lower() for instance in instance_list]
  return instance_id_list

The Falcon API

Beginning with Falcon’s simple Learning by Example, we utilize the above functions to create our API:


#instanceapi.py
import falcon

class CheckUUID(object):

  def on_get(self, req, resp, uuid):
    if not uuid_valid(uuid):
      resp.status = falcon.HTTP_400
      resp.body = (uuid+' is not a valid UUID that can be parsed\n')
      return

    osptoken = get_osp_token()
    id_list = generate_id_list(osptoken)

    if not id_list:
      resp.status = falcon.HTTP_500
      resp.body = ('Server Down\n')
      return

    uuid = uuid.lower()

    if uuid in id_list:
      resp.status=falcon.HTTP_200
      resp.body =('The UUID '+uuid+' exists in OSP\n')
      return

    # no match found
    resp.status = falcon.HTTP_404
    resp.body = ('The UUID '+uuid+' does not exist in OSP\n')

# main block
app = falcon.API()
check = UUIDCheck()
app.add_route('/check/{uuid}', check)

Using Gunicorn to serve the API

With the working code, it’s now a simple matter of deploying the code to a host that can be polled by Satellite.

Using Gunicorn, a simple WSGI server is started which serves the Python code.

# gunicorn --workers=4 --bind=0.0.0.0:80 check:app

A simple systemd service file will allow the API to start in the event the host is restarted:

#/usr/lib/systemd/system/instanceapi.service

[Unit]
Description=Instance API Frontend
After=network.target

[Service]
Environment="PATH=/root"
WorkingDirectory=/root
ExecStart=/usr/bin/gunicorn instanceapi:app -b 0.0.0.0:80 -w 4 --access-logfile /var/log/access.log

[Install]
WantedBy=multi-user.target
# systemctl enable instanceapi.service; systemctl start instanceapi.service

Checking Satellite Hosts

Satellite can now check iterate over the host records and determine if it can be safely removed. To get the UUID of a specific host from Satellite, you can use the hammer CLI:

# hammer fact list --search "host=<hostname> and fact=virt::uuid"

The UUID returned by hammer can then be passed to API, where the return code will indicate whether the instance still exists.

Conclusion

Without requiring the sharing of administrator credentials between Satellite and OpenStack deployments, it’s now possible with the use of a simple API to determine if an instance on OSP still exists.


Whether you are new to Linux or have experience, downloading this cheat sheet can assist you when encountering tasks you haven’t done lately.

Share

The post Using Falcon to cleanup Satellite host records that belong to terminated OSP instances appeared first on RHD Blog.

by Simeon Debreceni at October 06, 2017 04:00 PM

October 05, 2017

Julien Danjou

My interview with Cool Python Codes

A few days ago, I've recently been contacted by Godson Rapture from Cool Python codes to answer a few questions about what I work on in open source. Godson regularly interview developers and I invite you to check out his website!

Here's a copy of my original interview. Enjoy!

Good day, Julien Danjou, welcome to Cool Python Codes. Thanks for taking your precious time to be here.

You’re welcome!

Could you kindly tell us about yourself like your full name, hobbies, nationality, education, and experience in programming?

Sure. I’m Julien Danjou, I’m French and live in Paris, France. I studied Computer science for 5 years around 15 years ago, and continued my career in that field since then, specializing in open source projects.

Those last years, I’ve been working as a software engineer at Red Hat. I’ve spent the last 10 years working with the Python programming language. Now I work on the Gnocchi project which is a time series database.

When I’m not coding, I enjoy running half-marathon and playing FPS games.

Can you narrate your first programming experience and what got you to start learning to program?

I started programming around 2001, and my first serious programs were in Perl. I was contributing to a hosting platform for free software named VHFFS. It was a free software project itself, and I enjoyed being able to learn from other more experienced developers and being able to contribute back to it. That’s what got me stuck into that world of open source projects.

Which programming language do you know and which is your favorite?

I know quite a few, I’ve been doing serious programming in Perl, C, Lua, Common Lisp, Emacs Lisp and Python.

Obviously, my favorite is Common Lisp, but I was never able to use it for any serious project, for various reasons. So I spend most of my time hacking with Python, which I really enjoy as it is close to Lisp, in some ways. I see it as a small subset of Lisp.

What inspired you to venture into the world of programming and drove you to learn a handful of programming languages?

It was mostly scratching my own itches when I started. Each time I saw something I wanted to do or a feature I wanted in an existing software, I learned what I needed to get going and get it working.

I studied C and Lua while writing awesome- the window manager that I created 10 years ago and used for a while. I learned Emacs Lisp while writing extensions that I wanted to see in Emacs, etc. It’s the best way to start.

What is your blog about?

My blog is a platform where I write about what I work on most of the time. Nowadays, it’s mostly about Python and the main project I contribute to, Gnocchi.

When writing about Gnocchi, I usually try to explain what part of the project I worked on, what new features we achieved, etc.

On Python, I try to share solutions to common problems I encountered or identified while doing e.g. code reviews. Or presenting a new library I created!

Tell us more about your book, The Hacker’s Guide to Python.

It’s a compilation of everything I learned those last years building large Python applications. I spent the last 6 years developing on a large code base with thousands of other developers.

I’ve reviewed tons of code and identified the biggest issues, mistakes, and bad practice that developers tend to have. I decided to compile that in a guide, helping developers that played a bit with Python to learn the stages to get really productive with Python.

OpenStack is the biggest open source project in Python, Can you tell us more about OpenStack?

OpenStack is a cloud computing platform, started 7 years ago now. Its goal is to provide a programmatic platform to manage your infrastructure while being open source and avoiding vendor lock-in.

Who uses OpenStack? Is it for programmers, website owners?

It’s used by a lot of different organizations – not really by individuals. It’s a big piece of software. You can find it in some famous public cloud providers (Dreamhost, Rackspace…), and also as a private cloud in a lot of different organizations, from Bloomberg to eBay or the CERN in Switzerland, a big OpenStack user. Tons of telecom providers also leverages OpenStack for their own internal infrastructure.

Have you participated in any OpenStack conference? What did you speak on if you did?

I’ve attended the last 9 OpenStack summits and a few other OpenStack events around the world. I’ve been engaged in the upstream community for the last 6 years now.

My area of expertise is telemetry, the stack of software that is in charge of collecting and storing metrics from the various OpenStack components. This is what I regularly talk about during those events.

How can one join the OpenStack community?

There’s an entire documentation about that, called the Developer’s Guide. It explains how to setup your environment to send patches, how to join the community using the mailing-lists or IRC.

What makes your book, The Hacker’s Guide to Python stand out from other Python books? Also, who exactly did you write this book for?

I wrote the book that I always wanted to read about Python, but never found. It’s not a book for people that want to learn Python from scratch. It’s a great guide for those who know the language but don’t know the details that experienced developers know and that make the difference. The best practice, the elegant solutions to common problems, etc. That’s why it also includes interviews with prominent Python developers, so they can share their advice on different areas.

How can someone get your book?

I’ve decided to self-publish my book, so he does not have an editor like you can be used to see. The best place to get it is online at where you can pick the format you want, electronic or paper.

What do you mean when you say you hack with Python?

Unfortunately, most people refer to hacking as the activity of some bad guys trying to get access to whatever they’re not supposed to see. In the book title, I mean “hacking” as the elegant way of writing code and making things worse smoothly even when you were not expecting to make it.

You mentioned earlier that Gnocchi is a time series database. Can you please be more elaborate about Gnocchi? Is there also any documentation about Gnocchi?

So Gnocchi is a project I started a few years ago to store time series at large scale. Time series are basically a series of tuple composed of a timestamp and a value.

Imagine you wanted to store the temperature of all the rooms of the world at any point of time. You’d need a dedicated database for that with the right data structure. This is what Gnocchi does: it provides this data structure storage at very, very large scale.

The primary use case is infrastructure monitoring, so most people use it to store tons of metrics about their hardware, software, etc. It’s fully documented on its website.

How can a programmer without much experience contribute to open source projects?

The best way to start is to try to fix something that irritates you in some way. It might be a bug, it might be a missing feature. Start small. Don’t try big things first or you could be discouraged.

Never stop.

Also, don’t plunge right away in the community and start poking random people or spam them with questions. Do your homework, and listen

Do your homework, and listen to the community for a while to get a sense of how things are going. That can be joining IRC and lurking or following the mailing lists for example.

Big open source community dedicate programs to help you become engaged. It might be worth a try. Generic programs like Outreachy or Google Summer of Code are a great way to start if you don’t feel confident enough to jump by your own means in a community.

Just out of curiosity, do you write code in French?

Never ever. I think it’s acceptable to write in your language if you are sure that your code will never be open sourced and that your whole team is talking in that language, no matter what – but it’s a ballsy assumption, clearly.

Truth is that if you do open source, English is the standard, so go with it. Be sad if you want, but please be pragmatic.

I’ve seen projects being open sourced by companies where all the code source comments were in Korean. It was impossible for any non-Korean people to get a glance of what the code and the project was doing, so it just failed and disappeared.

How does a team of programmers handle bugs in a large open source project?

I wish there was some magic recipe, but I don’t think it’s the case. What you want is to have a place where your users can feel safe reporting bugs. Include a template so they don’t forget any details: how to reproduce the bugs, what they expected, etc. The worst thing is to have users reporting “That does not work.” with no details. It’s a waste of time.

What you want is to have a place where your users can feel safe reporting bugs. Include a template so they don’t forget any details: how to reproduce the bugs, what they expected, etc. The worst thing is to have users reporting “That does not work.” with no details. It’s a waste of time.

What tool to use to log all of that really depends on the team size and culture.

Once that works, the actual fixing of bug doesn’t follow any rule. Most developers fix the bug they encounter or the ones that are the most critical for users. Smaller problems might not be fixed for a long time.

Can you tell us about the new book you are working on and when do we expect to get it?

That new book is entitled “Scaling Python” and it provides insight into how to build largely scalable and distributed applications using Python.

It is also based on my experience in building this kind of software during the past years. This book also includes interviews of great Python hackers who work on scalable system or know a thing or two about writing applications for performance – an important point to have scalable applications.

The book is in its final stage now, and it should be out at the beginning of 2018.

How can someone get in contact with you?

I’m reachable at julien@danjou.info by email or via Twitter, @juldanjou.

by Julien Danjou at October 05, 2017 07:39 PM

Red Hat Stack

Using Red Hat OpenStack Platform director to deploy co-located Ceph storage – Part Two

Previously we learned all about the benefits in placing Ceph storage services directly on compute nodes in a co-located fashion. This time, we dive deep into the deployment templates to see how an actual deployment comes together and then test the results!

Enabling Co-Location

This article assumes the director is installed and configured with nodes already registered. The default Heat deployment templates ship an environment file for enabling Pure HCI. This environment file is:

/usr/share/openstack-tripleo-heat-templates/environments/hyperconverged-ceph.yaml

This file does two things:

  1. It redefines the composable service list for the Compute role to include both Compute and Ceph Storage services. The parameter for storing this list in ComputeServices.

  2. It enables a port on the Storage Management network for Compute nodes using the OS::TripleO::Compute::Ports::StorageMgmtPort resource. The default network isolation disables this port for standard Compute nodes. For our scenario we must enable this port and its network for the Ceph services to communicate. If you are not using network isolation, you can leave the resource at None to disable the resource.

Updating Network Templates

As mentioned, the Compute nodes need to be attached to the Storage Management network so Red Hat Ceph Storage can access the OSDs on them. This is not usually required in a standard deployment. To ensure the Compute node receives an IP address on the Storage Management network, you need to modify the NIC templates for your  Compute node to include it. As a basic example, the following snippet adds the Storage Management network to the compute node via the OVS bridge supporting multiple VLANs:

    - type: ovs_bridge
     name: br-vlans
     use_dhcp: false
     members:
     - type: interface
       name: nic3
       primary: false
     - type: vlan
       vlan_id:
         get_param: InternalApiNetworkVlanID
       addresses:
       - ip_netmask:
           get_param: InternalApiIpSubnet
     - type: vlan
       vlan_id:
         get_param: StorageNetworkVlanID
       addresses:
       - ip_netmask:
           get_param: StorageIpSubnet
     - type: vlan
       vlan_id:
         get_param: StorageMgmtNetworkVlanID
       addresses:
       - ip_netmask:
           get_param: StorageMgmtIpSubnet
     - type: vlan
       vlan_id:
         get_param: TenantNetworkVlanID
       addresses:
       - ip_netmask:
           get_param: TenantIpSubnet

The blue highlighted section is the additional VLAN interface for the Storage Management network we discussed.

Isolating Resources

We calculate the amount of memory to reserve for the host and Red Hat Ceph Storage services using the formula found in “Reserve CPU and Memory Resources for Compute”. Note that we accommodate for 2 OSDs so that we can potentially scale an extra OSD on the node in the future.

Our total instances:
32GB / (2GB per instance + 0.5GB per instance for host overhead) = ~12 hosts

Total host memory to reserve:
(12 hosts * 0.5 overhead) + (2 OSDs * 3GB) = 12GB or 12000MB

This means our reserved host memory is 12000MB.

We can also define how to isolate the CPU resources in two ways:

  • CPU Allocation Ratio – Estimate the CPU utilization of each instance and set the ratio of instances per CPU while taking into account Ceph service usage. This ensures a certain amount of CPU resources are available for the host and Ceph services. See the ”Reserve CPU and Memory Resources for Compute” documentation for more information on calculating this value.
  • CPU PinningDefine which CPU cores are reserved for instances and use the remaining CPU cores for the host and Ceph services.

This example uses CPU pinning. We are reserving cores 1-7 and 9-15 of our Compute node for our instances. This leaves cores 0 and 8 (both on the same physical core) for the host and Ceph services. This provides one core for the current Ceph OSD and a second core in case we scale the OSDs. Note that we also need to isolate the host to these two cores. This is shown after deploying the overcloud. 

main-board-89050_640

Using the configuration shown, we create an additional environment file that contains the resource isolation parameters defined above:

parameter_defaults:
 NovaReservedHostMemory: 12000
 NovaVcpuPinSet: ['1-7,9-15']

Our example does not use NUMA pinning because our test hardware does not support multiple NUMA nodes. However if you want to pin the Ceph OSDs to a specific NUMA node, you can do so using following “Configure Ceph NUMA Pinning”.

Deploying the configuration …

This example uses the following environment files in the overcloud deployment:

  • /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml – Enables network isolation for the default roles, including the standard Compute role.
  • /home/stack/templates/network.yamlCustom file defining network parameters (see Updating Network Templates). This file also sets the OS::TripleO::Compute::Net::SoftwareConfig resource to use our custom NIC Template containing the additional Storage Management VLAN we added to the Compute nodes above.
  • /usr/share/openstack-tripleo-heat-templates/environments/hyperconverged-ceph.yamlRedefines the service list for Compute nodes to include the Ceph OSD service. Also adds a Storage Management port for this role. This file is provided with the director’s Heat template collection.
  • /home/stack/templates/hci-resource-isolation.yamlCustom file with specific settings for resource isolation features such as memory reservation and CPU pinning (see Isolating Resources).

The following command deploys an overcloud with one Controller node and one co-located Compute/Storage node:

$ openstack overcloud deploy \
    --templates /usr/share/openstack-tripleo-heat-templates \
    -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml \      -e /home/stack/templates/network.yaml \
    -e /home/stack/templates/storage-environment.yaml \
    -e /usr/share/openstack-tripleo-heat-templates/environments/hyperconverged-ceph.yaml \
    -e /home/stack/templates/hci-resource-isolation.yaml
    --ntp-server pool.ntp.org

Configuring Host CPU Isolation

As a final step, this scenario requires isolating the host from using the CPU cores reserved for instances. To do this, log into the Compute node and run the following commands:

$ sudo grubby --update-kernel=ALL --args="isolcpus=1,2,3,4,5,6,7,9,10,11,12,13,14,15"
$ sudo grub2-install /dev/sda

This updates the kernel to use the isolcpus parameter, preventing the kernel from using cores reserved for instances. The grub2-install command updates the boot record, which resides on /dev/sda for default locations. If using a custom disk layout for your overcloud nodes, this location might be different.

After setting this parameter, we reboot our Compute node:

$ sudo reboot

Testing

After the Compute node reboots, we can view the hypervisor details to see the isolated resources from the undercloud:

$ source ~/overcloudrc
$ openstack hypervisor show overcloud-compute-0.localdomain -c vcpus
+-------+-------+
| Field | Value |
+-------+-------+
| vcpus | 14    |
+-------+-------+
$ openstack hypervisor show overcloud-compute-0.localdomain -c free_ram_mb
+-------------+-------+
| Field       | Value |
+-------------+-------+
| free_ram_mb | 20543 |
+-------------+-------+

2 of the 16 CPU cores are reserved for the Ceph services and only 20GB out for 32GB is available for the host to use for instance.

So, let’s see if this really worked. To find out, we will run some Browbeat tests against the overcloud. Browbeat is a performance and analysis tool specifically for OpenStack. It allows you to analyse, tune, and automate the entire process.

For our test we have run a set of Browbeat benchmark tests showing the CPU activity for different cores. The following graph displays the activity for a host/Ceph CPU core (Core 0) during one of the tests:

Screen Shot 2017-09-29 at 10.18.42 am

The green line indicates the system processes and the yellow line indicates the user processes. Notice that the CPU core activity peaks during the beginning and end of the test, which is when the disks for the instances were created and deleted respectively. Also notice the CPU core activity is fairly low as a percentage.

The other available host/Ceph CPU core (Core 8) follows a similar pattern:

Screen Shot 2017-09-29 at 10.19.43 am

The peak activity for this CPU core occurs during instance creation and during three periods of high instance activity (the Browbeat tests). Also notice the activity percentages are significantly higher than the activity on Core 0.

Finally, the following is an unused CPU core (Core 2) during the same test:

Screen Shot 2017-09-29 at 10.21.06 am

As expected, the unused CPU core shows no activity during the test. However, if we create more instances and exceed the ratio of allowable instances on Core 1, then these instances would use another CPU core, such as Core 2.

These graphs indicate our resource isolation configuration works and the Ceph services will not overlap with our Compute services, and vice versa.

Conclusion

Co-locating storage on compute nodes provides a simple method to consolidate storage and compute resources. This can help when you want to maximize the hardware of each node and consolidate your overcloud. By adding tuning and resource isolation you can allocate dedicated resources to both storage and compute services, preventing both from starving each other of CPU and memory. And by doing this via Red Hat OpenStack Platform director and Red Hat Ceph Storage, you have a solution that is easy to deploy and maintain!

by Dan Macpherson, Principal Technical Writer at October 05, 2017 01:38 AM

October 02, 2017

Red Hat Stack

Using Red Hat OpenStack Platform director to deploy co-located Ceph storage – Part One

An exciting new feature in Red Hat OpenStack Platform 11 is full Red Hat OpenStack Platform director support for deploying Red Hat Ceph storage directly on your overcloud compute nodes. Often called hyperconverged, or HCI (for Hyperconverged Infrastructure), this deployment model places the Red Hat Ceph Storage Object Storage Daemons (OSDs) and storage pools directly on the compute nodes.

Co-locating Red Hat Ceph Storage in this way can significantly reduce both the physical and financial footprint of your deployment without requiring any compromise on storage.

opwithtoolsinside

Red Hat OpenStack Platform director is the deployment and lifecycle management tool for Red Hat OpenStack Platform. With director, operators can deploy and manage OpenStack from within the same convenient and powerful lifecycle tool.

There are two primary ways to deploy this type of storage deployment which we currently refer to as pure HCI and mixed HCI.

In this two-part blog series we are going to focus on the Pure HCI scenario demonstrating how to deploy an overcloud with all compute nodes supporting Ceph. We do this using the Red Hat OpenStack Platform director. In this example we also implement resource isolation so that the Compute and Ceph services have their own dedicated resources and do not conflict with each other. We then show the results in action with a set of Browbeat benchmark tests.

But first …

Before we get into the actual deployment, let’s take a look at some of the benefits around co-locating storage and compute resources.

  • Smaller deployment footprint: When you perform the initial deployment, you co-locate more services together on single nodes, which helps simplify the architecture on fewer physical servers.

  • Easier to plan, cheaper to start out: co-location provides a decent option when your resources are limited. For example, instead of using six nodes, three for Compute and three for Ceph Storage, you can just co-locate the storage and use only three nodes.

  • More efficient capacity usage: You can utilize the same hardware resources for both Compute and Ceph services. For example, the Ceph OSDs and the compute services can take advantage of the same CPU, RAM, and solid-state drive (SSD). Many commodity hardware options provide decent resources that can accommodate both services on the same node.

  • Resource isolation: Red Hat addresses the noisy neighbor effect through resource isolation, which you orchestrate through Red Hat OpenStack Platform director.

However, while co-location realizes many benefits there are some considerations to be aware of with this deployment model. Co-location does not necessarily offer reduced latency in storage I/O. This is due to the distributed nature of Ceph storage: storage data is spread across different OSDs, and OSDs will be spread across several hyper-converged nodes. An instance on one node might need to access storage data from OSDs spread across several other nodes.

The Lab

Now that we fully understand the benefits and considerations for using co-located storage, let’s take a look at a deployment scenario to see it in action. 

computer-893226_640

We have developed a scenario using Red Hat OpenStack Platform 11 that deploys and demonstrates a simple “Pure HCI” environment. Here are the details.

We are using three nodes for simplicity:

  • 1 director node
  • 1 Controller node
  • 1 Compute node (Compute + Ceph)

Each of these nodes are these same specifications:

  • Dell PowerEdge R530
  • Intel Xeon CPU E5-2630 v3 @ 2.40GHz  – This contains 8 cores each with hyper-threading, providing us with a total of 16 cores.
  • 32 GB RAM
  • 278 GB SSD Hard Drive

Of course for production installs you would need a much more detailed architecture; this scenario simply allows us to quickly and easily demonstrate the advantages of co-located storage. 

This scenario follows these resource isolation guidelines:

  • Reserve enough resources for 1 Ceph OSD on the Compute node
  • Reserve enough resources to potentially scale an extra OSD on the same Compute node
  • Plan for instances to use 2GB on average but reserve 0.5GB per instance on the Compute node for overhead.

This scenario uses network isolation using VLANs:

  • Because the default Compute node deployment template shipped with the tripleo-heat-templates do not attach the Storage Management network computes we need to change that. They require a simple modification to accommodate the Storage Management network which is illustrated later.

Now that we have everything ready, we are set to deploy our hyperconverged solution! But you’ll have to wait for next time for that so check back soon to see the deployment in action in Part Two of the series!



Want to find out how Red Hat can help you plan, implement and run your OpenStack environment? Join Red Hat Architects Dave Costakos and Julio Villarreal Pelegrino in “Don’t fail at scale: How to plan, build, and operate a successful OpenStack cloud” today.

For full details on architecting your own Red Hat OpenStack Platform deployment check out the official Architecture Guide. And for details about Red Hat OpenStack Platform networking see the detailed Networking Guide.

by Dan Macpherson, Principal Technical Writer at October 02, 2017 10:31 PM

RDO Blog

Recent blog posts

Here's what the RDO community has been blogging about recently:

OpenStack 3rd Party CI with Software Factory by jpena

Introduction When developing for an OpenStack project, one of the most important aspects to cover is to ensure proper CI coverage of our code. Each OpenStack project runs a number of CI jobs on each commit to test its validity, so thousands of jobs are run every day in the upstream infrastructure.

Read more at http://rdoproject.org/blog/2017/09/openstack-3rd-party-ci-with-software-factory/

OpenStack Days UK by Steve Hardy

OpenStack Days UKYesterday I attended the OpenStack Days UK event, held in London.  It was a very good day and there were a number of interesting talks, and it provided a great opportunity to chat with folks about OpenStack.I gave a talk, titled "Deploying OpenStack at scale, with TripleO, Ansible and Containers", where I gave an update of the recent rework in the TripleO project to make more use of Ansible and enable containerized deployments.I'm planning some future blog posts with more detail on this topic, but for now here's a copy of the slide deck I used, also available on github.

Read more at http://hardysteven.blogspot.com/2017/09/openstack-days-uk-yesterday-i-attended.html

OpenStack Client in Queens - Notes from the PTG by jpichon

Here are a couple of notes about the OpenStack Client, taken while dropping in and out of the room during the OpenStack PTG in Denver, a couple of weeks ago.

Read more at http://www.jpichon.net/blog/2017/09/openstack-client-queens-notes-ptg/

Event report: OpenStack PTG by rbowen

Last week I attended the second OpenStack PTG, in Denver. The first one was held in Atlanta back in February.

Read more at http://drbacchus.com/event-report-openstack-ptg/

by Rich Bowen at October 02, 2017 04:48 PM

September 27, 2017

Steve Hardy

OpenStack Days UK

OpenStack Days UK

Yesterday I attended the OpenStack Days UK event, held in London.  It was a very good day and there were a number of interesting talks, and it provided a great opportunity to chat with folks about OpenStack.

I gave a talk, titled "Deploying OpenStack at scale, with TripleO, Ansible and Containers", where I gave an update of the recent rework in the TripleO project to make more use of Ansible and enable containerized deployments.

I'm planning some future blog posts with more detail on this topic, but for now here's a copy of the slide deck I used, also available on github.



by Steve Hardy (noreply@blogger.com) at September 27, 2017 11:18 AM

Julie Pichon

OpenStack Client in Queens - Notes from the PTG

Here are a couple of notes about the OpenStack Client, taken while dropping in and out of the room during the OpenStack PTG in Denver, a couple of weeks ago.

OSC 4

The original plan was to simply get rid of deprecated stuff, change a few names here and there and have few compatibility breaking changes.

However, now shade may adopt the SDK and move some of its contents into it. Then shade would consume the SDK, and OSC would consume it as well. It would be pretty clean and easy to use, but would mean major breaking changes for OSC4. OSC would become a shim layer over osc-lib. The plugin interface is going to change, as the loading time is long - every command requires loading all of the plugins which takes over half of the loading time even though the commands themselves load quickly. (There will be more communication once we understand what the new plugin interface will look like.) OSC4 would rip out global argument processing and adopt os-client-config (breaking change). It would adopt the SDK and stop using the client libraries.

Note that this may all change depending on how the SDK situation evolves.

From the end-user perspective, some option names will change. There is some old cruft left around for compatibility reasons that will disappear (e.g. "ip floating" will be gone, it changed a year ago to "floating ip"). The column output will handle structured data better and some of this is already commited to the osc4 feature branch.

The order of commands will not be changed.

For authentication, the bevahiour may change a bit between the CLI behaviour or clouds.yaml. os-client-config came along and changed a few things, notably with regard to precedence. The OSC way of doing will be removed and replaced with OCC.

Best effort will be made not to break scripts. The "configuration show" command shows your current configuration but not where it comes from - it's a bit hard to do because of all the merging of parameters going on.

The conversation continued about auth, how shade uses adapters and may change the SDK to use them as well: would sessions or adapters make the most sense? I had to attend another session and missed the discussion and conclusions.

Command aliases

There was a long discussion around command aliases, as some commands are very long to type (e.g. healthmonitor). It was very clear it's not something OSC wants to get into the business of managing itself (master list of collisions, etc) so it would be up to individual plugins. There could be individual .osc config file that would do the short to long name mapping, similar to a shell alias. It shouldn't be part of the official plugin (otherwise, "why don't we just use those names to begin with?") but it could be another pluging that sets up alias mappings to the short name or a second set of entry points, or include a "list of shortcuts we found handy" in the documentation. Perhaps there should be a community-wide discussion about this.

Collisions are to be managed by users, not by OSC. Having one master list to manage the initial set of keywords is already an unfortunate compromise.

Filtering and others

It's not possible to do filtering on lists or any kind of complex filtering at the moment. The recommendation, or what people currently do, is to output to --json and pipe the output to jq to do what they need. The documentation should be extended to show how to do this.

At the moment filtering varies wildly between APIs and none of them are very expressive, so there isn't a lot OSC can do.

Tagged with: events, openstack

by jpichon at September 27, 2017 08:40 AM

September 26, 2017

Lars Kellogg-Stedman

Some notes on PWM on the Raspberry Pi

I was recently working on a project in which I wanted to drive a simple piezo buzzer attached to a GPIO pin on a Raspberry Pi. I was already using the RPi.GPIO module in my project so that seemed like a logical place to start, but I ran into …

by Lars Kellogg-Stedman at September 26, 2017 04:00 AM

September 21, 2017

Rich Bowen

Event report: OpenStack PTG

Last week I attended the second OpenStack PTG, in Denver. The first one was held in Atlanta back in February.

This is not an event for everyone, and isn’t your standard conference. It’s a working meeting – a developers’ summit at which the next release of the OpenStack software is planned. The website is pretty blunt about who should, and should not, attend. So don’t sign up without knowing what your purpose is there, or you’ll spend a lot of time wondering why you’re there.

I went to do the second installment of my video series, interviewing the various project teams about what they did in the just-released version, and what they anticipate coming in the next release.

The first of these, at the PTG in Atlanta, featured only Red Hat engineers. (Those videos are HERE.) However, after reflection, I decided that it would be best to not limit it, but to expand it to the entire community, focusing on cross-company collaboration, and trying to get as many projects represented as I could.

So, in Denver I asked the various project PTL (project technical leads) to do an interview, or to assemble a group of people from the project to do interviews. I did 22 interviews, and I’m publishing those on the RDO YouTube channel – http://youtube.com/RDOCommunity – just as fast as I can get them edited.

I also hosted an RDO get-together to celebrate the Pike release, and we had just over 60 people show up for that. Thank you all so much for attending! (Photos and video from that coming soon to a blog near you!)

So, watch my YouTube channel, and hopefully by the end of next week I’ll have all of those posted.

I love working with the OpenStack community because they remind me of Open Source in the old days, when developers cared about the project and the community at least as much, and often more, than about the company that happens to pay their paycheck. It’s very inspiring to listen to these brilliant men and women talking about their projects.

by rbowen at September 21, 2017 06:35 PM

September 19, 2017

RDO Blog

Recent blog posts

It's been a few weeks since I did one of these blog wrapups, and there's been a lot of great content by the RDO community recently.

Here's some of what we've been talking about recently:

Project Teams Gathering (PTG) report - Zuul by tristanC

The OpenStack infrastructure team gathered in Denver (September 2017). This article reports some of Zuul's topics that were discussed at the PTG.

Read more at http://rdoproject.org/blog/2017/09/PTG-report-zuul/

Evaluating Total Cost of Ownership of the Identity Management Solution by Dmitri Pal

Increasing Interest in Identity Management: During last several months I’ve seen a rapid growth of interest in Red Hat’s Identity Management (IdM) solution. This might have been due to different reasons.

Read more at http://rhelblog.redhat.com/2017/09/18/evaluating-total-cost-of-ownership-of-the-identity-management-solution/

Debugging TripleO Ceph-Ansible Deployments by John

Starting in Pike it is possible to use TripleO to deploy Ceph in containers using ceph-ansible. This is a guide to help you if there is a problem. It asks questions, somewhat rhetorically, to help you track down the problem.

Read more at http://blog.johnlikesopenstack.com/2017/09/debug-tripleo-ceph-ansible.html

Make a NUMA-aware VM with virsh by John

Grégory showed me how he uses virsh edit on a VM to add something like the following:

Read more at http://blog.johnlikesopenstack.com/2017/09/make-numa-aware-vm-with-virsh.html

Writing a SELinux policy from the ground up by tristanC

SELinux is a mechanism that implements mandatory access controls in Linux systems. This article shows how to create a SELinux policy that confines a standard service:

Read more at http://rdoproject.org/blog/2017/09/SELinux-policy-from-the-ground-up/

Trick to test external ceph clusters using only tripleo-quickstart by John

TripleO can stand up a Ceph cluster as part of an overcloud. However, if all you have is a tripleo-quickstart env and want to test an overcloud feature which uses an external Ceph cluster, then can have quickstart stand up two heat stacks, one to make a separate ceph cluster and the other to stand up an overcloud which uses that ceph cluster.

Read more at http://blog.johnlikesopenstack.com/2017/09/trick-to-test-external-ceph-clusters.html

RDO Pike released by Rich Bowen

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Pike for RPM-based distributions, CentOS Linux 7 and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Pike is the 16th release from the OpenStack project, which is the work of more than 2300 contributors from around the world (source).

Read more at http://rdoproject.org/blog/2017/09/rdo-pike-released/

OpenStack Summit Sydney preview: Red Hat to present at more than 40 sessions by Peter Pawelski, Product Marketing Manager, Red Hat OpenStack Platform

The next OpenStack Summit will take place in Sydney, Australia, November 6-8. And despite the fact that the conference will only run three days instead of the usual four, there will be plenty of opportunities to learn about OpenStack from Red Hat’s thought leaders.

Read more at http://redhatstackblog.redhat.com/2017/08/31/openstack-summit-fall2017-preview/

Scheduled snapshots by Tim Bell

While most of the machines on the CERN cloud are configured using Puppet with state stored in external databases or file stores, there are a few machines where this has been difficult, especially for legacy applications. Doing a regular snapshot of these machines would be a way of protecting against failure scenarios such as hypervisor failure or disk corruptions.

Read more at http://openstack-in-production.blogspot.com/2017/08/scheduled-snapshots.html

Ada Lee: OpenStack Security, Barbican, Novajoin, TLS Everywhere in Ocata by Rich Bowen

Ada Lee talks about OpenStack Security, Barbican, Novajoin, and TLS Everywhere in Ocata, at the OpenStack PTG in Atlanta, 2017.

Read more at http://rdoproject.org/blog/2017/08/ada-lee-openstack-security-barbican-novajoin-tls-everywhere-in-ocata/

Octavia Developer Wanted by assafmuller

I’m looking for a Software Engineer to join the Red Hat OpenStack Networking team. I am presently looking to hire in Europe, Israel and US East. The candidate may work from home or from one of the Red Hat offices. The team is globally distributed and comprised of talented, autonomous, empowered and passionate individuals with a healthy work/life balance. The candidate will work on OpenStack Octavia and LBaaS. The candidate will write and review code while working with upstream community members and fellow Red Hatters. If you want to do open source, Red Hat is objectively where it’s at. We have an institutional culture of open source at all levels and this has a ripple effect on your day to day and your career at the company.

Read more at https://assafmuller.com/2017/08/18/octavia-developer-wanted/

by Rich Bowen at September 19, 2017 03:50 PM

Project Teams Gathering (PTG) report - Zuul

The OpenStack infrastructure team gathered in Denver (September 2017). This article reports some of Zuul's topics that were discussed at the PTG.

For your reference, I highlighted some of the new features comming in the Zuul version 3 in this article.

Cutover and jobs migration

The OpenStack community grew a complex set of CI jobs over the past several years, that needs to be migrated. A zuul-migrate script has been created to automate the migration from the Jenkins-Jobs-Builder format to the new Ansible based job definition. The migrated jobs are prefixed with "-legacy" to indicate they still need to be manually refactored to fully benefit from the ZuulV3 features.

The team couldn't finish the migration and disable the current ZuulV2 services at the PTG because the jobs migration took longer than expected. However, a new cutover attemp will occur in the next few weeks.

Ansible devstack job

The devstack job has been completely rewritten to a fully fledged Ansible job. This is a good example of what a job looks like in the new Zuul:

A project that needs a devstack CI job needs this new job definition:

- job:
    name: shade-functional-devstack-base
    parent: devstack
    description: |
      Base job for devstack-based functional tests
    pre-run: playbooks/devstack/pre
    run: playbooks/devstack/run
    post-run: playbooks/devstack/post
    required-projects:
      # These jobs will DTRT when shade triggers them, but we want to make
      # sure stable branches of shade never get cloned by other people,
      # since stable branches of shade are, well, not actually things.
      - name: openstack-infra/shade
        override-branch: master
      - name: openstack/heat
      - name: openstack/swift
    roles:
      - zuul: openstack-infra/devstack-gate
    timeout: 9000
    vars:
      devstack_localrc:
        SWIFT_HASH: "1234123412341234"
      devstack_local_conf:
        post-config:
          "$CINDER_CONF":
            DEFAULT:
              osapi_max_limit: 6
      devstack_services:
        ceilometer-acentral: False
        ceilometer-acompute: False
        ceilometer-alarm-evaluator: False
        ceilometer-alarm-notifier: False
        ceilometer-anotification: False
        ceilometer-api: False
        ceilometer-collector: False
        horizon: False
        s-account: True
        s-container: True
        s-object: True
        s-proxy: True
      devstack_plugins:
        heat: https://git.openstack.org/openstack/heat
      shade_environment:
        # Do we really need to set this? It's cargo culted
        PYTHONUNBUFFERED: 'true'
        # Is there a way we can query the localconf variable to get these
        # rather than setting them explicitly?
        SHADE_HAS_DESIGNATE: 0
        SHADE_HAS_HEAT: 1
        SHADE_HAS_MAGNUM: 0
        SHADE_HAS_NEUTRON: 1
        SHADE_HAS_SWIFT: 1
      tox_install_siblings: False
      tox_envlist: functional
      zuul_work_dir: src/git.openstack.org/openstack-infra/shade

This new job definition simplifies a lot the devstack integration tests and projects now have a much more fine grained control over their integration with the other OpenStack projects.

Dashboard

I have been working on the new zuul-web interfaces to replace the scheduler webapp so that we can scale out the REST endpoints and prevent direct connections to the scheduler. Here is a summary of the new interfaces:

  • /tenants.json : return the list of tenants,
  • /{tenant}/status.json : return the status of the pipelines,
  • /{tenant}/jobs.json : return the list of jobs defined, and
  • /{tenant}/builds.json : return the list of builds from the sql reporter.

Moreover, the new interfaces enable new use cases, for example, users can now:

  • Get the list of available jobs and their description,
  • Check the results of post and periodic jobs, and
  • Dynamically list jobs' results using filters, for example, the last tripleo periodic jobs can be obtained using:
$ curl ${TENANT_URL}/builds.json?project=tripleo&pipeline=periodic | python -mjson.tool
[
    {
        "change": 0,
        "patchset": 0,
        "id": 16,
        "job_name": "periodic-tripleo-ci-centos-7-ovb-ha-oooq",
        "log_url": "https://logs.openstack.org/periodic-tripleo-ci-centos-7-ovb-ha-oooq/2cde3fd/",
        "pipeline": "periodic",
		...
    },
    ...
]

OpenStack health

The openstack-health service is likely to be modified to better interface with the new Zuul design. It is currently connected to an internal gearman bus to receive job completion events before running the subunit2sql process.

This processing could be rewritten as a post playbook to do the subunit processing as part of the job. Then the data could be pushed to the SQL server with the credencials stored in a Zuul's secret.

Roadmap

The last day, even though most of us were exhausted, we spend some time discussing the roadmap for the upcoming months. While the roadmap is still being defined, here are some hilights:

  • Based on new user's walkthrough, the documentation will be greatly improved, For example see this nodepool contribution.
  • Jobs will be able to return structured data to improve the reporting. For example a pypi publisher may return the published url. Similarly, a rpm-build job may return the repository url.
  • Dashboard web interface and javascript tooling,
  • Admin interface to manually trigger unique build or cancel a buildset,
  • Nodepool quota to improve performances,
  • Cross source dependencies, for example a github change in Ansible could depends-on a gerrit change in shade,
  • More Nodepool drivers such as Kubernetes or AWS, and
  • Fedmsg and mqtt zuul driver for message bus repporting and trigger source.

In conclusion, the ZuulV3 efforts were extremly fruitful and this article only covers a few of the design sessions. Once again, we have made great progress and I'm looking forward to further developments. Thanks you all for the great team gathering event!

by tristanC at September 19, 2017 12:42 PM

September 18, 2017

RHELblog

Evaluating Total Cost of Ownership of the Identity Management Solution

Increasing Interest in Identity Management

During last several months I’ve seen a rapid growth of interest in Red Hat’s Identity Management (IdM) solution. This might have been due to different reasons.

  • First of all IdM has become much more mature and well known. In the past you come to a conference and talk about FreeIPA (community version of IdM) and IdM and you get a lot of people in the audience that have never heard about it. It is not the case any more. IdM, as a solution, is well known now. There are thousands of the deployments all over the world both using Red Hat supported and community bits. Many projects and open source communities implemented integration with it as an identity back end. There is no surprise that customers who are looking for a good, cost effective identity management solution are now aware of it and start considering it. This leads to questions, calls, face-to-face meetings and presentations.
  • Another reason is that IdM/FreeIPA project has been keeping an ear to the ground and was quick to adjust its plans and implement features in response to some of the tightening regulations in different verticals. Let us, for example,  consider the government space. Over the last couple of years, the policies became more strict requiring a robust solution for two-factor-authentication using CAC and PIV smart cards. IdM responded by adding support for smart cards based authentication making it easy to achieve compliance with the mentioned regulations.
  • Yet another reason is that more and more customers realize that moving to a modern Identity Management system is going to enable them to more quickly and easily transition into the age of hybrid cloud, taking advantage of both public and on premises clouds like OpenStack, and as well as to the world of containers and container management platforms like OpenShift.

Software Costs

One of the main questions people ask when they hear about the IdM solution is: Is Identity Management in Red Hat Enterprise Linux free? It is. Identity Management in Red Hat Enterprise Linux is a component of the platform and not a separately licensable product. What does this mean? This means that you can install IdM on any Red Hat Enterprise Linux server system with a valid subscription and get support from Red Hat.

There are many solutions on the market that build business around identity management services and integration with Active Directory that are not free. They require extra cost and dip into your IT budget.  Red Hat’s IdM solution is different. It is available without extra upfront cost for the software itself.

Total Cost of Ownership

People who have done identity management projects in their lives would support me in the claim that Identity Management should not be viewed as a project. It should be viewed as a program. There can be different phases, but the mindset and budgeting should assume that Identity Management is an ongoing endeavor. And it is actually quite reasonable if you think about it. Identity Management software connects to actual people and workforce dynamics. As the workforce evolves, the Identity Management software reflects the changes: growth, re-orgs, acquisitions and spin-offs. No two identity management implementations are the same. The solution has to adapt to a long list of use cases and be capable of unique requirements of every deployment. On one hand, the solution has to work all the time, and on the other hand, its limits are constantly stretched.

During my visits, I also help to architect a solution if customers are interested in quick “on the fly” white-boarding suggestions. Such designs need to be taken with a grain of salt as drive-by architecture usually considers the main technical requirements outlined during the discussion but does not consider hidden challenges and roadblocks that each organization has. So the suggested architecture should be viewed as a very rough draft and something to start thinking about rather than a precise blueprint that can be followed to a letter. After the first conversation it is recommended to read various publicly available materials. Red Hat documentation and man pages are good sources of information as well as the community project wikis for FreeIPA and SSSD. Identity Management documentation is very well maintained and regularly updated to reflect new changes or address reported issues.

In addition to reading documentation one can engage Red Hat professional services to help with a proof-of-concept or production deployment. Those services are priced per engagement. There are different pre-packaged offerings with the predefined results that you can purchase from Red Hat – just get in touch with your sales representative or technical account manager.

No matter what software you choose for your identity management solution, it makes sense to have someone on the vendor side who will be there to help with any issues and challenges you face, to connect you to experts and to reduce your downtime. Red Hat offers multiple tiers of support. One level includes a Technical Account Manager. More about TAM program can be read here. Since Identity Management should be viewed as an ongoing process and effort it makes sense to consider a TAM or the equivalent service from your vendor. Is it an extra cost? Yes but it is independent from the solution you choose. It is just a good risk mitigation strategy that makes your money work for you with the best possible return.

As always your comments and feedback are very welcome.

by Dmitri Pal at September 18, 2017 02:19 PM

September 08, 2017

John Likes OpenStack

Debugging TripleO Ceph-Ansible Deployments

Starting in Pike it is possible to use TripleO to deploy Ceph in containers using ceph-ansible. This is a guide to help you if there is a problem. It asks questions, somewhat rhetorically, to help you track down the problem.

What does this error from openstack overcloud deploy... mean?

If TripleO's new Ceph deployment fails, then you'll see an error like the following:


Stack overcloud CREATE_FAILED

overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution:
resource_type: OS::Mistral::ExternalResource
physical_resource_id: bb9e685c-fbe9-4573-8d74-2c053bc5de0d
status: CREATE_FAILED
status_reason: |
resources.WorkflowTasks_Step2_Execution: ERROR
Heat Stack create failed.

TripleO installs the OS and configures networking and other base services for OpenStack for the nodes during step 1 of its five-step deployment. During step 2, a new type of Heat OS::Mistral::ExternalResource is created which calls a new Mistral workflow which uses a new Mistral action to call an Ansible playook. The playbook that is called is site-docker.yaml.sample from ceph-ansible. Giulio covers this in more detail in Understanding ceph-ansible in TripleO. The above error message indicates that Heat was able to call Mistral, but that the Mistral workflow failed. So, the next place to look is the Mistral logs on the undercloud to see if the ceph-ansible site-docker.yml playbook ran.

Did the ceph-ansible playbook run?

The most helpful file for debugging TripleO ceph-ansible deployments is:


/var/log/mistral/ceph-install-workflow.log
If it doesn't exist or is empty, then the ceph-ansible playbook run did not happen.

If it does exist, then it's the key to solving the problem! Read it as it will contain the output of the ceph-ansible run which you can use to debug ceph-ansible as you normally would. The ceph-ansible docs should help. Once you think the environment has been changed so that you won't have the problem (details on that below), then re-run the `openstack overcloud deploy ...` command, and after TripleO does its normal checks, it will re-run the playbook. Because ceph-ansible and TripleO are idempotent, this process may be repeated as necessary.

Why didn't the ceph-ansible playbook run?

The following will show the playbook call to ceph-ansible:


cd /var/log/mistrtal/
grep site-docker.yml.sample executor.log | grep ansible-playbook

If there's an error during the playbook run, then it should look something like this...


2017-09-06 12:13:22.181 20608 ERROR mistral.executors.default_executor Command:
ansible-playbook -v /usr/share/ceph-ansible/site-docker.yml.sample --user tripleo-admin --become ...

If you don't see a playbook call like the above, then the Mistral tasks that set up the environment for a ceph-ansible run failed.

What does Mistral do to prepare the environment to run ceph-ansible?

A copy of the Mistral workbook which prepares the overcloud and undercloud to run ceph-ansible, and then runs it, is in:


/usr/share/tripleo-common/workbooks/ceph-ansible.yaml

The Mistral tasks do the following:

  • Configure the SSH key-pairs so the undercloud can run ansible tasks on the overcloud ndoes and the tripleo-admin user
  • Create a temporary fetch directory for ceph-ansible to use to copy configs between overcloud ndoes
  • Build a temporary Ansible inventory in a file like /tmp/ansible-mistral-actionSYRh6Q/inventory.yaml
  • Set the Ansible fork count to the number of nodes (but not >100).
  • Run the ceph-ansible site-docker.yaml.sample playbook
  • Clean up temproary files

To check the details of the Mistral tasks used by ceph-ansible, extract the workflow's UUID with the following:


WORKFLOW='tripleo.storage.v1.ceph-install'
UUID=$(mistral execution-list | grep $WORKFLOW | awk {'print $2'} | tail -1)

Then use the ID to examine each task:


for TASK_ID in $(mistral task-list $UUID | awk {'print $2'} | egrep -v 'ID|^$'); do
mistral task-get $TASK_ID
mistral task-get-result $TASK_ID | jq . | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'
done

If you really need to update the workbook itself, you can modify a copy and upload it with the following, but please see if your problem can instead be solved by simply overriding the default values in a Heat environment file as per the documentation.


source ~/stackrc
cp /usr/share/tripleo-common/workbooks/ceph-ansible.yaml .
vi ceph-ansible.yaml
mistral workbook-update ceph-ansible.yaml

I already know ceph-ansible; how do I edit the files in group_vars?

Please don't. It will break the TripleO integration. Instead please use TripleO as usual, and override the default values in a Heat environment file like ceph.yaml which you then use -e to add to your openstack overcloud deploy command as described in the documentation.

What changes does the TripleO ceph-ansible integration make to the files in ceph-ansible's group_vars?

None. Instead YAQL within tripleo-head-templates builds a Mistral environment which the ceph-ansible.yaml Mistral workbook may access to when it calls ceph-ansible. The workbook then passes those parameters as JSON with the ansible-playbook command's --extra-vars option. To see what parameters were passed using this method, grep the executor.log as above to see the ceph-ansible playbook call. The sample file, site-docker.yml.sample is called because that file is shipped by ceph-ansible. This allows TripleO to not need to maintain its own ceph-ansible fork.

What does a usual ceph-ansible playbook call look like when run by TripleO?


ansible-playbook -v /usr/share/ceph-ansible/site-docker.yml.sample
--user tripleo-admin
--become
--become-user root
--extra-vars
{"monitor_secret": "***",
"ceph_conf_overrides":
{"global": {"osd_pool_default_pg_num": 32,
"osd_pool_default_size": 1}},
"osd_scenario": "non-collocated",
"fetch_directory": "/tmp/file-mistral-action3_a1Cb",
"user_config": true,
"ceph_docker_image_tag": "tag-build-master-jewel-centos-7",
"ceph_release": "jewel",
"containerized_deployment": true,
"public_network": "192.168.24.0/24",
"copy_admin_key": false,
"journal_collocation": false,
"monitor_interface": "eth0",
"admin_secret": "***",
"raw_journal_devices": ["/dev/vdd", "/dev/vdd"],
"keys": [{"mon_cap": "allow r",
"osd_cap": "allow class-read object_prefix rbd_children, allow rwx pool=volumes, ... ],
"openstack_keys": [{"mon_cap": "allow r", ... ],
"generate_fsid": false,
"osd_objectstore": "filestore",
"monitor_address_block": "192.168.24.0/24",
"ntp_service_enabled": false,
"ceph_docker_image": "ceph/daemon",
"docker": true,
"fsid": "2d87a5e8-8e72-11e7-a223-003da9b9b610",
"journal_size": 256,
"cephfs_metadata": "manila_metadata",
"openstack_config": true,
"ceph_docker_registry": "docker.io",
"pools": [],
"cephfs_data": "manila_data",
"ceph_stable": true,
"devices": ["/dev/vdb", "/dev/vdc"],
"ceph_origin": "distro",
"openstack_pools": [
{"rule_name": "", "pg_num": 32, "name": "volumes"},
{"rule_name": "", "pg_num": 32, "name": "backups"},
{"rule_name": "", "pg_num": 32, "name": "vms"},
{"rule_name": "", "pg_num": 32, "name": "images"},
{"rule_name": "", "pg_num": 32, "name": "metrics"}],
"ip_version": "ipv4",
"ireallymeanit": "yes",
"cluster_network": "192.168.24.0/24",
"cephfs": "cephfs",
"raw_multi_journal": true
}
--forks 6
--ssh-common-args "-o StrictHostKeyChecking=no"
--ssh-extra-args "-o UserKnownHostsFile=/dev/null"
--inventory-file /tmp/ansible-mistral-actiontrguE1/inventory.yaml
--private-key /tmp/ansible-mistral-actiontrguE1/ssh_private_key
--skip-tags package-install,with_pkg

You can get the above in an unformated version of the following from a grep to /var/log/mistral/executor.log as described above.

How can I re-run only the ceph-ansible playbook?

Careful. This should not be done on a production deployment because if you re-run the Mistral deployment directly after getting the error posted under the first question, then the Heat Stack will not be updated. Thus, Heat will believe the OS::Mistral::ExternalResource resource has status CREATE_FAILED. If you are doing a practice deployment or development, then you can use Mistral's task-rerun. But this only works if the task has failed.

First get the Task ID


WORKFLOW='tripleo.storage.v1.ceph-install'
UUID=$(mistral execution-list | grep $WORKFLOW | awk {'print $2'} | tail -1)
mistral task-list $UUID | grep ERROR
For example:

(undercloud) [stack@undercloud workbooks]$ mistral task-list $UUID | grep ERROR
| 31257437-c877-40f8-872f-2576da89a8ea | ceph_install | tripleo.storage.v1.ceph-install | a5287f5c-f781-40cf-8fce-c56c21c52918 | ERROR | Failed to run action [act... | 2017-09-07 15:31:43 | 2017-09-07 15:31:46 |
(undercloud) [stack@undercloud workbooks]$
Then re-run the task

(undercloud) [stack@undercloud workbooks]$ mistral task-rerun 31257437-c877-40f8-872f-2576da89a8ea
+---------------+--------------------------------------+
| Field | Value |
+---------------+--------------------------------------+
| ID | 31257437-c877-40f8-872f-2576da89a8ea |
| Name | ceph_install |
| Workflow name | tripleo.storage.v1.ceph-install |
| Execution ID | a5287f5c-f781-40cf-8fce-c56c21c52918 |
| State | RUNNING |
| State info | None |
| Created at | 2017-09-07 15:31:43 |
| Updated at | 2017-09-08 16:24:04 |
+---------------+--------------------------------------+
(undercloud) [stack@undercloud workbooks]$

If you run the above and keep the following in another window:


tail -f /var/log/mistral/ceph-install-workflow.log
Then it's just like running `ansible-playbook site-docker.yaml ...` but you don't need to pass all of the --extra-vars because the same Mistral environment built by Heat is available.

by John (noreply@blogger.com) at September 08, 2017 03:38 PM

September 07, 2017

John Likes OpenStack

Make a NUMA-aware VM with virsh

Grégory showed me how he uses `virsh edit` on a VM to add something like the following:


<cpu mode='custom' match='exact' check='partial'>
<model fallback='allow'>SandyBridge</model>
<feature policy='force' name='vmx'/>
<numa>
<cell id='0' cpus='0-1' memory='4096000' unit='KiB'/>
<cell id='1' cpus='2-3' memory='4096000' unit='KiB'/>
</numa>
</cpu>

After that `lstopo` will show NUMA nodes you can use. E.g. if you want to start a process on your VM with `numactl`.


# lstopo-no-graphics
Machine (7999MB total)
NUMANode L#0 (P#0 3999MB)
Package L#0 + L3 L#0 (16MB) + L2 L#0 (4096KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
Package L#1 + L3 L#1 (16MB) + L2 L#1 (4096KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
NUMANode L#1 (P#1 4000MB)
Package L#2 + L3 L#2 (16MB) + L2 L#2 (4096KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
Package L#3 + L3 L#3 (16MB) + L2 L#3 (4096KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
Misc(MemoryModule)
HostBridge L#0
PCI 8086:7010
PCI 1013:00b8
GPU L#0 "card0"
GPU L#1 "controlD64"
3 x { PCI 1af4:1000 }
2 x { PCI 1af4:1001 }

by John (noreply@blogger.com) at September 07, 2017 03:32 PM

September 06, 2017

RDO Blog

Writing a SELinux policy from the ground up

SELinux is a mechanism that implements mandatory access controls in Linux systems. This article shows how to create a SELinux policy that confines a standard service:

  • Limit its network interfaces,
  • Restrict its system access, and
  • Protect its secrets.

Mandatory access control

By default, unconfined processes use discretionary access controls (DAC). A user has all the permissions over its objects, for example the owner of a log file can modify it or make it world readable.

In contrast, mandatory access control (MAC) enables more fine grained controls, for example it can restrict the owner of a log file to only append operations. Moreover, MAC can also be used to reduce the capability of a regular process, for example by denying debugging or networking capabilities.

This is great for system security, but is also a powerful tool to control and better understand an application. Security policies reduce services' attack surface and describes service system operations in depth.

Policy module files

A SELinux policy is composed of:

  • A type enforcement file (.te): describes the policy type and access control,
  • An interface file (.if): defines functions available to other policies,
  • A file context file (.fc): describes the path labels, and
  • A package spec file (.spec): describes how to build and install the policy.

The packaging is optional but highly recommended since it's a standard method to distribute and install new pieces on a system.

Under the hood, these files are written using macros processors:

  • A policy file (.pp) is generated using: make NAME=targeted -f "/usr/share/selinux/devel/Makefile"
  • An intermediary file (.cil) is generated using: /usr/libexec/selinux/hll/pp

Policy developpment workflow:

The first step is to get the services running in a confined domain. Then we define new labels to better protect the service. Finally the service is run in permissive mode to collect the access it needs.

As an example, we are going to create a security policy for the scheduler service of the Zuul program.

Confining a Service

To get the basic policy definitions, we use the sepolicy generate command to generate a bootstrap zuul-scheduler policy:

sepolicy generate --init /opt/rh/rh-python35/root/bin/zuul-scheduler

The –init argument tells the command to generate a service policy. Other types of policy could be generated such as user application, inetd daemon or confined administrator.

The .te file contains:

  • A new zuul_scheduler_t domain,
  • A new zuul_scheduler_exec_t file label,
  • A domain transition from systemd to zuul_scheduler_t when the zuul_scheduler_exec_t is executed, and
  • Miscellaneous definitions such as the ability to read localization settings.

The .fc file contains regular expressions to match a file path with a label: /bin/zuul-scheduler is associated with zuul_scheduler_exec_t.

The .if file contains methods (macros) that enable role extension. For example, we could use the zuul_scheduler_admin method to authorize a staff role to administrate the zuul service. We won't use this file because the admin user (root) is unconfined by default and it doesn't need special permission to administrate the service.

To install the zuul-scheduler policy we can run the provided script:

$ sudo ./zuul_scheduler.sh
Building and Loading Policy
+ make -f /usr/share/selinux/devel/Makefile zuul_scheduler.pp
Creating targeted zuul_scheduler.pp policy package
+ /usr/sbin/semodule -i zuul_scheduler.pp

Restarting the service should show (using "ps Zax") that it is now running with the system_u:system_r:zuul_scheduler_t:s0 context instead of the system_u:system_r:unconfined_service_t:s0.

And looking at the audit.log, it should show many "avc: denied error" because no permissions have yet been defined. Note that the service is running fine because this initial policy defines the zuul_scheduler_t domain as permissive.

Before authorizing the service's access, let's define the zuul resources.

Define the service resources

The service is trying to access /etc/opt/rh/rh-python35/zuul and /var/opt/rh/rh-python35/lib/zuul which inherited the etc_t and var_lib_t labels. Instead of giving zuul_scheduler_t access to etc_t and var_lib_t, we will create new types. Moreover the zuul-scheduler manages secret keys we could isolate from its general home directory and it requires two tcp ports.

In the .fc file, define the new paths:

/var/opt/rh/rh-python35/lib/zuul/keys(/.*)?  gen_context(system_u:object_r:zuul_keys_t,s0)
/etc/opt/rh/rh-python35/zuul(/.*)?           gen_context(system_u:object_r:zuul_conf_t,s0)
/var/opt/rh/rh-python35/lib/zuul(/.*)?       gen_context(system_u:object_r:zuul_var_lib_t,s0)
/var/opt/rh/rh-python35/log/zuul(/.*)?       gen_context(system_u:object_r:zuul_log_t,s0)

In the .te file, declare the new types:

# System files
type zuul_conf_t;
files_type(zuul_conf_t)
type zuul_var_lib_t;
files_type(zuul_var_lib_t)
type zuul_log_t;
logging_log_file(zuul_log_t)

# Secret files
type zuul_keys_t;
files_type(zuul_keys_t)

# Network label
type zuul_gearman_port_t;
corenet_port(zuul_gearman_port_t)
type zuul_webapp_port_t;
corenet_port(zuul_webapp_port_t);

Note that the file_type() macro is important since it provides unconfined access to the new types. Without it, even the admin user could not access the file.

In the .spec file, add the new path and setup the tcp port labels:

%define relabel_files() \
restorecon -R /var/opt/rh/rh-python35/lib/zuul/keys
...

# In the %post section, add
semanage port -a -t zuul_gearman_port_t -p tcp 4730
semanage port -a -t zuul_webapp_port_t -p tcp 8001

# In the %postun section, add
for port in 4730 8001; do semanage port -d -p tcp $port; done

Rebuild and install the package:

sudo ./zuul_scheduler.sh && sudo rpm -ivh ./noarch/*.rpm

Check that the new types are installed using "ls -Z" and "semanage port -l":

$ ls -Zd /var/opt/rh/rh-python35/lib/zuul/keys/
drwx------. zuul zuul system_u:object_r:zuul_keys_t:s0 /var/opt/rh/rh-python35/lib/zuul/keys/
$ sudo semanage port -l | grep zuul
zuul_gearman_port_t            tcp      4730
zuul_webapp_port_t             tcp      8001

Update the policy

With the service resources now declared, let's restart the service and start using it to collect all the access it needs.

After a while, we can update the policy using "./zuul_scheduler.sh –update" which basically does: "ausearch -m avc –raw | audit2allow -R". This collects all the permissions denied to generates type enforcement rules.

We can repeat this steps until all the required accesses are collected.

Here's what looks like the resulting zuul-scheduler rules:

allow zuul_scheduler_t gerrit_port_t:tcp_socket name_connect;
allow zuul_scheduler_t mysqld_port_t:tcp_socket name_connect;
allow zuul_scheduler_t net_conf_t:file { getattr open read };
allow zuul_scheduler_t proc_t:file { getattr open read };
allow zuul_scheduler_t random_device_t:chr_file { open read };
allow zuul_scheduler_t zookeeper_client_port_t:tcp_socket name_connect;
allow zuul_scheduler_t zuul_conf_t:dir getattr;
allow zuul_scheduler_t zuul_conf_t:file { getattr open read };
allow zuul_scheduler_t zuul_exec_t:file getattr;
allow zuul_scheduler_t zuul_gearman_port_t:tcp_socket { name_bind name_connect };
allow zuul_scheduler_t zuul_keys_t:dir getattr;
allow zuul_scheduler_t zuul_keys_t:file { create getattr open read write };
allow zuul_scheduler_t zuul_log_t:file { append open };
allow zuul_scheduler_t zuul_var_lib_t:dir { add_name create remove_name write };
allow zuul_scheduler_t zuul_var_lib_t:file { create getattr open rename write };
allow zuul_scheduler_t zuul_webapp_port_t:tcp_socket name_bind;

Once the service is no longer being denied permissions, we can remove the "permissive zuul_scheduler_t;" declaration and deploy it in production. To avoid issues, the domain can be set to permissive at first using:

$ sudo semanage permissive -a zuul_scheduler_t

Too long, didn't read

In short, to confine a service:

  • Use sepolicy generate
  • Declare the service's resources
  • Install the policy and restart the service
  • Use audit2allow

Here are some useful documents:

by tristanC at September 06, 2017 05:37 PM

September 05, 2017

John Likes OpenStack

Trick to test external ceph clusters using only tripleo-quickstart

TripleO can stand up a Ceph cluster as part of an overcloud. However, if all you have is a tripleo-quickstart env and want to test an overcloud feature which uses an external Ceph cluster, then can have quickstart stand up two heat stacks, one to make a separate ceph cluster and the other to stand up an overcloud which uses that ceph cluster.

Deploy stand alone ceph cluster

I use deploy-ceph-only.sh with ceph-only.yaml, based on Giulio's example. I add `-- stack ceph` to `openstack overcloud deploy ...` so that the Heat stack is not called "overcloud". You cannot rename a Heat stack.

After deploying the ceph cluster, get the monitor node's IP (CephExternalMonHost), use `ceph auth list` to get the secret key secret for the client.openstack keyring (CephClientKey), and look at the ceph.conf to get the FSID (CephClusterFSID), so that overcloud-ceph-ansible-external.yaml may be updated accordingly.

Deploy an overcloud to use external ceph

I use deploy-ext-ceph.sh with overcloud-ceph-ansible-external.yaml. This uses changes in tripleo and ceph-ansible which are unmerged (at this time of writing).

Results


(undercloud) [stack@undercloud ceph-ansible]$ openstack server list
+--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
| ID | Name | Status | Networks | Image | Flavor |
+--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
| 28d57de8-8354-43e0-8d4e-46de33ea4672 | overcloud-controller-0 | BUILD | ctlplane=192.168.24.8 | overcloud-full | control |
| 298943dd-b3d2-4302-93fd-c45d8375ff16 | overcloud-novacompute-0 | BUILD | ctlplane=192.168.24.21 | overcloud-full | compute |
| f4d15186-775c-4cab-ae5d-c3fd48ecfccf | ceph-cephstorage-2 | ACTIVE | ctlplane=192.168.24.18 | overcloud-full | ceph-storage |
| 24da4c0f-f945-4489-bdeb-eb9b2cf70bc0 | ceph-cephstorage-0 | ACTIVE | ctlplane=192.168.24.9 | overcloud-full | ceph-storage |
| 248eacd5-e0ae-47b2-a3a9-2b4f3d0dfa6c | ceph-cephstorage-1 | ACTIVE | ctlplane=192.168.24.15 | overcloud-full | ceph-storage |
| 5af9a2ae-3492-4874-b8ab-2de2f8530b60 | ceph-controller-0 | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | control |
+--------------------------------------+-------------------------+--------+------------------------+----------------+--------------+
(undercloud) [stack@undercloud ceph-ansible]$ openstack stack list
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+--------------+
| ID | Stack Name | Project | Stack Status | Creation Time | Updated Time |
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+--------------+
| c016b71d-0c73-468d-bed5-baf26d88ea23 | overcloud | d8e1f76b116f467cbe9e60b6c91c80b3 | CREATE_IN_PROGRESS | 2017-09-05T14:30:02Z | None |
| 91370b74-41bd-4923-bacb-c24d98ca148f | ceph | d8e1f76b116f467cbe9e60b6c91c80b3 | CREATE_COMPLETE | 2017-09-05T14:11:04Z | None |
+--------------------------------------+------------+----------------------------------+--------------------+----------------------+--------------+
(undercloud) [stack@undercloud ceph-ansible]$

I had set up my virtual hardware by running `quickstart.sh -e @myconfigfile.yml` with myconfigfile.yml.

In this scenario I used puppet-ceph to deploy the ceph cluster and ceph-ansible to deploy the ceph-client, which is the reverse of a more popular scenario. All four combinations are possible, though the puppet-ceph method will be deprecated.

by John (noreply@blogger.com) at September 05, 2017 02:44 PM

September 01, 2017

RDO Blog

RDO Pike released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Pike for RPM-based distributions, CentOS Linux 7 and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Pike is the 16th release from the OpenStack project, which is the work of more than 2300 contributors from around the world (source).

The release is making its way out to the CentOS mirror network, and should be on your favorite mirror site momentarily.

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Linux and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS Linux users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO, and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

New and Improved

Interesting things in the Pike release include:

Added/Updated packages

The following packages and services were added or updated in this release:

  • Kuryr and Kuryr-kubernetes: an integration between OpenStack and Kubernetes networking.
  • Senlin: a clustering service for OpenStack clouds.
  • Shade: a simple client library for interacting with OpenStack clouds, used by Ansible among others.
  • python-pankoclient: a client library for the event storage and REST API for Ceilometer.
  • python-scciclient: a ServerView Common Command Interface Client Library, for the FUJITSU iRMC S4 - integrated Remote Management Controller.

Other additions include:

Python Libraries

  • os-xenapi
  • ovsdbapp (deps)
  • python-daiquiri (deps)
  • python-deprecation (deps)
  • python-exabgp
  • python-json-logger (deps)
  • python-netmiko (deps)
  • python-os-traits
  • python-paunch
  • python-scciclient
  • python-scrypt (deps)
  • python-sphinxcontrib-actdiag (deps) (pending)
  • python-sphinxcontrib-websupport (deps)
  • python-stestr (deps)
  • python-subunit2sql (deps)
  • python-sushy
  • shade (SDK)
  • update XStatic packages (update)
  • update crudini to 0.9 (deps) (update)
  • upgrade liberasurecode and pyeclib libraries to 1.5.0 (update) (deps)

Tempest Plugins

  • python-barbican-tests-tempest
  • python-keystone-testst-tempest
  • python-kuryr-tests-tempest
  • python-patrole-tests-tempest
  • python-vmware-nsx-tests-tempest
  • python-watcher-tests-tempest

Puppet-Modules

  • puppet-murano
  • puppet-veritas_hyperscale
  • puppet-vitrage

OpenStack Projects

  • kuryr
  • kuryr-kubernetes
  • openstack-glare
  • openstack-panko
  • openstack-senlin

OpenStack Clients

  • mistral-lib
  • python-glareclient
  • python-pankoclient
  • python-senlinclient

Contributors

During the Pike cycle, we started the EasyFix initiative, which has resulted in several new people joining our ranks. These include:

  • Christopher Brown
  • Anthony Chow
  • T. Nicole Williams
  • Ricardo Arguello

But, we wouldn't want to overlook anyone. Thank you to all 172 contributors who participated in producing this release:

Aditya Prakash Vaja, Alan Bishop, Alan Pevec, Alex Schultz, Alexander Stafeyev, Alfredo Moralejo, Andrii Kroshchenko, Anil, Antoni Segura Puimedon, Arie Bregman, Assaf Muller, Ben Nemec, Bernard Cafarelli, Bogdan Dobrelya, Brent Eagles, Brian Haley, Carlos Gonçalves, Chandan Kumar, Christian Schwede, Christopher Brown, Damien Ciabrini, Dan Radez, Daniel Alvarez, Daniel Farrell, Daniel Mellado, David Moreau Simard, Derek Higgins, Doug Hellmann, Dougal Matthews, Edu Alcañiz, Eduardo Gonzalez, Elise Gafford, Emilien Macchi, Eric Harney, Eyal, Feng Pan, Frederic Lepied, Frederic Lepied, Garth Mollett, Gaël Chamoulaud, Giulio Fidente, Gorka Eguileor, Hanxi Liu, Harry Rybacki, Honza Pokorny, Ian Main, Igor Yozhikov, Ihar Hrachyshka, Jakub Libosvar, Jakub Ruzicka, Janki, Jason E. Rist, Jason Joyce, Javier Peña, Jeffrey Zhang, Jeremy Liu, Jiří Stránský, Johan Guldmyr, John Eckersberg, John Fulton, John R. Dennis, Jon Schlueter, Juan Antonio Osorio, Juan Badia Payno, Julie Pichon, Julien Danjou, Karim Boumedhel, Koki Sanagi, Lars Kellogg-Stedman, Lee Yarwood, Leif Madsen, Lon Hohberger, Lucas Alvares Gomes, Luigi Toscano, Luis Tomás, Luke Hinds, Martin André, Martin Kopec, Martin Mágr, Matt Young, Matthias Runge, Michal Pryc, Michele Baldessari, Mike Burns, Mike Fedosin, Mohammed Naser, Oliver Walsh, Parag Nemade, Paul Belanger, Petr Kovar, Pradeep Kilambi, Rabi Mishra, Radomir Dopieralski, Raoul Scarazzini, Ricardo Arguello, Ricardo Noriega, Rob Crittenden, Russell Bryant, Ryan Brady, Ryan Hallisey, Sarath Kumar, Spyros Trigazis, Stephen Finucane, Steve Baker, Steve Gordon, Steven Hardy, Suraj Narwade, Sven Anderson, T. Nichole Williams, Telles Nóbrega, Terry Wilson, Thierry Vignaud, Thomas Hervé, Thomas Morin, Tim Rozet, Tom Barron, Tony Breeds, Tristan Cacqueray, afazekas, danpawlik, dnyanmpawar, hamzy, inarotzk, j-zimnowoda, kamleshp, marios, mdbooth, michaelhenkel, mkolesni, numansiddique, pawarsandeepu, prateek1192, ratailor, shreshtha90, vakwetu, vtas-hyperscale-ci, yrobla, zhangguoqing, Vladislav Odintsov, Xin Wu, XueFengLiu, Yatin Karel, Yedidyah Bar David, adriano petrich, bcrochet, changzhi, diana, djipko, dprince, dtantsur, eggmaster, eglynn, elmiko, flaper87, gpocentek, gregswift, hguemar, jason guiditta, jprovaznik, mangelajo, marcosflobo, morsik, nmagnezi, sahid, sileht, slagle, trown, vkmc, wes hayutin, xbezdick, zaitcev, and zaneb.

Getting Started

There are three ways to get started with RDO.

  • To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.
  • For a production deployment of RDO, use the TripleO Quickstart and you'll be running a production cloud in short order.
  • Finally, if you want to try out OpenStack, but don't have the time or hardware to run it yourself, visit TryStack, where you can use a free public OpenStack instance, running RDO packages, to experiment with the OpenStack management interface and API, launch instances, configure networks, and generally familiarize yourself with OpenStack. (TryStack is not, at this time, running Pike, although it is running RDO.)

Getting Help

The RDO Project participates in a Q&A service at ask.openstack.org, for more developer-oriented content we recommend joining the rdo-list mailing list. Remember to post a brief introduction about yourself and your RDO story. You can also find extensive documentation on the RDO docs site.

The #rdo channel on Freenode IRC is also an excellent place to find help and give help.

We also welcome comments and requests on the CentOS mailing lists and the CentOS and TripleO IRC channels (#centos, #centos-devel, and #tripleo on irc.freenode.net), however we have a more focused audience in the RDO venues.

Getting Involved

To get involved in the OpenStack RPM packaging effort, see the RDO community pages and the CentOS Cloud SIG page. See also the RDO packaging documentation.

Join us in #rdo on the Freenode IRC network, and follow us at @RDOCommunity on Twitter. If you prefer Facebook, we're there too, and also Google+.

by Rich Bowen at September 01, 2017 03:21 PM

August 31, 2017

RDO Blog

Video interviews at the Denver PTG (Sign up now!)

TL;DR: Sign up here for the video interviews at the PTG in Denver next month.

Earlier this year, at the PTG in Atlanta I did video interviews with some of the Red Hat engineering who were there.

You can see these videos on the RDO YouTube channel.

Or you can see the teaser video here:

This year, I'll be expanding that to everyone - not just Red Hat - to emphasize the awesome cooperation and collaboration that happens across projects, and across companies.

If you'll be at the PTG, please consider signing up to talk to me about your project. I'll be conducting interviews starting on Tuesday morning, and you can sign up here

Please see the "planning for your interview" tab of that spreadsheet for the answers to all of your questions about the interviews. Or contact me directly at rbowen AT red hat DOT com if you have more questions.

by Rich Bowen at August 31, 2017 05:29 PM

Red Hat Stack

OpenStack Summit Sydney preview: Red Hat to present at more than 40 sessions

The next OpenStack Summit will take place in Sydney, Australia, November 6-8. And despite the fact that the conference will only run three days instead of the usual four, there will be plenty of opportunities to learn about OpenStack from Red Hat’s thought leaders.

Red Hatters will be presenting or co-presenting at more than 40 breakout sessions, sponsored track sessions, lightning talks, demos, and panel discussions. Just about every OpenStack topic, from various services to NFV solutions to day-2 management to containers integration will be covered.

In addition, as a premier sponsor, we’ll have a large presence in the OpenStack Marketplace. Visit us at booth B1 where you can learn about our products and services, speak to RDO and other community leaders about upstream projects, watch Red Hat product demos from experts, and score some pretty cool swag. There will also be many Red Hat partners with booths throughout the exhibit hall, so you can speak with them about their OpenStack solutions with Red Hat.

Below is a schedule of sessions Red Hatters will be presenting at. Click on each title link to find more information about that session or to add it to your Summit schedule. We’ll add our sponsored track sessions (seven more) in the near future.

If you haven’t registered for OpenStack Summit yet, feel free to use our discount for 10% off of your registration price. Just use the code: REDHAT10.

Hope to see you there!

 

Monday, November 6

Title Presenters Time
Upstream bug triage: the hidden gem? Sylvain Bauza and Stephen Finucane 11:35-12:15
The road to virtualization: highlighting the unique challenges faces by telcos Anita Tragler, Andrew Harris, and Greg Smith (Juniper Networks) 11:35-12:15
Will the real public clouds, please SDK up. OpenStack in the native prog lang of your choice. Monty Taylor, David Flanders (University of Melbourne), and Tobias Rydberg (City Network Hosting AB) 11:35-12:15
OpenStack: zero to hero Keith Tenzer 1:30-1:40
Panel: experiences scaling file storage with CephFS and OpenStack Gergory Farnum, Sage Weil, Patrick Donnelly, and Arne Wiebalck (CERN) 1:30-2:10
Keeping it real (time) Stephen Finucane and Sylvain Bauza 2:15-2:25
Multicloud requirements and implementations: from users, developers, service providers Mark McLoughlin, Jay Pipes (Mirantis), Kurt Garloff (T-Systems International GmbH), Anni Lai (Huawei), and Tim Bell (CERN) 2:20-3:00
How Zanata powers upstream collaboration with OpenStack internationalization Alex Eng, Patrick Huang, and Ian Y. Choi (Fuse) 2:20-3:00
CephFS: now fully awesome (what is the impact of CephFS on the OpenStack cloud Andrew Hatfield, Ramana Raja, and Victoria Martinez de la Cruz 2:30-2:40
Putting OpenStack on Kubernetes: what tools can we use? Flavio Percoco 4:20-5:00
Achieving zen-like bliss with Glance Erno Kuvaja, Brian Rosmaita (Verizon), and Abhishek Kekane (NTT Data) 5:10-5:50
Migrating your job from Jenkins Job Builder to Ansible Playbooks, a Zuulv3 story Paul Belanger 5:10-5:50
Warp-speed Open vSwitch: turbo-charge VNFs to 100Gbps in next-gen SDN/NFV datacenter Anita Tragler, Ash Bhalgat, and Mark Iskra (Nokia) 5:50-6:00


Tuesday, November 7

Title Presenters Time
ETSI NFV specs’ requirements vs. OpenStack reality Frank Zdarsky and Gergely Csatari (Nokia) 9:00-9:40
Monitoring performance of your OpenStack environment Matthias Runge 9:30-9:40
OpenStack compliance speed and agility: yes, it’s possible Keith Basil and Shawn Wells 9:50-10:30
Operational management: how is it really done, and what should OpenStack do about it Anandeep Pannu 10:20-10:30
Creating NFV-ready containers with kuryr-kubernetes Antoni Segura Puimedon and Kirill Zaitsev (Samsung) 10:50-11:30
Encryption workshop: using encryption to secure your cloud Ade Lee, Juan Osorio Robles, Kaitlin Farr (Johns Hopkins University), and Dave McCowan (Cisco) 10:50-12:20
Neutron-based networking in Kubernetes using Kuryr – a hands-on lab Sudhir Kethamakka, Geetika Batra, and Amol Chobe (JP Morgan Chase) 10:50-12:20
A Telco story of OpenStack success Krzysztof Janiszewski, Darin Sorrentino, and Dimitar Ivanov (TELUS) 1:50-2:30
Turbo-charging OpenStack for NFV workloads Ajay Simha, Vinay Rao, and Ian Wells (Cisco) 3:20-3:30
Windmill 101: Ansible-based deployments for Zuul / Nodepool Paul Belanger and Ricardo Carrillo Cruz 3:20-4:50
Simpler encrypted volume management with Tang Nathaniel McCallum and Ade Lee 3:50-4:00
Deploying multi-container applications with Ansible service broker Eric Dube and Todd Sanders 5:00-5:40

 

Wednesday, November 8

Title Presenters Time
OpenStack: the perfect virtual infrastructure manager (VIM) for a virtual evolved packet core (vEPC) Julio Villarreal Pelegrino and Rimma Iontel 9:00-9:40
Questions to make your storage vendor squirm Gregory Farnum 9:50-10:30
Bringing worlds together: designing and deploying Kubernetes on an OpenStack multi-site environment Roger Lopez and Julio Villarreal Pelegrino 10:20-10:30
DMA (distributed monitoring and analysis): monitoring practice and lifecycle management for Telecom Tomofumi Hayashi, Yuki Kasuya (KDDI) and Toshiaki Takahashi (NEC) 1:50-2:00
Standing up and operating a container service on top of OpenStack using OpenShift Dan McPherson, Ata Turk (MOC), and Robert Baron (Boston University) 1:50-2:30
Why are you not a mentor in the OpenStack community yet? Rodrigo Duarte Sousa, Raildo Mascena, and Telles Nobrega 1:50-2:30
What the heck are DHSS driver modes in OpenStack Manila? Tom Barron, Rodrigo Barbieri, and Goutham Pacha Ravi (NetApp) 1:50-2:30
Adding Cellsv2 to your existing Nova deployment Dan Smith 3:30-4:10
What’s your workflow? Daniel Mellado and David Paterson (Dell) 3:30-4:10
Glance image import is here…now it’s time to start using it! Erno Kuvaja and Brian Rosmaita (Verizon) 4:30-5:10

 

Red Hat’s Sponsored Breakout Track – Level 3, Room 3.3

Deploying OpenStack at scale with TripleO, Ansible, and Containers Steven Hardy Monday, 11:35-12:15
Automated NFV deployment and management with TripleO Arkady Kanevsky (Dell, Inc.), Chris Janiszewski Tuesday, 9:00-9:40
Delivering OpenStack and Ceph in containers Federico Lucifredi, Sebatien Han, Andrew Hatfield Tuesday, 9:50-10:30
Virtualized Central Office Azhar Sayeed Tuesday, 10:50-11:30
OpenStack upgrade strategy Maria Bracho Tuesday, 11:40-12:20
Lessons learned: changing the way we think about customers, with IAG Labs Eddie Satterly (IAG) Tuesday, 1:50-2:30
SDN fundamentals and best practices for NFV in OpenStack, Kubernetes, and containers James Kelly (Juniper Networks) Tuesday, 3:20-4:00
Leveraging the public cloud for integration with your hybrid cloud Mike Bursell Tuesday, 4:10-4:50

 

by Peter Pawelski, Product Marketing Manager, Red Hat OpenStack Platform at August 31, 2017 01:18 PM

August 30, 2017

OpenStack In Production (CERN)

Scheduled snapshots

While most of the machines on the CERN cloud are configured using Puppet with state stored in external databases or file stores, there are a few machines where this has been difficult, especially for legacy applications.

Doing a regular snapshot of these machines would be a way of protecting against failure scenarios such as hypervisor failure or disk corruptions.

This could always be scripted by the project administrator using the standard functions in the openstack client but this would also involve setting up the schedules and the credentials externally to the cloud along with appropriate skills for the project administrators. Since it is a common request, the CERN cloud investigated how this could be done as part of the standard cloud offering.

The approach that we have taken uses the Mistral project to execute the appropriate workflows at a scheduled time. The CERN cloud is running a mixture of OpenStack Newton and Ocata but we used the Mistral Pike release in order to have the latest set of fixes such as in the cron triggers. With the RDO packages coming out in the same week as the upstream release, this avoided doing an upgrade later.

Mistral has a set of terms which explain the different parts of a workflow (https://docs.openstack.org/mistral/latest/terminology).

The approach needed several steps
  • Mistral tasks to define the steps
  • Mistral workflows to provide the order to perform the steps in
  • Mistral cron triggers to execute the steps on schedule

Mistral Workflows

The Mistral workflows consist of a set of tasks and a process which decides which task to execute next based on different branch criteria such as success of a previous task or the value of some cloud properties.

Workflows can be private to the project, shared or public. By making these scheduled snapshot workflows public, the cloud administrators can improve the tasks incrementally and the cloud projects will receive the latest version of the workflow next time they execute them. With the CERN gitlab based continuous integration environment, the workflows are centrally maintained and then pushed to the cloud when the test suites have completed successfully.

The following Mistral workflows were defined

instance_snapshot

Virtual machines can be snapshotted so that a copy of the virtual machine is saved and can be used for recovery or cloning in future. The instance_snapshot workflow performs this operation for both virtual machines which have been booted from volume or locally.

Parameter
Description
Default
instance
The name of the instance to be snapshot
Mandatory
pattern
The name of the snapshot to store. The text ={0}= is replaced by the instance name and the text ={1}= is replaced by the date in the format YYYYMMDDHHMM.
{0}_snapshot_{1}
max_snapshots
The number of snapshots to keep. Older snapshots are cleaned from the store when new ones are created.
0 (i.e. keep all)
wait
Only complete the workflow when the steps have been completed and the snapshot is stored in the image storage
false
instance_stop
Shut the instance down before snapshotting and boot it up afterwards.
false (i.e. do not stop the instanc)
to_addr_success
e-mail address to send the report if the workflow is successful
null (i.e. no mail sent)
to_addr_error
e-mail address to send the report if the workflow failed
null (i.e. no mail sent)

The steps for this workflow are described in the detail in the YAML/YAQL files at https://gitlab.cern.ch/cloud-infrastructure/mistral-workflows.

The operation is very fast with Ceph based boot-from-volumes since the snapshot is done within Ceph. It can however take up to a minute for locally booted VMs while the hypervisor is ensuring the complete disk contents are available. The VM is resumed and the locally booted snapshot is then sent to Glance in the background.

The high level steps are

·      Identify server
·      Stop instance if requested by instance_stop
·      If the VM is locally booted
o   Snapshot the instance
o   Clean up the oldest image snapshot if over max_snapshots
·      If the VM is booted from volume
o   Snapshot the volume
o   Cleanup oldest volume snapshot if over max_snapshots
·      Start instance if requested by instance_stop
·      If there is an error and to_addr_error is set
o   Send an e-mail to to_addr_error
·      If there is no error and to_addr_success is set
o   Send an e-mail to to_addr_success

restore_clone_snapshot
For applications which are not highly available, a common configuration is using a LanDB alias to a particular VM. In the event of a failure, the VM can be cloned from a snapshot and the LanDB alias updated to reflect the new endpoint location for the service. This workflow will create a volume if the source instance is booted from volume. The workflow is called restore_clone_snapshot.

The source instance needs to be still defined since information such as the properties, flavor and availability zone are not included in the snapshot and these are propagated by default.

Parameter
Description
Default
instance
The name of the instance from which the snapshot will be cloned
Mandatory
Date
The date of the snapshot to clone (either YYYYMMDD or YYYYMMDDHHMM)
Mandatory
pattern
The name of the snapshot to clone. The text ={0}= is replaced by the instance name and the text ={1}= is replaced by the date.
{0}_snapshot_{1}
clone_name
The name of the new instance to be created
Mandatory
avz_name
The availability zone to create the clone in.
Same as the source instance
flavor
The flavour for the cloned instance
Same as the source instance
meta
The properties to copy to the new instance
All properties are copied from the source[1]
wait
Only complete the workflow when the steps have been completed and the cloned VM is running
false
to_addr_success
e-mail address to send the report if the workflow is successful
null (i.e. no mail sent)
to_addr_error
e-mail address to send the report if the workflow failed
null (i.e. no mail sent)

Thus, cloning the machine timbfvlinux143 to timbfvclone143 requires running the workflow with the parameters

{“instance”: “timbfvlinux143”, “clone_name”: “timbfvclone143”, “date”: “20170830” }

This results in

·      A new volume created from the snapshot timbfvlinux143_snapshot_20170830
·      A new VM is created called timbfvclone143 booted from the new volume

An instance clone can be run for VMs which are booted from volume even when the hypervisor is not running. A machine can then be recovered from it's current state using the procedure

·      Instance snapshot of original machine
·      Instance clone from that snapshot (using today's date)
·      If DNS aliases are used, the alias can then be updated to point to the new instance name

For Linux guests, the rename of the hostname to the clone name occurs as the machine is booted. In the CERN environment, this took a few minutes to create the new virtual machine and then up to 10 minutes to wait for the DNS refresh.

For Windows guests, it may be necessary to refresh the Active Directory information given the change of hostname.
restore_inplace_snapshot

In the event of an issue such as a bad upgrade, the administrator may wish to roll back to the last snapshot. This can be done using the restore_inplace_snapshot workflow.

This operation works for locally booted machines, maintains the IP and MAC address but cannot be used if the hypervisor is down. It does not currently work for boot from volume until the revert to snapshot (available in Pike from https://specs.openstack.org/openstack/cinder-specs/specs/pike/cinder-volume-revert-by-snapshot.html) is in production.

Parameter
Description
Default
instance
The name of the instance from which the snapshot will be replaced
Mandatory
date
The date of the snapshot to replace from (either YYYYMMDD or YYYYMMDDHHMM)
Mandatory
pattern
The name of the snapshot to replace from. The text ={0}= is replaced by the instance name and the text ={1}= is replaced by the date.
{0}_snapshot_{1}
wait
Only complete the workflow when the steps have been completed and the replaced VM is running
false
to_addr_success
e-mail address to send the report if the workflow is successful
null (i.e. no mail sent)
to_addr_error
e-mail address to send the report if the workflow failed
null (i.e. no mail sent)





Mistral Cron Triggers
Mistral has another nice feature where it is able to run a workflow at regular intervals. Compared to standard Unix cron, the Mistral cron triggers use Keystone trusts to save the user token when the trigger is enabled. Thus, the execution is able to run without needing the credentials such as a password or valid Kerberos token.
The steps are as follows to create a cron trigger via Horizon or the CLI.
Parameter
Description
Example
Name
The name of the cron trigger
Nightly Snapshot
Workflow ID
The name or UUID of the workflow
instance_snapshot
Params
A JSON dictionary of the parameters
{“instance”: “timbfvlinux143”, “max_snapshots”: 5, “to_addr_error”: “theadmin@cern.ch”}
Pattern
A cron schedule pattern according to http://en.wikipedia.org/wiki/Cron
* 5 * * * (i.e. run daily at 5a.m.)

This will then execute the instance snapshot at 5a.m. sending a mail to theadmin@cern.ch in the event of a failure of the snapshot. 5 past copies will be kept.

Mistral Executions
When Mistral runs a workflow, it provides details of the steps executed, the timestamps for start and end along with the results. Each step can be inspected individually as part of debugging and root cause analysis in the event of failures.
The Horizon interface gives an easy interface for selecting the failing tasks. There may be tasks reported as ‘error’ but these steps can then have subsequent actions which succeed so an error step may be a normal part of a successful task execution such as using a default if no value can be found.


References

Credits
  • Jose Castro Leon from the CERN IT cloud team did the implementation of the Mistral project and the workflows described.




[1] Except for a CERN specific one called landb-alias for a DNS alias

by Tim Bell (noreply@blogger.com) at August 30, 2017 03:26 PM

August 18, 2017

RDO Blog

Ada Lee: OpenStack Security, Barbican, Novajoin, TLS Everywhere in Ocata

Ada Lee talks about OpenStack Security, Barbican, Novajoin, and TLS Everywhere in Ocata, at the OpenStack PTG in Atlanta, 2017.

by Rich Bowen at August 18, 2017 07:54 PM

Assaf Muller

Octavia Developer Wanted

Edit: The position has been filled, thank you everyone.

I’m looking for a Software Engineer to join the Red Hat OpenStack Networking team. I am presently looking to hire in Europe, Israel and US East. The candidate may work from home or from one of the Red Hat offices. The team is globally distributed and comprised of talented, autonomous, empowered and passionate individuals with a healthy work/life balance. The candidate will work on OpenStack Octavia and LBaaS. The candidate will write and review code while working with upstream community members and fellow Red Hatters. If you want to do open source, Red Hat is objectively where it’s at. We have an institutional culture of open source at all levels and this has a ripple effect on your day to day and your career at the company.

Please email me CVs at assaf@redhat.com.

The ideal candidate is familiar with some or all of the following subjects:

  • Networking knowledge and terms such as L2, L3, ARP and IP
  • Cloud networking knowledge. For example VXLAN tunneling, network namespaces, OVS and haproxy
  • Familiarity with virtualization technology, cloud and infrastructure as a service and OpenStack in particular
  • Python
  • Bonus points for familiarity with Octavia

Responsibilities:

  • Write a bunch o’ code
  • Review code
  • Resolve bugs
  • Draft design documents
  • Implement features
  • Lead and participate in design discussions
  • Attend conferences
  • Improve our testing infrastructure
  • RPM packaging
  • Resolve customer issues

Required skills:

  • Bachelors Degree in Computer Science or equivalent
  • 3 years of significant software development experience
  • Excellent English verbal and written communication skills
  • Comfortable communicating and collaborating with upstream community members outside of the team and company

by assafmuller at August 18, 2017 11:33 AM

August 15, 2017

RDO Blog

Introducing opstools-ansible

Introducing Opstools-ansible

Ansible

Ansible is an agentless, declarative configuration management tool. Ansible can be used to install and configure packages on a wide variety of targets. Targets are defined in the inventory file for Ansible to apply the predefined actions. Actions are defined as playbooks or sometime roles in the form of YAML files. Details of Ansible can be found here.

Opstools-ansible

The project opstools-ansible hosted in Github is to use Ansible to configure an environment that provides the support of opstools, namely centralized logging and analysis, availability monitoring, and performance monitoring.

One prerequisite to run opstools-ansible is that the servers have to be running with CentOS 7 or RHEL 7 (or a compatible distribution).

Inventory file

These servers are to be defined in the inventory file with reference structure to this file that defines 3 high level host groups:

  • am_hosts
  • pm_hosts
  • logging_host

There are lower level host groups but documentation stated that they are not tested.

Configuration File

Once the inventory file is defined, the Ansible configuration files can be used to tailor to individual needs. The READM.rst file for opstools-ansible suggests the following as an example:

fluentd_use_ssl: true

fluentd_shared_key: secret

fluentd_ca_cert:

—–BEGIN CERTIFICATE—–

—–END CERTIFICATE—–

fluentd_private_key:

—–BEGIN RSA PRIVATE KEY—–

—–END RSA PRIVATE KEY—–

If there is no Ansible configuration file to tune the system, the default settings/options are applied.

Playbooks and roles

The playbook specifies what packages are to be installed in for the opstools environment by Ansible. Basically, the packages to be installed are:

Besides the above packages, opstools-ansible playbook also applies these additional roles

  • Firewall – this role manages the firewall rules for the servers.
  • Prereqs – this role checks and installs all the dependency packages such as python-netaddr or libselinux-python … etc. for the successful installation of opstools.
  • Repos - this is a collection of roles for configuring additional package repositories.
  • Chrony – this role installs and configures the NTP client to make sure the time in each server is in sync with each other.

opstools environment

Once these are done, we can simply apply the following command to create the opstools environment:

    ansible-playbook playbook.yml -e @config.yml

TripleO Integration

TripleO (OpenStack on OpenStack) has the concept of Undercloud and Overcloud

  • Undercloud : for deployment, configuration and management of OpenStack nodes.
  • Overcloud : the actual OpenStack cluster that is consumed by user.

RedHat has an in-depth blog post on TripleO and OpenStack has this document on contributing and installing TripleO

When opstools is installed at the TripleO Undercloud, the OpenStack instances running on the Overcloud can be configured to run the opstools service when it deployed. For example:

openstack overcloud deploy … \

-e /usr/share/openstack-tripleo-heat-templates/environments/monitoring-environment.yaml \

-e /usr/share/openstack-tripleo-heat-templates/environments/logging-environment.yaml \

-e params.yaml

There are only 3 steps to integrate opstools with TripleO with opstools-ansible. Detail of the steps can be found here.

  1. Use opstools-ansible to create the opstools environment at the Undercloud.
  2. Create the params.yaml for TripleO to points to the Sensu and Fluentd agents at the opstools hosts.
  3. Deploy with the "openstack overcloud deploy …" command.

by atc at August 15, 2017 04:02 PM

Daniel Berrange

ANNOUNCE: libosinfo 1.1.0 release

I am happy to announce a new release of libosinfo version 1.1.0 is now available, signed with key DAF3 A6FD B26B 6291 2D0E 8E3F BE86 EBB4 1510 4FDF (4096R). All historical releases are available from the project download page.

Changes in this release include:

  • Force UTF-8 locale for new glib-mkenums
  • Avoid python warnings in example program
  • Misc test suite updates
  • Fix typo in error messages
  • Remove ISO header string padding
  • Disable bogus gcc warning about unsafe loop optimizations
  • Remove reference to fedorahosted.org
  • Don’t hardcode /usr/bin/perl, use /usr/bin/env
  • Support eject-after-install parameter in OsinfoMedia
  • Fix misc warnings in docs
  • Fix error propagation when loading DB
  • Add usb.ids / pci.ids locations for FreeBSD
  • Don’t include private headers in gir/vapi generation

Thanks to everyone who contributed towards this release.

by Daniel Berrange at August 15, 2017 11:09 AM