Planet RDO

November 17, 2023

Lars Kellogg-Stedman

Applying custom configuration to Nginx Gateway Fabric

In this post, we take a look at how to apply custom Nginx configuration directives when you’re using the NGINX Gateway Fabric.

What’s the NGINX Gateway Fabric?

The NGINX Gateway Fabric is an implementation of the Kubernetes Gateway API.

What’s the Gateway API?

The Gateway API is an evolution of the Ingress API; it aims to provide a flexible mechanism for managing north/south network traffic (that is, traffic entering or exiting your Kubernetes cluster), with additional work to support east/west traffic (traffic between pods in your cluster).

What’s this about custom configuration?

I’ve deployed a local development cluster, and I wanted to be able to push images into an image registry hosted on the cluster. This requires (a) running a registry, which is easy, and (b) somehow exposing that registry outside the cluster, which is also easy unless you decide to make it more complex.

In this case, I decided that rather than running an Ingress provider I was going to start familiarizing myself with the Gateway API, so I deployed NGINX Gateway Fabric. My first attempt at pushing an image into the registry looked like this:

$ podman push --tls-verify=false example registry.apps.cluster1.house/example:latest
Getting image source signatures
Copying blob b9fe5313d237 done |
Copying blob cc2447e1835a done |
Copying blob cb8b0886acfb done |
Copying blob c4219a5645ea [===>----------------------------------] 9.3MiB / 80.2MiB | 372.7 MiB/s
Copying blob c6e5c62d1726 done |
Copying blob 9ee7eb11f876 done |
Copying blob f064c46326cb done |
Copying blob 9c45ffa2a02a done |
Copying blob 9a6c9897f309 done |
Copying blob 27a0dbb2828e done |
Error: writing blob: uploading layer chunked: StatusCode: 413, <html>
<head><title>413 Request Entity Too Large<...

Nginx, by default, restricts the maximum size of a request body to 1m, which is to say, 1 megabyte. You can increase (or remove) this limit by setting the client_max_body_size parameter…but how do you do this in the context of a managed deployment like the NGINX Gateway Fabric?

Via the API?

As of this writing, there is no mechanism to apply custom configuration options via the API (although there is ongoing work to provide this, see issue #1258).

What about dropping a config file into conf.d?

My first thought was that I could mount a custom configuration file into /etc/nginx/conf.d, along the lines of:

...
containers:
- name: nginx
volumeMounts:
- name: nginx-extra-conf
mountPath: /etc/nginx/conf.d/client_max_body_size.conf
subPath: client_max_body_size
...
volumes:
- name: nginx-extra-conf
configMap:
name: nginx-extra-conf

…but this fails because the Nginx controller explicitly cleans out that directory on startup and is unhappy if it is unable to delete a file.

Replacing nginx.conf

Right now, the solution is to replace /etc/nginx/nginx.conf. This is a relatively simple operation using kustomize to apply a patch to the deployment manifests.

Grab the original configuration

First, we need to retrieve the original nginx.conf:

mkdir configs
podman run --rm --entrypoint cat \
ghcr.io/nginxinc/nginx-gateway-fabric/nginx:1.0.0 /etc/nginx/nginx.conf > configs/nginx.conf

Modify configs/nginx.conf as necessary; in my case, I added the following line to the http section:

client_max_body_size 0;

Patch the deployment

We can deploy the stock NGINX Gateway Fabric with a kustomization.yaml file like this:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
commonLabels:
nginxGatewayVersion: v1.0.0
resources:
- https://github.com/nginxinc/nginx-gateway-fabric/releases/download/v1.0.0/crds.yaml
- https://github.com/nginxinc/nginx-gateway-fabric/releases/download/v1.0.0/nginx-gateway.yaml
- https://raw.githubusercontent.com/nginxinc/nginx-gateway-fabric/v1.0.0/deploy/manifests/service/nodeport.yaml

To patch the Deployment resource, we extend the kustomization.yaml with the following patch:

patches:
- patch: |
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-gateway
namespace: nginx-gateway
spec:
template:
spec:
containers:
- name: nginx
volumeMounts:
- mountPath: /etc/nginx/nginx.conf
name: nginx-conf-override
subPath: nginx.conf
volumes:
- name: nginx-conf-override
configMap:
name: nginx-conf-override

And then we add a confdigMapGenerator to generate the nginx-conf-override ConfigMap:

configMapGenerator:
- name: nginx-conf-override
namespace: nginx-gateway
options:
disableNameSuffixHash: true
files:
- configs/nginx.conf

Now when we deploy from this directory…

kubectl apply -k . --server-side

…the deployment includes our patched nginx.conf and we are able to successfully push images into the cluster registry.


I’ve included the complete kustomization.yaml alongside this post.

November 17, 2023 12:00 AM

October 24, 2023

RDO Blog

RDO Bobcat Released

RDO Bobcat Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack 2023.2 Bobcat  for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Bobcat is the 28th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.

 

The release is already available for CentOS Stream 9 on the CentOS mirror network in:

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/bobcat/highlights.html but here are some highlights:
  • New Cinder driver features were added, notably, QoS support for Fujitsu ETERNUS DX driver, replication-enabled consistency groups support for Pure Storage driver, and Active/Active support for NetApp NFS driver.
  • Glance added support for RBD driver to move images to the trash if they cannot be deleted immediately due to having snapshots.
  • The Neutron service has enabled the new API policies (RBAC) with system scope and default roles by default.
  • The Nova legacy quota driver is now deprecated and a nova-manage limits command is provided in order to migrate the orginal limits into Keystone. We plan to change the default quota driver to the unified limits driver in an upcoming release. It is recommended that you begin planning and executing a migration to unified limits as soon as possible.

OpenStack Bobcat is not marked as Skip Level Upgrade Release Process or SLURP. According to this model (https://governance.openstack.org/tc/resolutions/20220210-release-cadence-adjustment.html) this means that upgrades will only be supported from the Antelope 2023.1 release.

RDO Bobcat 2023.2 has been built and tested with the recently released Ceph  18.2.0 Reef version (https://docs.ceph.com/en/latest/releases/reef/)  which has been published by the CentOS Storage SIG in the official CentOS repositories. *Note:* Follow the instructions in [RDO documentation](https://www.rdoproject.org/install/install-with-ceph/) to install OpenStack and Ceph services in the same host.

During the Bobcat 2023.2 development cycle, the RDO community has implemented automatic dependency detection at run and build time. We expect that these changes will lead to more accurate dependency chains in OpenStack packages and less manual maintenance tasks for community maintainers.

Following upstream retirement, some packages are not present in RDO Bobcat 2023.2 release:
  • python-networking-odl
  • python-networking-omnipath
  • python-networking-vmware-nsx
  • python-oswin-tests-tempest
  • python-os-xenapi
  • python-patrole
  • python-stackviz
  • python-vmware-nsxlib
  • python-vmware-nsx-tests-tempest

Contributors

During the Bobcat cycle, we saw the following new RDO contributors:

  • Arkady Shtempler
  • Dariusz Smigiel
  • Dave Wilde
  • Fabricio Aguiar
  • Jakub Skunda
  • Joan Francesc Gilabert
  • Maor Blaustein
  • Mohammad Abunemeh
  • Szymon Datko
  • Yadnesh Kulkarni
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 47 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and rdo-website repositories:
  • Alfredo Moralejo Alonso
  • Amy Marrich 
  • Ananya Banerjee
  • Arkady Shtempler
  • Artom Lifshitz
  • Arx Cruz
  • Bhagyashri Shewale
  • Bohdan Dobrelia
  • Chandan Kumar
  • Daniel Pawlik
  • Dariusz Smigiel
  • Dave Wilde
  • Douglas Viroel
  • Enrique Vallespi Gil
  • Fabricio Aguiar
  • Giulio Fidente
  • Goutham Pacha Ravi
  • Gregory Thiemonge
  • Grzegorz Grasza
  • Ihar Hrachyshka
  • Jakub Skunda
  • Jiří Podivín
  • Jiří Stránský
  • Joan Francesc Gilabert
  • Joel Capitao
  • Karolina Kula
  • Karthik Sundaravel
  • Luca Miccini
  • Lucas Alvares Gomes
  • Luigi Toscano
  • Luis Tomas Bolivar
  • Maor Blaustein
  • Marios Andreou
  • Mathieu Bultel
  • Matthias Runge
  • Mohammad Abunemeh
  • Rodolfo Alonso Hernandez
  • Ronelle Landy
  • Sandeep Yadav
  • Slawomir Kaplonski
  • Soniya29 vyas
  • Szymon Datko
  • Takashi Kajinami
  • Tobias Urdin
  • Tom Weininger
  • Yadnesh Kulkarni
  • Yatin Karel

The Next Release Cycle

At the end of one release, focus shifts immediately to the next release i.e Caracal.

Get Started

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help

The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on OFTC IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS devel mailing list and the CentOS IRC channels (#centos, #centos-cloud, #centos-devel in Libera.Chat network), however we have a more focused audience within the RDO venues.

Get Involved

To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo and on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

by Amy Marrich at October 24, 2023 01:29 PM

October 06, 2023

Daniel Berrange

Bye Bye BIOS: a tool for when you need to warn users the VM image is EFI only

The x86 platform has been ever so slowly moving towards a world where EFI is used to boot everything, with legacy BIOS put out to pasture. Virtual machines in general have been somewhat behind the cutting edge in this respect though. This has mostly been due to the virtualization and cloud platforms being somewhat slow in enabling use of EFI at all, let alone making it the default. In a great many cases the platforms still default to using BIOS unless explicitly asked to use EFI. With this in mind most the mainstream distros tend to provide general purpose disk images built such that they can boot under either BIOS or EFI, thus adapting to whatever environment the user deploys them in.

In recent times there is greater interest in the use of TPM sealing and SecureBoot for protecting guest secrets (eg LUKS passphrases), the introduction of UKIs as the means to extend the SecureBoot signature to close initrd/cmdline hole, and the advent of confidential virtualization technology. These all combine to increase the liklihood that a virtual machine image will exclusively target EFI, fully discontinuing support for legacy BIOS.

This presents a bit of a usability trapdoor for people deploying images though, as it has been taken for granted that BIOS boot always works. If one takes an EFI only disk image and attempts to boot it via legacy BIOS, the user is likely to get an entirely blank graphical display and/or serial console, with no obvious hint that EFI is required. Even if the requirement for EFI is documented, it is inevitable that users will make mistakes.

Can we do better than this ? Of course we can.

Enter ‘Bye Bye BIOS‘  (https://gitlab.com/berrange/byebyebios)

This is a simple command line tool that, when pointed to a disk image, will inject a MBR sector that prints out a message to the user on the primary VGA display and serial port informing them that UEFI is required, then puts the CPUs in a ‘hlt‘ loop.

The usage is as follows, with a guest serial port connected to the local terminal:

$ byebyebios test.img
$ qemu-system-x86_64 \
    -blockdev driver=file,filename=test.img,node-name=img \
    -device virtio-blk,drive=img \
    -m 2000 -serial stdio

STOP: Machine was booted from BIOS or UEFI CSM
 _    _         _   _ ___________ _____   ___
| \  | |       | | | |  ___|  ___|_   _| |__ \
|  \ | | ___   | | | | |__ | |_    | |      ) |
| . `  |/ _ \  | | | |  __||  _|   | |     / /
| |\   | (_) | | |_| | |___| |    _| |_   |_|
\_| \_/ \___/   \___/\____/\_|    \___/   (_)

Installation requires UEFI firmware to boot

Meanwhile the graphical console shows the same:

QEMU showing "No UEFI" message

QEMU showing “No UEFI” message when booted from BIOS

The message shown here is a default, but it can be customized by pointing to an alternative message file

$ echo "Bye Bye BIOS" | figlet -f bubble | unix2dos > msg.txt
$ byebyebios --message msg.txt test.img
$ qemu-system-x86_64 \
    -blockdev driver=file,filename=test.img,node-name=img \
    -device virtio-blk,drive=img \
    -m 2000 -serial stdio

  _   _   _     _   _   _     _   _   _   _
 / \ / \ / \   / \ / \ / \   / \ / \ / \ / \
( B | y | e ) ( B | y | e ) ( B | I | O | S )
 \_/ \_/ \_/   \_/ \_/ \_/   \_/ \_/ \_/ \_/

The code behind this is simplicity itself, just a short piece of x86 asm

$ cat bootstub.S
# SPDX-License-Identifier: MIT-0

.code16
.global bye_bye_bios

bye_bye_bios:
  mov $something_important, %si
  mov $0xe, %ah
  mov $0x3f8,%dx

say_a_little_more:
  lodsb
  cmp $0, %al
  je this_is_the_end
  int $0x10
  outb %al,%dx
  jmp say_a_little_more

this_is_the_end:
  hlt
  jmp this_is_the_end

something_important:
# The string message will be appended here at time of install

This is compiled with the GNU assembler to create a i486 ELF object file

$ as -march i486 -mx86-used-note=no --32 -o bootstub.o bootstub.S

From this ELF object file we have to extract the raw machine code bytes

$ ld -m elf_i386 --oformat binary -e bye_bye_bios -Ttext 0x7c00 -o bootstub.bin bootstub.o

The byebyebios python tool takes this bootstub.bin, appends the text message and NUL terminator, padding to fill 446 bytes, then adds a dummy partition table and boot signature to fill the whole 512 sector.

With the boot stub binary at 21 bytes in size, this leaves 424 bytes available for the message to display to the user, which is ample for the purpose.

In conclusion, if you need to ship an EFI only virtual machine image, do your users a favour and use byebyebios to add a dummy MBR to tell them that the image is EFI only when they inevitably make a mistake and run it under legacy BIOS.

 

by Daniel Berrange at October 06, 2023 01:53 PM

July 27, 2023

Lars Kellogg-Stedman

Processing deeply nested JSON with jq streams

I recently found myself wanting to perform a few transformations on a large OpenAPI schema. In particular, I wanted to take the schema available from the /openapi/v2 endpoint of a Kubernetes server and minimize it by (a) extracting a subset of the definitions and (b) removing all the description attributes.

The first task is relatively easy, since everything of interest exists at the same level in the schema. If I want one or more specific definitions, I can simply ask for those by key. For example, if I want the definition of a DeploymentConfig object, I can run:

jq '.definitions."com.github.openshift.api.apps.v1.DeploymentConfig"' < openapi.json

So simple! And so wrong! Because while that does extract the required definition, that definition is not self-contained: it refers to other definitions via $ref pointers. The real solution would require code that parses the schema, resolves all the $ref pointers, and spits out a fully resolved schema. Fortunately, in this case we can get what we need by asking for schemas matching a few specific prefixes. Using jq, we can match keys against a prefix by:

  • Using the to_entries filter to transform a dictionary into a list of {"key": ..., "value": ...} dictionaries, and then
  • Using select with the startswith function to match specific keys, and finally
  • Reconstructing the data with from_entries

Which looks like:

jq '[.definitions|to_entries[]|select(
(.key|startswith("com.github.openshift.api.apps.v1.Deployment")) or
(.key|startswith("io.k8s.apimachinery")) or
(.key|startswith("io.k8s.api.core"))
)]|from_entries' < openapi.json

That works, but results in almost 500KB of output, which seems excessive. We could further reduce the size of the document by removing all the description elements, but here is where things get tricky: description attributes can occur throughout the schema hierarchy, so we can’t use a simple path (...|del(.value.description) to remove them.

A simple solution is to use sed:

jq ... | sed '/"description"/d'

While normally I would never use sed for processing JSON, that actually works in this case: because we’re first running the JSON document through jq, we can be confident about the formatting of the document being passed through sed, and anywhere the string "description" is contained in the value of an attribute the quotes will be escaped so we would see \"description\".

We could stop here and things would be just fine…but I was looking for a way to perform the same operation in a structured fashion. What I really wanted was an equivalent to xpath’s // operator (e.g., the path //description would find all <description> elements in a document, regardless of how deeply they were nested), but no such equivalent exists in jq. Then I came across the tostream filter, which is really neat: it transforms a JSON document into a sequence of [path, leaf-value] nodes (or [path] to indicate the end of an array or object).

That probably requires an example. The document:

{
"name": "gizmo",
"color": "red",
"count": {
"local": 1,
"warehouse": 3
}
}

When converted into a stream becomes:

[["name"],"gizmo"]
[["color"],"red"]
[["count","local"],1]
[["count","warehouse"],3]
[["count","warehouse"]]
[["count"]]

You can see how each attribute is represented by a tuple. For example, for .count.local, the first element of the tuple is ["count", "local"], representing that path to the value in the document, and the second element is the value itself (1). The “end” of an object is indicated by a 1-tuple ([path]), such as [["count"]] at the end of this example.

If we convert the OpenAPI schema to a stream, we’ll end up with nodes for the description attributes that look like this:

[
[
"com.github.openshift.api.apps.v1.DeploymentCause",
"properties",
"imageTrigger",
"description"
],
"ImageTrigger contains the image trigger details, if this trigger was fired based on an image change"
]

To match those, we need to look for nodes for which the last element of the first item is description. That is:

...|tostream|select(.[0][-1]=="description"))

Of course, we don’t want to select those nodes; we want to delete them:

...|tostream|del(select(.[0][-1]=="description")))

And lastly, we need to feed the result back to the fromstream function to reconstruct the document. Putting all of that together – and populating some required top-level keys so that we end up with a valid OpenAPI schema – looks like this:

jq '
fromstream(
{
"swagger": .swagger,
"definitions": [
.definitions|to_entries[]|select(
(.key|startswith("com.github.openshift.api.apps.v1.Deployment")) or
(.key|startswith("io.k8s.apimachinery")) or
(.key|startswith("io.k8s.api.core"))
)]|from_entries
}|tostream|del(select(.[0][-1]=="description"))|select(. != null)
)
'

In my environment, this reduces the size of the resulting file from about 500KB to around 175KB.

July 27, 2023 12:00 AM

July 15, 2023

Lars Kellogg-Stedman

Managing containers with Pytest fixtures

A software fixture “sets up a system for the software testing process by initializing it, thereby satisfying any preconditions the system may have”. They allow us to perform setup and teardown tasks, provide state or set up services required for our tests, and perform other initialization tasks. In this article, we’re going to explore how to use fixtures in Pytest to create and tear down containers as part of a test run.

Pytest Fixtures

Pytest fixtures are created through the use of the fixture decorator. A fixture is accessed by including a function parameter with the fixture name in our test functions. For example, if we define an example fixture:

@pytest.fixture
def example():
return "hello world"

Then we can write a test function like this:

def test_something(example):
...

And it will receive the string “hello world” as the value of the example parameter.

There are a number of built-in fixtures available; for example, the tmp_path fixture provides access to a temporary directory that is unique to each test function. The following function would create a file named myfile in the temporary directory; the file (in fact, the entire directory) will be removed automatically when the function completes:

def test_something(tmp_path):
 with (tmp_path / "myfile").open() as fd:
 fd.write('this is a test')

A fixture can declare a scope; the default is the function scope – a new value will be generated for each function. A fixture can also be declared with a scope of class, module, package, or session (where “session” means, effectively, a distinct run of pytest).

Fixtures can be located in the same files as your tests, or they can be placed in a conftest.py file where they can be shared between multiple sets of tests.

Communicating with Docker

In order to manage containers as part of the test process we’re going to need to interact with Docker. While we could call out to the docker CLI from our tests, a more graceful solution is to use the Docker client for Python. That means we’ll need a Docker client instance, so we start with a very simple fixture:

import docker

@pytest.fixture(scope="session")
def docker_client():
 """Return a Docker client"""
 return docker.from_env()

This returns a Docker client initialized using values from the environment (in other words, it behaves very much like the docker cli).

I’ve made this a session scoped fixture (which means we create one Docker client object at per pytest run, and every test using this fixture will receive the same object). This makes sense in general because a Docker client is stateless; there isn’t any data we need to reset between tests.

Starting a container, version 1

For the purposes of this article, let’s assume we want to spin up a MariaDB server in a container. From the command line we might run something like this:

docker run -d \
-e MARIADB_ROOT_PASSWORD=secret \
-e MARIADB_USER=testuser \
-e MARIADB_DATABASE=testdb \
mariadb:10

Looking through the Docker python API documentation, a naïve Python equivalent might look like this:


import docker
import pytest
@pytest.fixture
def mariadb_container(
docker_client,
):
"""Create a MariaDB container"""
container = docker_client.containers.run(
"docker.io/mariadb:11",
detach=True,
environment={
"MARIADB_ROOT_PASSWORD": "secret",
"MYSQL_PWD": "secret",
"MARIADB_DATABASE": "testdb",
},
)
return container

This works, but it’s not great. In particular, the container we create will hang around until we remove it manually, since we didn’t arrange to remove the container on completion. Since this is a function scoped fixture, we would end up with one container per test (potentially leading to hundreds of containers running for a large test suite).

Starting a container, version 2

Let’s take care of the biggest problem with the previous implementation and ensure that our containers get cleaned up. We can add cleanup code to a fixture by using a yield fixture; instead of return-ing a value, we yield a value, and any cleanup code after the yield statement runs when the fixture is no longer in scope.

That might look like:


import docker
import pytest
@pytest.fixture
def mariadb_container(
docker_client,
):
"""Create a MariaDB container"""
container = docker_client.containers.run(
"docker.io/mariadb:11",
detach=True,
environment={
"MARIADB_ROOT_PASSWORD": "secret",
"MYSQL_PWD": "secret",
"MARIADB_DATABASE": "testdb",
},
)
yield container
container.remove(force=True)

That’s better, but we’re not out of the woods yet. How would we use this fixture in a test? Maybe we would try something like this:


import mysql.connector
def test_simple_select(mariadb_container):
# get the address of the mariadb container
mariadb_container.reload()
addr = mariadb_container.attrs["NetworkSettings"]["Networks"]["bridge"]["IPAddress"]
# create a connection objects
conn = mysql.connector.connect(
host=addr, user="root", password="secret", database="testdb"
)
# try a simple select statement
curs = conn.cursor()
curs.execute("select 1")
res = curs.fetchone()
assert res[0] == 1

First of all, that’s not a great test; there’s too much setup happening in the test that we would have to repeat before every additional test. And more importantly, if you were to try to run that test it would probably fail with:

E mysql.connector.errors.InterfaceError: 2003: Can't connect to MySQL
server on '172.17.0.2:3306' (111 Connection refused)

The problem is that when we start the MariaDB container, MariaDB isn’t ready to handle connections immediately. It takes a couple of seconds after starting the container before the server is ready. Because we haven’t accounted for that in our test, there’s nothing listening when we try to connect.

A step back and a moving forward

To resolve the issues in the previous example, let’s first take a step back. For our test, we don’t actually want a container; what we want is the ability to perform SQL queries in our test with a minimal amount of boilerplate. Ideally, our test would look more like this:

def test_simple_select(mariadb_cursor):
 curs.execute('select 1')
 res = curs.fetchone()
 assert res[0] == 1

How do we get there?

Working backwards, we would need a mariadb_cursor fixture:

@pytest.fixture
def mariadb_cursor(...):
 ...

But to get a database cursor, we need a database connection:

@pytest.fixture
def mariadb_connection(...):
 ...

And to create a database connection, we need to know the address of the database server:

@pytest.fixture
def mariadb_host(...):
 ...

Let’s start filling in all those ellipses.

What would the mariadb_host fixture look like? We saw in our earlier test code how to get the address of a Docker container. Much like the situation with the database server, we want to account for the fact that it might take a nonzero amount of time for the container network setup to complete, so we can use a simple loop in which we check for the address and return it if it’s available, otherwise sleep a bit and try again:

@pytest.fixture
def mariadb_host(mariadb_container):
 while True:
 mariadb_container.reload()
 try:
 networks = list(
 mariadb_container.attrs["NetworkSettings"]["Networks"].values()
 )
 addr = networks[0]["IPAddress"]
 return addr
 except KeyError:
 time.sleep(0.5)

This works by repeatedly refreshing information about the container until we can find an ip address.

Now that we have the address of the database server, we can create a connection:

@pytest.fixture
def mariadb_connection(mariadb_host):
 while True:
 try:
 conn = mysql.connector.connect(
 host=mariadb_host, user="root", password="secret", database="testdb"
 )
 return conn
 except mysql.connector.errors.InterfaceError:
 time.sleep(1)

The logic here is very similar; we keep attempting to establish a connection until we’re successful, at which point we return the connection object.

Now that we have a fixture that gives us a functioning database connection, we can use that to acquire a cursor:

from contextlib import closing

@pytest.fixture
def mariadb_cursor(mariadb_connection):
 with closing(mariadb_connection.cursor()) as cursor:
 yield cursor

The closing method from the contextlib module returns a context manager that calls the close method on the given object when leaving the with context; this ensures that the cursor is closed when we’re done with it. We could have accomplished the same thing by writing this instead:

def mariadb_cursor(mariadb_connection):
 cursor = mariadb_connection.cursor()
 yield cursor
 cursor.close()

Putting all of this together gets us a conftest.py that looks something like:


import pytest
import docker
import time
import mysql.connector
from contextlib import closing
@pytest.fixture(scope="session")
def docker_client():
"""Return a Docker client"""
return docker.from_env()
@pytest.fixture
def mariadb_container(
docker_client,
):
"""Create a MariaDB container"""
container = docker_client.containers.run(
"docker.io/mariadb:11",
detach=True,
environment={
"MARIADB_ROOT_PASSWORD": "secret",
"MYSQL_PWD": "secret",
"MARIADB_DATABASE": "testdb",
},
)
yield container
container.remove(force=True)
@pytest.fixture
def mariadb_host(mariadb_container):
while True:
mariadb_container.reload()
try:
networks = list(
mariadb_container.attrs["NetworkSettings"]["Networks"].values()
)
addr = networks[0]["IPAddress"]
return addr
except KeyError:
time.sleep(0.5)
@pytest.fixture
def mariadb_connection(mariadb_host):
while True:
try:
conn = mysql.connector.connect(
host=mariadb_host, user="root", password="secret", database="testdb"
)
return conn
except mysql.connector.errors.InterfaceError:
time.sleep(1)
@pytest.fixture
def mariadb_cursor(mariadb_connection):
with closing(mariadb_connection.cursor()) as cursor:
yield cursor

And that allows us to dramatically simplify our test:


def test_simple_select(mariadb_cursor):
mariadb_cursor.execute("select 1")
res = mariadb_cursor.fetchone()
assert res[0] == 1

So we’ve accomplished our goal.

Additional improvements

Things we’re ignoring

In order to keep this post to a reasonable size, we haven’t bothered to create an actual application, which means we haven’t had to worry about things like initializing the database schema. In reality, we would probably handle that in a new or existing fixture.

Replaced hardcoded values

While our fixture does the job, we’re using a number of hardcoded values (for the username, the database name, the password, etc). This isn’t inherently bad for a test environment, but it can sometimes mask errors in our code (for example, if we pick values that match default values in our code, we might miss errors that crop up when using non-default values).

We can replace fixed strings with fixtures that produce random values (or values with a random component, if we want something a little more human readable). In the following example, we have a random_string fixture that produces an 8 character random string, and then we use that to produce a password and a database name:

import string
import random


@pytest.fixture
def random_string():
 return "".join(random.choices(string.ascii_letters + string.digits, k=8))


@pytest.fixture
def mariadb_dbpass(random_string):
 return f"secret-{random_string}"


@pytest.fixture
def mariadb_dbname(random_string):
 return f"testdb-{random_string}"

We would incorporate these into our existing fixtures wherever we need the database password or name:

@pytest.fixture(scope="session")
def mariadb_container(
 docker_client,
 random_string,
 mariadb_dbpass,
 mariadb_dbname,
):
 """Create a MariaDB container"""
 container = docker_client.containers.run(
 "docker.io/mariadb:11",
 name=f"mariadb-test-{random_string}",
 detach=True,
 environment={
 "MARIADB_ROOT_PASSWORD": mariadb_dbpass,
 "MYSQL_PWD": mariadb_dbpass,
 "MARIADB_DATABASE": mariadb_dbname,
 },
 )

 yield container

 container.remove(force=True)

(and so forth)

Consider a session scoped container

The fixtures we’ve developed in this post have all been function scoped, which means that we’re creating and tearing down a container for every single function. This will substantially increase the runtime of our tests. We may want to consider using session scoped fixtures instead; this would bring up a container and it use it for all our tests, only cleaning it up at the end of the test run.

The advantage here is that the impact on the test run time is minimal. The disadvantage is that we have to be very careful about the interaction between tests, since we would no longer be starting each test with a clean version of the database.

Keep in mind that in Pytest, a fixture can only reference other fixtures that come from the same or “broader” scope (so, a function scoped fixture can use a session scoped fixture, but the opposite is not true). In particular, that means if we were to make our mariadb_container fixture session-scoped, we would need to make the same change to its dependencies (mariadb_dbname, mariadb_dbpass, etc).


You can find a version of conftest.py with these changes here.

July 15, 2023 12:00 AM

April 18, 2023

RDO Blog

RDO Antelope Released

RDO Antelope Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack 2023.1 Antelope for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Antelope is the 27th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.

The release is already available for CentOS Stream 9 on the CentOS mirror network in:

http://mirror.stream.centos.org/SIGs/9-stream/cloud/x86_64/openstack-antelope/

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/antelope/highlights.html but here are some highlights:

  • The continuation of SRBAC and FIPS to make OpenStack a more secure platform across various services, along with additional support in images.
  • Additional drivers and features for Block Storage to support more technologies from vendors such as Dell, Hitachi and NetApp, among others.
  • DNS Zones that can now be shared with other tenants (projects) allowing them to create and manage recordsets within the Zone.
  • Networking port forwarding was added to the dashboard for Floating IPs.
  • Additional networking features to support OVN.
  • Compute now allows PCI devices to be scheduled via the Placement API and power consumption can be managed for dedicated CPUs.
  • Load balancing now allows users to enable cpu-pinning.
  • Community testing of compatibility between non-adjacent upstream versions.

OpenStack Antelope is the first release marked as Skip Level Upgrade Release Process or SLURP. According to this model (https://governance.openstack.org/tc/resolutions/20220210-release-cadence-adjustment.html) this means that upgrades will be supported between these (SLURP) releases, in addition to between adjacent major releases.

TripleO removal in the RDO Antelope release: During the Antelope cycle, The TripleO team communicated the decision of abandoning the development of the project and deprecating the master branches. According to that upstream decision, TripleO packages have been removed from the RDO distribution and will not be included in the Antelope release.

Contributors During the Zed cycle, we saw the following new RDO contributors:

  • Adrian Fusco Arnejo
  • Bhagyashri Shewale
  • Eduardo Olivares
  • Elvira Garcia Ruiz
  • Enrique Vallespí
  • Jason Paroly
  • Juan Badia Payno
  • Karthik Sundaravel
  • Roberto Alfieri
  • Tom Weininger

Welcome to all of you and Thank You So Much for participating! But we wouldn’t want to overlook anyone.

A super massive Thank You to all 52 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and rdo-website repositories:

  • Adrian Fusco Arnejo
  • Alan Pevec
  • Alfredo Moralejo Alonso
  • Amol Kahat
  • Amy Marrich
  • Ananya Banerjee
  • Artom Lifshitz
  • Arx Cruz
  • Bhagyashri Shewale
  • Cédric Jeanneret
  • Chandan Kumar
  • Daniel Pawlik
  • Dariusz Smigiel
  • Dmitry Tantsur
  • Douglas Viroel
  • Eduardo Olivares
  • Elvira Garcia Ruiz
  • Emma Foley
  • Eric Harney
  • Enrique Vallespí
  • Fabien Boucher
  • Harald Jensas
  • Jakob Meng
  • Jason Paroly
  • Jesse Pretorius
  • Jiří Podivín
  • Joel Capitao
  • Juan Badia Payno
  • Julia Kreger
  • Karolina Kula
  • Karthik Sundaravel
  • Leif Madsen
  • Luigi Toscano
  • Luis Tomas Bolivar
  • Marios Andreou
  • Martin Kopec
  • Matthias Runge
  • Matthieu Huin
  • Nicolas Hicher
  • Pooja Jadhav
  • Rabi Mishra
  • Riccardo Pittau
  • Roberto Alfieri
  • Ronelle Landy
  • Sandeep Yadav
  • Sean Mooney
  • Slawomir Kaplonski
  • Steve Baker
  • Takashi Kajinami
  • Tobias Urdin
  • Tom Weininger
  • Yatin Karel

The Next Release Cycle

At the end of one release, focus shifts immediately to the next release i.e Bobcat.

Get Started

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help

The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on OFTC IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS devel mailing list and the CentOS IRC channels (#centos, #centos-cloud, and #centos-devel in Libera.Chat network), however we have a more focused audience within the RDO venues.

Get Involved

To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation. Join us in #rdo on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

by Amy Marrich at April 18, 2023 01:44 PM

February 19, 2023

Lars Kellogg-Stedman

NAT between identical networks using VRF

<p>Last week, Oskar Stenberg asked on <a href="https://unix.stackexchange.com/q/735931/4989">Unix &amp; Linux</a> if it were possible to configure connectivity between two networks, both using the same address range, without involving network namespaces. That is, given this high level view of the network&hellip;</p> <p><a href="https://excalidraw.com/#json=uuXRRZ2ybaAXiUvbQVkNO,krx3lsbf12c-tDhuWtRjbg"><img alt="two networks with the same address range connected by a host named &ldquo;middleman&rdquo;" src="https://blog.oddbit.com/post/2023-02-19-vrf-and-nat/the-problem.svg"></a></p> <p>&hellip;can we set things up so that hosts on the &ldquo;inner&rdquo; network can communicate with hosts on the &ldquo;outer&rdquo; network using the range <code>192.168.3.0/24</code>, and similarly for communication in the other direction?</p> <h2 id="setting-up-a-lab">Setting up a lab</h2> <p>When investigating this sort of networking question, I find it easiest to reproduce the topology in a virtual environment so that it&rsquo;s easy to test things out. I generally use <a href="https://mininet.org">Mininet</a> for this, which provides a simple Python API for creating virtual nodes and switches and creating links between them.</p> <p>I created the following network topology for this test:</p> <figure class="center" > <img src="topology-1.svg" alt="virtual network topology diagram" /> </figure> <p>In the rest of this post, I&rsquo;ll be referring to these hostnames.</p> <p>See the bottom of this post for a link to the repository that contains the complete test environment.</p> <h2 id="vrf-in-theory">VRF in theory</h2> <p>VRF stands for &ldquo;Virtual Routing and Forwarding&rdquo;. From the <a href="https://en.wikipedia.org/wiki/Virtual_routing_and_forwarding">Wikipedia article on the topic</a>:</p> <blockquote> <p>In IP-based computer networks, virtual routing and forwarding (VRF) is a technology that allows multiple instances of a routing table to co-exist within the same router at the same time. One or more logical or physical interfaces may have a VRF and these VRFs do not share routes therefore the packets are only forwarded between interfaces on the same VRF. VRFs are the TCP/IP layer 3 equivalent of a VLAN. Because the routing instances are independent, the same or overlapping IP addresses can be used without conflicting with each other. Network functionality is improved because network paths can be segmented without requiring multiple routers.<a href="https://blog.oddbit.com/post/2023-02-19-vrf-and-nat/the-problem.svg">1</a></p> </blockquote> <p>In Linux, VRF support is implemented as a <a href="https://docs.kernel.org/networking/vrf.html">special type of network device</a>. A VRF device sets up an isolated routing domain; network traffic on devices associated with a VRF will use the routing table associated with that VRF, rather than the main routing table, which permits us to connect multiple networks with overlapping address ranges.</p> <p>We can create new VRF devices with the <code>ip link add</code> command:</p> <pre tabindex="0"><code>ip link add vrf-inner type vrf table 100 </code></pre><p>Running the above command results in the following changes:</p> <ul> <li> <p>It creates a new network device named <code>vrf-inner</code></p> </li> <li> <p>It adds a new route policy rule (if it doesn&rsquo;t already exist) that looks like:</p> <pre tabindex="0"><code>1000: from all lookup [l3mdev-table] </code></pre><p>This causes route lookups to use the appropriate route table for interfaces associated with a VRF.</p> </li> </ul> <p>After creating a VRF device, we can add interfaces to it like this:</p> <pre tabindex="0"><code>ip link set eth0 master vrf-inner </code></pre><p>This associates the given interface with the VRF device, and it moves all routes associated with the interface out of the <code>local</code> and <code>main</code> routing tables and into the VRF-specific routing table.</p> <p>You can see a list of vrf devices by running <code>ip vrf show</code>:</p> <pre tabindex="0"><code># ip vrf show Name Table ----------------------- vrf-inner 100 </code></pre><p>You can see a list of devices associated with a particular VRF with the <code>ip link</code> command:</p> <pre tabindex="0"><code># ip -brief link show master vrf-inner eth0@if448 UP 72:87:af:d3:b5:f9 &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; </code></pre><h2 id="vrf-in-practice">VRF in practice</h2> <p>We&rsquo;re going to create two VRF devices on the <code>middleman</code> host; one associated with the &ldquo;inner&rdquo; network and one associated with the &ldquo;outer&rdquo; network. In our virtual network topology, the <code>middleman</code> host has two network interfaces:</p> <ul> <li><code>middleman-eth0</code> is connected to the &ldquo;inner&rdquo; network</li> <li><code>middleman-eth1</code> is connected to the &ldquo;outer&rdquo; network</li> </ul> <p>Both devices have the same address (<code>192.168.2.1</code>):</p> <pre tabindex="0"><code># ip addr show 2: middleman-eth0@if426: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue master vrf-inner state UP group default qlen 1000 link/ether 32:9e:01:2e:78:2f brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.2.1/24 brd 192.168.2.255 scope global middleman-eth0 valid_lft forever preferred_lft forever root@mininet-vm:~/unix-735931# ip addr show middleman-eth1 3: middleman-eth1@if427: &lt;BROADCAST,MULTICAST,UP,LOWER_UP&gt; mtu 1500 qdisc noqueue master vrf-outer state UP group default qlen 1000 link/ether 12:be:9a:09:33:93 brd ff:ff:ff:ff:ff:ff link-netnsid 0 inet 192.168.2.1/24 brd 192.168.2.255 scope global middleman-eth1 valid_lft forever preferred_lft forever </code></pre><p>And the main routing table looks like this:</p> <pre tabindex="0"><code># ip route show 192.168.2.0/24 dev middleman-eth1 proto kernel scope link src 192.168.2.1 192.168.2.0/24 dev middleman-eth0 proto kernel scope link src 192.168.2.1 </code></pre><p>If you&rsquo;re at all familiar with Linux network configuration, that probably looks weird. Right now this isn&rsquo;t a particularly functional network configuration, but we can fix that!</p> <p>To create our two VRF devices, we run the following commands:</p> <pre tabindex="0"><code>ip link add vrf-inner type vrf table 100 ip link add vrf-outer type vrf table 200 ip link set vrf-inner up ip link set vrf-outer up </code></pre><p>This associates <code>vrf-inner</code> with route table 100, and <code>vrf-outer</code> with route table 200. At this point, tables 100 and 200 are empty:</p> <pre tabindex="0"><code># ip route show table 100 Error: ipv4: FIB table does not exist. Dump terminated # ip route show table 200 Error: ipv4: FIB table does not exist. Dump terminated </code></pre><p>Next, we add our interfaces to the appropriate VRF devices:</p> <pre tabindex="0"><code>ip link set middleman-eth0 master vrf-inner ip link set middleman-eth1 master vrf-outer </code></pre><p>After running these commands, there are no routes left in the main routing table:</p> <pre tabindex="0"><code># ip route show &lt;no output&gt; </code></pre><p>And the routes associated with our two physical interfaces are now contained by the appropriate VRF routing tables. Here&rsquo;s table 100:</p> <pre tabindex="0"><code>root@mininet-vm:~/unix-735931# ip route show table 100 broadcast 192.168.2.0 dev middleman-eth0 proto kernel scope link src 192.168.2.1 192.168.2.0/24 dev middleman-eth0 proto kernel scope link src 192.168.2.1 local 192.168.2.1 dev middleman-eth0 proto kernel scope host src 192.168.2.1 broadcast 192.168.2.255 dev middleman-eth0 proto kernel scope link src 192.168.2.1 </code></pre><p>And table 200:</p> <pre tabindex="0"><code>root@mininet-vm:~/unix-735931# ip route show table 200 broadcast 192.168.2.0 dev middleman-eth1 proto kernel scope link src 192.168.2.1 192.168.2.0/24 dev middleman-eth1 proto kernel scope link src 192.168.2.1 local 192.168.2.1 dev middleman-eth1 proto kernel scope host src 192.168.2.1 broadcast 192.168.2.255 dev middleman-eth1 proto kernel scope link src 192.168.2.1 </code></pre><p>This configuration effectively gives us two isolated networks:</p> <figure class="center" > <img src="topology-2.svg" alt="virtual network topology diagram" /> </figure> <p>We can verify that nodes in the &ldquo;inner&rdquo; and &ldquo;outer&rdquo; networks are now able to communicate with <code>middleman</code>. We can reach <code>middleman</code> from <code>innernode0</code>; in this case, we&rsquo;re communicating with interface <code>middleman-eth0</code>:</p> <pre tabindex="0"><code>innernode0# ping -c1 192.168.2.1 PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data. 64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=0.126 ms --- 192.168.2.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.126/0.126/0.126/0.000 ms </code></pre><p>Similarly, we can reach <code>middleman</code> from <code>outernode</code>, but in this case we&rsquo;re communicating with interface <code>middleman-eth1</code>:</p> <pre tabindex="0"><code>outernode0# ping -c1 192.168.2.1 PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data. 64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=1.02 ms --- 192.168.2.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 1.020/1.020/1.020/0.000 ms </code></pre><h2 id="configure-routing-on-the-nodes">Configure routing on the nodes</h2> <p>Our goal is to let nodes on one side of the network to use the address range <code>192.168.3.0/24</code> to refer to nodes on the other side of the network. Right now, if we were to try to access <code>192.168.3.10</code> from <code>innernode0</code>, the attempt would fail with:</p> <pre tabindex="0"><code>innernode0# ping 192.168.3.10 ping: connect: Network is unreachable </code></pre><p>The &ldquo;network is unreachable&rdquo; message means that <code>innernode0</code> has no idea where to send that request. That&rsquo;s because at the moment, the routing table on all the nodes look like:</p> <pre tabindex="0"><code>innernode0# ip route 192.168.2.0/24 dev outernode0-eth0 proto kernel scope link src 192.168.2.10 </code></pre><p>There is neither a default gateway nor a network-specific route appropriate for <code>192.168.3.0/24</code> addresses. Let&rsquo;s add a network route that will route that address range through <code>middleman</code>:</p> <pre tabindex="0"><code>innernode0# ip route add 192.168.3.0/24 via 192.168.2.1 innernode0# ip route 192.168.2.0/24 dev innernode0-eth0 proto kernel scope link src 192.168.2.10 192.168.3.0/24 via 192.168.2.1 dev innernode0-eth0 </code></pre><p>This same change needs to be made on all the <code>innernode*</code> and <code>outernode*</code> nodes.</p> <p>With the route in place, attempts to reach <code>192.168.3.10</code> from <code>innernode0</code> will still fail, but now they&rsquo;re getting rejected by <code>middleman</code> because <em>it</em> doesn&rsquo;t have any appropriate routes:</p> <pre tabindex="0"><code>innernode0# ping -c1 192.168.3.10 PING 192.168.3.10 (192.168.3.10) 56(84) bytes of data. From 192.168.2.1 icmp_seq=1 Destination Net Unreachable --- 192.168.3.10 ping statistics --- 1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms </code></pre><p>We need to tell <code>middleman</code> what to do with these packets.</p> <h2 id="configure-routing-and-nat-on-middleman">Configure routing and NAT on middleman</h2> <p>In order to achieve our desired connectivity, we need to:</p> <ol> <li>Map the <code>192.168.3.0/24</code> destination address to the equivalent <code>192.168.2.0/24</code> address <em>before</em> the kernel makes a routing decision.</li> <li>Map the <code>192.168.2.0/24</code> source address to the equivalent <code>192.168.3.0/24</code> address <em>after</em> the kernel makes a routing decision (so that replies will go back to &ldquo;other&rdquo; side).</li> <li>Ensure that the kernel uses the routing table for the <em>target</em> network when making routing decisions for these connections.</li> </ol> <p>We can achieve (1) and (2) using the netfilter <a href="https://www.netfilter.org/documentation/HOWTO/netfilter-extensions-HOWTO-4.html#ss4.4"><code>NETMAP</code></a> extension by adding the following two rules:</p> <pre tabindex="0"><code>iptables -t nat -A PREROUTING -d 192.168.3.0/24 -j NETMAP --to 192.168.2.0/24 iptables -t nat -A POSTROUTING -s 192.168.2.0/24 -j NETMAP --to 192.168.3.0/24 </code></pre><p>For incoming traffic destined for the 192.168.3.0/24 network, this maps the destination address to the matching <code>192.168.2.0/24</code> address. For outgoing traffic with a source address on the <code>192.168.2.0/24</code> network, this maps the source to the equivalent <code>192.168.3.0/24</code> network (so that the recipient see the traffic as coming from &ldquo;the other side&rdquo;).</p> <p>(For those of you wondering, &ldquo;can we do this using <code>nftables</code> instead?&rdquo;, as of this writing <a href="https://wiki.nftables.org/wiki-nftables/index.php/Supported_features_compared_to_xtables#NETMAP"><code>nftables</code> does not appear to have <code>NETMAP</code> support</a>, so we have to use <code>iptables</code> for this step.)</p> <p>With this change in place, re-trying that <code>ping</code> command on <code>innernode0</code> will apparently succeed:</p> <pre tabindex="0"><code>innernode0 ping -c1 192.168.3.10 PING 192.168.3.10 (192.168.3.10) 56(84) bytes of data. 64 bytes from 192.168.3.10: icmp_seq=1 ttl=63 time=0.063 ms --- 192.168.3.10 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.063/0.063/0.063/0.000 ms </code></pre><p>However, running <code>tcpdump</code> on <code>middleman</code> will show us that we haven&rsquo;t yet achieved our goal:</p> <pre tabindex="0"><code>12:59:52.899054 middleman-eth0 In IP 192.168.2.10 &gt; 192.168.3.10: ICMP echo request, id 16520, seq 1, length 64 12:59:52.899077 middleman-eth0 Out IP 192.168.3.10 &gt; 192.168.2.10: ICMP echo request, id 16520, seq 1, length 64 12:59:52.899127 middleman-eth0 In IP 192.168.2.10 &gt; 192.168.3.10: ICMP echo reply, id 16520, seq 1, length 64 12:59:52.899130 middleman-eth0 Out IP 192.168.3.10 &gt; 192.168.2.10: ICMP echo reply, id 16520, seq 1, length 64 </code></pre><p>You can see that our packet is coming on on <code>middleman-eth0</code>&hellip;and going right back out the same interface. We have thus far achieved a very complicated loopback interface.</p> <p>The missing piece is some logic to have the kernel use the routing table for the &ldquo;other side&rdquo; when making routing decisions for these packets. We&rsquo;re going to do that by:</p> <ol> <li>Tagging packets with a mark that indicates the interface on which they were recieved</li> <li>Using this mark to select an appropriate routing table</li> </ol> <p>We add the packet mark by adding these rules to the <code>MANGLE</code> table <code>PREROUTING</code> chain:</p> <pre tabindex="0"><code>iptables -t mangle -A PREROUTING -i middleman-eth0 -d 192.168.3.0/24 -j MARK --set-mark 100 iptables -t mangle -A PREROUTING -i middleman-eth1 -d 192.168.3.0/24 -j MARK --set-mark 200 </code></pre><p>And we utilize that mark in route lookups by adding the following two route policy rules:</p> <pre tabindex="0"><code>ip rule add prio 100 fwmark 100 lookup 200 ip rule add prio 200 fwmark 200 lookup 100 </code></pre><p>It is critical that these rules come before (aka &ldquo;have a higher priority than&rdquo;, aka &ldquo;have a lower number than&rdquo;) the <code>l3mdev</code> rule added when we created the VRF devices.</p> <h2 id="validation-does-it-actually-work">Validation: Does it actually work?</h2> <p>With that last set of changes in place, if we repeat the <code>ping</code> test from <code>innernode0</code> to <code>outernode0</code> and run <code>tcpdump</code> on <code>middleman</code>, we see:</p> <pre tabindex="0"><code>13:05:27.667793 middleman-eth0 In IP 192.168.2.10 &gt; 192.168.3.10: ICMP echo request, id 16556, seq 1, length 64 13:05:27.667816 middleman-eth1 Out IP 192.168.3.10 &gt; 192.168.2.10: ICMP echo request, id 16556, seq 1, length 64 13:05:27.667863 middleman-eth1 In IP 192.168.2.10 &gt; 192.168.3.10: ICMP echo reply, id 16556, seq 1, length 64 13:05:27.667868 middleman-eth0 Out IP 192.168.3.10 &gt; 192.168.2.10: ICMP echo reply, id 16556, seq 1, length 64 </code></pre><p>Now we finally see the desired behavior: the request from <code>innernode0</code> comes in on <code>eth0</code>, goes out on <code>eth1</code> with the addresses appropriately mapped and gets delivered to <code>outernode0</code>. The reply from <code>outernode0</code> goes through the process in reverse, and arrives back at <code>innernode0</code>.</p> <h2 id="connection-tracking-or-one-more-thing">Connection tracking (or, &ldquo;One more thing&hellip;&rdquo;)</h2> <p>There is a subtle problem with the configuration we&rsquo;ve implemented so far: the Linux connection tracking mechanism (&quot;<a href="https://arthurchiao.art/blog/conntrack-design-and-implementation/">conntrack</a>&quot;) by default identifies a connection by the 4-tuple <code>(source_address, source_port, destination_address, destination_port)</code>. To understand why this is a problem, assume that we&rsquo;re running a web server on port 80 on all the &ldquo;inner&rdquo; and &ldquo;outer&rdquo; nodes.</p> <p>To connect from <code>innernode0</code> to <code>outernode0</code>, we could use the following command. We&rsquo;re using the <code>--local-port</code> option here because we want to control the source port of our connections:</p> <pre tabindex="0"><code>innernode0# curl --local-port 4000 192.168.3.10 </code></pre><p>To connect from <code>outernode0</code> to <code>innernode0</code>, we would use the same command:</p> <pre tabindex="0"><code>outernode0# curl --local-port 4000 192.168.3.10 </code></pre><p>If we look at the connection tracking table on <code>middleman</code>, we will see a single connection:</p> <pre tabindex="0"><code>middleman# conntrack -L tcp 6 115 TIME_WAIT src=192.168.2.10 dst=192.168.3.10 sport=4000 dport=80 src=192.168.2.10 dst=192.168.3.10 sport=80 dport=4000 [ASSURED] mark=0 use=1 </code></pre><p>This happens because the 4-tuple for our two connections is identical. Conflating connections like this can cause traffic to stop flowing if both connections are active at the same time.</p> <p>We need to provide the connection track subsystem with some additional information to uniquely identify these connections. We can do this by using the netfilter <code>CT</code> module to assign each connection to a unique conntrack origination &ldquo;zone&rdquo;:</p> <pre tabindex="0"><code>iptables -t raw -A PREROUTING -s 192.168.2.0/24 -i middleman-eth0 -j CT --zone-orig 100 iptables -t raw -A PREROUTING -s 192.168.2.0/24 -i middleman-eth1 -j CT --zone-orig 200 </code></pre><p>What is a &ldquo;zone&rdquo;? From <a href="https://lore.kernel.org/all/4B9158F5.5040205@parallels.com/T/">the patch adding this feature</a>:</p> <blockquote> <p>A zone is simply a numerical identifier associated with a network device that is incorporated into the various hashes and used to distinguish entries in addition to the connection tuples.</p> </blockquote> <p>With these rules in place, if we repeat the test with <code>curl</code> we will see two distinct connections:</p> <pre tabindex="0"><code>middleman# conntrack -L tcp 6 117 TIME_WAIT src=192.168.2.10 dst=192.168.3.10 sport=4000 dport=80 zone-orig=100 src=192.168.2.10 dst=192.168.3.10 sport=80 dport=26148 [ASSURED] mark=0 use=1 tcp 6 115 TIME_WAIT src=192.168.2.10 dst=192.168.3.10 sport=4000 dport=80 zone-orig=200 src=192.168.2.10 dst=192.168.3.10 sport=80 dport=4000 [ASSURED] mark=0 use=1 </code></pre><h2 id="repository-and-demo">Repository and demo</h2> <p>You can find a complete test environment in <a href="https://github.com/larsks/unix-example-735931-1-1-nat">this repository</a>; that includes the mininet topology I mentioned at the beginning of this post as well as shell scripts to implement all the address, route, and netfilter configurations.</p> <p>And here&rsquo;s a video that runs through the steps described in this post:</p> <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;"> <iframe src="https://www.youtube.com/embed/Kws98JNKcxE" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe> </div>

February 19, 2023 12:00 AM

February 17, 2023

Lars Kellogg-Stedman

Simple error handling in C

Overview

I was recently working with someone else’s C source and I wanted to add some basic error checking without mucking up the code with a bunch of if statements and calls to perror. I ended up implementing a simple must function that checks the return value of an expression, and exits with an error if the return value is less than 0. You use it like this:

must(fd = open("textfile.txt", O_RDONLY));

Or:

must(close(fd));

In the event that an expression returns an error, the code will exit with a message that shows the file, line, and function in which the error occurred, along with the actual text of the called function and the output of perror:

example.c:24 in main: fd = open("does-not-exist.xt", O_RDONLY): [2]: No such file or directory

To be clear, this is only useful when you’re using functions that conform to standard Unix error reporting conventions, and if you’re happy with “exit with an error message” as the failure handling mechanism.

Implementation

The implementation starts with a macro defined in must.h:


#ifndef _MUST
#define _MUST
#define must(x) _must(__FILE__, __LINE__, __func__, #x, (x))
void _must(const char *fileName, int lineNumber, const char *funcName,
const char *calledFunction, int err);
#endif

The __FILE__, __LINE__, and __func__ symbols are standard predefined symbols provided by gcc; they are documented here. The expression #x is using the stringify operator to convert the macro argument into a string.

The above macro transforms a call to must() into a call to the _must() function, which is defined in must.c:


#include "must.h"
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
void _must(const char *fileName, int lineNumber, const char *funcName,
const char *calledFunction, int err) {
if (err < 0) {
char buf[256];
snprintf(buf, 256, "%s:%d in %s: %s: [%d]", fileName, lineNumber, funcName,
calledFunction, errno);
perror(buf);
exit(1);
}
}

In this function we check the value of err (which will be the return value of the expression passed as the argument to the must() macro), and if it evaluates to a number less than 0, we use snprintf() to generate a string that we can pass to perror(), and finally call perror() which will print our information string, a colon, and then the error message corresponding to the value of errno.

Example

You can see must() used in practice in the following example program:


#include "must.h"
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main() {
int fd;
char buf[1024];
printf("opening a file that does exist\n");
must(fd = open("file-that-exists.txt", O_RDONLY));
while (1) {
int nb;
must(nb = read(fd, buf, sizeof(buf)));
if (!nb)
break;
must(write(STDOUT_FILENO, buf, nb));
}
must(close(fd));
printf("opening a file that doesn't exist\n");
must(fd = open("file-that-does-not-exist.xt", O_RDONLY));
return 0;
}

Provided the file-that-exists.txt (a) exists and (b) contains the text Hello, world., and that file-that-does-not-exist.txt does not, in fact, exist, running the above code will produce the following output:

opening a file that does exist
Hello, world.
opening a file that doesn't exist
example.c:24 in main: fd = open("file-that-does-not-exist.xt", O_RDONLY): [2]: No such file or directory

February 17, 2023 12:00 AM

February 15, 2023

Lars Kellogg-Stedman

A review of the Garmin Fenix 6(x)

I’ve been using a Garmin Fenix 6x for a couple of weeks and thought it might be interesting to put together a short review.

Is it really a smartwatch?

I think it’s a misnomer to call the Fenix a “smartwatch”. I would call it a highly capable fitness tracker. That’s not a knock on the product; I really like it so far, but pretty much everything it does is centered around either fitness tracking or navigation. If you browse around the “Connect IQ” store, mostly you’ll find (a) watch faces, (b) fitness apps, and (c) navigation apps. It’s not able to control your phone (for the most part; there are some apps available that offer remote camera control and some other limited features); you can’t check your email on it, or send text messages, and you’ll never find a watch version of any major smartphone app.

So if you’re looking for a smartwatch, maybe look elsewhere. But if you’re looking for a great fitness tracker, this just might be your device.

Things I will not talk about

I don’t listen to music when I exercise. If I’m inside, I’m watching a video on a streaming service, and if I’m outside, I want to be able to hear my surroundings. So I won’t be looking at any aspects of music support on the Fenix.

All the data in one place

One of the things I really like about the Fenix is that I now have more of my activity and health data in one place.

As part of my exercise a use a Schwinn IC4 spin bike. Previously, I was using a Fitbit Charge 5, which works fine but meant exercise metrics ended up in multiple places: while I could collect heart rate with the Fitbit, to collect cycling data like cadence, power, etc, I needed to use another app on my phone (I used Wahoo Fitness). Additionally, Fitbit doesn’t support sharing data with Apple Health, so there wasn’t a great way to see a unified view of things.

This has all changed with the Fenix:

  • First and probably most importantly, the Fenix is able to utilize the sensor on the IC4 directly, so cadence/speed/distance data is collected in the same place as heart rate data.

  • Through the magic of the Gymnasticon project, the Fenix is also able to collect power data from the bike.

  • The Fenix is also great at tracking my outside bike rides, and of course providing basic heart rate and time tracking of my strength and PT workouts.

All of this means that Garmin’s tools (both their app and the Garmin Connect website) provide a great unified view of my fitness activities.

Notifications

This is an area in which I think there is a lot of room for improvement.

Like any good connected watch, you can configure your Fenix to receive notifications from your phone. Unfortunately, this is an all-or-nothing configuration; there’s no facility for blocking or selecting notifications from specific apps.

I usually have my phone in do-not-disturb mode, so notifications from Google or the New York Times app don’t interrupt me, but they show up in the notification center when I check for anything interesting. With my Fenix connected, I get interrupted every time something happens.

Having the ability to filter which notifications get sent to the watch would be an incredibly helpful feature.

Battery life

One of the reasons I have the 6x instead of the 6 is the increased battery size that comes along with the bigger watch. While the advertising touts a battery life of “up to 21 days with activity tracking and 24/7 wrist-based heart rate monitoring”, I’ve been seeing battery life closer to 1 week under normal use (which includes probably 10-20 miles of GPS-tracked bike rides a week).

I’ve been using the pulse oximeter at night, but I understand that can have a substantial contribution to battery drain; I’ve disabled it for now and I’ll update this post if it turns out that has a significant impact on battery life.

One of the reasons that the Fenix is able to get substantially better battery life than the Apple Watch is that the screen is far, far dimmer. By default, the screen brightness is set to 20%; you can increase that, but you’ll consume substantially more power by doing so. In well lit areas – outdoors, or under office lighting – the display is generally very easy to read even with the backlight low.

Ease of use

It’s a mixed bag.

The basic watch and fitness tracking functionality is easy to use, and I like the fact that it uses physical buttons rather than a touch screen (I’ve spent too much time struggling with touch screens in winter). The phone app itself is relatively easy to use, although the “Activities & Apps” screen has the bad habit of refreshing while you’re trying to use it.

I have found Garmin’s documentation to be very good, and highly search optimized. In most cases, when I’ve wanted to know how to do something on my watch I’ve been able to search for it on Google, and:

  • Garmin’s manual is usually the first result
  • The instructions are on point and clearly written

For example, I wanted to know how to remove an activity from the list of favorite activities, so I searched for garmin remove activity from favorites, which led me directly to this documentation.

This was exactly the information I needed. I’ve had similar success with just about everything I’ve searched for.

The Garmin Connect app and website are both generally easy to use and well organized. There is an emphasis on “social networking” aspects (share your activities! Join a group! Earn badges!) in which I have zero interest, and I wish there was a checkbox to simply disable those parts of the UI.

The place where things really fall over is the “IQ Connect” app store. There are many apps and watch faces there that require some sort of payment, but there’s no centralized payment processing facility so you end up getting sent to random payment processors all over the place depending on what the software author selected…and price information simply isn’t displayed in the app store at all unless an author happens to include it in the product description.

The UI for configuring custom watch faces is awful; it’s a small step up from someone just throwing a text editor at you and telling you to edit a file. For this reason I’ve mostly stuck with Garmin-produced watch faces (the built-in ones and a few from the app store), which tend to have high visual quality but aren’t very configurable.

Some random technical details

While Garmin doesn’t provide any Linux support at all, you can plug the watch into your Linux system and access the watch filesystem using any MTP client, including Gnome’s GVFS. While this isn’t going to replace your phone app, it does give you reasonably convenient access to activity tracking data (as .fit files).

The Fenix ships with reasonably complete US maps. I haven’t had the chance to assess their coverage of local hiking trails. You can load maps from the OpenStreetMap project, although the process for doing so is annoyingly baroque.

It is easy to load GPX tracks from your favorite hiking website onto the watch using the Garmin Connect website or phone app.

Wrapping up

I’m happy with the watch. It is a substantial upgrade from my Charge 5 in terms of fitness tracking, and aesthetically I like it as much as the Seiko SNJ025 I was previously wearing. It’s not a great smartwatch, but that’s not what I was looking for, and the battery life is much better than actual smart watches from Apple and Samsung.


A digression, in which I yell at All Trails

This isn’t a Garmin or Fenix issue, but I’d like to specially recognize All Trails for making the process of exporting a GPX file to Garmin Connect as painful as possible. You can’t do it at all from the phone app, so the process is something like:

  1. Use the All Trails app to find a hike you like
  2. Decide you want to send it to your watch
  3. Open a browser on your phone, go to https://alltrails.com, and log in (again, even though you were already logged in on the app)
  4. Find the hike again
  5. Download the GPX
  6. Open the downloads folder
  7. Open the GPX file
  8. Click the “share” button
  9. Find the Garmin Connect app

That is…completely ridiculous. The “Share” button in the All Trails app should provide an option to share the GPX version of the route so the above process could be collapsed into a single step. All Trails, why do you hate your users so much?

February 15, 2023 12:00 AM

February 14, 2023

Lars Kellogg-Stedman

Packet, packet, who's got the packet?

In this question, August Vrubel has some C code that sets up a tun interface and then injects a packet, but the packet seemed to disappear into the ether. In this post, I’d like to take a slightly extended look at my answer because I think it’s a great opportunity for learning a bit more about performing network diagnostics.

The original code looked like this:


#include <arpa/inet.h>
#include <fcntl.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <netinet/in.h>
#include <poll.h>
#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
static int tunAlloc(void) {
int fd;
struct ifreq ifr = {.ifr_name = "tun0", .ifr_flags = IFF_TUN | IFF_NO_PI};
fd = open("/dev/net/tun", O_RDWR);
ioctl(fd, TUNSETIFF, (void *)&ifr);
ioctl(fd, TUNSETOWNER, geteuid());
return fd;
}
// this is a test
static void bringInterfaceUp(void) {
int sock;
struct sockaddr_in addr = {.sin_family = AF_INET};
struct ifreq ifr = {.ifr_name = "tun0"};
inet_aton("172.30.0.1", &addr.sin_addr);
memcpy(&ifr.ifr_addr, &addr, sizeof(struct sockaddr));
sock = socket(AF_INET, SOCK_DGRAM, 0);
ioctl(sock, SIOCSIFADDR, &ifr);
ioctl(sock, SIOCGIFFLAGS, &ifr);
ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
ioctl(sock, SIOCSIFFLAGS, &ifr);
close(sock);
}
static void emitPacket(int tap_fd) {
unsigned char packet[] = {
0x45, 0x00, 0x00, 0x3c, 0xd8, 0x6f, 0x40, 0x00, 0x3f, 0x06, 0x08, 0x91,
172, 30, 0, 1, 192, 168, 255, 8, 0xa2, 0x9a, 0x27, 0x11,
0x80, 0x0b, 0x63, 0x79, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x02, 0xfa, 0xf0,
0x89, 0xd8, 0x00, 0x00, 0x02, 0x04, 0x05, 0xb4, 0x04, 0x02, 0x08, 0x0a,
0x5b, 0x76, 0x5f, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x01, 0x03, 0x03, 0x07};
write(tap_fd, packet, sizeof(packet));
}
int main() {
int tap_fd;
tap_fd = tunAlloc();
bringInterfaceUp();
emitPacket(tap_fd);
close(tap_fd);
return 0;
}

A problem with the original code is that it creates the interface, sends the packet, and tears down the interface with no delays, making it very difficult to inspect the interface configuration, perform packet captures, or otherwise figure out what’s going on.

In order to resolve those issues, I added some prompts before sending the packet and before tearing down the tun interface (and also some minimal error checking), giving us:


#include <arpa/inet.h>
#include <fcntl.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <netinet/in.h>
#include <poll.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define must(x) _must(#x, __FILE__, __LINE__, __func__, (x))
void _must(const char *call, const char *filename, int line,
const char *funcname, int err) {
char buf[1024];
snprintf(buf, 1023, "%s (@ %s:%d)", call, filename, line);
if (err < 0) {
perror(buf);
exit(1);
}
}
static int tunAlloc(void) {
int fd;
struct ifreq ifr = {.ifr_name = "tun0", .ifr_flags = IFF_TUN | IFF_NO_PI};
fd = open("/dev/net/tun", O_RDWR);
must(ioctl(fd, TUNSETIFF, (void *)&ifr));
must(ioctl(fd, TUNSETOWNER, geteuid()));
return fd;
}
static void bringInterfaceUp(void) {
int sock;
struct sockaddr_in addr = {.sin_family = AF_INET};
struct ifreq ifr = {.ifr_name = "tun0"};
inet_aton("172.30.0.1", &addr.sin_addr);
memcpy(&ifr.ifr_addr, &addr, sizeof(struct sockaddr));
sock = socket(AF_INET, SOCK_DGRAM, 0);
must(ioctl(sock, SIOCSIFADDR, &ifr));
must(ioctl(sock, SIOCGIFFLAGS, &ifr));
ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
must(ioctl(sock, SIOCSIFFLAGS, &ifr));
close(sock);
}
static void emitPacket(int tap_fd) {
unsigned char packet[] = {
0x45, 0x00, 0x00, 0x3c, 0xd8, 0x6f, 0x40, 0x00, 0x3f, 0x06, 0x08, 0x91,
172, 30, 0, 1, 192, 168, 255, 8, 0xa2, 0x9a, 0x27, 0x11,
0x80, 0x0b, 0x63, 0x79, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x02, 0xfa, 0xf0,
0x89, 0xd8, 0x00, 0x00, 0x02, 0x04, 0x05, 0xb4, 0x04, 0x02, 0x08, 0x0a,
0x5b, 0x76, 0x5f, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x01, 0x03, 0x03, 0x07};
write(tap_fd, packet, sizeof(packet));
}
void prompt(char *promptString) {
printf("%s\n", promptString);
getchar();
}
int main() {
int tap_fd;
tap_fd = tunAlloc();
bringInterfaceUp();
prompt("interface is up");
emitPacket(tap_fd);
prompt("sent packet");
close(tap_fd);
printf("all done");
return 0;
}

We start by compiling the code:

gcc -o sendpacket sendpacket.c

If we try running this as a regular user, it will simply fail (which confirms that at least some of our error handling is working correctly):

$ ./sendpacket
ioctl(fd, TUNSETIFF, (void *)&ifr) (@ sendpacket-pause.c:33): Operation not permitted

We need to run it as root:

$ sudo ./sendpacket
interface is up

The interface is up prompt means that the code has configured the interface but has not yet sent the packet. Let’s take a look at the interface configuration:

$ ip addr show tun0
3390: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 500
 link/none
 inet 172.30.0.1/32 scope global tun0
 valid_lft forever preferred_lft forever
 inet6 fe80::c7ca:fe15:5d5c:2c49/64 scope link stable-privacy
 valid_lft forever preferred_lft forever

The code will emit a TCP SYN packet targeting address 192.168.255.8, port 10001. In another terminal, let’s watch for that on all interfaces. If we start tcpdump and press RETURN at the interface is up prompt, we’ll see something like:

# tcpdump -nn -i any port 10001
22:36:35.336643 tun0 In IP 172.30.0.1.41626 > 192.168.255.8.10001: Flags [S], seq 2148230009, win 64240, options [mss 1460,sackOK,TS val 1534484436 ecr 0,nop,wscale 7], length 0

And indeed, we see the problem that was described: the packet enters the system on tun0, but never goes anywhere else. What’s going on?

Introducing pwru (Packet, Where are you?)

pwru is a nifty utility written by the folks at Cilium that takes advantage of eBPF to attach traces to hundreds of kernel functions to trace packet processing through the Linux kernel. It’s especially useful when packets seem to be getting dropped with no obvious explanation. Let’s see what it can tell us!

A convenient way to run pwru is using their official Docker image. We’ll run it like this, filtering by protocol and destination port so that we only see results relating to the synthesized packet created by the sendpacket.c code:

docker run --privileged --rm -t --pid=host \
 -v /sys/kernel/debug/:/sys/kernel/debug/ \
 cilium/pwru --filter-proto tcp --filter-port 10001

If we run sendpacket while pwru is running, the output looks something like this:

2023/02/15 03:42:33 Per cpu buffer size: 4096 bytes
2023/02/15 03:42:33 Attaching kprobes (via kprobe-multi)...
1469 / 1469 [-----------------------------------------------------------------------------] 100.00% ? p/s
2023/02/15 03:42:33 Attached (ignored 0)
2023/02/15 03:42:33 Listening for events..
SKB CPU PROCESS FUNC
0xffff8ce13e987900 6 [sendpacket-orig] netif_receive_skb
0xffff8ce13e987900 6 [sendpacket-orig] skb_defer_rx_timestamp
0xffff8ce13e987900 6 [sendpacket-orig] __netif_receive_skb
0xffff8ce13e987900 6 [sendpacket-orig] __netif_receive_skb_one_core
0xffff8ce13e987900 6 [sendpacket-orig] ip_rcv
0xffff8ce13e987900 6 [sendpacket-orig] ip_rcv_core
0xffff8ce13e987900 6 [sendpacket-orig] kfree_skb_reason(SKB_DROP_REASON_IP_CSUM)
0xffff8ce13e987900 6 [sendpacket-orig] skb_release_head_state
0xffff8ce13e987900 6 [sendpacket-orig] sock_wfree
0xffff8ce13e987900 6 [sendpacket-orig] skb_release_data
0xffff8ce13e987900 6 [sendpacket-orig] skb_free_head
0xffff8ce13e987900 6 [sendpacket-orig] kfree_skbmem

And now we have a big blinking sign that tells us why the packet is being dropped:

0xffff8ce13e987900 6 [sendpacket-orig] kfree_skb_reason(SKB_DROP_REASON_IP_CSUM)

Fixing the checksum

It looks like the synthesized packet data includes a bad checksum. We could update the code to correctly calculate the checksum…or we could just use Wireshark and have it tell us the correct values. Because this isn’t meant to be an IP networking primer, we’ll just use Wireshark, which gives us the following updated code:

static void emitPacket(int tap_fd) {
 uint16_t cs;
 uint8_t packet[] = {
 0x45, 0x00, 0x00, 0x3c, 0xd8, 0x6f, 0x40, 0x00, 0x3f, 0x06, 0xf7, 0x7b,
 172, 30, 0, 1, 192, 168, 255, 8, 0xa2, 0x9a, 0x27, 0x11,
 0x80, 0x0b, 0x63, 0x79, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x02, 0xfa, 0xf0,
 0x78, 0xc3, 0x00, 0x00, 0x02, 0x04, 0x05, 0xb4, 0x04, 0x02, 0x08, 0x0a,
 0x5b, 0x76, 0x5f, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x01, 0x03, 0x03, 0x07,
 };

 write(tap_fd, packet, sizeof(packet));
}

If we repeat our invocation of pwru and run a test with the updated code, we see:

2023/02/15 04:17:29 Per cpu buffer size: 4096 bytes
2023/02/15 04:17:29 Attaching kprobes (via kprobe-multi)...
1469 / 1469 [-----------------------------------------------------------------------------] 100.00% ? p/s
2023/02/15 04:17:29 Attached (ignored 0)
2023/02/15 04:17:29 Listening for events..
SKB CPU PROCESS FUNC
0xffff8cd8a6c5ef00 9 [sendpacket-chec] netif_receive_skb
0xffff8cd8a6c5ef00 9 [sendpacket-chec] skb_defer_rx_timestamp
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __netif_receive_skb
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __netif_receive_skb_one_core
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_rcv
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_rcv_core
0xffff8cd8a6c5ef00 9 [sendpacket-chec] sock_wfree
0xffff8cd8a6c5ef00 9 [sendpacket-chec] nf_hook_slow
0xffff8cd8a6c5ef00 9 [sendpacket-chec] nf_checksum
0xffff8cd8a6c5ef00 9 [sendpacket-chec] nf_ip_checksum
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __skb_checksum_complete
0xffff8cd8a6c5ef00 9 [sendpacket-chec] tcp_v4_early_demux
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_route_input_noref
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_route_input_slow
0xffff8cd8a6c5ef00 9 [sendpacket-chec] fib_validate_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __fib_validate_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_handle_martian_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)
0xffff8cd8a6c5ef00 9 [sendpacket-chec] skb_release_head_state
0xffff8cd8a6c5ef00 9 [sendpacket-chec] skb_release_data
0xffff8cd8a6c5ef00 9 [sendpacket-chec] skb_free_head
0xffff8cd8a6c5ef00 9 [sendpacket-chec] kfree_skbmem

Dealing with martians

Looking at the above output, we’re no longer seeing the SKB_DROP_REASON_IP_CSUM error; instead, we’re getting dropped by the routing logic:

0xffff8cd8a6c5ef00 9 [sendpacket-chec] fib_validate_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __fib_validate_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_handle_martian_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)

Specifically, the packet is being dropped as a “martian source”, which means a packet that has a source address that is invalid for the interface on which it is being received. Unlike the previous error, we can actually get kernel log messages about this problem. If we had the log_martians sysctl enabled for all interfaces:

sysctl -w net.ipv4.conf.all.log_martians=1

Or if we enabled it specifically for tun0 after the interface is created:

sysctl -w net.ipv4.conf.tun0.log_martians=1

We would see the following message logged by the kernel:

Feb 14 12:14:03 madhatter kernel: IPv4: martian source 192.168.255.8 from 172.30.0.1, on dev tun0

We’re seeing this particular error because tun0 is configured with address 172.30.0.1, but it claims to be receiving a packet with the same source address from “somewhere else” on the network. This is a problem because we would never be able to reply to that packet (our replies would get routed to the local host). To deal with this problem, we can either change the source address of the packet, or we can change the IP address assigned to the tun0 interface. Since changing the source address would mean mucking about with checksums again, let’s change the address of tun0:

static void bringInterfaceUp(void) {
 int sock;
 struct sockaddr_in addr = {.sin_family = AF_INET};
 struct ifreq ifr = {.ifr_name = "tun0"};

 inet_aton("172.30.0.10", &addr.sin_addr);
 memcpy(&ifr.ifr_addr, &addr, sizeof(struct sockaddr));

 sock = socket(AF_INET, SOCK_DGRAM, 0);
 must(ioctl(sock, SIOCSIFADDR, &ifr));
 must(ioctl(sock, SIOCGIFFLAGS, &ifr));
 ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
 must(ioctl(sock, SIOCSIFFLAGS, &ifr));
 close(sock);
}

With this change, tun0 now looks like:

3452: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 500
link/none
inet 172.30.0.10/32 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::bda3:ddc8:e60e:106b/64 scope link stable-privacy
valid_lft forever preferred_lft forever

And if we repeat our earlier test in which we use tcpdump to watch for our synthesized packet on any interface, we now see the desired behavior:

# tcpdump -nn -i any port 10001
23:37:55.897786 tun0 In IP 172.30.0.1.41626 > 192.168.255.8.10001: Flags [S], seq 2148230009, win 64240, options [mss 1460,sackOK,TS val 1534484436 ecr 0,nop,wscale 7], length 0
23:37:55.897816 eth0 Out IP 172.30.0.1.41626 > 192.168.255.8.10001: Flags [S], seq 2148230009, win 64240, options [mss 1460,sackOK,TS val 1534484436 ecr 0,nop,wscale 7], length 0

The packet is correctly handled by the kernel and sent out to our default gateway.

Finishing up

The final version of the code looks like this:


#include <arpa/inet.h>
#include <fcntl.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <netinet/in.h>
#include <poll.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define must(x) _must(#x, __FILE__, __LINE__, __func__, (x))
void _must(const char *call, const char *filename, int line,
const char *funcname, int err) {
char buf[1024];
snprintf(buf, 1023, "%s (@ %s:%d)", call, filename, line);
if (err < 0) {
perror(buf);
exit(1);
}
}
static int tunAlloc(void) {
int fd;
struct ifreq ifr = {.ifr_name = "tun0", .ifr_flags = IFF_TUN | IFF_NO_PI};
fd = open("/dev/net/tun", O_RDWR);
must(ioctl(fd, TUNSETIFF, (void *)&ifr));
must(ioctl(fd, TUNSETOWNER, geteuid()));
return fd;
}
static void bringInterfaceUp(void) {
int sock;
struct sockaddr_in addr = {.sin_family = AF_INET};
struct ifreq ifr = {.ifr_name = "tun0"};
inet_aton("172.30.0.10", &addr.sin_addr);
memcpy(&ifr.ifr_addr, &addr, sizeof(struct sockaddr));
sock = socket(AF_INET, SOCK_DGRAM, 0);
must(ioctl(sock, SIOCSIFADDR, &ifr));
must(ioctl(sock, SIOCGIFFLAGS, &ifr));
ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
must(ioctl(sock, SIOCSIFFLAGS, &ifr));
close(sock);
}
static void emitPacket(int tap_fd) {
uint16_t cs;
uint8_t packet[] = {
0x45, 0x00, 0x00, 0x3c, 0xd8, 0x6f, 0x40, 0x00, 0x3f, 0x06, 0xf7, 0x7b,
172, 30, 0, 1, 192, 168, 255, 8, 0xa2, 0x9a, 0x27, 0x11,
0x80, 0x0b, 0x63, 0x79, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x02, 0xfa, 0xf0,
0x78, 0xc3, 0x00, 0x00, 0x02, 0x04, 0x05, 0xb4, 0x04, 0x02, 0x08, 0x0a,
0x5b, 0x76, 0x5f, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x01, 0x03, 0x03, 0x07,
};
write(tap_fd, packet, sizeof(packet));
}
void prompt(char *promptString) {
printf("%s\n", promptString);
getchar();
}
int main() {
int tap_fd;
tap_fd = tunAlloc();
bringInterfaceUp();
prompt("interface is up");
emitPacket(tap_fd);
prompt("sent packet");
close(tap_fd);
printf("all done");
return 0;
}

February 14, 2023 12:00 AM

November 16, 2022

Lars Kellogg-Stedman

Setting up an IPv6 VLAN

My internet service provider (FIOS) doesn’t yet (sad face) offer IPv6 capable service, so I’ve set up an IPv6 tunnel using the Hurricane Electric tunnel broker. I want to provide IPv6 connectivity to multiple systems in my house, but not to all systems in my house 1. In order to meet those requirements, I’m going to set up the tunnel on the router, and then expose connectivity over an IPv6-only VLAN. In this post, we’ll walk through the steps necessary to set that up.

Parts of this post are going to be device specific: for example, I’m using a Ubiquiti EdgeRouter X as my Internet router, so the tunnel setup is going to be specific to that device. The section about setting things up on my Linux desktop will be more generally applicable.

There are three major parts to this post:

  • Configure the EdgeRouter

    This shows how to set up an IPv6 tunnel and configure an IPv6-only VLAN on the EdgeRouter.

  • Configure the switch

    This is only necessary due to the specifics of the connection between my desktop and the router; you can probably skip this.

  • Configure the desktop

    This shows how to set up the IPv6 VLAN interface under Linux using nmcli.

What we know

When you set up an IPv6 tunnel with hurricane electric, you receive several bits of information. We care in particular about the following (the IPv6 addresses and client IPv4 addresses here have been munged for privacy reasons):

IPv6 Tunnel Endpoints

Server IPv4 Address 209.51.161.14
Server IPv6 Address 2001:470:1236:1212::1/64
Client IPv4 Address 1.2.3.4
Client IPv6 Address 2001:470:1236:1212::2/64

Routed IPv6 Prefixes

Routed /64 2001:470:1237:1212::/64

We’ll refer back to this information as we configured things later on.

Configure the EdgeRouter

Create the tunnel interface

The first step in the process is to create a tunnel interface – that is, an interface that looks like an ordinary network interface, but is in fact encapsulating traffic and sending it to the tunnel broker where it will unpacked and sent on its way.

I’ll be creating a SIT tunnel, which is designed to “interconnect isolated IPv6 networks” over an IPv4 connection.

I start by setting the tunnel encapsulation type and assigning an IPv6 address to the tunnel interface. This is the “Client IPv6 Address” from the earlier table:

set interfaces tunnel tun0 encapsulation sit
set interfaces tunnel tun0 address 2001:470:1236:1212::2/64

Next I need to define the local and remote IPv4 endpoints of the tunnel. The remote endpoint is the “Server IPv4” address. The value 0.0.0.0 for the local-ip option means “whichever source address is appropriate for connecting to the given remote address”:

set interfaces tunnel tun0 remote-ip 209.51.161.14
set interfaces tunnel tun0 local-ip 0.0.0.0

Finally, I associate some firewall rulesets with the interface. This is import because, unlike IPv4, as you assign IPv6 addresses to internal devices they will be directly connected to the internet. With no firewall rules in place you would find yourself inadvertently exposing services that previously were “behind” your home router.

set interfaces tunnel tun0 firewall in ipv6-name WANv6_IN
set interfaces tunnel tun0 firewall local ipv6-name WANv6_LOCAL

I’m using the existing WANv6_IN and WANv6_LOCAL rulesets, which by default block all inbound traffic. These correspond to the following ip6tables chains:

root@ubnt:~# ip6tables -S WANv6_IN
-N WANv6_IN
-A WANv6_IN -m comment --comment WANv6_IN-10 -m state --state RELATED,ESTABLISHED -j RETURN
-A WANv6_IN -m comment --comment WANv6_IN-20 -m state --state INVALID -j DROP
-A WANv6_IN -m comment --comment "WANv6_IN-10000 default-action drop" -j LOG --log-prefix "[WANv6_IN-default-D]"
-A WANv6_IN -m comment --comment "WANv6_IN-10000 default-action drop" -j DROP
root@ubnt:~# ip6tables -S WANv6_LOCAL
-N WANv6_LOCAL
-A WANv6_LOCAL -m comment --comment WANv6_LOCAL-10 -m state --state RELATED,ESTABLISHED -j RETURN
-A WANv6_LOCAL -m comment --comment WANv6_LOCAL-20 -m state --state INVALID -j DROP
-A WANv6_LOCAL -p ipv6-icmp -m comment --comment WANv6_LOCAL-30 -j RETURN
-A WANv6_LOCAL -p udp -m comment --comment WANv6_LOCAL-40 -m udp --sport 547 --dport 546 -j RETURN
-A WANv6_LOCAL -m comment --comment "WANv6_LOCAL-10000 default-action drop" -j LOG --log-prefix "[WANv6_LOCAL-default-D]"
-A WANv6_LOCAL -m comment --comment "WANv6_LOCAL-10000 default-action drop" -j DROP

As you can see, both rulesets block all inbound traffic by default unless it is related to an existing outbound connection.

Create a vlan interface

I need to create a network interface on the router that will be the default gateway for my local IPv6-only network. From the tunnel broker, I received the CIDR 2001:470:1237:1212::/64 for local use, so:

  • I’ve decided to split this up into smaller networks (because a /64 has over 18 quintillion available addresses). I’m using /110 networks in this example, which means I will only ever be able to have 262,146 devices on each network (note that the decision to use a smaller subnet impacts your choices for address autoconfiguration; see RFC 7421 for the relevant discussion).
  • I’m using the first /110 network for this VLAN, which comprises addresses 2001:470:1237:1212::1 through 2001:470:1237:1212::3:ffff. I’ll use the first address as the router address.

  • I’ve arbitrarily decided to use VLAN id 10 for this purpose.

To create an interface for VLAN id 10 with address 2001:470:1237:1212::1/110, we use the set interfaces ... vif command:

set interfaces switch switch0 vif 10 address 2001:470:1237:1212::1/110

Configure the default IPv6 route

We don’t receive router advertisements over the IPv6 tunnel, which means we need to explicitly configure the IPv6 default route. The default gateway will be the “Server IPv6 Address” we received from the tunnel broker.

set protocol static route6 ::/0 next-hop 2001:470:1236:1212::1

Enable router advertisements

IPv6 systems on our local network will use the neighbor discovery protocol to discover the default gateway for the network. Support for this service is provided by RADVD, and we configure it using the set interfaces ... ipv6 router-advert command:

set interfaces switch switch0 vif 10 ipv6 router-advert send-advert true
set interfaces switch switch0 vif 10 ipv6 router-advert managed-flag true
set interfaces switch switch0 vif 10 ipv6 router-advert prefix ::/110

The managed-flag setting corresponds to the RADVD AdvManagedFlag configuration setting, which instructs clients to use DHCPv6 for address autoconfiguration.

Configure the DHCPv6 service

While in theory it is possible for clients to assign IPv6 addresses without the use of a DHCP server using stateless address autoconfiguration, this requires that we’re using a /64 subnet (see e.g. RFC 7421). There is no such limitation when using DHCPv6.

set service dhcpv6-server shared-network-name VLAN10 subnet 2001:470:1237:1212::/110 address-range start 2001:470:1237:1212::10 stop 2001:470:1237:1212::3:ffff
set service dhcpv6-server shared-network-name VLAN10 subnet 2001:470:1237:1212::/110 name-server 2001:470:1237:1212::1
set service dhcpv6-server shared-network-name VLAN10 subnet 2001:470:1237:1212::/110 domain-search house
set service dhcpv6-server shared-network-name VLAN10 subnet 2001:470:1237:1212::/110 lease-time default 86400

Here I’m largely setting things up to mirror the configuration of the IPv4 dhcp server for the name-server, domain-search, and lease-time settings. I’m letting the DHCPv6 server allocate pretty much the entire network range, with the exception of the first 10 addresses.

Commit the changes

After making the above changes they need to be activated:

commit

Verify the configuration

This produces the following interface configuration for tun0:

13: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/sit 0.0.0.0 peer 209.51.161.14
inet6 2001:470:1236:1212::2/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::c0a8:101/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::6c07:49c7/64 scope link
valid_lft forever preferred_lft forever

And for switch0.10:

ubnt@ubnt:~$ ip addr show switch0.10
14: switch0.10@switch0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 78:8a:20:bb:05:db brd ff:ff:ff:ff:ff:ff
inet6 2001:470:1237:1212::1/110 scope global
valid_lft forever preferred_lft forever
inet6 fe80::7a8a:20ff:febb:5db/64 scope link
valid_lft forever preferred_lft forever

And the following route configuration:

ubnt@ubnt:~$ ip -6 route | grep -v fe80
2001:470:1236:1212::/64 dev tun0 proto kernel metric 256 pref medium
2001:470:1237:1212::/110 dev switch0.10 proto kernel metric 256 pref medium
default via 2001:470:1236:1212::1 dev tun0 proto zebra metric 1024 pref medium

We can confirm things are properly configured by accessing a remote service that reports our ip address:

ubnt@ubnt:~$ curl https://api64.ipify.org
2001:470:1236:1212::2

Configure the switch

In my home network, devices in my office connect to a switch, and the switch connects back to the router. I need to configure the switch (an older Netgear M4100-D12G) to pass the VLAN on to the desktop.

Add vlan 10 to the vlan database with name ipv6net0

I start by defining the VLAN in the VLAN database:

vlan database
vlan 10
vlan name 10 ipv6net0
exit

Configure vlan 10 as a tagged member of ports 1-10

Next, I configure the switch to pass VLAN 10 as a tagged VLAN on all switch interfaces:

configure
interface 0/1-0/10
vlan participation include 10
vlan tagging 10
exit
exit

Configure the desktop

With the above configuration in place, traffic on VLAN 10 will arrive on my Linux desktop (which is connected to the switch we configured in the previous step). I can use nmcli, the NetworkManager CLI, to add a VLAN interface (I’m using Fedora 37, which uses NetworkManager to manage network interface configuration; other distributions may have different tooling).

The following command will create a connection named vlan10. Bringing up the connection will create an interface named vlan10, configured to receive traffic on VLAN 10 arriving on eth0:

nmcli con add type vlan con-name vlan10 ifname vlan10 dev eth0 id 10 ipv6.method auto
nmcli con up vlan10

This produces the following interface configuration:

$ ip addr show vlan10
7972: vlan10@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 2c:f0:5d:c9:12:a9 brd ff:ff:ff:ff:ff:ff
inet6 2001:470:1237:1212::2:c19a/128 scope global dynamic noprefixroute
valid_lft 85860sec preferred_lft 53460sec
inet6 fe80::ced8:1750:d67c:2ead/64 scope link noprefixroute
valid_lft forever preferred_lft forever

And the following route configuration:

$ ip -6 route show | grep vlan10
2001:470:1237:1212::2:c19a dev vlan10 proto kernel metric 404 pref medium
2001:470:1237:1212::/110 dev vlan10 proto ra metric 404 pref medium
fe80::/64 dev vlan10 proto kernel metric 1024 pref medium
default via fe80::7a8a:20ff:febb:5db dev vlan10 proto ra metric 404 pref medium

We can confirm things are properly configured by accessing a remote service that reports our ip address:

$ curl https://api64.ipify.org
2001:470:1237:1212::2:c19a

Note that unlike access using IPv4, the address visible here is the address assigned to our local interface. There is no NAT happening at the router.


Cover image by Chris Woodford/explainthatstuff.com, licensed under CC BY-NC-SA 3.0.


  1. Some services (Netflix is a notable example) block access over the IPv6 tunnels because it breaks their geolocation process and prevents them from determining your country of origin. I don’t want to break things for other folks in my house just because I want to play with IPv6. ↩︎

November 16, 2022 12:00 AM

November 14, 2022

RDO Blog

RDO Zed Released

RDO Zed Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Zed for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Zed is the 26th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world. As with the Upstream release, this release of RDO is dedicated to Ilya Etingof who was an upstream and RDO contributor.

The release is already available for CentOS Stream 9 on the CentOS mirror network in:
http://mirror.stream.centos.org/SIGs/9-stream/cloud/x86_64/openstack-zed/

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/zed/highlights.html

TripleO in the RDO Zed release:

Since the Xena development cycle, TripleO follows the Independent release model (https://specs.openstack.org/openstack/tripleo-specs/specs/xena/tripleo-independent-release.html).

For the Zed cycle, TripleO project will maintain and validate stable Zed branches. As for the rest of packages, RDO will update and publish the releases created during the maintenance cycle.

Contributors During the Zed cycle, we saw the following new RDO contributors:

  • Miguel Garcia Cruces
  • Michael Johnson
  • René Ribaud
  • Paras Babbar
  • Maurício Harley
  • Jesse Pretorius
  • Francesco Pantano
  • Carlos Eduardo
  • Arun KV

Welcome to all of you and Thank You So Much for participating!

But we wouldn’t want to overlook anyone. A super massive Thank You to all 57 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:

  • Adriano Vieira Petrich
  • Alan Bishop
  • Alan Pevec
  • Alfredo Moralejo Alonso
  • Amol Kahat
  • Amy Marrich
  • Ananya Banerjee
  • Arun KV
  • Arx Cruz
  • Bhagyashri Shewale
  • Carlos Eduardo
  • Chandan Kumar
  • Cédric Jeanneret
  • Daniel Pawlik
  • Dariusz Smigiel
  • Douglas Viroel
  • Emma Foley
  • Eric Harney
  • Fabien Boucher
  • Francesco Pantano
  • Gregory Thiemonge
  • Jakob Meng
  • Jesse Pretorius
  • Jiří Podivín
  • Joel Capitao
  • Jon Schlueter
  • Julia Kreger
  • Karolina Kula
  • Leif Madsen
  • Lon Hohberger
  • Luigi Toscano
  • Marios Andreou
  • Martin Kopec
  • Mathieu Bultel
  • Matthias Runge
  • Maurício Harley
  • Michael Johnson
  • Miguel Garcia Cruces
  • Nate Johnston
  • Nicolas Hicher
  • Paras Babbar
  • Pooja Jadhav
  • Rabi Mishra
  • Rafael Castillo
  • René Ribaud/780
  • Riccardo Pittau
  • Ronelle Landy
  • Sagi Shnaidman
  • Sandeep Yadav
  • Sean Mooney
  • Shreshtha Joshi
  • Slawomir Kaplonski
  • Steve Baker
  • Takashi Kajinami
  • Tobias Urdin
  • Tristan De Cacqueray
  • Yatin Karel

The Next Release Cycle

At the end of one release, focus shifts immediately to the next release i.e Antelope.

Get Started

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help

The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on OFTC IRC is also an excellent place to find and give help. We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel in Libera.Chat network, and #tripleo on OFTC), however we have a more focused audience within the RDO venues.

Get Involved

To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo and #tripleo on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

by Amy Marrich at November 14, 2022 01:12 PM

November 13, 2022

Lars Kellogg-Stedman

Using KeyOxide

In today’s post, we look at KeyOxide, a service that allows you to cryptographically assert ownership of online resources using your GPG key. Some aspects of the service are less than obvious; in response to some questions I saw on Mastodon I though I would put together a short guide to making use of the service.

We’re going to look at the following high-level tasks:

  1. Create a GPG key

  2. Publish the GPG key

  3. Use the GPG key to assert claims on online resources

Step 1: Create a GPG keypair

If you already have a keypair, skip on to “Step 2: Publish your key”.

The first thing you need to do is set up a GPG1 keypair and publish it to a keyserver (or a WKD endpoint). There are many guides out there that step you through the process (for example, GitHub’s guide on Generating a new GPG key), but if you’re in a hurry and not particularly picky, read on.

This assumes that you’re using a recent version of GPG; at the time of this writing, the current GPG release is 2.3.8, but these instructions seem to work at least with version 2.2.27.

  1. Generate a new keypair using the --quick-gen-key option:

    gpg --batch --quick-gen-key <your email address>
    

    This will use the GPG defaults for the key algorithm (varies by version) and expiration time (the key never expires2).

  2. When prompted, enter a secure passphrase.

  3. GPG will create a keypair for you; you can view it after the fact by running:

    gpg -qk <your email address>
    

    You should see something like:

    pub ed25519 2022-11-13 [SC] [expires: 2024-11-12]
    EC03DFAC71DB3205EC19BAB1404E03D044EE706B
    uid [ultimate] testuser@example.com
    sub cv25519 2022-11-13 [E]
    

    In the above output, F79CE5D41D93C2C0E97F9A63C4178440F81E4261 is the key fingerprint. We’re going to need this later.

Now you have created a GPG keypair!

Step 2: Publish your key

If you’ve already published your key at https://keys.openpgp.org/ or at a WKD endpoint, skip on to “Step 3: Add a claim”.

In order for KeyOxide to find your GPG key, it needs to be published at a known location. There are two choices:

In this post, we’re only going to consider the first option.

  1. Export your public key to a file using gpg’s --export option:

    gpg --export -a <your email address> > mykey.asc
    

    This will create a file mykey.asc in your current directory that looks like:

    -----BEGIN PGP PUBLIC KEY BLOCK-----
    [...a bunch of base64 encoded text...]
    -----END PGP PUBLIC KEY BLOCK-----
    
  2. Go to https://keys.openpgp.org/upload.

  3. Select the key export you just created, and select “upload”.

  4. When prompted on the next page, select “Send Verification Email”. Your key won’t discoverable until you have received and responded to the verification email.

  5. When you receive the email, select the verification link.

Now your key has been published! You can verify this by going to https://keys.openpgp.org/ and searching for your email address.

Step 3: Add a claim

You assert ownership of an online resource through a three step process:

  1. Mark the online resource with your GPG key fingerprint. How you do this depends on the type of resource you’re claiming; e.g., for GitHub you create a gist with specific content, while for claiming a DNS domain you create a TXT record.

  2. Add a notation to your GPG key with a reference to the claim created in the previous step.

  3. Update your published key.

In this post we’re going to look at two specific examples; for other services, see the “Service providers” section of the KeyOxide documentation.

In order to follow any of the following instructions, you’re going to need to know your key fingerprint. When you show your public key by running gpg -k, you key fingerprint is the long hexadecimal string on the line following the line that starts with pub :

$ gpg -qk testuser@example.com
pub ed25519 2022-11-13 [SC] [expires: 2024-11-12]
EC03DFAC71DB3205EC19BAB1404E03D044EE706B <--- THIS LINE HERE
uid [ultimate] testuser@example.com
sub cv25519 2022-11-13 [E]

Add a claim to your GPG key

This is a set of common instructions that we’ll use every time we need to add a claim to our GPG key.

  1. Edit your GPG key using the --edit-key option:

    gpg --edit-key <your email address>
    

    This will drop you into the GPG interactive key editor.

  2. Select a user id on which to operate using the uid command. If you created your key following the instructions earlier in this post, then you only have a single user id:

    gpg> uid 1
    
  3. Add an annotation to the key using the notation command:

    gpg> notation
    
  4. When prompted, enter the notation (the format of the notation depends on the service you’re claiming; see below for details). For example, if we’re asserting a Mastodon identity at hachyderm.io, we would enter:

    Enter the notation: proof@ariadne.id=https://hachyderm.io/@testuser
    
  5. Save your changes with the save command:

    gpg> save
    

Update your published key

After adding an annotation to your key locally, you need to publish those changes. One way of doing this is simply following the instructions for initially uploading your public key:

  1. Export the key to a file:

    gpg --export -a <your email address> > mykey.asc
    
  2. Upload your key to https://keys.openpgp.org/upload.

You won’t have to re-verify your key.

Alternately, you can configure gpg so that you can publish your key from the command line. Create or edit $HOME/.gnupg/gpg.conf and add the following line:

keyserver hkps://keys.openpgp.org

Now every time you need to update the published version of your key:

  1. Upload your public key using the --send-keys option along with your key fingerprint, e.g:

    gpg --send-keys EC03DFAC71DB3205EC19BAB1404E03D044EE706B
    

Claiming a Mastodon identity

  1. On your favorite Mastodon server, go to your profile and select “Edit profile”.

  2. Look for the “Profile metadata section”; this allows you to associate four bits of metadata with your Mastodon profile. Assuming that you still have a slot free, give it a name (it could be anything, I went with “Keyoxide claim”), and for the value enter:

    openpgp4fpr:<your key fingerprint>
    

    E.g., given the gpg -k output shown above, I would enter:

    openpgp4fpr:EC03DFAC71DB3205EC19BAB1404E03D044EE706B
    
  3. Click “Save Changes”

Now, add the claim to your GPG key by adding the notation proof@ariadne.id=https://<your mastodon server>/@<your mastodon username. I am @larsks@hachyderm.io, so I would enter:

proof@ariadne.id=https://hachyderm.io/@larsks

After adding the claim, update your published key.

Claiming a Github identity

  1. Create a new gist (it can be either secret or public).

  2. In your gist, name the filename openpgp.md.

  3. Set the content of that file to:

    openpgp4fpr:<your key fingerprint>
    

Now, add the claim to your GPG key by adding the notation proof@ariadne.id=https://gist.github.com/larsks/<gist id>. You can see my claim at https://gist.github.com/larsks/9224f58cf82bdf95ef591a6703eb91c7; the notation I added to my key is:

proof@ariadne.id=https://gist.github.com/larsks/9224f58cf82bdf95ef591a6703eb91c7

After adding the claim, update your published key.

Step 4: View your claims

You’ll note that none of the previous steps required interacting with KeyOxide. That’s because KeyOxide doesn’t actually store any of your data: it just provides a mechanism for visualizing and verifying claims.

You can look up an identity by email address or by GPG key fingerprint.

To look up an identity using an email address:

  1. Go to https://keyoxide.org/<email address. For example, to find my identity, visit https://keyoxide.org/lars@oddbit.com.

To look up an identity by key fingerprint:

  1. Go to https://keyoxide.org/<fingerprint>. For example, to find my identity, visit https://keyoxide.org/3e70a502bb5255b6bb8e86be362d63a80853d4cf.

  1. The pedantic among you will already be writing to me about how PGP is the standard and GPG is an implementation of that standard, but I’m going to stick with this nomenclature for the sake of simplicity. ↩︎

  2. For some thoughts on key expiration, see this question on the Information Security StackExchange. ↩︎

November 13, 2022 12:00 AM

September 22, 2022

Lars Kellogg-Stedman

Delete GitHub workflow runs using the gh cli

Hello, future me. This is for you next time you want to do this.

When setting up the CI for a project I will sometimes end up with a tremendous clutter of workflow runs. Sometimes they have embarrassing mistakes. Who wants to show that to people? I was trying to figure out how to bulk delete workflow runs from the CLI, and I came up with something that works:

gh run list --json databaseId -q '.[].databaseId' |
xargs -IID gh api \
"repos/$(gh repo view --json nameWithOwner -q .nameWithOwner)/actions/runs/ID" \
-X DELETE

This will delete all (well, up to 20, or whatever you set in --limit) your workflow runs. You can add flags to gh run list to filter runs by workflow or by triggering user.

September 22, 2022 12:00 AM

September 10, 2022

Lars Kellogg-Stedman

Kubernetes, connection timeouts, and the importance of labels

We are working with an application that produces resource utilization reports for clients of our OpenShift-based cloud environments. The developers working with the application have been reporting mysterious issues concerning connection timeouts between the application and the database (a MariaDB instance). For a long time we had only high-level verbal descriptions of the problem (“I’m seeing a lot of connection timeouts!”) and a variety of unsubstantiated theories (from multiple sources) about the cause. Absent a solid reproducer of the behavior in question, we looked at other aspects of our infrastructure:

  • Networking seemed fine (we weren’t able to find any evidence of interface errors, packet loss, or bandwidth issues)
  • Storage in most of our cloud environments is provided by remote Ceph clusters. In addition to not seeing any evidence of network problems in general, we weren’t able to demonstrate specific problems with our storage, either (we did spot some performance variation between our Ceph clusters that may be worth investigating in the future, but it wasn’t the sort that would cause the problems we’re seeing)
  • My own attempts to reproduce the behavior using mysqlslap did not demonstrate any problems, even though we were driving a far larger number of connections and queries/second in the benchmarks than we were in the application.

What was going on?

I was finally able to get my hands on container images, deployment manifests, and instructions to reproduce the problem this past Friday. After working through some initial errors that weren’t the errors we were looking for (insert Jedi hand gesture here), I was able to see the behavior in practice. In a section of code that makes a number of connections to the database, we were seeing:

Failed to create databases:
Command returned non-zero value '1': ERROR 2003 (HY000): Can't connect to MySQL server on 'mariadb' (110)
#0 /usr/share/xdmod/classes/CCR/DB/MySQLHelper.php(521): CCR\DB\MySQLHelper::staticExecuteCommand(Array)
#1 /usr/share/xdmod/classes/CCR/DB/MySQLHelper.php(332): CCR\DB\MySQLHelper::staticExecuteStatement('mariadb', '3306', 'root', 'pass', NULL, 'SELECT SCHEMA_N...')
#2 /usr/share/xdmod/classes/OpenXdmod/Shared/DatabaseHelper.php(65): CCR\DB\MySQLHelper::databaseExists('mariadb', '3306', 'root', 'pass', 'mod_logger')
#3 /usr/share/xdmod/classes/OpenXdmod/Setup/DatabaseSetupItem.php(39): OpenXdmod\Shared\DatabaseHelper::createDatabases('root', 'pass', Array, Array, Object(OpenXdmod\Setup\Console))
#4 /usr/share/xdmod/classes/OpenXdmod/Setup/DatabaseSetup.php(109): OpenXdmod\Setup\DatabaseSetupItem->createDatabases('root', 'pass', Array, Array)
#5 /usr/share/xdmod/classes/OpenXdmod/Setup/Menu.php(69): OpenXdmod\Setup\DatabaseSetup->handle()
#6 /usr/bin/xdmod-setup(37): OpenXdmod\Setup\Menu->display()
#7 /usr/bin/xdmod-setup(22): main()
#8 {main}

Where 110 is ETIMEDOUT, “Connection timed out”.

The application consists of two Deployment resources, one that manages a MariaDB pod and another that manages the application itself. There are also the usual suspects, such as PersistentVolumeClaims for the database backing store, etc, and a Service to allow the application to access the database.

While looking at this problem, I attempted to look at the logs for the application by running:

kubectl logs deploy/moc-xdmod

But to my surprise, I found myself looking at the logs for the MariaDB container instead…which provided me just about all the information I needed about the problem.

How do Deployments work?

To understand what’s going on, let’s first take a closer look at a Deployment manifest. The basic framework is something like this:

apiVersion: apps/v1
kind: Deployment
metadata:
name: example
spec:
selector:
matchLabels:
app: example
strategy:
type: Recreate
template:
metadata:
labels:
app: example
spec:
containers:
- name: example
image: docker.io/alpine:latest
command:
- sleep
- inf

There are labels in three places in this manifest:

  1. The Deployment itself has labels in the metadata section.

  2. There are labels in spec.template.metadata that will be applied to Pods spawned by the Deployment.

  3. There are labels in spec.selector which, in the words of [the documentation]:

    defines how the Deployment finds which Pods to manage

It’s not spelled out explicitly anywhere, but the spec.selector field is also used to identify to which pods to attach when using the Deployment name in a command like kubectl logs: that is, given the above manifest, running kubectl logs deploy/example would look for pods that have label app set to example.

With this in mind, let’s take a look at how our application manifests are being deployed. Like most of our applications, this is deployed using Kustomize. The kustomization.yaml file for the application manifests looked like this:

commonLabels:
app: xdmod
resources:
- svc-mariadb.yaml
- deployment-mariadb.yaml
- deployment-xdmod.yaml

That commonLabels statement will apply the label app: xdmod to all of the resources managed by the kustomization.yaml file. The Deployments looked like this:

For MariaDB:

apiVersion: apps/v1
kind: Deployment
metadata:
name: mariadb
spec:
selector:
matchLabels:
app: mariadb
template:
metadata:
labels:
app: mariadb

For the application experience connection problems:

apiVersion: apps/v1
kind: Deployment
metadata:
name: moc-xdmod
spec:
selector:
matchLabels:
app: xdmod
template:
metadata:
labels:
app: xdmod

The problem here is that when these are processed by kustomize, the app label hardcoded in the manifests will be replaced by the app label defined in the commonLabels section of kustomization.yaml. When we run kustomize build on these manifests, we will have as output:

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: xdmod
name: mariadb
spec:
selector:
matchLabels:
app: xdmod
template:
metadata:
labels:
app: xdmod
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: xdmod
name: moc-xdmod
spec:
selector:
matchLabels:
app: xdmod
template:
metadata:
labels:
app: xdmod

In other words, all of our pods will have the same labels (because the spec.template.metadata.labels section is identical in both Deployments). When I run kubectl logs deploy/moc-xdmod, I’m just getting whatever the first match is for a query that is effectively the same as kubectl get pod -l app=xdmod.

So, that’s what was going on with the kubectl logs command.

How do services work?

A Service manifest in Kubernetes looks something like this:

apiVersion: v1
kind: Service
metadata:
name: mariadb
spec:
selector:
app: mariadb
ports:
- protocol: TCP
port: 3306
targetPort: 3306

Here, spec.selector has a function very similar to what it had in a Deployment: it selects pods to which the Service will direct traffic. From the documentation, we know that a Service proxy will select a backend either in a round-robin fashion (using the legacy user-space proxy) or in a random fashion (using the iptables proxy) (there is also an IPVS proxy mode, but that’s not available in our environment).

Given what we know from the previous section about Deployments, you can probably see what’s going on here:

  1. There are multiple pods with identical labels that are providing distinct services
  2. For each incoming connection, the service proxy selects a Pod based on the labels in the service’s spec.selector.
  3. With only two pods involved, there’s a 50% chance that traffic targeting our MariaDB instance will in fact be directed to the application pod, which will simply drop the traffic (because it’s not listening on the appropriate port).

We can see the impact of this behavior by running a simple loop that attempts to connect to MariaDB and run a query:

while :; do
_start=$SECONDS
echo -n "$(date +%T) "
timeout 10 mysql -h mariadb -uroot -ppass -e 'select 1' > /dev/null && echo -n OKAY || echo -n FAILED
echo " $(( SECONDS - _start))"
sleep 1
done

Which outputs:

01:41:30 OKAY 1
01:41:32 OKAY 0
01:41:33 OKAY 1
01:41:35 OKAY 0
01:41:36 OKAY 3
01:41:40 OKAY 1
01:41:42 OKAY 0
01:41:43 OKAY 3
01:41:47 OKAY 3
01:41:51 OKAY 4
01:41:56 OKAY 1
01:41:58 OKAY 1
01:42:00 FAILED 10
01:42:10 OKAY 0
01:42:11 OKAY 0

Here we can see that connection time is highly variable, and we occasionally hit the 10 second timeout imposed by the timeout call.

Solving the problem

In order to resolve this behavior, we want to ensure (a) that Pods managed by a Deployment are uniquely identified by their labels and that (b) spec.selector for both Deployments and Services will only select the appropriate Pods. We can do this with a few simple changes.

It’s useful to apply some labels consistently across all of the resource we generate, so we’ll keep the existing commonLabels section of our kustomization.yaml:

commonLabels:
app: xdmod

But then in each Deployment we’ll add a component label identifying the specific service, like this:

apiVersion: apps/v1
kind: Deployment
metadata:
name: mariadb
labels:
component: mariadb
spec:
selector:
matchLabels:
component: mariadb
template:
metadata:
labels:
component: mariadb

When we generate the final manifest with kustomize, we end up with:

apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: xdmod
component: mariadb
name: mariadb
spec:
selector:
matchLabels:
app: xdmod
component: mariadb
template:
metadata:
labels:
app: xdmod
component: mariadb

In the above output, you can see that kustomize has combined the commonLabel definition with the labels configured individually in the manifests. With this change, spec.selector will now select only the pod in which MariaDB is running.

We’ll similarly modify the Service manifest to look like:

apiVersion: v1
kind: Service
metadata:
name: mariadb
spec:
selector:
component: mariadb
ports:
- protocol: TCP
port: 3306
targetPort: 3306

Resulting in a generated manifest that looks like:

apiVersion: v1
kind: Service
metadata:
labels:
app: xdmod
name: mariadb
spec:
ports:
- port: 3306
protocol: TCP
targetPort: 3306
selector:
app: xdmod
component: mariadb

Which, as with the Deployment, will now select only the correct pods.

With these changes in place, if we re-run the test loop I presented earlier, we see as output:

01:57:27 OKAY 0
01:57:28 OKAY 0
01:57:29 OKAY 0
01:57:30 OKAY 0
01:57:31 OKAY 0
01:57:32 OKAY 0
01:57:33 OKAY 0
01:57:34 OKAY 0
01:57:35 OKAY 0
01:57:36 OKAY 0
01:57:37 OKAY 0
01:57:38 OKAY 0
01:57:39 OKAY 0
01:57:40 OKAY 0

There is no variability in connection time, and there are no timeouts.

September 10, 2022 12:00 AM

June 20, 2022

Lars Kellogg-Stedman

Directing different ports to different containers with Traefik

This post is mostly for myself: I find the Traefik documentation hard to navigate, so having figured this out in response to a question on Stack Overflow, I’m putting it here to help it stick in my head.

The question asks essentially how to perform port-based routing of requests to containers, so that a request for http://example.com goes to one container while a request for http://example.com:9090 goes to a different container.

Creating entrypoints

A default Traefik configuration will already have a listener on port 80, but if we want to accept connections on port 9090 we need to create a new listener: what Traefik calls an entrypoint. We do this using the --entrypoints.<name>.address option. For example, --entrypoints.ep1.address=80 creates an entrypoint named ep1 on port 80, while --entrypoints.ep2.address=9090 creates an entrypoint named ep2 on port 9090. Those names are important because we’ll use them for mapping containers to the appropriate listener later on.

This gives us a Traefik configuration that looks something like:

 proxy:
image: traefik:latest
command:
- --api.insecure=true
- --providers.docker
- --entrypoints.ep1.address=:80
- --entrypoints.ep2.address=:9090
ports:
- "80:80"
- "127.0.0.1:8080:8080"
- "9090:9090"
volumes:
- /var/run/docker.sock:/var/run/docker.sock

We need to publish ports 80 and 9090 on the host in order to accept connections. Port 8080 is by default the Traefik dashboard; in this configuration I have it bound to localhost because I don’t want to provide external access to the dashboard.

Routing services

Now we need to configure our services so that connections on ports 80 and 9090 will get routed to the appropriate containers. We do this using the traefik.http.routers.<name>.entrypoints label. Here’s a simple example:

app1:
image: docker.io/alpinelinux/darkhttpd:latest
labels:
- traefik.http.routers.app1.entrypoints=ep1
- traefik.http.routers.app1.rule=Host(`example.com`)

In the above configuration, we’re using the following labels:

  • traefik.http.routers.app1.entrypoints=ep1

    This binds our app1 container to the ep1 entrypoint.

  • traefik.http.routers.app1.rule=Host(`example.com`)

    This matches requests with Host: example.com.

So in combination, these two rules say that any request on port 80 for Host: example.com will be routed to the app1 container.

To get port 9090 routed to a second container, we add:

app2:
image: docker.io/alpinelinux/darkhttpd:latest
labels:
- traefik.http.routers.app2.rule=Host(`example.com`)
- traefik.http.routers.app2.entrypoints=ep2

This is the same thing, except we use entrypoint ep2.

With everything running, we can watch the logs from docker-compose up and see that a request on port 80:

curl -H 'host: example.com' localhost

Is serviced by app1:

app1_1 | 172.20.0.2 - - [21/Jun/2022:02:44:11 +0000] "GET / HTTP/1.1" 200 354 "" "curl/7.76.1"

And that request on port 9090:

curl -H 'host: example.com' localhost:9090

Is serviced by app2:

app2_1 | 172.20.0.2 - - [21/Jun/2022:02:44:39 +0000] "GET / HTTP/1.1" 200 354 "" "curl/7.76.1"

The complete docker-compose.yaml file from this post looks like:

version: "3"
services:
proxy:
image: traefik:latest
command:
- --api.insecure=true
- --providers.docker
- --entrypoints.ep1.address=:80
- --entrypoints.ep2.address=:9090
ports:
- "80:80"
- "8080:8080"
- "9090:9090"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
app1:
image: docker.io/alpinelinux/darkhttpd:latest
labels:
- traefik.http.routers.app1.rule=Host(`example.com`)
- traefik.http.routers.app1.entrypoints=ep1
app2:
image: docker.io/alpinelinux/darkhttpd:latest
labels:
- traefik.http.routers.app2.rule=Host(`example.com`)
- traefik.http.routers.app2.entrypoints=ep2

June 20, 2022 12:00 AM

May 17, 2022

Adam Young

Errors running Keystone pep8

The command to run the formatting tests for the keystone project is:

tox -e pe8

Running this on Fedora35 failed for me with this error:

ERROR:   pep8: could not install deps [-chttps://releases.openstack.org/constraints/upper/master, -r/opt/stack/keystone/test-requirements.txt, .[ldap,memcache,mongodb]]; v = InvocationError("/opt/stack/keystone/.tox/pep8/bin/python -m pip install -chttps://releases.openstack.org/constraints/upper/master -r/opt/stack/keystone/test-requirements.txt '.[ldap,memcache,mongodb]'", 1)

What gets swallowed up is the actual error in the install, and it has to do with the fact that the python dependencies are compiled against native libraries. If I activate the venv and run the command by hand, I can see the first failure. But if I look up at the previous output, I can see it, just buried a few screens up:

    Error: pg_config executable not found.

A Later error was due to the compile step erroring out looking for lber.h:

 In file included from Modules/LDAPObject.c:3:
  Modules/common.h:15:10: fatal error: lber.h: No such file or directory
     15 | #include 
        |          ^~~~~~~~
  compilation terminated.
  error: command '/usr/bin/gcc' failed with exit code 1

To get the build to run, I need to install both libpq-devel and libldap-devel and now it fails like this:

File "/opt/stack/keystone/.tox/pep8/lib/python3.10/site-packages/pep257.py", line 24, in 
    from collections import defaultdict, namedtuple, Set
ImportError: cannot import name 'Set' from 'collections' (/usr/lib64/python3.10/collections/__init__.py)

This appears to be due to the version of python3 on my system (3.10) which is later than supported by upstream openstack. I do have python3.9 installed on my system, and can modify the tox.ini to use it by specifying the basepython version.

 
[testenv:pep8]
basepython = python3.9
 deps =

And then I can run tox -e pep8.

by Adam Young at May 17, 2022 10:41 PM

May 04, 2022

Adam Young

Keystone LDAP with Bifrost

I got keystone in my Bifrost install to talk via LDAP to our Freeipa server. Here’s what I had to do.

I started with a new install of bifrost, using Keystone and TLS.

./bifrost-cli install --enable-keystone --enable-tls  --network-interface enP4p4s0f0np0 --dhcp-pool 192.168.116.25-192.168.116.75

After making sure that Keystone could work for normal things;

source /opt/stack/bifrost/bin/activate
export OS_CLOUD=bifrost-admin
 openstack user list -f yaml
- ID: 1751a5bb8b4a4f0188069f8cb4f8e333
  Name: admin
- ID: 5942330b4f2c4822a9f2cdf45ad755ed
  Name: ironic
- ID: 43e30ad5bf0349b7b351ca2e86fd1628
  Name: ironic_inspector
- ID: 0c490e9d44204cc18ec1e507f2a07f83
  Name: bifrost_user

I had to install python3-ldap and python3-ldappool .

sudo apt install python3-ldap python3-ldappool

Now create a domain for the LDAP data.

openstack domain create freeipa
...
openstack domain show freeipa -f yaml

description: ''
enabled: true
id: 422608e5c8d8428cb022792b459d30bf
name: freeipa
options: {}
tags: []

Edit /etc/keystone/keystone.conf to support domin specific backends and back them with file config. When you are done, your identity section should look like this.

[identity]
domain_specific_drivers_enabled=true
domain_config_dir=/etc/keystone/domains
driver = sql

Create the corresponding directory for the new configuration files.

sudo mkdir /etc/keystone/domains/

Add in a configuration file for your LDAP server. Since I called my domain freeipa I have to name the config file /etc/keystone/domains/keystone.freeipa.conf

[identity]
driver = ldap

[ldap]
url = ldap://den-admin-01


user_tree_dn = cn=users,cn=accounts,dc=younglogic,dc=com
user_objectclass = person
user_id_attribute = uid
user_name_attribute = uid
user_mail_attribute = mail
user_allow_create = false
user_allow_update = false
user_allow_delete = false
group_tree_dn = cn=groups,cn=accounts,dc=younglogic,dc=com
group_objectclass = groupOfNames
group_id_attribute = cn
group_name_attribute = cn
group_member_attribute = member
group_desc_attribute = description
group_allow_create = false
group_allow_update = false
group_allow_delete = false
user_enabled_attribute = nsAccountLock
user_enabled_default = False
user_enabled_invert = true

To make changes, to restart sudo systemctl restart uwsgi@keystone-public

sudo systemctl restart uwsgi@keystone-public

And test that it worked

openstack user list -f yaml  --domain freeipa
- ID: b3054e3942f06016f8b9669b068e81fd2950b08c46ccb48032c6c67053e03767
  Name: renee
- ID: d30e7bc818d2f633439d982783a2d145e324e3187c0e67f71d80fbab065d096a
  Name: ann

This same approach can work if you need to add more than one LDAP server to your Keystone deployment.

by Adam Young at May 04, 2022 07:39 PM

April 27, 2022

RDO Blog

RDO Yoga Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Yoga for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Yoga is the 25th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.

The release is already available on the CentOS mirror network:

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

Interesting things in the Yoga release include:

  • RDO Yoga is the first RDO version built and tested for CentOS Stream 9.
  • In order to ease transition from CentOS Stream 8, RDO Yoga is also built and tested for CentOS Stream 8. Note that next release of RDO will be available only for CentOS Stream 9.

The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/yoga/highlights.html

TripleO in the RDO Yoga release:

Since the Xena development cycle, TripleO follows the Independent release model and will only maintain branches for selected OpenStack releases. In the case of Yoga, TripleO will not support the Yoga release. For TripleO users in RDO, this means that:

  • RDO Yoga will include packages for TripleO tested at OpenStack Yoga GA time.
  • Those packages will not be updated during the entire Yoga maintenance cycle.
  • RDO will not be able to included patches required to fix bugs in TripleO on RDO Yoga.
  • The lifecycle for the non-TripleO packages will follow the code merged and tested in upstream stable/yoga branches.
  • There will not be any TripleO Yoga container images built/pushed, so interested users will have to do their own container builds when deploying Yoga.

You can find details about this on the RDO Webpage

Contributors
During the Yoga cycle, we saw the following new RDO contributors:

  • Adriano Vieira Petrich
  • Andrea Bolognani
  • Dariusz Smigiel
  • David Vallee Delisle
  • Douglas Viroel
  • Jakob Meng
  • Lucas Alvares Gomes
  • Luis Tomas Bolivar
  • T. Nichole Williams
  • Karolina Kula

Welcome to all of you and Thank You So Much for participating!

But we wouldn’t want to overlook anyone. A super massive Thank You to all 40 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:

  • Adriano Vieira Petrich
  • Alan Bishop
  • Alan Pevec
  • Alex Schultz
  • Alfredo Moralejo
  • Amy Marrich (spotz)
  • Andrea Bolognani
  • Chandan Kumar
  • Daniel Alvarez Sanchez
  • Dariusz Smigiel
  • David Vallee Delisle
  • Douglas Viroel
  • Emma Foley
  • Gaël Chamoulaud
  • Gregory Thiemonge
  • Harald
  • Jakob Meng
  • James Slagle
  • Jiri Podivin
  • Joel Capitao
  • Jon Schlueter
  • Julia Kreger
  • Kashyap Chamarthy
  • Lee Yarwood
  • Lon Hohberger
  • Lucas Alvares Gomes
  • Luigi Toscano
  • Luis Tomas Bolivar
  • Martin Kopec
  • mathieu bultel
  • Matthias Runge
  • Riccardo Pittau
  • Sergey
  • Stephen Finucane
  • Steve Baker
  • Takashi Kajinami
  • T. Nichole Williams
  • Tobias Urdin
  • Karolina Kula
  • User otherwiseguy
  • Yatin Karel

The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Zed.

Get Started
To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help
The RDO Project has the users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on OFTC. IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel in Libera Chat network, and #tripleo on OFTC), however we have a more focused audience within the RDO venues.

Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo and #tripleo on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

by Amy Marrich at April 27, 2022 12:22 PM

April 20, 2022

Adam Young

ipxe.efi for aarch64

To make the AARCH64 ipxe process work using bifrost, I had to

git clone https://github.com/ipxe/ipxe.git
cd ipxe/src/
make bin-arm64-efi/snponly.efi ARCH=arm64
sudo cp bin-arm64-efi/snponly.efi /var/lib/tftpboot/ipxe.efi

This works for the Ampere reference implementation servers that use a Mellanox network interface card, which supports (only) snp.

by Adam Young at April 20, 2022 08:31 PM

April 08, 2022

Adam Young

Bifrost Spike on an Ampere AltraMax

For the past week I worked on getting a Ironic standalone to run on an Ampere AltraMax server in our lab. As I recently was able to get a baremetal node to boot, I wanted to record the steps I went through.

Our base operating system for this install is Ubuntu 20.04.

The controller node has 2 Mellanox Technologies MT27710 network cards, each with 2 ports apiece.

I started by following the steps to install with the bifrost-cli. However, there were a few places where the installation assumes an x86_64 architecture, and I hard-swapped them to be AARCH64/ARM64 specific:

$ git diff HEAD
diff --git a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
index 18e281b0..277bfc1c 100644
--- a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
+++ b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
@@ -6,8 +6,8 @@ ironic_rootwrap_dir: /usr/local/bin/
 mysql_service_name: mysql
 tftp_service_name: tftpd-hpa
 efi_distro: debian
-grub_efi_binary: /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
-shim_efi_binary: /usr/lib/shim/shimx64.efi.signed
+grub_efi_binary: /usr/lib/grub/arm64-efi-signed/grubaa64.efi.signed
+shim_efi_binary: /usr/lib/shim/shimaa64.efi.signed
 required_packages:
   - mariadb-server
   - python3-dev
diff --git a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
index 7fcbcd46..4d6a1337 100644
--- a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
+++ b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
@@ -26,7 +26,7 @@ required_packages:
   - dnsmasq
   - apache2-utils
   - isolinux
-  - grub-efi-amd64-signed
+  - grub-efi-arm64-signed
   - shim-signed
   - dosfstools
 # NOTE(TheJulia): The above entry for dnsmasq must be the last entry in the

The long term approach to these is to make those variables architecture specific.

In order to install, I ran the cli:

./bifrost-cli install --network-interface enP4p4s0f1 --dhcp-pool 192.168.116.100-192.168.116.150 

It took me several tries with -e variables until realized that it was not going to honor them. I did notice that the heart of the command was the Ansible call, which I ended up running directly:

/opt/stack/bifrost/bin/ansible-playbook   ~/bifrost/playbooks/install.yaml -i ~/bifrost/playbooks/inventory/target -e bifrost_venv_dir=/opt/stack/bifrost -e @/home/ansible/bifrost/baremetal-install-env.json 

You may notice that I added a -e with the baremetal-install-env.json file. That file had been created by the earlier CLI run., and contained the variables specific to my install. I also edited it to trigger the build of the ironic cleaning image.

{
  "create_ipa_image": false,
  "create_image_via_dib": false,
  "install_dib": true,
  "network_interface": "enP4p4s0f1",
  "enable_keystone": false,
  "enable_tls": false,
  "generate_tls": false,
  "noauth_mode": false,
  "enabled_hardware_types": "ipmi,redfish,manual-management",
  "cleaning_disk_erase": false,
  "testing": false,
  "use_cirros": false,
  "use_tinyipa": false,
  "developer_mode": false,
  "enable_prometheus_exporter": false,
  "default_boot_mode": "uefi",
  "include_dhcp_server": true,
  "dhcp_pool_start": "192.168.116.100",
  "dhcp_pool_end": "192.168.116.150",
  "download_ipa": false,
  "create_ipa_image": true
}

With this ins place, I was able to enroll nodes using the Bifrost cli:

 ~/bifrost/bifrost-cli enroll ~/nodes.json

I prefer this to using my own script. However, my script checks for existence and thus can be run idempotently, unlike this one. Still, I like the file format and will likely script to it in the future.

WIth this, I was ready to try booting the nodes, but they hung as I reported in an earlier article.

The other place where the deployment is x86_64 specific is the iPXE binary. In a bifrost install on Ubuntu, the binary is called ipxe.efi, and it is placed in /var/lib/tftpboot/ipxe.efi. It is copied from the grub-ipxe package which places it in /boot/ipxe.efi. Although this package is not tagged as an x86_64 architecture (Debian/Ubuntu call it all) the file is architecture specific.

I went through the steps to fetch and install the latest one out of jammy which has an additional file: /boot/ipxe-arm64.efi. However, when I replaced the file /var/lib/tftpboot/ipxe.efi with this one, the baremetal node still failed to boot, although it did get a few steps further in the process.

The issue, as I understand it, is that the binary needs as set of drivers to set up the http request in the network interface cards, and the build in the Ubuntu package did not have that. Instead, I cloned the source git repo and compiled the binary directly. Roughly

git clone https://github.com/ipxe/ipxe.git
cd ipxe/src
make bin-arm64-efi/snponly.efi  ARCH=arm64

SNP stands for the Simple Network Protocol. I guess this protocol is esoteric enough that Wikipedia has not heard of it.

The header file in the code says this:

  The EFI_SIMPLE_NETWORK_PROTOCOL provides services to initialize a network interface,
  transmit packets, receive packets, and close a network interface.
 

It seems the Mellanox cards support/require SNP. With this file in place, I was able to get the cleaning image to PXE boot.

I call this a spike as it has a lot of corners cut in it that I would not want to maintain in production. We’ll work with the distributions to get a viable version of ipxe.efi produced that can work for an array of servers, including Ampere’s. In the meantime, I need a strategy to handle building our own binary. I also plan on reworking the Bifrost variables to handle ARM64/AARCH64 along side x86_64; a single server should be able to handle both based on the Architecture flag sent in the initial DHCP request.

Note: I was not able to get the cleaning image to boot, as it had an issue with werkzeug and JSON. However, I had an older build of the IPA kernel and initrd that I used, and the node properly deployed and cleaned.

And yes, I plan on integrating Keystone in the future, too.

by Adam Young at April 08, 2022 11:43 PM

March 25, 2022

Adam Young

Discoverability in API design

There are a handful of questions a user will (implicitly) ask when using your API:

  1. What actions can I do against this endpoint?
  2. How do I find the URLs for those actions?
  3. What information do I need to provide in order to perform this action?
  4. What permission do I need in order to perform this action.

Answering these questions can be automated. The user, and the tools they use, can discover the answers by working with the system. That is what I mean when I use the word “Discoverability.”

We missed some opportunities to answer these questions when we designed the APIs for Keystone OpenStack. I’d like to talk about how to improve on what we did there.

First I’d like to state what not to do.

Don’t make the user read the documentation and code to an external spec.

Never require a user to manually perform an operation that should be automated. Answering every one of those question can be automated. If you can get it wrong, you will get it wrong. Make it possible to catch errors as early as possible.

Lets start with the question: “What actions can I do against this endpoint?” In the case of Keystone, the answer would be some of the following:

Create, Read, Update and Delete (CRUD) Users, Groups of Users, Projects, Roles, and Catalog Items such as Services and Endpoints. You can also CRUD relationships between these entities. You can CRUD Entities for Federated Identity. You can CRUD Policy files (historical). Taken in total, you have the tools to make access control decisions for a wide array of services, not just Keystone.

The primary way, however, that people interact with Keystone is to get a token. Let’s use this use case to start. To Get a token, you make a POST to the $OS_AUTH_URL/v3/auth/tokens/ URL. The data

How would you know this? Only by reading the documentation. If someone handed you the value of their OS_AUTH_URL environment variable, and you looked at it using a web client, what would you get? Really, just the version URL. Assuming you chopped off the V3:

$ curl http://10.76.10.254:35357/
{"versions": {"values": [{"id": "v3.14", "status": "stable", "updated": "2020-04-07T00:00:00Z", "links": [{"rel": "self", "href": "http://10.76.10.254:35357/v3/"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}]}}

and the only URL in there is the version URL, which gives you back the same thing.

If you point a web browser at the service, the output is in JSON, even though the web browser told the server that it preferred HTML.

What could this look like: If we look at the API spec for Keystone:  We can see that the various entities referred to Above hat fairly predictable URL forms. However, for this use case, we want a token, so we should, at a minimum, see the path to get to the token. Since this is the V3 API, we should See an entry like this:

{"rel": "auth", "href": "http://10.76.10.254:35357/v3/auth"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}

And is we then performed an HTTP GET on http://10.76.10.254:35357/v3/auth we should see a link to :

{"rel": "token", "href": "http://10.76.10.254:35357/v3/auth/token"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}

Is this 100% of the solution? No. The Keystone API shows its prejudices toward PASSWORD based authentication, a very big antipattern. The Password goes in clear text into the middle of the JSON blob posted to this API. We trust in SSL/TLS to secure it over the wire, and have had to erase from logs and debugging. This is actually a step backwards from BASIC_AUTH in HTTP. All this aside, there is still no way to tell what you need to put into the body of the token request without reading the documentation….unless you know the magic of JSON-HOME.

Here is what you would need to do to get a list of the top level URLS, excluding all the ones that are templated, and thus require knowing an ID.

curl 10.76.116.63:5000 -H "Accept: application/json-home" | jq '.resources | to_entries | .[] | .value | .href ' | sort -u
  • “/v3/auth/catalog”
  • “/v3/auth/domains”
  • “/v3/auth/OS-FEDERATION/saml2”
  • “/v3/auth/OS-FEDERATION/saml2/ecp”
  • “/v3/auth/projects”
  • “/v3/auth/system”
  • “/v3/auth/tokens”
  • “/v3/auth/tokens/OS-PKI/revoked”
  • “/v3/credentials”
  • “/v3/domains”
  • “/v3/domains/config/default”
  • “/v3/ec2tokens”
  • “/v3/endpoints”
  • “/v3/groups”
  • “/v3/limits”
  • “/v3/limits/model”
  • “/v3/OS-EP-FILTER/endpoint_groups”
  • “/v3/OS-FEDERATION/identity_providers”
  • “/v3/OS-FEDERATION/mappings”
  • “/v3/OS-FEDERATION/saml2/metadata”
  • “/v3/OS-FEDERATION/service_providers”
  • “/v3/OS-OAUTH1/access_token”
  • “/v3/OS-OAUTH1/consumers”
  • “/v3/OS-OAUTH1/request_token”
  • “/v3/OS-REVOKE/events”
  • “/v3/OS-SIMPLE-CERT/ca”
  • “/v3/OS-SIMPLE-CERT/certificates”
  • “/v3/OS-TRUST/trusts”
  • “/v3/policies”
  • “/v3/projects”
  • “/v3/regions”
  • “/v3/registered_limits”
  • “/v3/role_assignments”
  • “/v3/role_inferences”
  • “/v3/roles”
  • “/v3/s3tokens”
  • “/v3/services”
  • “/v3/users”

This would be the friendly list to return from the /v3 page. Or, if we wanted to streamline it a bit for human consumption, we could put a top level grouping around each of these APIs. A friendlier list would look like this (chopping off the /v3)

  • auth
  • assignment
  • catalog
  • federation
  • identity
  • limits
  • resource
  • assignment
  • policy

There are a couple ways to order the list. Alphabetical order is the simplest for an English speaker if they know what they are looking for. This won’t internationalize, and it won’t guide the user to the use cases that are most common. Thus, I put auth at the top, as that is, by far, the most common use case. The others I have organized based on a quick think-through from most to least common. I could easily be convinced to restructure this a couple different ways.

However, we are starting to trip over some of the other aspects of usability. We have provided the user with way more information than they need, or, indeed, can use at this point. Since none of those operations can be performed unauthenticated, we have lead the user astray; we should show them, at this stage, only what they can do in their current state. Thus: the obvious entry would be.

  • /v3/auth/tokens.
  • /v3/auth/OS-FEDERATION
As these are the only two directions they can go unauthenticated.

Lets continue on with the old-school version of a token request using the v3/auth/tokens resource, as that is the most common use case. How now does a user request a token? Depends on whether they want to use password or another token, or multifactor, and whether they want an unscoped token or a scoped token.

None of this information is in the JSON home. You have to read the docs.

If we were using straight HTML to render the response, we would expect a form. Something along the lines of:

There is, as of now, no standard way to put form data into JSON. However, there are numerous standards to chose from. One such standard is FormData API. JSON Scheme https://json-schema.org/. If we look at the API do, we get a table that specifies the name. Anything that is not a single value is specified as an object, which really means a JSON object which is a dictionary that can bee deeply nested. We can see the complexity in the above form, where the scope value determines what is meant by the project/domain name field. And these versions don’t allow for IDs to be used instead of the names for users, projects, or domains.

A lot of the custom approach here is dictated by the fact that Keystone does not accept standard authentication. The Password based token request could easily be replaced with BASIC-AUTH. Tokens themselves could be stored as session cookies, with the same timeouts as the token expiration. All of the One-Offs in Keystone make it more difficult to use, and require more application specific knowledge.

Many of these issues were straightened out when we started doing federation. Now, there is still some out-of-band knowledge required to use the Federated API, but this was due to concerns about information leaking that I am going to ignore for now. The approach I am going to describe is basically what is used by any app that allows you to log in from the different cloud providers Identity sources today.

From the /v3 page, a user should be able to select the identity provider that they want to use. This could require a jump to /v3/FEDERATION and then to /v3/FEDERATION/idp, in order to keep things namespaced, or the list could be expanded in the /v3 page if there is really nothing else that a user can do unauthenticated.

Let us assume a case where there are three companies that all share access to the cloud; Acme, Beta, and Charlie. The JSON response would be the same as the list identity providers API. The interesting part of the result is this one here:

 "protocols": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols"

Lets say that a given Identity provider supports multiple protocols. Here is where the user gets to chose which hone they want to use to try and authenticate. An HTTP GET on the link above would return that list: The documentation shows an example of an identity provider that supports saml2. Here is an expanded one that shows the set of protocols a user could expect in a private cloud running FreeIPA and Keycloak, or Active Directory and ADFS.

{
    "links": {
        "next": null,
        "previous": null,
        "self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols"
    },
    "protocols": [
        {
            "id": "saml2",
            "links": {
                "identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
                "self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/saml2"
            },
            "mapping_id": "xyz234"
        },
        {
            "id": "x509",
            "links": {
                "identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
                "self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/x509"
            },
            "mapping_id": "xyz235"
        },
        {
            "id": "gssapi",
            "links": {
                "identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
                "self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/gssapi"
            },
            "mapping_id": "xyz236"
        },
        {
            "id": "oidc",
            "links": {
                "identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
                "self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/oidc"
            },
            "mapping_id": "xyz237"
        },
        {
            "id": "basic-auth",
            "links": {
                "identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
                "self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/basic-auth"
            },
            "mapping_id": "xyz238"
        }
    ]
}

Note that this is very similar to the content that a web browser gives back in a 401 response: the set of acceptable authentication mechanisms. I actually prefer this here, as it actually allows the user to select the appropriate mechanism for the use case, which may vary depending on where the use connects from.

Lets ignore the actual response from the above links and assume that, if the user is unauthenticated, they merely get a link to where they can authenticate. /v3/OS-FEDERATION/identity_providers/{idp_id}/protocols/{protocol_id}/auth. The follow on link is a GET. Not a POST. There is no form Data required. The mapping resolves the users Domain Name/ID, so there is no need to provide that information, and the token is a Federated unscoped token.

The actual response contains the list of groups that a user belongs to. This is an artifact of the mapping, and it is useful for debugging. However, what the user has at this point is, effectively, an unscoped token. It is passed in the X-Subject-Token header, and not in the session cookie. However, for an HTML based workflow, and, indeed, for sane HTTP workflows against Keystone, a session scoped cookie containing the token would be much more useful.

With an unscoped token, a user can perform some operations against a Keystone server, but those operations are either read-only, operations specific to the user, or administrative actions specific to the Keystone server. For OpenStack, the vast majority of the time the user is going to Keystone to request a scoped token to use on one of the other services. As such, the user probably needs to convert the unscoped token shown above to a token scoped to a project. A very common setup has the user assigned to a single project. Even if they are scoped to multiple, it is unlikely that they are scoped to many. Thus, the obvious next step is to show the user a URL that will allow them to get a token scoped to a specific project.

Keystone does not have such a URL. In Keystone, again you are required to go through /v3/auth/tokens to request a scoped token.

A much friendlier URL scheme would be /v3/auth/projects which lists the set of projects a user can request a token for, and /v3/auth/project/{id} which lets a user request a scoped token for that project

However, even if we had such a URL pattern, we would need to direct the user to that URL. There are two distinct use cases. The first is the case where the user has just authenticated, and in the token response, they need to see the project list URL. A redirect makes the most sense, although the list of projects could also be in authentication response. However, the user might also be returning to the Keystone server from some other operation, still have the session cookie with the token in it, and start at the discovery page again. IN this case, the /v3/ response should show /v3/auth/projects/ in its list.

There is, unfortunately, one case where this would be problematic. With Hierarchical projects, a single assignment could allow a user to get a token for many projects. While this is a useful hack in practice, it means that the project list page could get extremely long. This is, unfortunately also the case with the project list page itself; projects may be nested, but the namespace needs to be flat, and listing projects will list all of them, only the parent-project ID distinguishes them. Since we do have ways to do path nesting in HTTP, this is a solvable problem. Lets lump the token request and the project list APIs together. This actually makes a very elegant solution;

Instead of /v3/auth/projects we put a link off the project page itelf back to /v3/auth/tokens but accepting the project ID as a URL parameter, like this: /v3/auth/tokens?project_id=abc123.

Of course, this means that there is a hidden mechanism now. If a user wants to look at any resource in Keystone, they can do so with an unscoped token, provided they have a role assignment on the project or domain that manages that object.

To this point we have discussed implicit answers to the questions of finding URLs and discovering what actions a user can perform. For the token request, is started discussing how to provide the answer to “What information do I need to provide in order to perform this action?” I think now we can state how to do that: the list page for any collection should either provide an inline form or a link to a form URL. The form provides the information in a format that makes sense for the content type. If the user does not have the permission to create the object, they should not see the form. If the form is on a separate link, a user that cannot create that object should get back a 403 error if they attempt to GET the URL.

If Keystone had been written to return HTML when hit by a browser instead of JSON, all of this navigation would have been painfully obvious. Instead, we subscribed to the point of view that UI was to be done by the Horizon server.

There still remains the last question: “What permission do I need in order to perform this action?” The user only thinks to answer this question when they come across an operation that they cannot perform. I’ll did deeper into this in the next article


by Adam Young at March 25, 2022 03:06 AM

March 16, 2022

Adam Young

Generating a clouds.yaml file

Kolla creates an admin.rc file using the environment variables. I want to then use this in a terraform plan, but I’d rather not generate terrafoprm specific code for the Keystone login data. So, a simple python script converts from env vars to yaml.

#!/usr/bin/python3
import os
import yaml

clouds = {
   "clouds":{
    "cluster": {
        "auth" : {
            "auth_url" : os.environ["OS_AUTH_URL"], 
            "project_name": os.environ["OS_PROJECT_NAME"],
            "project_domain_name": os.environ["OS_PROJECT_DOMAIN_NAME"],
            "username": os.environ["OS_USERNAME"],
            "user_domain_name": os.environ["OS_USER_DOMAIN_NAME"],
            "password": os.environ["OS_PASSWORD"]
        }
    }
    }
}


print (yaml.dump(clouds))

To use it:

./clouds.py > clouds.yaml

Note that you should have sourced the appropriate config environment variables file, such as :

. /etc/kolla/admin-openrc.sh

by Adam Young at March 16, 2022 11:00 PM

February 13, 2022

Lars Kellogg-Stedman

Udev rules for CH340 serial devices

I like to fiddle with Micropython, particularly on the Wemos D1 Mini, because these are such a neat form factor. Unfortunately, they have a cheap CH340 serial adapter on board, which means that from the perspective of Linux these devices are all functionally identical – there’s no way to identify one device from another. This by itself would be a manageable problem, except that the device names assigned to these devices aren’t constant: depending on the order in which they get plugged in (and the order in which they are detected at boot), a device might be /dev/ttyUSB0 one day and /dev/ttyUSB2 another day.

On more than one occasion, I have accidentally re-flashed the wrong device. Ouch.

A common solution to this problem is to create device names based on the USB topology – that is, assign names based on a device’s position in the USB bus: e.g., when attaching a new USB serial device, expose it at something like /dev/usbserial/<bus>/<device_path>. While that sounds conceptually simple, it took me a while to figure out the correct udev rules.

Looking at the available attributes for a serial device, we see:

# udevadm info -a -n /dev/ttyUSB0
[...]
looking at device '/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3/3-1.4.3:1.0/ttyUSB0/tty/ttyUSB0':
KERNEL=="ttyUSB0"
SUBSYSTEM=="tty"
DRIVER==""
ATTR{power/control}=="auto"
ATTR{power/runtime_active_time}=="0"
ATTR{power/runtime_status}=="unsupported"
ATTR{power/runtime_suspended_time}=="0"
looking at parent device '/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3/3-1.4.3:1.0/ttyUSB0':
KERNELS=="ttyUSB0"
SUBSYSTEMS=="usb-serial"
DRIVERS=="ch341-uart"
ATTRS{port_number}=="0"
ATTRS{power/control}=="auto"
ATTRS{power/runtime_active_time}=="0"
ATTRS{power/runtime_status}=="unsupported"
ATTRS{power/runtime_suspended_time}=="0"
looking at parent device '/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3/3-1.4.3:1.0':
KERNELS=="3-1.4.3:1.0"
SUBSYSTEMS=="usb"
DRIVERS=="ch341"
ATTRS{authorized}=="1"
ATTRS{bAlternateSetting}==" 0"
ATTRS{bInterfaceClass}=="ff"
ATTRS{bInterfaceNumber}=="00"
ATTRS{bInterfaceProtocol}=="02"
ATTRS{bInterfaceSubClass}=="01"
ATTRS{bNumEndpoints}=="03"
ATTRS{supports_autosuspend}=="1"
looking at parent device '/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3':
KERNELS=="3-1.4.3"
SUBSYSTEMS=="usb"
DRIVERS=="usb"
ATTRS{authorized}=="1"
ATTRS{avoid_reset_quirk}=="0"
ATTRS{bConfigurationValue}=="1"
ATTRS{bDeviceClass}=="ff"
ATTRS{bDeviceProtocol}=="00"
ATTRS{bDeviceSubClass}=="00"
ATTRS{bMaxPacketSize0}=="8"
ATTRS{bMaxPower}=="98mA"
ATTRS{bNumConfigurations}=="1"
ATTRS{bNumInterfaces}==" 1"
ATTRS{bcdDevice}=="0262"
ATTRS{bmAttributes}=="80"
ATTRS{busnum}=="3"
ATTRS{configuration}==""
ATTRS{devnum}=="8"
ATTRS{devpath}=="1.4.3"
ATTRS{idProduct}=="7523"
ATTRS{idVendor}=="1a86"
ATTRS{ltm_capable}=="no"
ATTRS{maxchild}=="0"
ATTRS{power/active_duration}=="48902765"
ATTRS{power/autosuspend}=="2"
ATTRS{power/autosuspend_delay_ms}=="2000"
ATTRS{power/connected_duration}=="48902765"
ATTRS{power/control}=="on"
ATTRS{power/level}=="on"
ATTRS{power/persist}=="1"
ATTRS{power/runtime_active_time}=="48902599"
ATTRS{power/runtime_status}=="active"
ATTRS{power/runtime_suspended_time}=="0"
ATTRS{product}=="USB2.0-Serial"
ATTRS{quirks}=="0x0"
ATTRS{removable}=="unknown"
ATTRS{rx_lanes}=="1"
ATTRS{speed}=="12"
ATTRS{tx_lanes}=="1"
ATTRS{urbnum}=="17"
ATTRS{version}==" 1.10"
[...]

In this output, we find that the device itself (at the top) doesn’t have any useful attributes we can use for creating a systematic device name. It’s not until we’ve moved up the device hierarchy to /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3 that we find topology information (in the busnum and devpath attributes). This complicates matters because a udev rule only has access to attributes defined directly on matching device, so we can’t right something like:

SUBSYSTEM=="usb-serial", SYMLINK+="usbserial/$attr{busnum}/$attr{devpath}"

How do we access the attributes of a parent node in our rule?

The answer is by creating environment variables that preserve the values in which we are interested. I started with this:

SUBSYSTEMS=="usb", ENV{.USB_BUSNUM}="$attr{busnum}", ENV{.USB_DEVPATH}="$attr{devpath}"

Here, my goal was to stash the busnum and devpath attributes in .USB_BUSNUM and .USB_DEVPATH, but this didn’t work: it matches device path /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3/3-1.4.3:1.0, which is:

KERNELS=="3-1.4.3:1.0"
SUBSYSTEMS=="usb"
DRIVERS=="ch341"
ATTRS{authorized}=="1"
ATTRS{bAlternateSetting}==" 0"
ATTRS{bInterfaceClass}=="ff"
ATTRS{bInterfaceNumber}=="00"
ATTRS{bInterfaceProtocol}=="02"
ATTRS{bInterfaceSubClass}=="01"
ATTRS{bNumEndpoints}=="03"
ATTRS{supports_autosuspend}=="1"

We need to match the next device up the chain, so we need to make our match more specific. There are a couple of different options we can pursue; the simplest is probably to take advantage of the fact that the next device up the chain has SUBSYSTEMS=="usb" and DRIVERS="usb", so we could instead write:

SUBSYSTEMS=="usb", DRIVERS=="usb", ENV{.USB_BUSNUM}="$attr{busnum}", ENV{.USB_DEVPATH}="$attr{devpath}"

Alternately, we could ask for “the first device that has a busnum attribute” like this:

SUBSYSTEMS=="usb", ATTRS{busnum}=="?*", ENV{.USB_BUSNUM}="$attr{busnum}", ENV{.USB_DEVPATH}="$attr{devpath}"

Where (from the udev(7) man page), ? matches any single character and * matches zero or more characters, so this matches any device in which busnum has a non-empty value. We can test this rule out using the udevadm test command:

# udevadm test $(udevadm info --query=path --name=/dev/ttyUSB0)
[...]
.USB_BUSNUM=3
.USB_DEVPATH=1.4.3
[...]

This shows us that our rule is matching and setting up the appropriate variables. We can now use those in a subsequent rule to create the desired symlink:

SUBSYSTEMS=="usb", ATTRS{busnum}=="?*", ENV{.USB_BUSNUM}="$attr{busnum}", ENV{.USB_DEVPATH}="$attr{devpath}"
SUBSYSTEMS=="usb-serial", SYMLINK+="usbserial/$env{.USB_BUSNUM}/$env{.USB_DEVPATH}"

Re-running the test command, we see:

# udevadm test $(udevadm info --query=path --name=/dev/ttyUSB0)
[...]
DEVLINKS=/dev/serial/by-path/pci-0000:03:00.0-usb-0:1.4.3:1.0-port0 /dev/usbserial/3/1.4.3 /dev/serial/by-id/usb-1a86_USB2.0-Serial-if00-port0
[...]

You can see the new symlink in the DEVLINKS value, and looking at /dev/usbserial we can see the expected symlinks:

# tree /dev/usbserial
/dev/usbserial/
└── 3
├── 1.1 -> ../../ttyUSB1
└── 1.4.3 -> ../../ttyUSB0

And there have it. Now as long as I attach a specific device to the same USB port on my system, it will have the same device node. I’ve updated my tooling to use these paths (/dev/usbserial/3/1.4.3) instead of the kernel names (/dev/ttyUSB0), and it has greatly simplified things.

February 13, 2022 12:00 AM

December 06, 2021

Matthias Runge

Debugging OpenStack Metrics

Overview

We'll do the following steps:

  1. check if metrics are getting in
  2. if not, check if ceilometer is running
  3. check if gnocchi is running
  4. query gnocchi for metrics
  5. check alarming
  6. further debugging

Check that metrics are getting in

openstack server list

example:

$ openstack server list  -c ID -c Name -c Status
+--------------------------------------+-------+--------+
| ID                                   | Name  | Status |
+--------------------------------------+-------+--------+
| 7e087c31-7e9c-47f7-a4c4-ebcc20034faa | foo-4 | ACTIVE |
| 977fd250-75d4-4da3-a37c-bc7649047151 | foo-2 | ACTIVE |
| a1182f44-7163-4ad9-89ed-36611f75bac7 | foo-5 | ACTIVE |
| a91951b3-ff4e-46a0-a5fd-20064f02afc9 | foo-3 | ACTIVE |
| f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f | foo-1 | ACTIVE |
+--------------------------------------+-------+--------+

We'll use f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f. That's the uuid of the server foo-1. For convenience, the server uuid and the resource ID used in openstack metric resource are the same.

$ openstack metric resource show f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f
+-----------------------+-------------------------------------------------------------------+
| Field                 | Value                                                             |
+-----------------------+-------------------------------------------------------------------+
| created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e                                  |
| created_by_user_id    | 39d9e30374a74fe8b58dee9e1dcd7382                                  |
| creator               | 39d9e30374a74fe8b58dee9e1dcd7382:6a180caa64894f04b1d04aeed1cb920e |
| ended_at              | None                                                              |
| id                    | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f                              |
| metrics               | cpu: fe84c113-f98d-42ee-84fd-bdf360842bdb                         |
|                       | disk.ephemeral.size: ad79f268-5f56-4ff8-8ece-d1f170621217         |
|                       | disk.root.size: 6e021f8c-ead0-46e4-bd26-59131318e6a2              |
|                       | memory.usage: b768ec46-5e49-4d9a-b00d-004f610c152d                |
|                       | memory: 1a4e720a-2151-4265-96cf-4daf633611b2                      |
|                       | vcpus: 68654bc0-8275-4690-9433-27fe4a3aef9e                       |
| original_resource_id  | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f                              |
| project_id            | 8d077dbea6034e5aa45c0146d1feac5f                                  |
| revision_end          | None                                                              |
| revision_start        | 2021-11-09T10:00:46.241527+00:00                                  |
| started_at            | 2021-11-09T09:29:12.842149+00:00                                  |
| type                  | instance                                                          |
| user_id               | 65bfeb6cc8ec4df4a3d53550f7e99a5a                                  |
+-----------------------+-------------------------------------------------------------------+

This list shows the metrics associated with the instance.

You are done here.

Checking if ceilometer is running

$ ssh controller-0 -l root
$ podman ps --format "{{.Names}} {{.Status}}" | grep ceilometer
ceilometer_agent_central Up 2 hours ago
ceilometer_agent_notification Up 2 hours ago

On compute nodes, there should be ceilometer_agent_compute running

$ podman ps --format "{{.Names}} {{.Status}}" | grep ceilometer
ceilometer_agent_compute Up 2 hours ago

The metrics are being sent from ceilometer to a remote defined in /var/lib/config-data/puppet-generated/ceilometer/etc/ceilometer/pipeline.yaml , which may look similar to the following file

---
sources:
    - name: meter_source
      meters:
          - "*"
      sinks:
          - meter_sink
sinks:
    - name: meter_sink
      publishers:
          - gnocchi://?filter_project=service&archive_policy=ceilometer-high-rate
          - notifier://172.17.1.40:5666/?driver=amqp&topic=metering

In this case, data is sent to both STF and Gnocchi. Next step is to check if there are any errors happening. On controllers and computes, ceilometer logs are found in /var/log/containers/ceilometer/.

The agent-notification.log shows logs from publishing data, as well as errors if sending out metrics or logs fails for some reason.

If there are any errors in the log file, it is likely that metrics are not being delivered to the remote.

2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 136, in _send_notification
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging     retry=retry)
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 295, in wrap
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging     return func(self, *args, **kws)
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging   File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 397, in send_notification
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging     raise rc
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging oslo_messaging.exceptions.MessageDeliveryFailure: Notify message sent to <Target topic=event.sample> failed: timed out

In this case, it failes to send messages to the STF instance. The following example shows the gnocchi api not responding or not being accessible

2021-11-16 10:38:07.707 16 ERROR ceilometer.publisher.gnocchi [-] <html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
 (HTTP 503): gnocchiclient.exceptions.ClientException: <html><body><h1>503 Service Unavailable</h1>

For more gnocchi debugging, see the gnocchi section.

Gnocchi

Gnocchi sits on controller nodes and consists of three separate containers, gnocchi_metricd, gnocchi_statsd, and gnocchi_api. The latter is for the interaction with the outside world, such as ingesting metrics or returning measurements.

Gnocchi metricd are used for re-calculating metrics, downsampling for lower granularity, etc. Gnocchi logfiles are found under /var/log/containers/gnocchi and the gnocchi API is hooked into httpd, thus the logfiles are stored under /var/log/containers/httpd/gnocchi-api/. The corresponding files there are either gnocchi_wsgi_access.log or gnocchi_wsgi_error.log.

In the case from above (ceilometer section), where ceilometer could not send metrics to gnocchi, one would also observe log output for the gnocchi API.

Retrieving metrics from Gnocchi

For a starter, let's see which resources there are.

openstack server list -c ID -c Name -c Status
+--------------------------------------+-------+--------+
| ID                                   | Name  | Status |
+--------------------------------------+-------+--------+
| 7e087c31-7e9c-47f7-a4c4-ebcc20034faa | foo-4 | ACTIVE |
| 977fd250-75d4-4da3-a37c-bc7649047151 | foo-2 | ACTIVE |
| a1182f44-7163-4ad9-89ed-36611f75bac7 | foo-5 | ACTIVE |
| a91951b3-ff4e-46a0-a5fd-20064f02afc9 | foo-3 | ACTIVE |
| f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f | foo-1 | ACTIVE |
+--------------------------------------+-------+--------+

To show which metrics are stored for the vm foo-1 one would use the following command

openstack metric resource show f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f --max-width 75
+-----------------------+-------------------------------------------------+
| Field                 | Value                                           |
+-----------------------+-------------------------------------------------+
| created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e                |
| created_by_user_id    | 39d9e30374a74fe8b58dee9e1dcd7382                |
| creator               | 39d9e30374a74fe8b58dee9e1dcd7382:6a180caa64894f |
|                       | 04b1d04aeed1cb920e                              |
| ended_at              | None                                            |
| id                    | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f            |
| metrics               | cpu: fe84c113-f98d-42ee-84fd-bdf360842bdb       |
|                       | disk.ephemeral.size:                            |
|                       | ad79f268-5f56-4ff8-8ece-d1f170621217            |
|                       | disk.root.size:                                 |
|                       | 6e021f8c-ead0-46e4-bd26-59131318e6a2            |
|                       | memory.usage:                                   |
|                       | b768ec46-5e49-4d9a-b00d-004f610c152d            |
|                       | memory: 1a4e720a-2151-4265-96cf-4daf633611b2    |
|                       | vcpus: 68654bc0-8275-4690-9433-27fe4a3aef9e     |
| original_resource_id  | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f            |
| project_id            | 8d077dbea6034e5aa45c0146d1feac5f                |
| revision_end          | None                                            |
| revision_start        | 2021-11-09T10:00:46.241527+00:00                |
| started_at            | 2021-11-09T09:29:12.842149+00:00                |
| type                  | instance                                        |
| user_id               | 65bfeb6cc8ec4df4a3d53550f7e99a5a                |
+-----------------------+-------------------------------------------------+

To view the memory usage between Nov 18 2021 17:00 UTC and 17:05 UTC, one would issue this command:

openstack metric measures show --start 2021-11-18T17:00:00 \
                               --stop 2021-11-18T17:05:00 \
                               --aggregation mean 
                               b768ec46-5e49-4d9a-b00d-004f610c152d

+---------------------------+-------------+-------------+
| timestamp                 | granularity |       value |
+---------------------------+-------------+-------------+
| 2021-11-18T17:00:00+00:00 |      3600.0 | 28.87890625 |
| 2021-11-18T17:00:00+00:00 |        60.0 | 28.87890625 |
| 2021-11-18T17:01:00+00:00 |        60.0 | 28.87890625 |
| 2021-11-18T17:02:00+00:00 |        60.0 | 28.87890625 |
| 2021-11-18T17:03:00+00:00 |        60.0 | 28.87890625 |
| 2021-11-18T17:04:00+00:00 |        60.0 | 28.87890625 |
| 2021-11-18T17:00:14+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:00:44+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:01:14+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:01:44+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:02:14+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:02:44+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:03:14+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:03:44+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:04:14+00:00 |         1.0 | 28.87890625 |
| 2021-11-18T17:04:44+00:00 |         1.0 | 28.87890625 |
+---------------------------+-------------+-------------+

This shows, the data is available with granularity 3600, 60 and 1 sec. The memory usage does not change over the time, that's why the values don't change. Please note, if you'd be asking for values with the granularity of 300, the result will be empty

$ openstack metric measures show --start 2021-11-18T17:00:00 \
              --stop 2021-11-18T17:05:00 \
              --aggregation mean \
              --granularity 300
              b768ec46-5e49-4d9a-b00d-004f610c152d
Aggregation method 'mean' at granularity '300.0' for metric b768ec46-5e49-4d9a-b00d-004f610c152d does not exist (HTTP 404)

More info about the metric can be actually listed by using

openstack metric show --resource-id f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f \
              memory.usage \
              --max-width 75

+--------------------------------+----------------------------------------+
| Field                          | Value                                  |
+--------------------------------+----------------------------------------+
| archive_policy/name            | ceilometer-high-rate                   |
| creator                        | 39d9e30374a74fe8b58dee9e1dcd7382:6a180 |
|                                | caa64894f04b1d04aeed1cb920e            |
| id                             | b768ec46-5e49-4d9a-b00d-004f610c152d   |
| name                           | memory.usage                           |
| resource/created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e       |
| resource/created_by_user_id    | 39d9e30374a74fe8b58dee9e1dcd7382       |
| resource/creator               | 39d9e30374a74fe8b58dee9e1dcd7382:6a180 |
|                                | caa64894f04b1d04aeed1cb920e            |
| resource/ended_at              | None                                   |
| resource/id                    | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f   |
| resource/original_resource_id  | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f   |
| resource/project_id            | 8d077dbea6034e5aa45c0146d1feac5f       |
| resource/revision_end          | None                                   |
| resource/revision_start        | 2021-11-09T10:00:46.241527+00:00       |
| resource/started_at            | 2021-11-09T09:29:12.842149+00:00       |
| resource/type                  | instance                               |
| resource/user_id               | 65bfeb6cc8ec4df4a3d53550f7e99a5a       |
| unit                           | MB                                     |
+--------------------------------+----------------------------------------+

It shows in this case, the used archive policy is ceilometer-high-rate.

openstack metric archive-policy show ceilometer-high-rate --max-width 75
+---------------------+---------------------------------------------------+
| Field               | Value                                             |
+---------------------+---------------------------------------------------+
| aggregation_methods | mean, rate:mean                                   |
| back_window         | 0                                                 |
| definition          | - timespan: 1:00:00, granularity: 0:00:01,        |
|                     | points: 3600                                      |
|                     | - timespan: 1 day, 0:00:00, granularity: 0:01:00, |
|                     | points: 1440                                      |
|                     | - timespan: 365 days, 0:00:00, granularity:       |
|                     | 1:00:00, points: 8760                             |
| name                | ceilometer-high-rate                              |
+---------------------+---------------------------------------------------+

That means, in this case, the aggregation methods one could use for querying the metrics are just mean and rate:mean. Other methods could include min or max.

Alarming

Alarms can be retrieved by issuing

$ openstack alarm list

To create an alarm, for example based on disk.ephemeral.size, one would use something like

openstack alarm create --alarm-action 'log://' \
              --ok-action 'log://' \
              --comparison-operator ge \
              --evaluation-periods 1 \
              --granularity 60 \
              --aggregation-method mean \
              --metric disk.ephemeral.size \
              --resource-id f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f \
              --name ephemeral \
              -t gnocchi_resources_threshold \
              --resource-type instance \
              --threshold 1

+---------------------------+----------------------------------------+
| Field                     | Value                                  |
+---------------------------+----------------------------------------+
| aggregation_method        | mean                                   |
| alarm_actions             | ['log:']                               |
| alarm_id                  | 994a1710-98e8-495f-89b5-f14349575c96   |
| comparison_operator       | ge                                     |
| description               | gnocchi_resources_threshold alarm rule |
| enabled                   | True                                   |
| evaluation_periods        | 1                                      |
| granularity               | 60                                     |
| insufficient_data_actions | []                                     |
| metric                    | disk.ephemeral.size                    |
| name                      | ephemeral                              |
| ok_actions                | ['log:']                               |
| project_id                | 8d077dbea6034e5aa45c0146d1feac5f       |
| repeat_actions            | False                                  |
| resource_id               | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f   |
| resource_type             | instance                               |
| severity                  | low                                    |
| state                     | insufficient data                      |
| state_reason              | Not evaluated yet                      |
| state_timestamp           | 2021-11-22T10:16:15.250720             |
| threshold                 | 1.0                                    |
| time_constraints          | []                                     |
| timestamp                 | 2021-11-22T10:16:15.250720             |
| type                      | gnocchi_resources_threshold            |
| user_id                   | 65bfeb6cc8ec4df4a3d53550f7e99a5a       |
+---------------------------+----------------------------------------+

The state here insufficient data states, the data gathered or stored is not sufficient to compare against. There is also a state reason given, in this case Not evaluated yet, which gives an explanation.

Another valid reason could be No datapoint for granularity 60.

Further debugging

On OpenStack installations deployed via Tripleo aka OSP Director, the log files are located on the separate nodes under /var/log/containers/{service_name}/. The config files for the services are stored under /var/lib/config-data/puppet-generated/<service_name> and are mounted into the containers.

by mrunge at December 06, 2021 03:00 PM

October 27, 2021

RDO Blog

RDO Xena Released

RDO Xena Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Xena for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Xena is the 24th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.

 

The release is already available on the CentOS mirror network at http://mirror.centos.org/centos/8-stream/cloud/x86_64/openstack-xena/.

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.
PLEASE NOTE: RDO Xena provides packages for CentOS Stream 8 only. Please use the Victoria release for CentOS Linux 8 which will reach End Of Life (EOL) on December 31st, 2021 (https://www.centos.org/centos-linux-eol/).

Interesting things in the Xena release include:
  • The python-oslo-limit package has been added to RDO. This is the limit enforcement library which assists with quota calculation. Its aim is to provide support for quota enforcement across all OpenStack services.
  • The glance-tempest-plugin package has been added to RDO. This package provides a set of functional tests to validate Glance using the Tempest framework.
  • TripleO has been moved to an independent release model (see section TripleO in the RDO Xena release).

The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/xena/highlights.html

 

TripleO in the RDO Xena release:
In the Xena development cycle, TripleO has moved to an Independent release model (https://specs.openstack.org/openstack/tripleo-specs/specs/xena/tripleo-independent-release.html) and will only maintain branches for selected OpenStack releases. In the case of Xena, TripleO will not support the Xena release. For TripleO users in RDO, this means that:
  • RDO Xena will include packages for TripleO tested at OpenStack Xena GA time.
  • Those packages will not be updated during the entire Xena maintenance cycle.
  • RDO will not be able to included patches required to fix bugs in TripleO on RDO Xena.
  • The lifecycle for the non-TripleO packages will follow the code merged and tested in upstream stable/Xena branches.
  • There will not be any TripleO Xena container images built/pushed, so interested users will have to do their own container builds when deploying Xena.
You can find details about this on the RDO webpage
Contributors

During the Xena cycle, we saw the following new RDO contributors:

  • Chris Sibbitt
  • Gregory Thiemonge
  • Julia Kreger
  • Leif Madsen
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 41 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:
  • Alan Bishop
  • Alan Pevec
  • Alex Schultz
  • Alfredo Moralejo
  • Amy Marrich (spotz)
  • Bogdan Dobrelya
  • Chandan Kumar
  • Chris Sibbitt
  • Damien Ciabrini
  • Dmitry Tantsur
  • Eric Harney
  • Gaël Chamoulaud
  • Giulio Fidente
  • Goutham Pacha Ravi
  • Gregory Thiemonge
  • Grzegorz Grasza
  • Harald Jensas
  • James Slagle
  • Javier Peña
  • Jiri Podivin
  • Joel Capitao
  • Jon Schlueter
  • Julia Kreger
  • Lee Yarwood
  • Leif Madsen
  • Luigi Toscano
  • Marios Andreou
  • Mark McClain
  • Martin Kopec
  • Mathieu Bultel
  • Matthias Runge
  • Michele Baldessari
  • Pranali Deore
  • Rabi Mishra
  • Riccardo Pittau
  • Sagi Shnaidman
  • Sławek Kapłoński
  • Steve Baker
  • Takashi Kajinami
  • Wes Hayutin
  • Yatin Karel

 

The Next Release Cycle

At the end of one release, focus shifts immediately to the next release i.e Yoga.

Get Started

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help

The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on OFTC IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel in Libera.Chat network, and #tripleo on OFTC), however we have a more focused audience within the RDO venues.

Get Involved

To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo and #tripleo on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

by Amy Marrich at October 27, 2021 02:59 PM

September 30, 2021

Adam Young

Legible Error traces from openstack server show

If an OpenStack server (Ironic or Nova) has an error, it shows up in a nested field. That field is hard to read in its normal layout, due to JSON formatting. Using jq to strip the formatting helps a bunch

The nested field is fault.details.

The -r option strips off the quotes.

[ayoung@ayoung-home scratch]$ openstack server show oracle-server-84-aarch64-vm-small -f json | jq -r '.fault | .details'
Traceback (most recent call last):
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance
    block_device_info=block_device_info)
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3458, in spawn
    block_device_info=block_device_info)
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3831, in _create_image
    fallback_from_host)
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3922, in _create_and_inject_local_root
    instance, size, fallback_from_host)
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 9243, in _try_fetch_image_cache
    trusted_certs=instance.trusted_certs)
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 275, in cache
    *args, **kwargs)
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 642, in create_image
    self.verify_base_size(base, size)
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 331, in verify_base_size
    flavor_size=size, image_size=base_size)
nova.exception.FlavorDiskSmallerThanImage: Flavor's disk is too small for requested image. Flavor disk is 21474836480 bytes, image is 34359738368 bytes.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance
    filter_properties, request_spec)
  File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2525, in _build_and_run_instance
    reason=e.format_message())
nova.exception.BuildAbortException: Build of instance 5281b93a-0c3c-4d38-965d-568d79abb530 aborted: Flavor's disk is too small for requested image. Flavor disk is 21474836480 bytes, image is 34359738368 bytes.

by Adam Young at September 30, 2021 08:13 PM

Debugging a Clean Failure in Ironic

My team is running a small OpenStack cluster with reposnsibility for providing bare metal nodes via Ironic. Currently, we have a handful of nodes that are not usable. They show up as “Cleaning failed.” I’m learning how to debug this process.

Tools

The following ipmtool commands allow us to set the machine to PXE boot, remote power cycle the machine, and view what happens during the boot process.

Power stuff:

ipmitool -H $H -U $U -I lanplus -P $P chassis power status
ipmitool -H $H -U $U -I lanplus -P $P chassis power on
ipmitool -H $H -U $U -I lanplus -P $P chassis power off
ipmitool -H $H -U $U -I lanplus -P $P chassis power cycle

Serial over LAN (SOL)

ipmitool -H $H -U $U -I lanplus -P $P sol activate

PXE Boot

ipmitool -H $H -U $U -I lanplus -P $P chassis bootdev pxe
#Set Boot Device to pxe

Conductor Log

To tail the log and only see entries relevant to the UUID of the node I am cleaning:

tail -f /var/log/kolla/ironic/ironic-conductor.log | grep $UUID

OpenStack baremetal node commands

What is the IPMI address for a node?

openstack baremetal node show fab1bcf7-a7fc-4c19-9d1d-fc4dbc4b2281 -f json | jq '.driver_info | .ipmi_address'
"10.76.97.171"

Cleaning Commands

We have a script that prepares the PXE server to accept a cleaning request from a node. It performs the following three actions (don’t do these yet):

 
 openstack baremetal node maintenance unset ${i}
 openstack baremetal node manage ${i}
 openstack baremetal node provide ${i}

Getting ipmi addresses for nodes

To look at the IPM power status (and confirm that IPMI is set up right for the nodes)

for node in `openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="clean failed")  | .UUID' ` ; 
do    
echo $node ; 
METAL_IP=`openstack baremetal node show  $node -f json | jq -r  '.driver_info | .ipmi_address' ` ; 
echo $METAL_IP  ; 
ipmitool -I lanplus -H  $METAL_IP  -L ADMINISTRATOR -U admin -R 12 -N 5 -P admin chassis power status   ; 
done 

Yes, I did that all on one line, hence the semicolons.

A couple other one liners. This selects all active nodes and gives you their node id and ipmi IP address.

for node in `openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="active")  | .UUID' ` ; do echo $node ;  openstack baremetal node show  $node -f json | jq -r  '.driver_info | .ipmi_address' ;done

And you can swap out active with other values. For example, if you want to see what nodes are in either error or clean failed states:

openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="error" or ."Provisioning State"=="manageable")  | .UUID'

Troubleshooting

PXE outside of openstack

If I want to ensure I can PXE boot, out side of the openstack operations, in one terminal, I can track the state in a console. I like to have this running in a dedicated terminal: open the SOL.

ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN sol activate

and in another, set the machine to PXE boot, then power cycle it:

ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN chassis bootdev pxe
Set Boot Device to pxe
[ayoung@ayoung-home keystone]$ ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN chassis power cycle
Chassis Power Control: Cycle

If the Ironic server is not ready to accept the PXE request, your server will let you know with a message like this one:

>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: 1C-34-DA-51-D6-C0.
PXE-E18: Server response timeout.

ERROR: Boot option loading failed

PXE inside of a clean

openstack baremetal node list --provision-state "clean failed"  -f value -c UUID

Produces output like this:

8470e638-0085-470c-9e51-b2ed016569e1
5411e7e8-8113-42d6-a966-8cacd1554039
08c14110-88aa-4e45-b5c1-4054ac49115a
3f5f510c-a313-4e40-943a-366917ec9e44

Clean wait log entries

I’ll track what is going on in the log for a specific node by running tail -f and grepping for the uuid of the node:

tail -f /var/log/kolla/ironic/ironic-conductor.log | grep 5411e7e8-8113-42d6-a966-8cacd1554039

If you run the three commands I showed above, the Ironic server should be prepared for cleaning and will accept the PXE request. I can execute these one at a time and track the state in the conductor log. If I kick off a clean, eventually, I see entries like this in the conductor log (I’m removing the time stamps and request ids for readability):

ERROR ironic.conductor.task_manager [] Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "clean failed" from state "clean wait"; target provision state is "available"
INFO ironic.conductor.utils [] Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power off by power off.
INFO ironic.drivers.modules.network.flat [] Removing ports from cleaning network for node 5411e7e8-8113-42d6-a966-8cacd1554039
INFO ironic.common.neutron [] Successfully removed node 5411e7e8-8113-42d6-a966-8cacd1554039 neutron ports.

Manual abort

And I can trigger this manually if a run is taking too long by running:

openstack baremetal node abort  $UUID

Kick off clean process

The command to kick off the clean process is

openstack baremetal node provide $UUID

In the conductor log, that should show messages like this (again, edited for readability)

Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "cleaning" from state "manageable"; target provision state is "available"
Adding cleaning network to node 5411e7e8-8113-42d6-a966-8cacd1554039
For node 5411e7e8-8113-42d6-a966-8cacd1554039 in network de931fcc-32a0-468e-8691-ffcb43bf9f2e, successfully created ports (ironic ID: neutron ID): {'94306ff5-5cd4-4fdd-a33e-a0202c34d3d0': 'd9eeb64d-468d-4a9a-82a6-e70d54b73e62'}.
Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power on by rebooting.
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "clean wait" from state "cleaning"; target provision state is "available"

PXE during a clean

At this point, the most interesting thing is to see what is happening on the node. ipmiptool sol activate provides a running log. If you are lucky, the PXE process kicks off and a debian-based kernel should start booting. My company has a specific login set for the machines:

debian login: ampere Password: Linux debian 5.10.0-6-arm64 #1 SMP Debian 5.10.28-1 (2021-04-09) aarch64

Debugging on the Node

After this, I use sudo -i to run as root.

$ sudo -i 
...
# ps -ef | grep ironic
root        2369       1  1 14:26 ?        00:00:02 /opt/ironic-python-agent/bin/python3 /usr/local/bin/ironic-python-agent --config-dir /etc/ironic-python-agent.d/

Looking for logs:

ls /var/log/
btmp	ibacm.log  opensm.0x9a039bfffead6720.log  private
chrony	lastlog    opensm.0x9a039bfffead6721.log  wtmp

No ironic log. Is this thing even on the network?

# ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0f0np0:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 98:03:9b:ad:67:20 brd ff:ff:ff:ff:ff:ff
3: enp1s0f1np1:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 98:03:9b:ad:67:21 brd ff:ff:ff:ff:ff:ff
4: enxda90910dd11e:  mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether da:90:91:0d:d1:1e brd ff:ff:ff:ff:ff:ff

Nope. Ok, lets get it on the network:

# dhclient
[  486.508054] mlx5_core 0000:01:00.1 enp1s0f1np1: Link down
[  486.537116] mlx5_core 0000:01:00.1 enp1s0f1np1: Link up
[  489.371586] mlx5_core 0000:01:00.0 enp1s0f0np0: Link down
[  489.394050] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0f1np1: link becomes ready
[  489.400646] mlx5_core 0000:01:00.0 enp1s0f0np0: Link up
[  489.406226] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0f0np0: link becomes ready
root@debian:~# [  500.596626] sr 0:0:0:0: [sr0] CDROM not ready.  Make sure there is a disc in the drive.
ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: enp1s0f0np0:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 98:03:9b:ad:67:20 brd ff:ff:ff:ff:ff:ff
    inet 192.168.97.178/24 brd 192.168.97.255 scope global dynamic enp1s0f0np0
       valid_lft 86386sec preferred_lft 86386sec
    inet6 fe80::9a03:9bff:fead:6720/64 scope link 
       valid_lft forever preferred_lft forever
3: enp1s0f1np1:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 98:03:9b:ad:67:21 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::9a03:9bff:fead:6721/64 scope link 
       valid_lft forever preferred_lft forever
4: enxda90910dd11e:  mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether da:90:91:0d:d1:1e brd ff:ff:ff:ff:ff:ff
    inet6 fe80::d890:91ff:fe0d:d11e/64 scope link 
       valid_lft forever preferred_lft forever

And…quite shortly thereafter in the conductor log:

Agent on node 5411e7e8-8113-42d6-a966-8cacd1554039 returned cleaning command success, moving to next clean step
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "cleaning" from state "clean wait"; target provision state is "available"
Executing cleaning on node 5411e7e8-8113-42d6-a966-8cacd1554039, remaining steps: []
Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power off by power off.
Removing ports from cleaning network for node 5411e7e8-8113-42d6-a966-8cacd1554039
Successfully removed node 5411e7e8-8113-42d6-a966-8cacd1554039 neutron ports.
Node 5411e7e8-8113-42d6-a966-8cacd1554039 cleaning complete
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "available" from state "cleaning"; target provision state is "None"

Cause of Failure

So, in our case, the issue seems to be that the IPA image does not have dhcp enabled.

by Adam Young at September 30, 2021 04:30 PM

September 27, 2021

John Likes OpenStack

OpenInfra Live Episode 24: OpenStack and Ceph

This Thursday at 14:00 UTC Francesco and I will be in a panel on OpenInfra Live Episode 24: OpenStack and Ceph.

by Unknown (noreply@blogger.com) at September 27, 2021 10:19 PM

September 05, 2021

Lars Kellogg-Stedman

A pair of userscripts for cleaning up Stack Exchange sites

I’ve been a regular visitor to Stack Overflow and other Stack Exchange sites over the years, and while I’ve mostly enjoyed the experience, I’ve been frustrated by the lack of control I have over what questions I see. I’m not really interested in looking at questions that have already been closed, or that have a negative score, but there’s no native facility for filtering questions like this.

I finally spent the time learning just enough JavaScript to hurt myself to put together a pair of scripts that let me present the questions that way I want:

sx-hide-questions

The sx-hide-questions script will hide:

  • Questions that are closed
  • Questions that are marked as a duplicate
  • Questions that have a score below 0

Because I wanted it to be obvious that the script was actually doing something, hidden questions don’t just disappear; they fade out.

These behaviors (including the fading) can all be controlled individually by a set of global variables at the top of the script.

sx-reorder questions

The sx-reorder-questions script will sort questions such that anything that has an answer will be at the bottom, and questions that have not yet been answered appear at the top.

Installation

If you are using the Tampermonkey extension, you should be able to click on the links to the script earlier in this post and be taken directly to the installation screen. If you’re not running Tampermonkey, than either (a) install it, or (b) you’re on your own.

You can find both of these scripts in my sx-question-filter repository.

Caveats

These scripts rely on the CSS classes and layout of the Stack Exchange websites. If these change, the scripts will need updating. If you notice that something no longer works as advertised, please feel free to submit pull request with the necessary corrections!

September 05, 2021 12:00 AM

September 03, 2021

Lars Kellogg-Stedman

Kubernetes External Secrets

At $JOB we maintain the configuration for our OpenShift clusters in a public git repository. Changes in the git repository are applied automatically using ArgoCD and Kustomize. This works great, but the public nature of the repository means we need to find a secure solution for managing secrets (such as passwords and other credentials necessary for authenticating to external services). In particular, we need a solution that permits our public repository to be the source of truth for our cluster configuration, without compromising our credentials.

Rejected options

We initially looked at including secrets directly in the repository through the use of the KSOPS plugin for Kustomize, which uses sops to encrypt secrets with GPG keys. There are some advantages to this arrangement:

  • It doesn’t require any backend service
  • It’s easy to control read access to secrets in the repository by encrypting them to different recipients.

There were some minor disadvantages:

  • We can’t install ArgoCD via the operator because we need a customized image that includes KSOPS, so we have to maintain our own ArgoCD image.

And there was one major problem:

  • Using GPG-encrypted secrets in a git repository makes it effectively impossible to recover from a key compromise.

One a private key is compromised, anyone with access to that key and the git repository will be able to decrypt data in historical commits, even if we re-encrypt all the data with a new key.

Because of these security implications we decided we would need a different solution (it’s worth noting here that Bitnami Sealed Secrets suffers from effectively the same problem).

Our current solution

We’ve selected a solution that uses the External Secrets project in concert with the AWS SecretsManager service.

Kubernetes external secrets

The External Secrets project allows one to store secrets in an external secrets store, such as AWS SecretsManager, Hashicorp Vault, and others 1. The manifests that get pushed into your OpenShift cluster contain only pointers (called ExternalSecrets) to those secrets; the external secrets controller running on the cluster uses the information contained in the ExternalSecret in combination with stored credentials to fetch the secret from your chosen backend and realize the actual Secret resource. An external secret manifest referring to a secret named mysceret stored in AWS SecretsManager would look something like:

apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
name: example-secret
spec:
backendType: secretsManager
data:
- key: mysecret
name: mysecretvalue

This model means that no encrypted data is ever stored in the git repository, which resolves the main problem we had with the solutions mentioned earlier.

External Secrets can be installed into your Kubernetes environment using Helm, or you can use helm template to generate manifests locally and apply them using Kustomize or some other tool (this is the route we took).

AWS SecretsManager Service

AWS SecretsManager is a service for storing and managing secrets and making them accessible via an API. Using SecretsManager we have very granular control over who can view or modify secrets; this allows us, for example, to create cluster-specific secret readers that can only read secrets intended for a specific cluster (e.g. preventing our development environment from accidentally using production secrets).

SecretsManager provides automatic versioning of secrets to prevent loss of data if you inadvertently change a secret while still requiring the old value.

We can create secrets through the AWS SecretsManager console, or we can use the AWS CLI, which looks something like:

aws secretsmanager create-secret \
--name mysecretname \
--secret-string mysecretvalue

Two great tastes that taste great together

This combination solves a number of our problems:

  • Because we’re not storing actual secrets in the repository, we don’t need to worry about encrypting anything.

  • Because we’re not managing encrypted data, replacing secrets is much easier.

  • There’s a robust mechanism for controlling access to secrets.

  • This solution offers a separation of concern that simply wasn’t possible with the KSOPS model: someone can maintain secrets without having to know anything about Kubernetes manifests, and someone can work on the repository without needing to know any secrets.

Creating external secrets

In its simplest form, an ExternalSecret resource maps values from specific named secrets in the backend to keys in a Secret resource. For example, if we wanted to create a Secret in OpenShift with the username and password for an external service, we could create to separate secrets in SecretsManager. One for the username:

aws secretsmanager create-secret \
--name cluster/cluster1/example-secret-username \
--secret-string foo

And one for the password:

aws secretsmanager create-secret \
--name cluster/cluster1/example-secret-password \
--secret-string bar \
--tags Key=cluster,Value=cluster1

And then create an ExternalSecret manifest like this:

apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
name: example-secret
spec:
backendType: secretsManager
data:
- key: cluster/cluster1/example-secret-username
name: username
- key: cluster/cluster1/example-secret-password
name: password

This instructs the External Secrets controller to create an Opaque secret named example-secret from data in AWS SecretsManager. The value of the username key will come from the secret named cluster/cluster1/example-secret-username, and similarly for password. The resulting Secret resource will look something like this:

apiVersion: v1
kind: Secret
metadata:
name: example-secret
type: Opaque
data:
password: YmFy
username: Zm9v

Templates for structured data

In the previous example, we created two separate secrets in SecretsManager for storing a username and password. It might be more convenient if we could store both credentials in a single secret. Thanks to the templating support in External Secrets, we can do that!

Let’s redo the previous example, but instead of using two separate secrets, we’ll create a single secret named cluster/cluster1/example-secret in which the secret value is a JSON document containing both the username and password:

aws secretsmanager create-secret \
--name cluster/cluster1/example-secret \
--secret-string '{"username": "foo", "password": "bar"}'

NB: The jo utility is a neat little utility for generating JSON from the command line; using that we could write the above like this…

aws secretsmanager create-secret \
--name cluster/cluster1/example-secret \
--secret-string $(jo username=foo password=bar)

…which makes it easier to write JSON without missing a quote, closing bracket, etc.

We can extract these values into the appropriate keys by adding a template section to our ExternalSecret, and using the JSON.parse template function, like this:

apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
name: example-secret
namespace: sandbox
spec:
backendType: secretsManager
data:
- key: cluster/cluster1/example-secret
name: creds
template:
stringData:
username: "<%= JSON.parse(data.creds).username %>"
password: "<%= JSON.parse(data.creds).password %>"

The result secret will look like:

apiVersion: v1
kind: Secret
metadata:
name: example-secret
type: Opaque
data:
creds: eyJ1c2VybmFtZSI6ICJmb28iLCAicGFzc3dvcmQiOiAiYmFyIn0=
password: YmFy
username: Zm9v

Notice that in addition to the values created in the template section, the Secret also contains any keys defined in the data section of the ExternalSecret.

Templating can also be used to override the secret type if you want something other than Opaque, add metadata, and otherwise influence the generated Secret.


  1. E.g. Azure Key Vault, Google Secret Manager, Alibaba Cloud KMS Secret Manager, Akeyless ↩︎

September 03, 2021 12:00 AM

August 23, 2021

Lars Kellogg-Stedman

Connecting OpenShift to an External Ceph Cluster

Red Hat’s OpenShift Data Foundation (formerly “OpenShift Container Storage”, or “OCS”) allows you to either (a) automatically set up a Ceph cluster as an application running on your OpenShift cluster, or (b) connect your OpenShift cluster to an externally managed Ceph cluster. While setting up Ceph as an OpenShift application is a relatively polished experienced, connecting to an external cluster still has some rough edges.

NB I am not a Ceph expert. If you read this and think I’ve made a mistake with respect to permissions or anything else, please feel free to leave a comment and I will update the article as necessary. In particular, I think it may be possible to further restrict the mgr permissions shown in this article and I’m interested in feedback on that topic.

Installing OCS

Regardless of which option you choose, you start by installing the “OpenShift Container Storage” operator (the name change apparently hasn’t made it to the Operator Hub yet). When you select “external mode”, you will be given the opportunity to download a Python script that you are expected to run on your Ceph cluster. This script will create some Ceph authentication principals and will emit a block of JSON data that gets pasted into the OpenShift UI to configure the external StorageCluster resource.

The script has a single required option, --rbd-data-pool-name, that you use to provide the name of an existing pool. If you run the script with only that option, it will create the following ceph principals and associated capabilities:

  • client.csi-rbd-provisioner

    caps mgr = "allow rw"
    caps mon = "profile rbd"
    caps osd = "profile rbd"
    
  • client.csi-rbd-node

    caps mon = "profile rbd"
    caps osd = "profile rbd"
    
  • client.healthchecker

    caps mgr = "allow command config"
    caps mon = "allow r, allow command quorum_status, allow command version"
    caps osd = "allow rwx pool=default.rgw.meta, allow r pool=.rgw.root, allow rw pool=default.rgw.control, allow rx pool=default.rgw.log, allow x pool=default.rgw.buckets.index"
    

    This account is used to verify the health of the ceph cluster.

If you also provide the --cephfs-filesystem-name option, the script will also create:

  • client.csi-cephfs-provisioner

    caps mgr = "allow rw"
    caps mon = "allow r"
    caps osd = "allow rw tag cephfs metadata=*"
    
  • client.csi-cephfs-node

    caps mds = "allow rw"
    caps mgr = "allow rw"
    caps mon = "allow r"
    caps osd = "allow rw tag cephfs *=*"
    

If you specify --rgw-endpoint, the script will create a RGW user named rgw-admin-ops-userwith administrative access to the default RGW pool.

So what’s the problem?

The above principals and permissions are fine if you’ve created an external Ceph cluster explicitly for the purpose of supporting a single OpenShift cluster.

In an environment where a single Ceph cluster is providing storage to multiple OpenShift clusters, and especially in an environment where administration of the Ceph and OpenShift environments are managed by different groups, the process, principals, and permissions create a number of problems.

The first and foremost is that the script provided by OCS both (a) gathers information about the Ceph environment, and (b) makes changes to that environment. If you are installing OCS on OpenShift and want to connect to a Ceph cluster over which you do not have administrative control, you may find yourself stymied when the storage administrators refuse to run your random Python script on the Ceph cluster.

Ideally, the script would be read-only, and instead of making changes to the Ceph cluster it would only validate the cluster configuration, and inform the administrator of what changes were necessary. There should be complete documentation that describes the necessary configuration scripts so that a Ceph cluster can be configured correctly without running any script, and OCS should provide something more granular than “drop a blob of JSON here” for providing the necessary configuration to OpenShift.

The second major problem is that while the script creates several principals, it only allows you to set the name of one of them. The script has a --run-as-user option, which at first sounds promising, but ultimately is of questionable use: it only allows you set the Ceph principal used for cluster health checks.

There is no provision in the script to create separate principals for each OpenShift cluster.

Lastly, the permissions granted to the principals are too broad. For example, the csi-rbd-node principal has access to all RBD pools on the cluster.

How can we work around it?

If you would like to deploy OCS in an environment where the default behavior of the configuration script is inappropriate you can work around this problem by:

  • Manually generating the necessary principals (with more appropriate permissions), and

  • Manually generating the JSON data for input into OCS

Create the storage

I’ve adopted the following conventions for naming storage pools and filesystems:

  • All resources are prefixed with the name of the cluster (represented here by ${clustername}).

  • The RBD pool is named ${clustername}-rbd. I create it like this:

     ceph osd pool create ${clustername}-rbd
    ceph osd pool application enable ${clustername}-rbd rbd
    
  • The CephFS filesystem (if required) is named ${clustername}-fs, and I create it like this:

     ceph fs volume create ${clustername}-fs
    

    In addition to the filesystem, this creates two pools:

    • cephfs.${clustername}-fs.meta
    • cephfs.${clustername}-fs.data

Creating the principals

Assuming that you have followed the same conventions and have an RBD pool named ${clustername}-rbd and a CephFS filesystem named ${clustername}-fs, the following set of ceph auth add commands should create an appropriate set of principals (with access limited to just those resources that belong to the named cluster):

ceph auth add client.healthchecker-${clustername} \
mgr "allow command config" \
mon "allow r, allow command quorum_status, allow command version"
ceph auth add client.csi-rbd-provisioner-${clustername} \
mgr "allow rw" \
mon "profile rbd" \
osd "profile rbd pool=${clustername}-rbd"
ceph auth add client.csi-rbd-node-${clustername} \
mon "profile rbd" \
osd "profile rbd pool=${clustername}-rbd"
ceph auth add client.csi-cephfs-provisioner-${clustername} \
mgr "allow rw" \
mds "allow rw fsname=${clustername}-fs" \
mon "allow r fsname=${clustername}-fs" \
osd "allow rw tag cephfs metadata=${clustername}-fs"
ceph auth add client.csi-cephfs-node-${clustername} \
mgr "allow rw" \
mds "allow rw fsname=${clustername}-fs" \
mon "allow r fsname=${clustername}-fs" \
osd "allow rw tag cephfs data=${clustername}-fs"

Note that I’ve excluded the RGW permissions here; in our OpenShift environments, we typically rely on the object storage interface provided by Noobaa so I haven’t spent time investigating permissions on the RGW side.

Create the JSON

The final step is to create the JSON blob that you paste into the OCS installation UI. I use the following script which calls ceph -s, ceph mon dump, and ceph auth get-key to get the necessary information from the cluster:

#!/usr/bin/python3
import argparse
import json
import subprocess
from urllib.parse import urlparse
usernames = [
'healthchecker',
'csi-rbd-node',
'csi-rbd-provisioner',
'csi-cephfs-node',
'csi-cephfs-provisioner',
]
def parse_args():
p = argparse.ArgumentParser()
p.add_argument('--use-cephfs', action='store_true', dest='use_cephfs')
p.add_argument('--no-use-cephfs', action='store_false', dest='use_cephfs')
p.add_argument('instance_name')
p.set_defaults(use_rbd=True, use_cephfs=True)
return p.parse_args()
def main():
args = parse_args()
cluster_status = json.loads(subprocess.check_output(['ceph', '-s', '-f', 'json']))
mon_status = json.loads(subprocess.check_output(['ceph', 'mon', 'dump', '-f', 'json']))
users = {}
for username in usernames:
key = subprocess.check_output(['ceph', 'auth', 'get-key', 'client.{}-{}'.format(username, args.instance_name)])
users[username] = {
'name': 'client.{}-{}'.format(username, args.instance_name),
'key': key.decode(),
}
mon_name = mon_status['mons'][0]['name']
mon_ip = [
addr for addr in
mon_status['mons'][0]['public_addrs']['addrvec']
if addr['type'] == 'v1'
][0]['addr']
prom_url = urlparse(cluster_status['mgrmap']['services']['prometheus'])
prom_ip, prom_port = prom_url.netloc.split(':')
output = [
{
"name": "rook-ceph-mon-endpoints",
"kind": "ConfigMap",
"data": {
"data": "{}={}".format(mon_name, mon_ip),
"maxMonId": "0",
"mapping": "{}"
}
},
{
"name": "rook-ceph-mon",
"kind": "Secret",
"data": {
"admin-secret": "admin-secret",
"fsid": cluster_status['fsid'],
"mon-secret": "mon-secret"
}
},
{
"name": "rook-ceph-operator-creds",
"kind": "Secret",
"data": {
"userID": users['healthchecker']['name'],
"userKey": users['healthchecker']['key'],
}
},
{
"name": "ceph-rbd",
"kind": "StorageClass",
"data": {
"pool": "{}-rbd".format(args.instance_name),
}
},
{
"name": "monitoring-endpoint",
"kind": "CephCluster",
"data": {
"MonitoringEndpoint": prom_ip,
"MonitoringPort": prom_port,
}
},
{
"name": "rook-csi-rbd-node",
"kind": "Secret",
"data": {
"userID": users['csi-rbd-node']['name'].replace('client.', ''),
"userKey": users['csi-rbd-node']['key'],
}
},
{
"name": "rook-csi-rbd-provisioner",
"kind": "Secret",
"data": {
"userID": users['csi-rbd-provisioner']['name'].replace('client.', ''),
"userKey": users['csi-rbd-provisioner']['key'],
}
}
]
if args.use_cephfs:
output.extend([
{
"name": "rook-csi-cephfs-provisioner",
"kind": "Secret",
"data": {
"adminID": users['csi-cephfs-provisioner']['name'].replace('client.', ''),
"adminKey": users['csi-cephfs-provisioner']['key'],
}
},
{
"name": "rook-csi-cephfs-node",
"kind": "Secret",
"data": {
"adminID": users['csi-cephfs-node']['name'].replace('client.', ''),
"adminKey": users['csi-cephfs-node']['key'],
}
},
{
"name": "cephfs",
"kind": "StorageClass",
"data": {
"fsName": "{}-fs".format(args.instance_name),
"pool": "cephfs.{}-fs.data".format(args.instance_name),
}
}
])
print(json.dumps(output, indent=2))
if __name__ == '__main__':
main()

If you’d prefer a strictly manual process, you can fill in the necessary values yourself. The JSON produced by the above script looks like the following, which is invalid JSON because I’ve use inline comments to mark all the values which you would need to provide:

[
{
"name": "rook-ceph-mon-endpoints",
"kind": "ConfigMap",
"data": {
# The format is <mon_name>=<mon_endpoint>, and you only need to
# provide a single mon address.
"data": "ceph0=192.168.122.140:6789",
"maxMonId": "0",
"mapping": "{}"
}
},
{
"name": "rook-ceph-mon",
"kind": "Secret",
"data": {
# Fill in the fsid of your Ceph cluster.
"fsid": "c9c32c73-dac4-4cc9-8baa-d73b96c135f4",
# Do **not** fill in these values, they are unnecessary. OCS
# does not require admin access to your Ceph cluster.
"admin-secret": "admin-secret",
"mon-secret": "mon-secret"
}
},
{
"name": "rook-ceph-operator-creds",
"kind": "Secret",
"data": {
# Fill in the name and key for your healthchecker principal.
# Note that here, unlike elsewhere in this JSON, you must
# provide the "client." prefix to the principal name.
"userID": "client.healthchecker-mycluster",
"userKey": "<key>"
}
},
{
"name": "ceph-rbd",
"kind": "StorageClass",
"data": {
# Fill in the name of your RBD pool.
"pool": "mycluster-rbd"
}
},
{
"name": "monitoring-endpoint",
"kind": "CephCluster",
"data": {
# Fill in the address and port of the Ceph cluster prometheus
# endpoint.
"MonitoringEndpoint": "192.168.122.140",
"MonitoringPort": "9283"
}
},
{
"name": "rook-csi-rbd-node",
"kind": "Secret",
"data": {
# Fill in the name and key of the csi-rbd-node principal.
"userID": "csi-rbd-node-mycluster",
"userKey": "<key>"
}
},
{
"name": "rook-csi-rbd-provisioner",
"kind": "Secret",
"data": {
# Fill in the name and key of your csi-rbd-provisioner
# principal.
"userID": "csi-rbd-provisioner-mycluster",
"userKey": "<key>"
}
},
{
"name": "rook-csi-cephfs-provisioner",
"kind": "Secret",
"data": {
# Fill in the name and key of your csi-cephfs-provisioner
# principal.
"adminID": "csi-cephfs-provisioner-mycluster",
"adminKey": "<key>"
}
},
{
"name": "rook-csi-cephfs-node",
"kind": "Secret",
"data": {
# Fill in the name and key of your csi-cephfs-node principal.
"adminID": "csi-cephfs-node-mycluster",
"adminKey": "<key>"
}
},
{
"name": "cephfs",
"kind": "StorageClass",
"data": {
# Fill in the name of your CephFS filesystem and the name of the
# associated data pool.
"fsName": "mycluster-fs",
"pool": "cephfs.mycluster-fs.data"
}
}
]

Associated Bugs

I’ve opened several bug reports to see about adressing some of these issues:

  • #1996833 “ceph-external-cluster-details-exporter.py should have a read-only mode”
  • #1996830 “OCS external mode should allow specifying names for all Ceph auth principals”
  • #1996829 “Permissions assigned to ceph auth principals when using external storage are too broad”

August 23, 2021 12:00 AM

July 12, 2021

Website and blog of Jiří Stránský

Introduction to OS Migrate, the  OpenStack parallel cloud migration toolbox

OS Migrate is a toolbox for content migration (workloads and more) between OpenStack clouds. Let’s dive into why you’d use it, some of its most notable features, and a bit of how it works.

The Why

Why move cloud content between OpenStacks? Imagine these situations:

  • Old cloud hardware is obsolete, you’re buying new. A new green field deployment will be easier than gradual replacement of hardware in the original cloud.

  • You want to make fundamental changes to your OpenStack deployment, that would be difficult or risky to perform on a cloud which is already providing service to users.

  • You want to upgrade to a new release of OpenStack, but you want to cut down on associated cloud-wide risk, or you can’t schedule cloud-wide control plane downtime.

  • You want to upgrade to a new release of OpenStack, but the cloud users should be given a choice when to stop using the old release and start using the new.

  • A combination of the above.

In such situations, running (at least) two clouds in parallel for a period of time is often the preferable path. And when you run parallel clouds, perhaps with the intention of decomissioning some of them eventually, a tool may come in handy to copy/migrate the content that users have created (virtual networks, routers, security groups, machines, block storage, images etc.) from one cloud to another. This is what OS Migrate is for.

The Pitch

Now we know OS Migrate copies/moves content from one OpenStack to another. But there is more to say. Some of the design decisions that went into OS Migrate should make it a tool of choice:

  • Uses standard OpenStack APIs. You don’t need to install any plugins into your clouds before using OS Migrate, and OS Migrate does not need access to the backends of your cloud (databases etc.).

  • Runnable with tenant privileges. For moving tenant-owned content, OS Migrate only needs tenant credentials (not administrative credentials). This naturally reduces risks associated with the migration.

    If desired, cloud tenants can even use OS Migrate on their own. Cloud admins do not necessarily need to get involved.

    Admin credentials are only needed when the content being migrated requires admin privileges to be created (e.g. public Glance images).

  • Transparent. The metadata of exported content is in human-readable YAML files. You can inspect what has been exported from the source cloud, and tweak it if necessary, before executing the import into the destination cloud.

  • Stateless. There is no database in OS Migrate that could get out of sync with reality. The source of migration information are the human readable YAML files. ID-to-ID mappings are not kept, entry-point resources are referred to by names.

  • Idempotent. In case of an issue, fix the root cause and re-run, be it export or import. OS Migrate has mechanisms against duplicit exports and duplicit imports.

  • Cherry-pickable. There’s no need to migrate all content with OS Migrate. Only migrate some tenants, or further scope to some of their resource types, or further limit the resource type exports/imports by a list of resource names or regular expression patterns. Use as much or as little of OS Migrate as you need.

  • Implemented as an Ansible collection. When learning to work with OS Migrate, most importantly you’ll be learning to work with Ansible, an automation tool used across the IT industry. If you already know Ansible, you’ll feel right at home with OS Migrate.

The How

If you want to use OS Migrate, the best thing I can do here is point towards the OS Migrate User Documentation. If you just want to get a glimpse for now, read on.

As OS Migrate is an Ansible collection, the main mode of use is setting Ansible variables and running playbooks shipped with the collection.

Should the default playbooks not fit a particular use case, a technically savvy user could also utilize the collection’s roles and modules as building blocks to craft their own playbooks. However, as i’ve wrote above in the point about cherry-picking features, we’ve tried to make the default playbooks quite generically usable.

In OS Migrate we differentiate between two main migration types with respect to what resources we are migrating: pre-workload migration, and workload migration.

Pre-workload migration

Pre-workload migration focuses on content/resources that can be copied to the destination cloud without affecting workloads in the source cloud. It can be typically done with little timing pressure, ahead of time before migrating workloads. This includes resources like tenant networks, subnets, routers, images, security groups etc.

The content is serialized as editable YAML files to the Migrator host (the machine running the Ansible playbooks), and then resources are created in the destination according to the YAML serializations.

Pre-workload migration data flow

Workload migration

Workload migration focuses on copying VMs and their attached Cinder volumes, and on creating floating IPs for VMs in the destination cloud. The VM migration between clouds is a “cold” migration. VMs first need to be stopped and then they are copied.

With regards to the boot disk of the VM, we support two options: either the destination VM’s boot disk is created from a Glance image, or the source VM’s boot disk snapshot is copied into the destination cloud as a Cinder volume and the destination VM is created as boot-from-volume. There is a migration parameter controlling this behavior on a per-VM basis. Additional Cinder volumes attached to the source VM are copied.

The data path for VMs and volumes is slightly different than in the pre-workload migration. Only metadata gets exported onto the Migrator host. For moving the binary data, special VMs called conversion hosts are deployed, one in the source and one in the destination. This is done for performance reasons, to allow the VMs’ and volumes’ binary data to travel directly from cloud to cloud without going through the (perhaps external) Migrator host as an intermediary.

Workload migration data flow

The Pointers

Now that we have an overview of OS Migrate, let’s finish with some links where more info can be found:

Have a good day!

by Jiří Stránský at July 12, 2021 12:00 AM

May 12, 2021

RDO Blog

RDO Wallaby Released

RDO Wallaby Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Wallaby for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Wallaby is the 23rd release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.
The release is already available on the CentOS mirror network at http://mirror.centos.org/centos/8-stream/cloud/x86_64/openstack-wallaby/.
The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.
All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.
PLEASE NOTE: RDO Wallaby provides packages for CentOS Stream 8 and Python 3 only. Please use the Victoria release for CentOS8.  For CentOS7 and python 2.7, please use the Train release.
Interesting things in the Wallaby release include:
  • With the Victoria release, source tarballs are validated using the upstream GPG signature. This certifies that the source is identical to what is released upstream and ensures the integrity of the packaged source code.
  • With the Victoria release, openvswitch/ovn are not shipped as part of RDO. Instead RDO relies on builds from the CentOS NFV SIG.
  • Some new packages have been added to RDO during the Victoria release:
    • RBAC supported added in multiple projects including Designate, Glance, Horizon, Ironic, and Octavia
    • Glance added support for distributed image import
    • Ironic added deployment and cleaning enhancements including UEFI Partition Image handling, NVMe Secure Erase, per-instance deployment driver interface overrides, deploy time “deploy_steps”, and file injection.
    • Kuryr added nested mode with node VMs running in multiple subnets is now available. To use that functionality a new option [pod_vif_nested]worker_nodes_subnets is introduced accepting multiple Subnet IDs.
    • Manila added the ability for Operators to now set maximum and minimum share sizes as extra specifications on share types.
    • Neutron added a new subnet type network:routed is now available. IPs on this subnet type can be advertised with BGP over a provider network.
    • TripleO moved network and network port creation out of the Heat stack and into the baremetal provisioning workflow.

Other highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/wallaby/highlights.html

Contributors

During the Wallaby cycle, we saw the following new RDO contributors:

  • Adriano Petrich
  • Ananya Banerjee
  • Artom Lifshitz
  • Attila Fazekas
  • Brian Haley
  • David J Peacock
  • Jason Joyce
  • Jeremy Freudberg
  • Jiri Podivin
  • Martin Kopec
  • Waleed Mousa
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 58 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:
  • Adriano Petrich
  • Alex Schultz
  • Alfredo Moralejo
  • Amol Kahat
  • Amy Marrich
  • Ananya Banerjee
  • Artom Lifshitz
  • Arx Cruz
  • Attila Fazekas
  • Bhagyashri Shewale
  • Brian Haley
  • Cédric Jeanneret
  • Chandan Kumar
  • Daniel Pawlik
  • David J Peacock
  • Dmitry Tantsur
  • Emilien Macchi
  • Eric Harney
  • Fabien Boucher
  • Gabriele Cerami
  • Gael Chamoulaud
  • Grzegorz Grasza
  • Harald Jensas
  • Jason Joyce
  • Javier Pena
  • Jeremy Freudberg
  • Jiri Podivin
  • Joel Capitao
  • Kevin Carter
  • Luigi Toscano
  • Marc Dequenes
  • Marios Andreou
  • Martin Kopec
  • Mathieu Bultel
  • Matthias Runge
  • Mike Turek
  • Nicolas Hicher
  • Pete Zaitcev
  • Pooja Jadhav
  • Rabi Mishra
  • Riccardo Pittau
  • Roman Gorshunov
  • Ronelle Landy
  • Sagi Shnaidman
  • Sandeep Yadav
  • Slawek Kaplonski
  • Sorin Sbarnea
  • Steve Baker
  • Takashi Kajinami
  • Tristan Cacqueray
  • Waleed Mousa
  • Wes Hayutin
  • Yatin Karel

The Next Release Cycle

At the end of one release, focus shifts immediately to the next release i.e Xena.

Get Started

There are three ways to get started with RDO.

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

For a production deployment of RDO, use TripleO and you’ll be running a production cloud in short order.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help

The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on Freenode IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel, and #tripleo on irc.freenode.net), however we have a more focused audience within the RDO venues.

Get Involved

To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo and #tripleo on the Freenode IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

 

by Amy Marrich at May 12, 2021 07:41 PM

April 17, 2021

Lars Kellogg-Stedman

Creating a VXLAN overlay network with Open vSwitch

In this post, we’ll walk through the process of getting virtual machines on two different hosts to communicate over an overlay network created using the support for VXLAN in Open vSwitch (or OVS).

The test environment

For this post, I’ll be working with two systems:

  • node0.ovs.virt at address 192.168.122.107
  • node1.ovs.virt at address 192.168.122.174

These hosts are running CentOS 8, although once we get past the package installs the instructions will be similar for other distributions.

While reading through this post, remember that unless otherwise specified we’re going to be running the indicated commands on both hosts.

Install packages

Before we can get started configuring things we’ll need to install OVS and libvirt. While libvirt is included with the base CentOS distribution, for OVS we’ll need to add both the EPEL repository as well as a recent CentOS OpenStack repository (OVS is included in the CentOS OpenStack repositories because it is required by OpenStack’s networking service):

yum -y install epel-release centos-release-openstack-victoria

With these additional repositories enabled we can now install the required packages:

yum -y install \
libguestfs-tools-c \
libvirt \
libvirt-daemon-kvm \
openvswitch2.15 \
tcpdump \
virt-install

Enable services

We need to start both the libvirtd and openvswitch services:

systemctl enable --now openvswitch libvirtd

This command will (a) mark the services to start automatically when the system boots and (b) immediately start the service.

Configure libvirt

When libvirt is first installed it doesn’t have any configured storage pools. Let’s create one in the default location, /var/lib/libvirt/images:

virsh pool-define-as default --type dir --target /var/lib/libvirt/images

We need to mark the pool active, and we might as well configure it to activate automatically next time the system boots:

virsh pool-start default
virsh pool-autostart default

Configure Open vSwitch

Create the bridge

With all the prerequisites out of the way we can finally start working with Open vSwitch. Our first task is to create the OVS bridge that will host our VXLAN tunnels. To create a bridge named br0, we run:

ovs-vsctl add-br br0

We can inspect the OVS configuration by running ovs-vsctl show, which should output something like:

cc1e7217-e393-4e21-97c1-92324d47946d
Bridge br0
Port br0
Interface br0
type: internal
ovs_version: "2.15.1"

Let’s not forget to mark the interface “up”:

ip link set br0 up

Create the VXLAN tunnels

Up until this point we’ve been running identical commands on both node0 and node1. In order to create our VXLAN tunnels, we need to provide a remote endpoint for the VXLAN connection, which is going to be “the other host”. On node0, we run:

ovs-vsctl add-port br0 vx_node1 -- set interface vx_node1 \
type=vxlan options:remote_ip=192.168.122.174

This creates a VXLAN interface named vx_node1 (named that way because the remote endpoint is node1). The OVS configuration now looks like:

cc1e7217-e393-4e21-97c1-92324d47946d
Bridge br0
Port vx_node1
Interface vx_node1
type: vxlan
options: {remote_ip="192.168.122.174"}
Port br0
Interface br0
type: internal
ovs_version: "2.15.1"

On node1 we will run:

ovs-vsctl add-port br0 vx_node0 -- set interface vx_node0 \
type=vxlan options:remote_ip=192.168.122.107

Which results in:

58451994-e0d1-4bf1-8f91-7253ddf4c016
Bridge br0
Port br0
Interface br0
type: internal
Port vx_node0
Interface vx_node0
type: vxlan
options: {remote_ip="192.168.122.107"}
ovs_version: "2.15.1"

At this point, we have a functional overlay network: anything attached to br0 on either system will appear to share the same layer 2 network. Let’s take advantage of this to connect a pair of virtual machines.

Create virtual machines

Download a base image

We’ll need a base image for our virtual machines. I’m going to use the CentOS 8 Stream image, which we can download to our storage directory like this:

curl -L -o /var/lib/libvirt/images/centos-8-stream.qcow2 \
https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-20210210.0.x86_64.qcow2

We need to make sure libvirt is aware of the new image:

virsh pool-refresh default

Lastly, we’ll want to set a root password on the image so that we can log in to our virtual machines:

virt-customize -a /var/lib/libvirt/images/centos-8-stream.qcow2 \
--root-password password:secret

Create the virtual machine

We’re going to create a pair of virtual machines (one on each host). We’ll be creating each vm with two network interfaces:

  • One will be attached to the libvirt default network; this will allow us to ssh into the vm in order to configure things.
  • The second will be attached to the OVS bridge

To create a virtual machine on node0 named vm0.0, run the following command:

virt-install \
-r 3000 \
--network network=default \
--network bridge=br0,virtualport.type=openvswitch \
--os-variant centos8 \
--disk pool=default,size=10,backing_store=centos-8-stream.qcow2,backing_format=qcow2 \
--import \
--noautoconsole \
-n vm0.0

The most interesting option in the above command line is probably the one used to create the virtual disk:

--disk pool=default,size=10,backing_store=centos-8-stream.qcow2,backing_format=qcow2 \

This creates a 10GB “copy-on-write” disk that uses centos-8-stream.qcow2 as a backing store. That means that reads will generally come from the centos-8-stream.qcow2 image, but writes will be stored in the new image. This makes it easy for us to quickly create multiple virtual machines from the same base image.

On node1 we would run a similar command, although here we’re naming the virtual machine vm1.0:

virt-install \
-r 3000 \
--network network=default \
--network bridge=br0,virtualport.type=openvswitch \
--os-variant centos8 \
--disk pool=default,size=10,backing_store=centos-8-stream.qcow2,backing_format=qcow2 \
--import \
--noautoconsole \
-n vm1.0

Configure networking for vm0.0

On node0, get the address of the new virtual machine on the default network using the virsh domifaddr command:

[root@node0 ~]# virsh domifaddr vm0.0
Name MAC address Protocol Address
-------------------------------------------------------------------------------
vnet2 52:54:00:21:6e:4f ipv4 192.168.124.83/24

Connect to the vm using ssh:

[root@node0 ~]# ssh 192.168.124.83
root@192.168.124.83's password:
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Sat Apr 17 14:08:17 2021 from 192.168.124.1
[root@localhost ~]#

(Recall that the root password is secret.)

Configure interface eth1 with an address. For this post, we’ll use the 10.0.0.0/24 range for our overlay network. To assign this vm the address 10.0.0.10, we can run:

ip addr add 10.0.0.10/24 dev eth1
ip link set eth1 up

Configure networking for vm1.0

We need to repeat the process for vm1.0 on node1:

[root@node1 ~]# virsh domifaddr vm1.0
Name MAC address Protocol Address
-------------------------------------------------------------------------------
vnet0 52:54:00:e9:6e:43 ipv4 192.168.124.69/24

Connect to the vm using ssh:

[root@node0 ~]# ssh 192.168.124.69
root@192.168.124.69's password:
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Sat Apr 17 14:08:17 2021 from 192.168.124.1
[root@localhost ~]#

We’ll use address 10.0.0.11 for this system:

ip addr add 10.0.0.11/24 dev eth1
ip link set eth1 up

Verify connectivity

At this point, our setup is complete. On vm0.0, we can connect to vm1.1 over the overlay network. For example, we can ping the remote host:

[root@localhost ~]# ping -c2 10.0.0.11
PING 10.0.0.11 (10.0.0.11) 56(84) bytes of data.
64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.79 ms
64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=0.719 ms
--- 10.0.0.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.719/1.252/1.785/0.533 ms

Or connect to it using ssh:

[root@localhost ~]# ssh 10.0.0.11 uptime
root@10.0.0.11's password:
14:21:33 up 1:18, 1 user, load average: 0.00, 0.00, 0.00

Using tcpdump, we can verify that these connections are going over the overlay network. Let’s watch for VXLAN traffic on node1 by running the following command (VXLAN is a UDP protocol running on port 4789)

tcpdump -i eth0 -n port 4789

When we run ping -c2 10.0.0.11 on vm0.0, we see the following:

14:23:50.312574 IP 192.168.122.107.52595 > 192.168.122.174.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.10 > 10.0.0.11: ICMP echo request, id 4915, seq 1, length 64
14:23:50.314896 IP 192.168.122.174.59510 > 192.168.122.107.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.11 > 10.0.0.10: ICMP echo reply, id 4915, seq 1, length 64
14:23:51.314080 IP 192.168.122.107.52595 > 192.168.122.174.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.10 > 10.0.0.11: ICMP echo request, id 4915, seq 2, length 64
14:23:51.314259 IP 192.168.122.174.59510 > 192.168.122.107.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.11 > 10.0.0.10: ICMP echo reply, id 4915, seq 2, length 64

In the output above, we see that each packet in the transaction results in two lines of output from tcpdump:

14:23:50.312574 IP 192.168.122.107.52595 > 192.168.122.174.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.10 > 10.0.0.11: ICMP echo request, id 4915, seq 1, length 64

The first line shows the contents of the VXLAN packet, while the second lines shows the data that was encapsulated in the VXLAN packet.

That’s all folks

We’ve achieved our goal: we have two virtual machines on two different hosts communicating over a VXLAN overlay network. If you were to do this “for real”, you would probably want to make a number of changes: for example, the network configuration we’ve applied in many cases will not persist across a reboot; handling persistent network configuration is still very distribution dependent, so I’ve left it out of this post.

April 17, 2021 12:00 AM

March 22, 2021

Matthias Runge

High memory usage with collectd

collectd itself is intended as lightweight collecting agent for metrics and events. In larger infrastructure, the data is sent over the network to a central point, where data is stored and processed further.

This introduces a potential issue: what happens, if the remote endpoint to write data to is not available. The traditional network plugin uses UDP, which is by definition unreliable.

Collectd has a queue of values to be written to an output plugin, such was write_http or amqp1. At the time, when metrics should be written, collectd iterates on that queue and tries to write this data to the endpoint. If writing was successful, the data is removed from the queue. The little word if also hints, there is a chance that data doesn't get removed. The question is: what happens, or what should be done?

There is no easy answer to this. Some people tend to ignore missed metrics, some don't. The way to address this is to cap the queue at a given length and to remove oldest data when new comes in. The parameters are WriteQueueLimitHigh and WriteQueueLimitLow. If they are unset, the queue is not limited and will grow until memory is out. For predictability reasons, you should set these two values to the same number. To get the right value for this parameter, it would require a bit of experimentation. If values are dropped, one would see that in the log file.

When collectd is configured as part of Red Hat OpenStack Platform, the following config snippet can be used:

parameter_defaults:
    ExtraConfig:
      collectd::write_queue_limit_high: 100
      collectd::write_queue_limit_low: 100

Another parameter can be used to limit explicitly the queue length in case the amqp1 plugin is used for sending out data: the SendQueueLimit parameter, which is used for the same purpose, but can differ from the global WriteQueueLimitHigh and WriteQueueLimitLow.

parameter_defaults:
    ExtraConfig:
        collectd::plugin::amqp1::send_queue_limit: 50

In almost all cases, the issue of collectd using much memory could be tracked down to a write endpoint not being available, dropping data occasionally, etc.

by mrunge at March 22, 2021 03:00 PM

March 09, 2021

Lars Kellogg-Stedman

Getting started with KSOPS

Kustomize is a tool for assembling Kubernetes manifests from a collection of files. We’re making extensive use of Kustomize in the operate-first project. In order to keep secrets stored in our configuration repositories, we’re using the KSOPS plugin, which enables Kustomize to use sops to encrypt/files using GPG.

In this post, I’d like to walk through the steps necessary to get everything up and running.

Set up GPG

We encrypt files using GPG, so the first step is making sure that you have a GPG keypair and that your public key is published where other people can find it.

Install GPG

GPG will be pre-installed on most Linux distributions. You can check if it’s installed by running e.g. gpg --version. If it’s not installed, you will need to figure out how to install it for your operating system.

Create a key

Run the following command to create a new GPG keypair:

gpg --full-generate-key

This will step you through a series of prompts. First, select a key type. You can just press <RETURN> for the default:

gpg (GnuPG) 2.2.25; Copyright (C) 2020 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
(14) Existing key from card
Your selection?

Next, select a key size. The default is fine:

RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (3072)
Requested keysize is 3072 bits

You will next need to select an expiration date for your key. The default is “key does not expire”, which is a fine choice for our purposes. If you’re interested in understanding this value in more detail, the following articles are worth reading:

Setting an expiration date will require that you periodically update the expiration date (or generate a new key).

Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y

Now you will need to enter your identity, which consists of your name, your email address, and a comment (which is generally left blank). Note that you’ll need to enter o for okay to continue from this prompt.

GnuPG needs to construct a user ID to identify your key.
Real name: Your Name
Email address: you@example.com
Comment:
You selected this USER-ID:
"Your Name <you@example.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o

Lastly, you need to enter a password. In most environments, GPG will open a new window asking you for a passphrase. After you’ve entered and confirmed the passphrase, you should see your key information on the console:

gpg: key 02E34E3304C8ADEB marked as ultimately trusted
gpg: revocation certificate stored as '/home/lars/tmp/gpgtmp/openpgp-revocs.d/9A4EB5B1F34B3041572937C002E34E3304C8ADEB.rev'
public and secret key created and signed.
pub rsa3072 2021-03-11 [SC]
9A4EB5B1F34B3041572937C002E34E3304C8ADEB
uid Your Name <you@example.com>
sub rsa3072 2021-03-11 [E]

Publish your key

You need to publish your GPG key so that others can find it. You’ll need your key id, which you can get by running gpg -k --fingerprint like this (using your email address rather than mine):

$ gpg -k --fingerprint lars@oddbit.com

The output will look like the following:

pub rsa2048/0x362D63A80853D4CF 2013-06-21 [SC]
Key fingerprint = 3E70 A502 BB52 55B6 BB8E 86BE 362D 63A8 0853 D4CF
uid [ultimate] Lars Kellogg-Stedman <lars@oddbit.com>
uid [ultimate] keybase.io/larsks <larsks@keybase.io>
sub rsa2048/0x042DF6CF74E4B84C 2013-06-21 [S] [expires: 2023-07-01]
sub rsa2048/0x426D9382DFD6A7A9 2013-06-21 [E]
sub rsa2048/0xEE1A8B9F9369CC85 2013-06-21 [A]

Look for the Key fingerprint line, you want the value after the =. Use this to publish your key to keys.openpgp.org:

gpg --keyserver keys.opengpg.org \
--send-keys '3E70 A502 BB52 55B6 BB8E 86BE 362D 63A8 0853 D4CF'

You will shortly receive an email to the address in your key asking you to approve it. Once you have approved the key, it will be published on https://keys.openpgp.org and people will be able to look it up by address or key id. For example, you can find my public key at https://keys.openpgp.org/vks/v1/by-fingerprint/3E70A502BB5255B6BB8E86BE362D63A80853D4CF.

Installing the Tools

In this section, we’ll get all the necessary tools installed on your system in order to interact with a repository using Kustomize and KSOPS.

Install Kustomize

Pre-compiled binaries of Kustomize are published on GitHub. To install the command, navigate to the current release (v4.0.5 as of this writing) and download the appropriate tarball for your system. E.g, for an x86-64 Linux environment, you would grab kustomize_v4.0.5_linux_amd64.tar.gz.

The tarball contains a single file. You need to extract this file and place it somwhere in your $PATH. For example, if you use your $HOME/bin directory, you could run:

tar -C ~/bin -xf kustomize_v4.0.5_linux_amd64.tar.gz

Or to install into /usr/local/bin:

sudo tar -C /usr/local/bin -xf kustomize_v4.0.5_linux_amd64.tar.gz

Run kustomize with no arguments to verify the command has been installed correctly.

Install sops

The KSOPS plugin relies on the sops command, so we need to install that first. Binary releases are published on GitHub, and the current release is v3.6.1.

Instead of a tarball, the project publishes the raw binary as well as packages for a couple of different Linux distributions. For consistency with the rest of this post we’re going to grab the raw binary. We can install that into $HOME/bin like this:

curl -o ~/bin/sops https://github.com/mozilla/sops/releases/download/v3.6.1/sops-v3.6.1.linux
chmod 755 ~/bin/sops

Install KSOPS

KSOPS is a Kustomize plugin. The kustomize command looks for plugins in subdirectories of $HOME/.config/kustomize/plugin. Directories are named after an API and plugin name. In the case of KSOPS, kustomize will be looking for a plugin named ksops in the $HOME/.config/kustomize/plugin/viaduct.ai/v1/ksops/ directory.

The current release of KSOPS is v2.4.0, which is published as a tarball. We’ll start by downloading ksops_2.4.0_Linux_x86_64.tar.gz, which contains the following files:

LICENSE
README.md
ksops

To extract the ksops command to $HOME/bin, you can run:

mkdir -p ~/.config/kustomize/plugin/viaduct.ai/v1/ksops/
tar -C ~/.config/kustomize/plugin/viaduct.ai/v1/ksops -xf ksops_2.4.0_Linux_x86_64.tar.gz ksops

Test it out

Let’s create a simple Kustomize project to make sure everything is installed and functioning.

Start by creating a new directory and changing into it:

mkdir kustomize-test
cd kustomize-test

Create a kustomization.yaml file that looks like this:

generators:
- secret-generator.yaml

Put the following content in secret-generator.yaml:

---
apiVersion: viaduct.ai/v1
kind: ksops
metadata:
name: secret-generator
files:
- example-secret.enc.yaml

This instructs Kustomize to use the KSOPS plugin to generate content from the file example-secret.enc.yaml.

Configure sops to use your GPG key by default by creating a .sops.yaml (note the leading dot) similar to the following (you’ll need to put your GPG key fingerprint in the right place):

creation_rules:
- encrypted_regex: "^(users|data|stringData)$"
pgp: <YOUR KEY FINGERPRINT HERE>

The encrypted_regex line tells sops which attributes in your YAML files should be encrypted. The pgp line is a (comma delimited) list of keys to which data will be encrypted.

Now, edit the file example-secret.enc.yaml using the sops command. Run:

sops example-secret.enc.yaml

This will open up an editor with some default content. Replace the content with the following:

apiVersion: v1
kind: Secret
metadata:
name: example-secret
type: Opaque
stringData:
message: this is a test

Save the file and exit your editor. Now examine the file; you will see that it contains a mix of encrypted and unencrypted content. When encrypted with my private key, it looks like this:

$ cat example-secret.enc.yaml
{
"data": "ENC[AES256_GCM,data:wZvEylsvhfU29nfFW1PbGqyk82x8+Vm/3p2Y89B8a1A26wa5iUTr1hEjDYrQIGQq4rvDyK4Bevxb/PrTzdOoTrYIhaerEWk13g9UrteLoaW0FpfGv9bqk0c12OwTrzS+5qCW2mIlfzQpMH5+7xxeruUXO7w=,iv:H4i1/Znp6WXrMmmP9YVkz+xKOX0XBH7kPFaa36DtTxs=,tag:bZhSzkM74wqayo7McV/VNQ==,type:str]",
"sops": {
"kms": null,
"gcp_kms": null,
"azure_kv": null,
"hc_vault": null,
"lastmodified": "2021-03-12T03:11:46Z",
"mac": "ENC[AES256_GCM,data:2NrsF6iLA3zHeupD314Clg/WyBA8mwCn5SHHI5P9tsOt6472Tevdamv6ARD+xqfrSVWz+Wy4PtWPoeqZrFJwnL/qCR4sdjt/CRzLmcBistUeAnlqoWIwbtMxBqaFg9GxTd7f5q0iHr9QNWGSVV3JMeZZ1jeWyeQohAPpPufsuPQ=,iv:FJvZz8SV+xsy4MC1W9z1Vn0s4Dzw9Gya4v+rSpwZLrw=,tag:pfW8r5856c7qetCNgXMyeA==,type:str]",
"pgp": [
{
"created_at": "2021-03-12T03:11:45Z",
"enc": "-----BEGIN PGP MESSAGE-----\n\nwcBMA0Jtk4Lf1qepAQgAGKwk6zDMPUYbUscky07v/7r3fsws3pTVRMgpEdhTra6x\nDxiMaLnjTKJi9fsB7sQuh/PTGWhXGuHtHg0YBtxRkuZY0Kl6xKXTXGBIBhI/Ahgw\n4BSz/rE7gbz1h6X4EFml3e1NeUTvGntA3HjY0o42YN9uwsi9wvMbiR4OLQfwY1gG\np9/v57KJx5ipEKSgt+81KwzOhuW79ttXd2Tvi9rjuAfvmLBU9q/YKMT8miuNhjet\nktNwXNJNpglHJta431YUhPZ6q41LpgvQPMX4bIZm7i7NuR470njYLQPe7xiGqqeT\nBcuF7KkNXGcDu9/RnIyxK4W5Bo9NEa06TqUGTHLEENLgAeSzHdQdUwx/pLLD6OPa\nv/U34YJU4JngqOGqTuDu4orgwLDg++XysBwVsmFp1t/nHvTkwj57wAuxJ4/It/9l\narvRHlCx6uA05IXukmCTvYMPRV3kY/81B+biHcka7uFUOQA=\n=x+7S\n-----END PGP MESSAGE-----",
"fp": "3E70A502BB5255B6BB8E86BE362D63A80853D4CF"
}
],
"encrypted_regex": "^(users|data|stringData)$",
"version": "3.6.1"
}
}

Finally, attempt to render the project with Kustomize by running:

kustomize build --enable-alpha-plugins

This should produce on stdout the unencrypted content of your secret:

apiVersion: v1
kind: Secret
metadata:
name: example-secret
type: Opaque
stringData:
message: this is a test

March 09, 2021 12:00 AM

February 27, 2021

Lars Kellogg-Stedman

Tools for writing about Git

I sometimes find myself writing articles or documentation about git, so I put together a couple of terrible hacks for generating reproducible histories and pretty graphs of those histories.

git synth

The git synth command reads a YAML description of a repository and executes the necessary commands to reproduce that history. It allows you set the name and email address of the author and committer as well as static date, so you every time you generate the repository you can identical commit ids.

git dot

The git dot command generates a representation of a repository history in the dot language, and uses Graphviz to render those into diagrams.

Putting it together

For example, the following history specification:

<!-- include examplerepo.yml -->

When applied with git synth:

$ git synth -r examplerepo examplerepo.yml

Will generate the following repository:

$ git -C examplerepo log --graph --all --decorate --oneline
* 28f7b38 (HEAD -> master) H
| * 93e1d18 (topic2) G
| * 3ef811d F
| * 973437c (topic1) E
| * 2c0bd1c D
|/
* cabdedf C
* a5cbd99 B
* d98f949 A

We can run this git dot command line:

$ git -C examplerepo dot -m -g branch --rankdir=RL

To produce the following dot description of the history:

<!-- include examplerepo.dot -->

Running that through the dot utility (dot -Tsvg -o repo.svg repo.dot) results in the following diagram:

<!-- include examplerepo.dot -->

Where are these wonders?

Both tools live in my git-snippets repository, which is a motley collection of shells scripts, python programs, and other utilities for interacting with git.

It’s all undocumented and uninstallable, but if there’s interest in either of these tools I can probably find the time to polish them up a bit.

February 27, 2021 12:00 AM

Tools for writing about Git

I sometimes find myself writing articles or documentation about git, so I put together a couple of terrible hacks for generating reproducible histories and pretty graphs of those histories.

git synth

The git synth command reads a YAML description of a repository and executes the necessary commands to reproduce that history. It allows you set the name and email address of the author and committer as well as static date, so you every time you generate the repository you can identical commit ids.

git dot

The git dot command generates a representation of a repository history in the dot language, and uses Graphviz to render those into diagrams.

Putting it together

For example, the following history specification:

- set:
date: "2021-01-01"
name: Fake Person
email: fake@example.com
- branch:
name: master
actions:
- commit:
message: A
- commit:
message: B
- commit:
message: C
- branch:
name: topic1
actions:
- commit:
message: D
- commit:
message: E
- branch:
name: topic2
actions:
- commit:
message: F
- commit:
message: G
- commit:
message: H

When applied with git synth:

$ git synth -r examplerepo examplerepo.yml

Will generate the following repository:

$ git -C examplerepo log --graph --all --decorate --oneline
* 28f7b38 (HEAD -> master) H
| * 93e1d18 (topic2) G
| * 3ef811d F
| * 973437c (topic1) E
| * 2c0bd1c D
|/
* cabdedf C
* a5cbd99 B
* d98f949 A

We can run this git dot command line:

$ git -C examplerepo dot -m -g branch --rankdir=RL

To produce the following dot description of the history:

digraph git {
graph [rankdir=RL]
node [shape=circle]
{
node [group=master_commits]
"28f7b382a5" [label=H tooltip="28f7b382a52ac53f86314e5d608ebafd66de6c44"]
cabdedff95 [label=C tooltip=cabdedff957f7dec15f365e7c29eaead9930d618]
a5cbd99954 [label=B tooltip=a5cbd999545aeabc2e102a845aeb0466f01454a2]
d98f949840 [label=A tooltip=d98f94984057d760066ba0b300ab4930497bcba6]
}
{
node [group=topic1_commits]
"973437cb00" [label=E tooltip="973437cb007d2a69d6564fd7b30f3e8c347073c2"]
"2c0bd1c1df" [label=D tooltip="2c0bd1c1dfe9f76cd18b37bb0bc995e449e0094b"]
}
{
node [group=topic2_commits]
"93e1d18862" [label=G tooltip="93e1d18862102e044a4ec46bb189f5bca9ba0e05"]
"3ef811d426" [label=F tooltip="3ef811d426c09be792a0ff6564eca82a7bd105a9"]
}
{
node [color=black fontcolor=white group=heads shape=box style=filled]
master
topic1
topic2
}
{
edge [style=dashed]
topic2 -> "93e1d18862"
topic1 -> "973437cb00"
master -> "28f7b382a5"
}
a5cbd99954 -> d98f949840
"3ef811d426" -> "973437cb00"
"973437cb00" -> "2c0bd1c1df"
cabdedff95 -> a5cbd99954
"28f7b382a5" -> cabdedff95
"2c0bd1c1df" -> cabdedff95
"93e1d18862" -> "3ef811d426"
}

Running that through the dot utility (dot -Tsvg -o repo.svg repo.dot) results in the following diagram:

Where are these wonders?

Both tools live in my git-snippets repository, which is a motley collection of shells scripts, python programs, and other utilities for interacting with git.

It’s all undocumented and uninstallable, but if there’s interest in either of these tools I can probably find the time to polish them up a bit.

February 27, 2021 12:00 AM

February 24, 2021

Lars Kellogg-Stedman

File reorganization

This is just a note that I’ve substantially changed how the post sources are organized. I’ve tried to ensure that I preserve all the existing links, but if you spot something missing please feel free to leave a comment on this post.

February 24, 2021 12:00 AM

File reorganization

This is just a note that I’ve substantially changed how the post sources are organized. I’ve tried to ensure that I preserve all the existing links, but if you spot something missing please feel free to leave a comment on this post.

February 24, 2021 12:00 AM

February 18, 2021

Lars Kellogg-Stedman

Editing a commit message without git rebase

While working on a pull request I will make liberal use of git rebase to clean up a series of commits: squashing typos, re-ordering changes for logical clarity, and so forth. But there are some times when all I want to do is change a commit message somewhere down the stack, and I was wondering if I had any options for doing that without reaching for git rebase.

It turns out the answer is “yes”, as long as you have a linear history.

Let’s assume we have a git history that looks like this:

┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ 4be811 │ ◀── │ 519636 │ ◀── │ 38f6fe │ ◀── │ 2951ec │ ◀╴╴ │ master │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘

The corresponding git log looks like:

commit 2951ec3f54205580979d63614ef2751b61102c5d
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Add detailed, high quality documentation
commit 38f6fe61ffd444f601ac01ecafcd524487c83394
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Fixed bug that would erroneously call rm -rf
commit 51963667037ceb79aff8c772a009a5fbe4b8d7d9
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
A very interesting change
commit 4be8115640821df1565c421d8ed848bad34666e5
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
The beginning of time

Mucking about with objects

We would like to modify the message on commit 519636.

We start by extracting the commit object for that commit using git cat-file:

$ git cat-file -p 519636
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent 4be8115640821df1565c421d8ed848bad34666e5
author Alice User <alice@example.com> 978325200 -0500
committer Alice User <alice@example.com> 978325200 -0500
A very interesting change

We want to produce a commit object that is identical except for an updated commit message. That sounds like a job for sed! We can strip the existing message out like this:

git cat-file -p 519636 | sed '/^$/q'

And we can append a new commit message with the power of cat:

git cat-file -p 519636 | sed '/^$/q'; cat <<EOF
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
EOF

This will give us:

tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent 4be8115640821df1565c421d8ed848bad34666e5
author Alice User <alice@example.com> 978325200 -0500
committer Alice User <alice@example.com> 978325200 -0500
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.

We need to take this modified commit and store it back into the git object database. We do that using the git hash-object command:

(git cat-file -p 519636 | sed '/^$/q'; cat <<EOF) | git hash-object -t commit --stdin -w
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
EOF

The -t commit argument instructs hash-object to create a new commit object. The --stdin argument instructs hash-object to read input from stdin, while the -w argument instructs hash-object to write a new object to the object database, rather than just calculating the hash and printing it for us.

This will print the hash of the new object on stdout. We can wrap everything in a $(...) expression to capture the output:

newref=$(
(git cat-file -p 519636 | sed '/^$/q'; cat <<EOF) | git hash-object -t commit --stdin -w
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
EOF
)

At this point we have successfully created a new commit, but it isn’t reachable from anywhere. If we were to run git log at this point, everything would look the same as when we started. We need to walk back up the tree, starting with the immediate descendant of our target commit, replacing parent pointers as we go along.

The first thing we need is a list of revisions from our target commit up to the current HEAD. We can get that with git rev-list:

$ git rev-list 519636..HEAD
2951ec3f54205580979d63614ef2751b61102c5d
38f6fe61ffd444f601ac01ecafcd524487c83394

We’ll process these in reverse order, so first we modify 38f6fe:

oldref=51963667037ceb79aff8c772a009a5fbe4b8d7d9
newref=$(git cat-file -p 38f6fe61ffd444f601ac01ecafcd524487c83394 |
sed "s/parent $oldref/parent $newref/" |
git hash-object -t commit --stdin -w)

And then repeat that for the next commit up the tree:

oldref=38f6fe61ffd444f601ac01ecafcd524487c83394
newref=$(git cat-file -p 2951ec3f54205580979d63614ef2751b61102c5d |
sed "s/parent $oldref/parent $newref/" |
git hash-object -t commit --stdin -w)

We’ve now replaced all the descendants of the modified commit…but git log would still show us the old history. The last thing we need to do is update the branch point to point at the top of the modified tree. We do that using the git update-ref command. Assuming we’re on the master branch, the command would look like this:

git update-ref refs/heads/master $newref

And at this point, running git log show us our modified commit in all its glory:

commit 365bc25ee1fe365d5d63d2248b77196d95d9573a
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Add detailed, high quality documentation
commit 09d6203a2b64c201dde12af7ef5a349e1ae790d7
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Fixed bug that would erroneously call rm -rf
commit fb01f35c38691eafbf44e9ee86824b594d036ba4
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
commit 4be8115640821df1565c421d8ed848bad34666e5
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
The beginning of time

Giving us a modified history that looks like:

┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ 4be811 │ ◀── │ fb01f3 │ ◀── │ 09d620 │ ◀── │ 365bc2 │ ◀╴╴ │ master │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘

Automating the process

Now, that was a lot of manual work. Let’s try to automate the process.

#!/bin/sh
# get the current branch name
branch=$(git rev-parse --symbolic-full-name HEAD)
# git the full commit id of our target commit (this allows us to
# specify the target as a short commit id, or as something like
# `HEAD~3` or `:/interesting`.
oldref=$(git rev-parse "$1")
# generate a replacement commit object, reading the new commit message
# from stdin.
newref=$(
(git cat-file -p $oldref | sed '/^$/q'; cat) | tee newref.txt | git hash-object -t commit --stdin -w
)
# iterate over commits between our target commit and HEAD in
# reverse order, replacing parent points with updated commit objects
for rev in $(git rev-list --reverse ${oldref}..HEAD); do
newref=$(git cat-file -p $rev |
sed "s/parent $oldref/parent $newref/" |
git hash-object -t commit --stdin -w)
oldref=$rev
done
# update the branch pointer to the head of the modified tree
git update-ref $branch $newref

If we place the above script in editmsg.sh and restore our original revision history, we can run:

sh editmsg.sh :/interesting <<EOF
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
EOF

And end up with a new history identical to the one we created manually:

commit 365bc25ee1fe365d5d63d2248b77196d95d9573a
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Add detailed, high quality documentation
commit 09d6203a2b64c201dde12af7ef5a349e1ae790d7
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Fixed bug that would erroneously call rm -rf
commit fb01f35c38691eafbf44e9ee86824b594d036ba4
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
commit 4be8115640821df1565c421d8ed848bad34666e5
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
The beginning of time

Caveats

The above script is intentionally simple. If you’re interesting in doing something like this in practice, you should be aware of the following:

  • The above process works great with a linear history, but will break things if the rewriting process crosses a merge commit.
  • We’re assuming that the given target commit is actually reachable from the current branch.
  • We’re assuming that the given target actually exists.

It’s possible to check for all of these conditions in our script, but I’m leaving that as an exercise for the reader.

February 18, 2021 12:00 AM

February 10, 2021

Lars Kellogg-Stedman

Object storage with OpenShift Container Storage

OpenShift Container Storage (OCS) from Red Hat deploys Ceph in your OpenShift cluster (or allows you to integrate with an external Ceph cluster). In addition to the file- and block- based volume services provided by Ceph, OCS includes two S3-api compatible object storage implementations.

The first option is the Ceph Object Gateway (radosgw), Ceph’s native object storage interface. The second option called the “Multicloud Object Gateway”, which is in fact a piece of software named Noobaa, a storage abstraction layer that was acquired by Red Hat in 2018. In this article I’d like to demonstrate how to take advantage of these storage options.

What is object storage?

The storage we interact with regularly on our local computers is block storage: data is stored as a collection of blocks on some sort of storage device. Additional layers – such as a filesystem driver – are responsible for assembling those blocks into something useful.

Object storage, on the other hand, manages data as objects: a single unit of data and associated metadata (such as access policies). An object is identified by some sort of unique id. Object storage generally provides an API that is largely independent of the physical storage layer; data may live on a variety of devices attached to a variety of systems, and you don’t need to know any of those details in order to access the data.

The most well known example of object storage service Amazon’s S3 service (“Simple Storage Service”), first introduced in 2006. The S3 API has become a de-facto standard for object storage implementations. The two services we’ll be discussing in this article provide S3-compatible APIs.

Creating buckets

The fundamental unit of object storage is called a “bucket”.

Creating a bucket with OCS works a bit like creating a persistent volume, although instead of starting with a PersistentVolumeClaim you instead start with an ObjectBucketClaim ("OBC"). An OBC looks something like this when using RGW:

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
 name: example-rgw
spec:
 generateBucketName: example-rgw
 storageClassName: ocs-storagecluster-ceph-rgw

Or like this when using Noobaa (note the different value for storageClassName):

apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
 name: example-noobaa
spec:
 generateBucketName: example-noobaa
 storageClassName: openshift-storage.noobaa.io

With OCS 4.5, your out-of-the-box choices for storageClassName will be ocs-storagecluster-ceph-rgw, if you choose to use Ceph Radosgw, or openshift-storage.noobaa.io, if you choose to use the Noobaa S3 endpoint.

Before we continue, I’m going to go ahead and create these resources in my OpenShift environment. To do so, I’m going to use Kustomize to deploy the resources described in the following kustomization.yml file:

namespace: oddbit-ocs-example

resources:
 - obc-noobaa.yml
 - obc-rgw.yml

Running kustomize build | oc apply -f- from the directory containing this file populates the specified namespace with the two ObjectBucketClaims mentioned above:

$ kustomize build | oc apply -f-
objectbucketclaim.objectbucket.io/example-noobaa created
objectbucketclaim.objectbucket.io/example-rgw created

Verifying that things seem healthy:

$ oc get objectbucketclaim
NAME STORAGE-CLASS PHASE AGE
example-noobaa openshift-storage.noobaa.io Bound 2m59s
example-rgw ocs-storagecluster-ceph-rgw Bound 2m59s

Each ObjectBucketClaim will result in a OpenShift creating a new ObjectBucket resource (which, like PersistentVolume resources, are not namespaced). The ObjectBucket resource will be named obc-<namespace-name>-<objectbucketclaim-name>.

$ oc get objectbucket obc-oddbit-ocs-example-example-rgw obc-oddbit-ocs-example-example-noobaa
NAME STORAGE-CLASS CLAIM-NAMESPACE CLAIM-NAME RECLAIM-POLICY PHASE AGE
obc-oddbit-ocs-example-example-rgw ocs-storagecluster-ceph-rgw oddbit-ocs-example example-rgw Delete Bound 67m
obc-oddbit-ocs-example-example-noobaa openshift-storage.noobaa.io oddbit-ocs-example example-noobaa Delete Bound 67m

Each ObjectBucket resource corresponds to a bucket in the selected object storage backend.

Because buckets exist in a flat namespace, the OCS documentation recommends always using generateName in the claim, rather than explicitly setting bucketName, in order to avoid unexpected conflicts. This means that the generated buckets will have a named prefixed by the value in generateName, followed by a random string:

$ oc get objectbucketclaim example-rgw -o jsonpath='{.spec.bucketName}'
example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661

$ oc get objectbucketclaim example-noobaa -o jsonpath='{.spec.bucketName}'
example-noobaa-2e087028-b3a4-475b-ae83-a4fa80d9e3ef

Along with the bucket itself, OpenShift will create a Secret and a ConfigMap resource – named after your OBC – with the metadata necessary to access the bucket.

The Secret contains AWS-style credentials for authenticating to the S3 API:

$ oc get secret example-rgw -o yaml | oc neat
apiVersion: v1
data:
 AWS_ACCESS_KEY_ID: ...
 AWS_SECRET_ACCESS_KEY: ...
kind: Secret
metadata:
 labels:
 bucket-provisioner: openshift-storage.ceph.rook.io-bucket
 name: example-rgw
 namespace: oddbit-ocs-example
type: Opaque

(I’m using the neat filter here to remove extraneous metadata that OpenShift returns when you request a resource.)

The ConfigMap contains a number of keys that provide you (or your code) with the information necessary to access the bucket. For the RGW bucket:

$ oc get configmap example-rgw -o yaml | oc neat
apiVersion: v1
data:
 BUCKET_HOST: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc.cluster.local
 BUCKET_NAME: example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661
 BUCKET_PORT: "80"
 BUCKET_REGION: us-east-1
kind: ConfigMap
metadata:
 labels:
 bucket-provisioner: openshift-storage.ceph.rook.io-bucket
 name: example-rgw
 namespace: oddbit-ocs-example

And for the Noobaa bucket:

$ oc get configmap example-noobaa -o yaml | oc neat
apiVersion: v1
data:
 BUCKET_HOST: s3.openshift-storage.svc
 BUCKET_NAME: example-noobaa-2e087028-b3a4-475b-ae83-a4fa80d9e3ef
 BUCKET_PORT: "443"
kind: ConfigMap
metadata:
 labels:
 app: noobaa
 bucket-provisioner: openshift-storage.noobaa.io-obc
 noobaa-domain: openshift-storage.noobaa.io
 name: example-noobaa
 namespace: oddbit-ocs-example

Note that BUCKET_HOST contains the internal S3 API endpoint. You won’t be able to reach this from outside the cluster. We’ll tackle that in just a bit.

Accessing a bucket from a pod

The easiest way to expose the credentials in a pod is to map the keys from both the ConfigMap and Secret as environment variables using the envFrom directive, like this:

apiVersion: v1
kind: Pod
metadata:
 name: bucket-example
spec:
 containers:
 - image: myimage
 env:
 - name: AWS_CA_BUNDLE
 value: /run/secrets/kubernetes.io/serviceaccount/service-ca.crt
 envFrom:
 - configMapRef:
 name: example-rgw
 - secretRef:
 name: example-rgw
 [...]

Note that we’re also setting AWS_CA_BUNDLE here, which you’ll need if the internal endpoint referenced by $BUCKET_HOST is using SSL.

Inside the pod, we can run, for example, aws commands as long as we provide an appropriate s3 endpoint. We can inspect the value of BUCKET_PORT to determine if we need http or https:

$ [ "$BUCKET_PORT" = 80 ] && schema=http || schema=https
$ aws s3 --endpoint $schema://$BUCKET_HOST ls
2021-02-10 04:30:31 example-rgw-8710aa46-a47a-4a8b-8edd-7dabb7d55469

Python’s boto3 module can also make use of the same environment variables:

>>> import boto3
>>> import os
>>> bucket_host = os.environ['BUCKET_HOST']
>>> schema = 'http' if os.environ['BUCKET_PORT'] == '80' else 'https'
>>> s3 = boto3.client('s3', endpoint_url=f'{schema}://{bucket_host}')
>>> s3.list_buckets()['Buckets']
[{'Name': 'example-noobaa-...', 'CreationDate': datetime.datetime(...)}]

External connections to S3 endpoints

External access to services in OpenShift is often managed via routes. If you look at the routes available in your openshift-storage namespace, you’ll find the following:

$ oc -n openshift-storage get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
noobaa-mgmt noobaa-mgmt-openshift-storage.apps.example.com noobaa-mgmt mgmt-https reencrypt None
s3 s3-openshift-storage.apps.example.com s3 s3-https reencrypt None

The s3 route provides external access to your Noobaa S3 endpoint. You’ll note that in the list above there is no route registered for radosgw1. There is a service registered for Radosgw named rook-ceph-rgw-ocs-storagecluster-cephobjectstore, so we can expose that service to create an external route by running something like:

oc create route edge rgw --service rook-ceph-rgw-ocs-storagecluster-cephobjectstore

This will create a route with “edge” encryption (TLS termination is handled by the default ingress router):

$ oc -n openshift storage get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
noobaa-mgmt noobaa-mgmt-openshift-storage.apps.example.com noobaa-mgmt mgmt-https reencrypt None
rgw rgw-openshift-storage.apps.example.com rook-ceph-rgw-ocs-storagecluster-cephobjectstore http edge None
s3 s3-openshift-storage.apps.example.com s3 s3-https reencrypt None

Accessing a bucket from outside the cluster

Once we know the Route to our S3 endpoint, we can use the information in the Secret and ConfigMap created for us when we provisioned the storage. We just need to replace the BUCKET_HOST with the hostname in the route, and we need to use SSL over port 443 regardless of what BUCKET_PORT tells us.

We can extract the values into variables using something like the following shell script, which takes care of getting the appropriate route from the openshift-storage namespace, base64-decoding the values in the Secret, and replacing the BUCKET_HOST value:

#!/bin/sh

bucket_host=$(oc get configmap $1 -o json | jq -r .data.BUCKET_HOST)
service_name=$(cut -f1 -d. <<<$bucket_host)
service_ns=$(cut -f2 -d. <<<$bucket_host)

# get the externally visible hostname provided by the route
public_bucket_host=$(
 oc -n $service_ns get route -o json |
 jq -r '.items[]|select(.spec.to.name=="'"$service_name"'")|.spec.host'
)

# dump configmap and secret as shell variables, replacing the
# value of BUCKET_HOST in the process.
(
 oc get configmap $1 -o json |
 jq -r '.data as $data|.data|keys[]|"\(.)=\($data[.])"'
 oc get secret $1 -o json |
 jq -r '.data as $data|.data|keys[]|"\(.)=\($data[.]|@base64d)"'
) | sed -e 's/^/export /' -e '/BUCKET_HOST/ s/=.*/='$public_bucket_host'/'

If we call the script getenv.sh and run it like this:

$ sh getenv.sh example-rgw

It will produce output like this:

export BUCKET_HOST="s3-openshift-storage.apps.cnv.massopen.cloud"
export BUCKET_NAME="example-noobaa-2e1bca2f-ff49-431a-99b8-d7d63a8168b0"
export BUCKET_PORT="443"
export BUCKET_REGION=""
export BUCKET_SUBREGION=""
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

We could accomplish something similar in Python with the following, which shows how to use the OpenShift dynamic client to interact with OpenShift:

import argparse
import base64

import kubernetes
import openshift.dynamic


def parse_args():
 p = argparse.ArgumentParser()
 p.add_argument('-n', '--namespace', required=True)
 p.add_argument('obcname')
 return p.parse_args()


args = parse_args()
k8s_client = kubernetes.config.new_client_from_config()
dyn_client = openshift.dynamic.DynamicClient(k8s_client)

v1_configmap = dyn_client.resources.get(api_version='v1', kind='ConfigMap')
v1_secret = dyn_client.resources.get(api_version='v1', kind='Secret')
v1_service = dyn_client.resources.get(api_version='v1', kind='Service')
v1_route = dyn_client.resources.get(api_version='route.openshift.io/v1', kind='Route')

configmap = v1_configmap.get(name=args.obcname, namespace=args.namespace)
secret = v1_secret.get(name=args.obcname, namespace=args.namespace)

env = dict(configmap.data)
env.update({k: base64.b64decode(v).decode() for k, v in secret.data.items()})

svc_name, svc_ns = env['BUCKET_HOST'].split('.')[:2]
routes = v1_route.get(namespace=svc_ns)
for route in routes.items:
 if route.spec.to.name == svc_name:
 break

env['BUCKET_PORT'] = 443
env['BUCKET_HOST'] = route['spec']['host']

for k, v in env.items():
 print(f'export {k}="{v}"')

If we run it like this:

python genenv.py -n oddbit-ocs-example example-noobaa

It will produce output largely identical to what we saw above with the shell script.

If we load those variables into the environment:

$ eval $(sh getenv.sh example-rgw)

We can perform the same operations we executed earlier from inside the pod:

$ aws s3 --endpoint https://$BUCKET_HOST ls
2021-02-10 14:34:12 example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661

  1. note that this may have changed in the recent OCS 4.6 release ↩︎

February 10, 2021 12:00 AM

February 08, 2021

Lars Kellogg-Stedman

Remediating poor PyPi performance with DevPi

Performance of the primary PyPi service has been so bad lately that it’s become very disruptive. Tasks that used to take a few seconds will now churn along for 15-20 minutes or longer before completing, which is incredibly frustrating.

I first went looking to see if there was a PyPi mirror infrastructure, like we see with CPAN for Perl or CTAN for Tex (and similarly for most Linux distributions). There is apparently no such beast,

I didn’t really want to set up a PyPi mirror locally, since the number of packages I actually use is small vs. the number of packages available. I figured there must be some sort of caching proxy available that would act as a shim between me and PyPi, fetching packages from PyPi and caching them if they weren’t already available locally.

I was previously aware of Artifactory, which I suspected (and confirmed) was capable of this, but while looking around I came across DevPi, which unlike Artifactory is written exclusively for managing Python packages. DevPi itself is hosted on PyPi, and the documentation made things look easy to configure.

After reading through their Quickstart: running a pypi mirror on your laptop documentation, I built a containerized service that would be easy for me to run on my desktop, laptop, work computer, etc. You can find the complete configuration at https://github.com/oddbit-dot-com/docker-devpi-server.

I started with the following Dockerfile (note I’m using podman rather than Docker as my container runtime, but the resulting image will work fine for either environment):

FROM python:3.9
RUN pip install devpi-server devpi-web
WORKDIR /root
VOLUME /root/.devpi
COPY docker-entrypoint.sh /docker-entrypoint.sh
ENTRYPOINT ["sh", "/docker-entrypoint.sh"]
CMD ["devpi-server", "--host", "0.0.0.0"]

This installs both devpi-server, which provides the basic caching for pip install, as well as devpi-web, which provides support for pip search.

To ensure that things are initialized correctly when the container start up, I’ve set the ENYTRYPOINT to the following script:

#!/bin/sh
if ! [ -f /root/.devpi/server ]; then
devpi-init
fi
exec "$@"

This will run devpi-init if the target directory hasn’t already been initialized.

The repository includes a GitHub workflow that builds a new image on each commit and pushes the result to the oddbit/devpi-server repository on Docker Hub.

Once the image was available on Docker Hub, I created the following systemd unit to run the service locally:

[Service]
Restart=on-failure
ExecStartPre=/usr/bin/rm -f %t/%n-pid
ExecStart=/usr/bin/podman run --replace \
--conmon-pidfile %t/%n-pid --cgroups=no-conmon \
--name %n -d -p 127.0.0.1:3141:3141 \
-v devpi:/root/.devpi oddbit/devpi-server
ExecStopPost=/usr/bin/rm -f %t/%n-pid
PIDFile=%t/%n-pid
Type=forking
[Install]
WantedBy=multi-user.target default.target

There are a couple items of note in this unitfile:

  • The service is exposed only on localhost using -p 127.0.0.1:3141:3141. I don’t want this service exposed on externally visible addresses since I haven’t bothered setting up any sort of authentication.

  • The service mounts a named volume for use by devpi-server via the -v devpi:/root/.devpi command line option.

This unit file gets installed into ~/.config/systemd/user/devpi.service. Running systemctl --user enable --now devpi.service both enables the service to start at boot and actually starts it up immediately.

With the service running, the last thing to do is configure pip to utilize it. The following configuration, placed in ~/.config/pip/pip.conf, does the trick:

[install]
index-url = http://localhost:3141/root/pypi/+simple/
[search]
index = http://localhost:3141/root/pypi/

Now both pip install and pip search hit the local cache instead of the upstream PyPi server, and things are generally much, much faster.

For Poetry Users

Poetry respects the pip configuration and will Just Work.

For Pipenv Users

Pipenv does not respect the pip configuration [1, 2], so you will need to set the PIPENV_PYPI_MIRROR environment variable. E.g:

export PIPENV_PYPI_MIRROR=http://localhost:3141/root/pypi/+simple/

February 08, 2021 12:00 AM

February 06, 2021

Lars Kellogg-Stedman

symtool: a tool for interacting with your SYM-1

The SYM-1 is a 6502-based single-board computer produced by Synertek Systems Corp in the mid 1970’s. I’ve had one floating around in a box for many, many years, and after a recent foray into the world of 6502 assembly language programming I decided to pull it out, dust it off, and see if it still works.

The board I have has a whopping 8KB of memory, and in addition to the standard SUPERMON monitor it has the expansion ROMs for the Synertek BASIC interpreter (yet another Microsoft BASIC) and RAE (the “Resident Assembler Editor”). One interacts with the board either through the onboard hex keypad and six-digit display, or via a serial connection at 4800bps (or lower).

[If you’re interested in Microsoft BASIC, the mist64/msbasic repository on GitHub is a trove of information, containing the source for multiple versions of Microsoft BASIC including the Synertek version.]

Fiddling around with the BASIC interpreter and the onboard assembler was fun, but I wanted to use a real editor for writing source files, assemble them on my Linux system, and then transfer the compiled binary to the SYM-1. The first two tasks are easy; there are lots of editors and there are a variety of 6502 assemblers that will run under Linux. I’m partial to ca65, part of the cc65 project (which is an incredible project that implements a C compiler that cross-compiles C for 6502 processors). But what’s the best way to get compiled code over to the SYM-1?

Symtool

That’s where symtool comes in. Symtool runs on your host and talks to the SUPERMON monitor on the SYM-1 over a serial connection. It allows you to view registers, dump and load memory, fill memory, and execute code.

Configuration

Symtool needs to know to what serial device your SYM-1 is attached. You can specify this using the -d <device> command line option, but this quickly gets old. To save typing, you can instead set the SYMTOOL_DEVICE environment variable:

$ export SYMTOOL_DEVICE=/dev/ttyUSB0
$ symtool load ...
$ symtool dump ...

The baud rate defaults to 4800bps. If for some reason you want to use a slower speed (maybe you’d like to relive the good old days of 300bps modems), you can use the -s command line option or the SYMTOOL_SPEED environment variable.

Loading code into memory

After compiling your code (I’ve included the examples from the SYM-1 Technical Notes in the repository), use the load command to load the code into the memory of the SYM-1:

$ make -C asm
[...]
$ symtool -v load 0x200 asm/countdown.bin
INFO:symtool.symtool:using port /dev/ttyUSB0, speed 4800
INFO:symtool.symtool:connecting to sym1...
INFO:symtool.symtool:connected
INFO:symtool.symtool:loading 214 bytes of data at $200

(Note the -v on the command line there; without that, symtool won’t produce any output unless there’s an error.)

[A note on compiling code: the build logic in the asm/ directory is configured to load code at address 0x200. If you want to load code at a different address, you will need to add the appropriate --start-addr option to LD65FLAGS when building, or modify the linker configuration in sym1.cfg.]

Examining memory

The above command loads the code into memory but doesn’t execute it. We can use the dump command to examine memory. By default, dump produces binary output. We can use that to extract code from the SYM-1 ROM or to verify that the code we just loaded was transferred correctly:

$ symtool dump 0x200 $(wc -c < asm/countdown.bin) -o check.bin
$ sha1sum check.bin asm/countdown.bin
5851c40bed8cc8b2a132163234b68a7fc0e434c0 check.bin
5851c40bed8cc8b2a132163234b68a7fc0e434c0 asm/countdown.bin

We can also produce a hexdump:

$ symtool dump 0x200 $(wc -c < asm/countdown.bin) -h
00000000: 20 86 8B A9 20 85 03 A9 55 8D 7E A6 A9 02 8D 7F ... ...U.~.....
00000010: A6 A9 40 8D 0B AC A9 4E 8D 06 AC A9 C0 8D 0E AC ..@....N........
00000020: A9 00 85 02 A9 20 8D 05 AC 18 58 A9 00 8D 40 A6 ..... ....X...@.
00000030: 8D 41 A6 8D 44 A6 8D 45 A6 A5 04 29 0F 20 73 02 .A..D..E...). s.
00000040: 8D 43 A6 A5 04 4A 4A 4A 4A 20 73 02 8D 42 A6 20 .C...JJJJ s..B.
00000050: 06 89 4C 2B 02 48 8A 48 98 48 AD 0D AC 8D 0D AC ..L+.H.H.H......
00000060: E6 02 A5 02 C9 05 F0 02 50 66 A9 00 85 02 20 78 ........Pf.... x
00000070: 02 50 5D AA BD 29 8C 60 18 A5 04 69 01 18 B8 85 .P]..).`...i....
00000080: 04 C9 FF F0 01 60 A9 7C 8D 41 A6 A9 79 8D 42 A6 .....`.|.A..y.B.
00000090: 8D 43 A6 A9 73 8D 44 A6 A9 00 85 04 20 72 89 20 .C..s.D..... r.
000000A0: 06 89 20 06 89 20 06 89 20 06 89 20 06 89 20 06 .. .. .. .. .. .
000000B0: 89 C6 03 20 06 89 20 06 89 20 06 89 20 06 89 20 ... .. .. .. ..
000000C0: 06 89 20 06 89 A5 03 C9 00 D0 D1 A9 20 85 03 60 .. ......... ..`
000000D0: 68 A8 68 AA 68 40 h.h.h@

Or a disassembly:

$ symtool dump 0x200 $(wc -c < asm/countdown.bin) -d
$0200 20 86 8b JSR $8B86
$0203 a9 20 LDA #$20
$0205 85 03 STA $03
$0207 a9 55 LDA #$55
$0209 8d 7e a6 STA $A67E
$020c a9 02 LDA #$02
$020e 8d 7f a6 STA $A67F
$0211 a9 40 LDA #$40
$0213 8d 0b ac STA $AC0B
$0216 a9 4e LDA #$4E
$0218 8d 06 ac STA $AC06
$021b a9 c0 LDA #$C0
$021d 8d 0e ac STA $AC0E
$0220 a9 00 LDA #$00
$0222 85 02 STA $02
$0224 a9 20 LDA #$20
$0226 8d 05 ac STA $AC05
$0229 18 CLC
$022a 58 CLI
$022b a9 00 LDA #$00
$022d 8d 40 a6 STA $A640
$0230 8d 41 a6 STA $A641
$0233 8d 44 a6 STA $A644
$0236 8d 45 a6 STA $A645
$0239 a5 04 LDA $04
$023b 29 0f AND #$0F
$023d 20 73 02 JSR $0273
$0240 8d 43 a6 STA $A643
$0243 a5 04 LDA $04
$0245 4a LSR
$0246 4a LSR
$0247 4a LSR
$0248 4a LSR
$0249 20 73 02 JSR $0273
$024c 8d 42 a6 STA $A642
$024f 20 06 89 JSR $8906
$0252 4c 2b 02 JMP $022B
$0255 48 PHA
$0256 8a TXA
$0257 48 PHA
$0258 98 TYA
$0259 48 PHA
$025a ad 0d ac LDA $AC0D
$025d 8d 0d ac STA $AC0D
$0260 e6 02 INC $02
$0262 a5 02 LDA $02
$0264 c9 05 CMP #$05
$0266 f0 02 BEQ $02
$0268 50 66 BVC $66
$026a a9 00 LDA #$00
$026c 85 02 STA $02
$026e 20 78 02 JSR $0278
$0271 50 5d BVC $5D
$0273 aa TAX
$0274 bd 29 8c LDA $8C29,X
$0277 60 RTS
$0278 18 CLC
$0279 a5 04 LDA $04
$027b 69 01 ADC #$01
$027d 18 CLC
$027e b8 CLV
$027f 85 04 STA $04
$0281 c9 ff CMP #$FF
$0283 f0 01 BEQ $01
$0285 60 RTS
$0286 a9 7c LDA #$7C
$0288 8d 41 a6 STA $A641
$028b a9 79 LDA #$79
$028d 8d 42 a6 STA $A642
$0290 8d 43 a6 STA $A643
$0293 a9 73 LDA #$73
$0295 8d 44 a6 STA $A644
$0298 a9 00 LDA #$00
$029a 85 04 STA $04
$029c 20 72 89 JSR $8972
$029f 20 06 89 JSR $8906
$02a2 20 06 89 JSR $8906
$02a5 20 06 89 JSR $8906
$02a8 20 06 89 JSR $8906
$02ab 20 06 89 JSR $8906
$02ae 20 06 89 JSR $8906
$02b1 c6 03 DEC $03
$02b3 20 06 89 JSR $8906
$02b6 20 06 89 JSR $8906
$02b9 20 06 89 JSR $8906
$02bc 20 06 89 JSR $8906
$02bf 20 06 89 JSR $8906
$02c2 20 06 89 JSR $8906
$02c5 a5 03 LDA $03
$02c7 c9 00 CMP #$00
$02c9 d0 d1 bNE $D1
$02cb a9 20 LDA #$20
$02cd 85 03 STA $03
$02cf 60 RTS
$02d0 68 PLA
$02d1 a8 TAY
$02d2 68 PLA
$02d3 aa TAX
$02d4 68 PLA
$02d5 40 RTI

Executing code

There are two ways to run your code using symtool. If you provide the -g option to the load command, symtool will execute your code as soon as the load has finished:

$ symtool load -g 0x200 asm/countdown.bin

Alternatively, you can use the go command to run code that has already been loaded onto the SYM-1:

$ symtool go 0x200

Examining registers

The registers command allows you to examine the contents of the 6502 registers:

$ symtool registers
s ff (11111111)
f b1 (10110001) +carry -zero -intr -dec -oflow +neg
a 80 (10000000)
x 00 (00000000)
y 50 (01010000)
p b0ac (1011000010101100)

Filling memory

If you want to clear a block of memory, you can use the fill command. For example, to wipe out the code we loaded in the earlier example:

$ symtool fill 0x200 0 $(wc -c < asm/countdown.bin)
$ symtool dump -h 0x200 $(wc -c < asm/countdown.bin)
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[...]

Notes on the code

The symtool repository includes both unit and functional tests. The functional tests require an actual SYM-1 to be attached to your system (with the device name in the SYMTOOL_DEVICE environment variable). The unit tests will run anywhere.

Wrapping up

No lie, this is a pretty niche project. I’m not sure how many people out there own a SYM-1 these days, but this has been fun to work with and if maybe one other person finds it useful, I would consider that a success :).

See Also

February 06, 2021 12:00 AM

January 14, 2021

RDO Blog

RDO plans to move to CentOS Stream

What changed with CentOS?
CentOS announced recently that they will focus on CentOS Stream and CentOS Linux 8 will be EOL at the end of 2021.

While CentOS Linux 8 (C8) is a pure rebuild of Red Hat Enterprise Linux (RHEL), CentOS Stream 8 (C8S) tracks just ahead of the current RHEL release. This means that we will have a continuous flow of new packages available before they are included in the next RHEL minor release.

What’s the current situation in RDO?
RDO has been using the latest CentOS Linux 8 to build both the OpenStack packages and the required dependencies since the Train release for both for the official CloudSIG repos and the RDO Trunk (aka DLRN) repos.

In the last
few months, we have been running periodic CI jobs to validate RDO Trunk repos built on CentOS Linux 8 along with CentOS Stream 8 to find any potential issues created by OS package updates before they are shipped in CentOS Linux 8. As expected, during these tests we have not found any issue related to the buildroot environment, packages can be used for both C8 and C8S. We did find a few issues related to package updates which allowed us to propose the required fixes upstream.

What’s our plan for RDO roadmap?
  • RDO Wallaby (ETA is end of April 2021) will be built, tested and released only on CentOS 8 Stream.
  • RDO CloudSIG repos for Victoria and Ussuri will be updated and tested for both CentOS Stream and CentOS Linux 8 until end of 2021 and then continue on CentOS Stream.
  • We will create and test new RDO CloudSIG repos for Victoria and Ussuri on CentOS Stream 8.
  • The RDO Trunk repositories (aka DLRN repos) will be built and tested using CentOS 8 Stream buildroot for all releases currently using CentOS Linux 8 (since Train on)

How do we plan to implement these changes?
Some implementation details that may be of interest:
  • We will keep building packages just once. We will move buildroots for both DLRN and CloudSIG to use CentOS Stream 8 in the near future.
  • For Ussuri and Victoria CloudSIG repos, while we are supporting both C8 and C8S, we will be utilizing separated CBS Tags. This will allow us to have separated repositories, promotion jobs and package versions for each OS.
  • In order to reduce the impact of potential issues and discover issues related to C8S as soon as possible, we will put more focus on periodic jobs on C8S.
  • At a later stage, we will move the CI jobs used to gate changes in distgits to use C8S instead of C8 for all RDO releases where we use CentOS 8.
  • The CentOS/RHEL team has made public their interest in applying Continuous Delivery approach to CentOS Stream to provide a stable CentOS Stream using gating integration jobs. Our intent is to collaborate with the CentOS team on any initiatives that will help to validate RDO as early in the delivery pipeline as possible and reduce the impact on potential issues.

What’s next?
We plan to start the activities needed to carry out this plan in the next weeks.

We will continue discussing and sharing the progress
during the RDO weekly meetings, feel free to join us if you are interested.

Also, If you have any question or suggestion related to these changes, don’t hesitate to contact us
in the #rdo freenode channel or using the RDO mailing lists.

by amoralej at January 14, 2021 03:17 PM

January 04, 2021

Matthias Runge

Kubernetes on Raspberry Pi 4 on Fedora

Recently, I bought a couple of Raspberry Pi 4, one with 4 GB and 2 equipped with 8 GB of RAM. When I bought the first one, there was no option to get bigger memory. However, I saw this as a game and thought to give this a try. I also bought SSDs for these and USB3 to SATA adapters. Before purchasing anything, you may want to take a look at James Archers page. Unfortunately, there are a couple on adapters on the marked, which don't work that well.

Deploying Fedora 33 Server

Initially, I followed the description to deploy Fedora 32; it works the same way for Fedora 33 Server (in my case here).

Because ceph requires a partition (or better: a whole disk), I used the traditional setup using partitions and no LVM.

Deploying Kubernetes

git clone https://github.com/kubernetes-sigs/kubespray
cd kubespray

I followed the documentation and created an inventory. For the container runtime, I picked crio, and as calico as network plugin.

Because of an issue, I had to patch roles/download/defaults/main.yml:

diff --git a/roles/download/defaults/main.yml b/roles/download/defaults/main.yml
index a97be5a6..d4abb341 100644
--- a/roles/download/defaults/main.yml
+++ b/roles/download/defaults/main.yml
@@ -64,7 +64,7 @@ quay_image_repo: "quay.io"

 # TODO(mattymo): Move calico versions to roles/network_plugins/calico/defaults
 # after migration to container download
-calico_version: "v3.16.5"
+calico_version: "v3.15.2"
 calico_ctl_version: "{{ calico_version }}"
 calico_cni_version: "{{ calico_version }}"
 calico_policy_version: "{{ calico_version }}"
@@ -520,13 +520,13 @@ etcd_image_tag: "{{ etcd_version }}{%- if image_arch != 'amd64' -%}-{{ image_arc
 flannel_image_repo: "{{ quay_image_repo }}/coreos/flannel"
 flannel_image_tag: "{{ flannel_version }}"
 calico_node_image_repo: "{{ quay_image_repo }}/calico/node"
-calico_node_image_tag: "{{ calico_version }}"
+calico_node_image_tag: "{{ calico_version }}-arm64"
 calico_cni_image_repo: "{{ quay_image_repo }}/calico/cni"
-calico_cni_image_tag: "{{ calico_cni_version }}"
+calico_cni_image_tag: "{{ calico_cni_version }}-arm64"
 calico_policy_image_repo: "{{ quay_image_repo }}/calico/kube-controllers"
-calico_policy_image_tag: "{{ calico_policy_version }}"
+calico_policy_image_tag: "{{ calico_policy_version }}-arm64"
 calico_typha_image_repo: "{{ quay_image_repo }}/calico/typha"
-calico_typha_image_tag: "{{ calico_typha_version }}"
+calico_typha_image_tag: "{{ calico_typha_version }}-arm64"
 pod_infra_image_repo: "{{ kube_image_repo }}/pause"
 pod_infra_image_tag: "{{ pod_infra_version }}"
 install_socat_image_repo: "{{ docker_image_repo }}/xueshanf/install-socat"

Deploy Ceph

Ceph requires a raw partition. Make sure, you have an empty partition available.

[root@node1 ~]# lsblk -f
NAME FSTYPE FSVER LABEL UUID                                   FSAVAIL FSUSE% MOUNTPOINT
sda
├─sda1
│    vfat   FAT32 UEFI  7DC7-A592
├─sda2
│    vfat   FAT32       CB75-24A9                               567.9M     1% /boot/efi
├─sda3
│    xfs                cab851cb-1910-453b-ae98-f6a2abc7f0e0    804.7M    23% /boot
├─sda4
│
├─sda5
│    xfs                6618a668-f165-48cc-9441-98f4e2cc0340     27.6G    45% /
└─sda6

In my case, there are sda4 and sda6 not formatted. sda4 is very small and will be ignored, sda6 will be used.

Using rook is pretty straightforward

git clone --single-branch --branch v1.5.4 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
kubectl create -f cluster.yaml

by mrunge at January 04, 2021 10:00 AM

December 18, 2020

Lars Kellogg-Stedman

To sleep or not to sleep?

Let’s say you have a couple of sensors attached to an ESP8266 running MicroPython. You’d like to sample them at different frequencies (say, one every 60 seconds and one every five minutes), and you’d like to do it as efficiently as possible in terms of power consumption. What are your options?

If we don’t care about power efficiency, the simplest solution is probably a loop like this:

import machine
lastrun_1 = 0
lastrun_2 = 0
while True:
now = time.time()
if (lastrun_1 == 0) or (now - lastrun_1 >= 60):
read_sensor_1()
lastrun_1 = now
if (lastrun_2 == 0) or (now - lastrun_2 >= 300):
read_sensor_2()
lastrun_2 = now
machine.idle()

If we were only reading a single sensor (or multiple sensors at the same interval), we could drop the loop and juse use the ESP8266’s deep sleep mode (assuming we have wired things properly):

import machine
def deepsleep(duration):
rtc = machine.RTC()
rtc.irq(trigger=rtc.ALARM0, wake=machine.DEEPSLEEP)
rtc.alarm(rtc.ALARM0, duration)
read_sensor_1()
deepsleep(60000)

This will wake up, read the sensor, then sleep for 60 seconds, at which point the device will reboot and repeat the process.

If we want both use deep sleep and run tasks at different intervals, we can effectively combine the above two methods. This requires a little help from the RTC, which in addition to keeping time also provides us with a small amount of memory (492 bytes when using MicroPython) that will persist across a deepsleep/reset cycle.

The machine.RTC class includes a memory method that provides access to the RTC memory. We can read the memory like this:

import machine
rtc = machine.RTC()
bytes = rtc.memory()

Note that rtc.memory() will always return a byte string.

We write to it like this:

rtc.memory('somevalue')

Lastly, note that the time maintained by the RTC also persists across a deepsleep/reset cycle, so that if we call time.time() and then deepsleep for 10 seconds, when the module boots back up time.time() will show that 10 seconds have elapsed.

We’re going to implement a solution similar to the loop presented at the beginning of this article in that we will store the time at which at task was last run. Because we need to maintain two different values, and because the RTC memory operates on bytes, we need a way to serialize and deserialize a pair of integers. We could use functions like this:

import json
def store_time(t1, t2):
rtc.memory(json.dumps([t1, t2]))
def load_time():
data = rtc.memory()
if not data:
return [0, 0]
try:
return json.loads(data)
except ValueError:
return [0, 0]

The load_time method returns [0, 0] if either (a) the RTC memory was unset or (b) we were unable to decode the value stored in memory (which might happen if you had previously stored something else there).

You don’t have to use json for serializing the data we’re storing in the RTC; you could just as easily use the struct module:

import struct
def store_time(t1, t2):
rtc.memory(struct.pack('ll', t1, t2))
def load_time():
data = rtc.memory()
if not data:
return [0, 0]
try:
return struct.unpack('ll', data)
except ValueError:
return [0, 0]

Once we’re able to store and retrieve data from the RTC, the main part of our code ends up looking something like this:

lastrun_1, lastrun_2 = load_time()
now = time.time()
something_happened = False
if lastrun_1 == 0 or (now - lastrun_1 > 60):
read_sensor_1()
lastrun_1 = now
something_happened = True
if lastrun_2 == 0 or (now - lastrun_2 > 300):
read_sensor_2()
lastrun_2 = now
something_happened = True
if something_happened:
store_time(lastrun_1, lastrun_2)
deepsleep(60000)

This code will wake up every 60 seconds. That means it will always run the read_sensor_1 task, and it will run the read_sensor_2 task every five minutes. In between, the ESP8266 will be in deep sleep mode, consuming around 20µA. In order to avoid too many unnecessary writes to RTC memory, we only store values when lastrun_1 or lastrun_2 has changed.

While developing your code, it can be inconvenient to have the device enter deep sleep mode (because you can’t just ^C to return to the REPL). You can make the deep sleep behavior optional by wrapping everything in a loop, and optionally calling deepsleep at the end of the loop, like this:

lastrun_1, lastrun_2 = load_time()
while True:
now = time.time()
something_happened = False
if lastrun_1 == 0 or (now - lastrun_1 > 60):
read_sensor_1()
lastrun_1 = now
something_happened = True
if lastrun_2 == 0 or (now - lastrun_2 > 300):
read_sensor_2()
lastrun_2 = now
something_happened = True
if something_happened:
store_time(lastrun_1, lastrun_2)
if use_deep_sleep:
deepsleep(60000)
else:
machine.idle()

If the variable use_deepsleep is True, this code will perform as described in the previous section, waking once every 60 seconds. If use_deepsleep is False, this will use a busy loop.

December 18, 2020 12:00 AM

December 16, 2020

Adam Young

Moving things around in OpenStack

While reviewing  the comments on the Ironic spec, for Secure RBAC. I had to ask myself if the “project” construct makes sense for Ironic.  I still think it does, but I’ll write this down to see if I can clarify it for me, and maybe for you, too.

Baremetal servers change.  The whole point of Ironic is to control the change of Baremetal servers from inanimate pieces of metal to “really useful engines.”  This needs to happen in a controlled and unsurprising way.

Ironic the server does what it is told. If a new piece of metal starts sending out DHCP requests, Ironic is going to PXE boot it.  This is the start of this new piece of metals journey of self discovery.  At least as far as Ironic is concerned.

But really, someone had to rack and wire said piece of metal.  Likely the person that did this is not the person that is going to run workloads on it in the end.  They might not even work for the same company;  they might be a delivery person from Dell or Supermicro.  So, once they are done with it, they don’t own it any more.

Who does?  Who owns a piece of metal before it is enrolled in the OpenStack baremetal service?

No one.  It does not exist.

Ok, so lets go back to someone pushing the button, booting our server for the first time, and it doing its PXE boot thing.

Or, we get the MAC address and enter that into the ironic database, so that when it does boot, we know about it.

Either way, Ironic is really the playground monitor, just making sure it plays nice.

What if Ironic is a multi-tenant system?  Someone needs to be able to transfer the baremetal server from where ever it lands up front to the people that need to use it.

I suspect that ransferring metal from project to project is going to be one of the main use cases after the sun has set on day one.

So, who should be allowed to say what project a piece of baremetal can go to?

Well, in Keystone, we have the idea of hierarchy.  A Project is owned by a domain, and a project can be nested inside another project.

But this information is not passed down to Ironic.  There is no way to get a token for a project that shows its parent information.  But a remote service could query the project hierarchy from Keystone. 

https://docs.openstack.org/api-ref/identity/v3/?expanded=show-project-details-detail#show-project-details

Say I want to transfer a piece of metal from one project to another.  Should I have a token for the source project or the remote project.  Ok, dump question, I should definitely have a token for the source project.  The smart question is whether I should also have a token for the destination project.

Sure, why not.  Two tokens. One has the “delete” role and one that has the “create” role.

The only problem is that nothing like this exists in Open Stack.  But it should.

We could fake it with hierarchy;  I can pass things up and down the project tree.  But that really does not one bit of good.  People don’t really use the tree like that.  They should.  We built a perfectly nice tree and they ignore it.  Poor, ignored, sad, lonely tree.

Actually, it has no feelings.  Please stop anthropomorphising the tree.

What you could do is create the destination object, kind of a potential piece-of-metal or metal-receiver.  This receiver object gets  a UUID.  You pass this UUID to the “move” API. But you call the MOVE api with a token for the source project.   The move is done atomically. Lets call this thing identified by a UUID a move-request. 

The order of operations could be done in reverse.  The operator could create the move request on the source, and then pass that to the receiver.  This might actually make mores sense, as you need to know about the object before you can even think to move it.

Both workflows seem to have merit.

And…this concept seems to be something that OpenStack needs in general. 

Infact, why should the API not be a generic API. I mean, it would have to be per service, but the same API could be used to transfer VMs between projects in Nova nad between Volumes in Cinder. The API would have two verbs one for creating a new move request, and one for accepting it.

POST /thingy/v3.14/resource?resource_id=abcd&destination=project_id

If this is called with a token, it needs to be scoped. If it is scoped to the project_id in the API, it creates a receiving type request. If it is scoped to the project_id that owns the resource, it is a sending type request. Either way, it returns an URL. Call GET on that URL and you get information about the transfer. Call PATCH on it with the appropriately scoped token, and the resource is transferred. And maybe enough information to prove that you know what you are doing: maybe you have to specify the source and target projects in that patch request.

A foolish consistency is the hobgoblin of little minds.

Edit: OK, this is not a new idea. Cinder went through the same thought process according to Duncan Thomas. The result is this API: https://docs.openstack.org/api-ref/block-storage/v3/index.html#volume-transfer

Which looks like it then morphed to this one:

https://docs.openstack.org/api-ref/block-storage/v3/index.html#volume-transfers-volume-transfers-3-55-or-later


by Adam Young at December 16, 2020 12:49 AM

December 15, 2020

John Likes OpenStack

Running tripleo-ansible molecule locally for dummies

I've had to re-teach myself how to do this so I'm writing my own notes.

Prerequisites:

  1. Get a working undercloud (perhaps from tripleo-lab)
  2. git clone https://git.openstack.org/openstack/tripleo-ansible.git ; cd tripleo-ansible
  3. Determine the test name: ls roles

Once you have your environment ready run a test with the name from step 3.

./scripts/run-local-test tripleo_derived_parameters
Some tests in CI are configured to use `--skip-tags`. You can do this for your local tests too by setting the appropriate environment variables. For example:
 export TRIPLEO_JOB_ANSIBLE_ARGS="--skip-tags run_ceph_ansible,run_uuid_ansible,ceph_client_rsync,clean_fetch_dir"
 ./scripts/run-local-test tripleo_ceph_run_ansible

This last tip should get added to the docs.

by Unknown (noreply@blogger.com) at December 15, 2020 03:46 PM

December 13, 2020

Lars Kellogg-Stedman

Animating a map of Covid in the Northeast US

<p>I recently put together a short animation showing the spread of Covid throughout the Northeast United States:</p> <div style="position: relative; padding-bottom: 56.25%; height: 0; overflow: hidden;"> <iframe src="https://www.youtube.com/embed/zGN_zEzd_TE" style="position: absolute; top: 0; left: 0; width: 100%; height: 100%; border:0;" allowfullscreen title="YouTube Video"></iframe> </div> <p>I thought it might be interesting to walk through the process I used to create the video. The steps described in this article aren&rsquo;t exactly what I used (I was dealing with data in a <a href="https://postgis.net/">PostGIS</a> database, and in the interests of simplicity I wanted instructions that can be accomplished with just QGIS), but they end up in the same place.</p> <h2 id="data-sources">Data sources</h2> <p>Before creating the map, I had to find appropriate sources of data. I needed three key pieces of information:</p> <ol> <li>State and county outlines</li> <li>Information about population by county</li> <li>Information about Covid cases over time by county</li> </ol> <h3 id="us-census-data">US Census Data</h3> <p>I was able to obtain much of the data from the US Census website, <a href="https://data.census.gov">https://data.census.gov</a>. Here I was able to find both tabular demographic data (population information) and geographic data (state and county cartographic borders):</p> <ul> <li> <p><a href="https://www2.census.gov/programs-surveys/popest/datasets/2010-2019/counties/totals/co-est2019-alldata.csv">Population estimates</a></p> <p>This dataset contains population estimates by county from 2010 through 2019. This comes from the US Census &ldquo;<a href="https://www.census.gov/programs-surveys/popest.html">Population Estimates Program</a>&rdquo; (PEP).</p> </li> <li> <p><a href="https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_county_5m.zip">County outlines</a></p> <p>This dataset contains US county outlines provided by the US Census.</p> </li> <li> <p><a href="https://www2.census.gov/geo/tiger/GENZ2018/shp/cb_2018_us_state_5m.zip">State outlines</a></p> <p>This dataset contains US state outlines provided by the US Census.</p> </li> </ul> <p>The tabular data is provided in <a href="https://en.wikipedia.org/wiki/Comma-separated_values">CSV</a> (comma-separated value) format, which is a simple text-only format that can be read by a variety of software (including spreadsheet software such as Excel or Google Sheets).</p> <p>The geographic data is available as both a <a href="https://en.wikipedia.org/wiki/Shapefile">shapefile</a> and as a <a href="https://en.wikipedia.org/wiki/Keyhole_Markup_Language">KML</a> file. A <em>shapefile</em> is a relatively standard format for exchanging geographic data. You generally need some sort of <a href="https://en.wikipedia.org/wiki/Geographic_information_system">GIS software</a> in order to open and manipulate a shapefile (a topic that I will cover later on in this article). KML is another format for sharing geographic data that was developed by Google as part of Google Earth.</p> <h3 id="new-york-times-covid-data">New York Times Covid Data</h3> <p>The New York Times maintains a <a href="https://github.com/nytimes/covid-19-data">Covid dataset</a> (because our government is both unable and unwilling to perform this basic public service) in CSV format that tracks Covid cases and deaths in the United States, broken down both by state and by county.</p> <h2 id="software">Software</h2> <p>In order to build something like this map you need a Geographic Information System (GIS) software package. The 800 pound gorilla of GIS software is <a href="https://www.esri.com/en-us/arcgis/about-arcgis/overview">ArcGIS</a>, a capable but pricey commercial package that may cost more than the casual GIS user is willing to pay. Fortunately, there are some free alternatives available.</p> <p>Google&rsquo;s <a href="https://www.google.com/earth/versions/#earth-pro">Google Earth Pro</a> has a different focus from most other GIS software (it is designed more for exploration/educational use than actual GIS work), but it is able to open and display a variety of GIS data formats, including the shapefiles used in this project.</p> <p><a href="https://qgis.org/en/site/">QGIS</a> is a highly capable <a href="https://www.redhat.com/en/topics/open-source/what-is-open-source">open source</a> GIS package, available for free for a variety of platforms including MacOS, Windows, and Linux. This is the software that I used to create the animated map, and the software we&rsquo;ll be working with in the rest of this article.</p> <h2 id="preparing-the-data">Preparing the data</h2> <h3 id="geographic-filtering">Geographic filtering</h3> <p>I was initially planning on creating a map for the entire United States, but I immediately ran into a problem: with over 3,200 counties in the US and upwards of 320 data points per county in the Covid dataset, that was going to result in over 1,000,000 geographic features. On my computer, QGIS wasn&rsquo;t able to handle a dataset of that size. So the first step is limiting the data we&rsquo;re manipulating to something smaller; I chose New York and New England.</p> <p>We start by adding the <code>cb_2018_us_state_5m</code> map to QGIS. This gives us all 50 states (and a few territories):</p> <figure class="left" > <img src="states-unfiltered.png" /> </figure> <p>To limit this to our target geography, we can select &ldquo;Filter&hellip;&rdquo; from the layer context menu and apply the following filter:</p> <pre tabindex="0"><code>&#34;NAME&#34; in ( &#39;New York&#39;, &#39;Massachusetts&#39;, &#39;Rhode Island&#39;, &#39;Connecticut&#39;, &#39;New Hampshire&#39;, &#39;Vermont&#39;, &#39;Maine&#39; ) </code></pre><p>This gives us:</p> <figure class="left" > <img src="states-filtered.png" /> </figure> <p>Next, we need to load in the county outlines that cover the same geographic area. We start by adding the <code>cb_2018_us_county_5m</code> dataset to QGIS, which gets us:</p> <figure class="left" > <img src="counties-unfiltered.png" /> </figure> <p>There are several ways we could limit the counties to just those in our target geography. One method is to use the &ldquo;Clip&hellip;&rdquo; feature in the &ldquo;Vector-&gt;Geoprocessing Tools&rdquo; menu. This allows to &ldquo;clip&rdquo; one vector layer (such as our county outlines) using another layer (our filtered state layer).</p> <p>We select &ldquo;Vector-&gt;Geoprocessing Tools-&gt;Clip&hellip;&rdquo;, and then fill in in the resulting dialog as follows:</p> <ul> <li>For &ldquo;Input layer&rdquo;, select <code>cb_2018_us_county_5m</code>.</li> <li>For &ldquo;Overlay layer&rdquo;, select <code>cb_2018_us_state_5m</code>.</li> </ul> <p>Now select the &ldquo;Run&rdquo; button. You should end up with a new layer named <code>Clipped</code>. Hide the original <code>cb_2018_us_county_5m</code> layer, and rename <code>Clipped</code> to <code>cb_2018_us_county_5m_clipped</code>. This gives us:</p> <figure class="left" > <img src="counties-clipped.png" /> </figure> <p>Instead of using the &ldquo;Clip&hellip;&rdquo; algorithm, we could have created a <a href="https://docs.qgis.org/3.16/en/docs/user_manual/managing_data_source/create_layers.html#creating-virtual-layers">virtual layer</a> and performed a <a href="http://wiki.gis.com/wiki/index.php/Spatial_Join#:~:text=A%20Spatial%20join%20is%20a,spatially%20to%20other%20feature%20layers.">spatial join</a> between the state and county layers; unfortunately, due to issue <a href="https://github.com/qgis/QGIS/issues/40503">#40503</a>, it&rsquo;s not possible to use virtual layers with this dataset (or really any dataset, if you have numeric data you care about).</p> <h3 id="merging-population-data-with-our-geographic-data">Merging population data with our geographic data</h3> <p>Add the population estimates to our project. Select &ldquo;Layer-&gt;Add Layer-&gt;Add Delimited Text Layer&hellip;&rdquo;, find the <code>co-est2019-alldata.csv</code> dataset and add it to the project. This layer doesn&rsquo;t have any geographic data of its own; we need to associate it with one of our other layers in order to make use of it. We can this by using a <a href="https://www.qgistutorials.com/en/docs/3/performing_table_joins.html">table join</a>.</p> <p>In order to perform a table join, we need a single field in each layer that corresponds to a field value in the other layer. The counties dataset has a <code>GEOID</code> field that combines the state and county <a href="https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt">FIPS codes</a>, but the population dataset has only individual state and county codes. We can create a new <a href="https://docs.qgis.org/3.16/en/docs/user_manual/working_with_vector/attribute_table.html#creating-a-virtual-field">virtual field</a> in the population layer that combines these two values in order to provide an appropriate target field for the table join.</p> <p>Open the attribute table for population layer, and click on the &ldquo;Open field calculator&rdquo; button (it looks like an abacus). Enter <code>geoid</code> for the field name, select the &ldquo;Create virtual field&rdquo; checkbox, and select &ldquo;Text (string)&rdquo; for the field type. In the &ldquo;Expression&rdquo; field, enter:</p> <pre tabindex="0"><code>lpad(to_string(&#34;STATE&#34;), 2, &#39;0&#39;) || lpad(to_string(&#34;COUNTY&#34;), 3, &#39;0&#39;) </code></pre> <figure class="left" > <img src="create-virtual-field.png" /> </figure> <p>When you return the to attribute table, you will see a new <code>geoid</code> field that contains our desired value. We can now perform the table join.</p> <p>Open the properties for the <code>cb_2018_us_county_5m_clipped</code> layer we created earlier, and select the &ldquo;Joins&rdquo; tab. Click on the &ldquo;+&rdquo; button. For &ldquo;Join layer&rdquo;, select <code>co-est2019-alldata</code>. Select <code>geoid</code> for &ldquo;Join field&rdquo; and <code>GEOID</code> for target field. Lastly, select the &ldquo;Custom field name prefix&rdquo; checkbox and enter <code>pop_</code> in the field, then click &ldquo;OK&rdquo;.</p> <figure class="left" > <img src="county-join-population.png" /> </figure> <p>If you examine the attribute table for the layer, you will see the each county feature is now linked to the appropriate population data for that county.</p> <h3 id="merging-covid-data-with-our-geographic-data">Merging Covid data with our geographic data</h3> <p>This is another table join operation, but the process is going to be a little different. The previous process assumes a 1-1 mapping between features in the layers being joined, but the Covid dataset has many data points for each county. We need a solution that will produce the desired 1-many mapping.</p> <p>We can achieve this using the &ldquo;Join attributes by field value&rdquo; action in the &ldquo;Processing&rdquo; toolbox.</p> <p>Start by adding the <code>us-counties.csv</code> file from the NYT covid dataset to the project.</p> <p>Select &ldquo;Toolbox-&gt;Processing&rdquo; to show the Processing toolbox, if it&rsquo;s not already visible. In the &ldquo;Search&rdquo; field, enter &ldquo;join&rdquo;, and then look for &ldquo;Join attributes by field value&rdquo; in the &ldquo;Vector general&rdquo; section.</p> <p>Double click on this to open the input dialog. For &ldquo;Input layer&rdquo;, select <code>cb_2018_us_county_5m_clipped</code>, and &ldquo;Table field&rdquo; select <code>GEOID</code>. For &ldquo;Input layer 2&rdquo;, select <code>us-counties</code>, and for &ldquo;Table field 2&rdquo; select <code>fips</code>. In the &ldquo;Join type&rdquo; menu, select &ldquo;Create separate feature for each matching feature (one-to-many)&rdquo;. Ensure the &ldquo;Discard records which could not be joined&rdquo; is checked. Enter <code>covid_</code> in the &ldquo;Joined field prefix [optional]&rdquo; field (this will cause the fields in the resulting layer to have names like <code>covid_date</code>, <code>covid_cases</code>, etc). Click the &ldquo;Run&rdquo; button to create the new layer.</p> <figure class="left" > <img src="county-join-covid.png" /> </figure> <p>You will end up with a new layer named &ldquo;Joined layer&rdquo;. I suggest renaming this to <code>cb_2018_us_county_5m_covid</code>. If you enable the &ldquo;show feature count&rdquo; checkbox for your layers, you will see that while the <code>cb_2018_us_county_5m_clipped</code> has 129 features, the new <code>cb_2018_us_county_5m_covid</code> layer has over 32,000 features. That&rsquo;s because for each county, there are around 320 data points tracking Covid cases (etc) over time.</p> <figure class="left" > <img src="layers-feature-count.png" /> </figure> <h2 id="styling">Styling</h2> <h3 id="creating-outlines">Creating outlines</h3> <p>The only layer on our map that should have filled features will be the covid data layer. We want to configure our other layers to only display outlines.</p> <p>First, arrange the layers in the following order (from top to bottom):</p> <ol> <li>cb_2018_us_state_5m</li> <li>cb_2018_us_county_5m_clipped</li> <li>cb_2018_us_county_5m_covid</li> </ol> <p>The order of the csv layers doesn&rsquo;t matter, and if you still have the original <code>cb_2018_us_county_5m</code> layer in your project it should be hidden.</p> <p>Configure the state layer to display outlines. Right click on the layer and select &ldquo;Properties&rdquo;, then select the &ldquo;Symbology&rdquo; tab. Click on the &ldquo;Simple Fill&rdquo; item at the top, then in the &ldquo;Symbol layer type&rdquo; menu select &ldquo;Simple Line&rdquo;. Set the stroke width to 0.66mm.</p> <p>As long as we&rsquo;re here, let&rsquo;s also enable labels for the state layer. Select the &ldquo;Labels&rdquo; tab, then set the menu at the top to &ldquo;Single Labels&rdquo;. Set the &ldquo;Value&rdquo; field to &ldquo;Name&rdquo;. Click the &ldquo;Apply&rdquo; button to show the labels on the map without closing the window; now adjust the font size (and click &ldquo;Apply&rdquo; again) until things look the way you want. To make the labels a bit easier to read, select the &ldquo;Buffer&rdquo; panel, and check the &ldquo;Draw text buffer&rdquo; checkbox.</p> <p>Now do the same thing (except don&rsquo;t enable labels) with the <code>cb_2018_us_county_5m_clipped</code> layer, but set the stroke width to 0.46mm.</p> <p>If you hide the the Covid layer, your map should look like this (don&rsquo;t forget to unhide the Covid layer for the next step):</p> <figure class="left" > <img src="map-outlines.png" /> </figure> <h3 id="creating-graduated-colors">Creating graduated colors</h3> <p>Open the properties for the <code>cb_2018_us_county_5m_covid</code> layer, and select the &ldquo;Symbology&rdquo; tab. At the top of the symbology panel is a menu currently set to &ldquo;Single Symbol&rdquo;. Set this to &ldquo;Graduated&rdquo;.</p> <p>Open the expression editor for the &ldquo;Value&rdquo; field, and set it to:</p> <pre tabindex="0"><code>(to_int(&#34;cases&#34;) / &#34;pop_POPESTIMATE2019&#34;) * 1000000 </code></pre><p>Set the &ldquo;Color ramp&rdquo; to &ldquo;Spectral&rdquo;, and then select &ldquo;Invert color ramp&rdquo;.</p> <p>Ensure the &ldquo;Mode&rdquo; menu is set to &ldquo;Equal count (Quantile)&rdquo;, and then set &ldquo;Classes&rdquo; to 15. This will give a set of graduated categories that looks like this:</p> <figure class="left" > <img src="graduated-categories.png" /> </figure> <p>Close the properties window. Your map should look something like this:</p> <figure class="left" > <img src="map-graduated-1.png" /> </figure> <p>That&rsquo;s not very exciting yet, is it? Let&rsquo;s move on to the final section of this article.</p> <h2 id="animating-the-data">Animating the data</h2> <p>For this final step, we need to enable the QGIS <a href="https://plugins.qgis.org/plugins/timemanager/">TimeManager</a> plugin. Install the TimeManager plugin if it&rsquo;s not already installed: open the plugin manager (&ldquo;Plugins-&gt;Manage and Install Plugins&hellip;&rdquo;), and ensure both that TimeManager is installed and that it is enabled (the checkbox to the left of the plugin name is checked).</p> <p>Return to the project and open the TimeManger panel: select &ldquo;Plugins-&gt;TimeManager-&gt;Toggle visbility&rdquo;. This will display the following panel below the map:</p> <figure class="left" > <img src="timemanager-panel-initial.png" /> </figure> <p>Make sure that the &ldquo;Time frame size&rdquo; is set to &ldquo;1 days&rdquo;.</p> <p>Click the &ldquo;Settings&rdquo; button to open the TimeManager settings window, then select the &ldquo;Add layer&rdquo; button. In the resulting window, select the <code>cb_2018_us_county_5m_covid</code> layer in the &ldquo;Layer&rdquo; menu, the select the <code>covid_date</code> column in the &ldquo;Start time&rdquo; menu. Leave all other values at their defaults and click &ldquo;OK&rdquo; to return to the TimeManager settings.</p> <figure class="left" > <img src="timemanager-add-layer.png" /> </figure> <p>You will see the layer we just added listed in the &ldquo;Layers&rdquo; list. Look for the &ldquo;Time Format&rdquo; column in this list, which will say &ldquo;TO BE INFERRED&rdquo;. Click in this column and change the value to <code>%Y-%m-%d</code> to match the format of the dates in the <code>covid_date</code> field.</p> <figure class="left" > <img src="timemanager-settings-final.png" /> </figure> <p>You may want to change &ldquo;Show frame for&rdquo; setting from the default to something like 50 milliseconds. Leave everything else at the defaults and click the &ldquo;OK&rdquo; button.</p> <p>Ensure that the TimeManager is enabled by clicking on the &ldquo;power button&rdquo; in the TimeManager panel. TimeManager is enabled when the power button is green.</p> <p>Disabled:</p> <figure class="left" > <img src="timemanager-disabled.png" /> </figure> <p>Enabled:</p> <figure class="left" > <img src="timemanager-enabled.png" /> </figure> <p>Once TimeManager is enabled, you should be able to use the slider to view the map at different times. For example, here&rsquo;s the map in early May:</p> <figure class="left" > <img src="timemanager-early-may.png" /> </figure> <p>And here it is in early November:</p> <figure class="left" > <img src="timemanager-early-november.png" /> </figure> <p>To animate the map, click the play button in the bottom left of the TimeManager panel.</p> <p>You can export the animation to a video using the &ldquo;Export Video&rdquo; button. Assuming that you have <a href="https://ffmpeg.org/">ffmpeg</a> installed, you can select an output directory, select the &ldquo;Video (required ffmpeg &hellip;)&rdquo; button, then click &ldquo;OK&rdquo;. You&rsquo;ll end up with (a) a PNG format image file for each frame and (b) a file named <code>out.mp4</code> containing the exported video.</p> <h2 id="datasets">Datasets</h2> <p>I have made all the data referenced in this post available at <a href="https://github.com/larsks/ne-covid-map">https://github.com/larsks/ne-covid-map</a>.</p>

December 13, 2020 12:00 AM

December 12, 2020

Lars Kellogg-Stedman

Postgres and permissions in OpenShift

Folks running the official postgres image in OpenShift will often encounter a problem when first trying to boot a Postgres container in OpenShift. Given a pod description something like this: apiVersion: v1 kind: Pod metadata: name: postgres spec: containers: - name: postgres image: postgres:13 ports: - containerPort: 5432 volumeMounts: - mountPath: /var/lib/postgresql/data name: postgres-data envFrom: - secretRef: name: postgres-secret volumes: - name: postgres-data persistentVolumeClaim: claimName: postgres-data-pvc The container will fail to start and the logs will show the following error:

December 12, 2020 12:00 AM

November 18, 2020

Adam Young

Keystone and Cassandra: Parity with SQL

Look back at our Pushing Keystone over the Edge presentation from the OpenStack Summit. Many of the points we make are problems faced by any application trying to scale across multiple datacenters. Cassandra is a database designed to deal with this level of scale. So Cassandra may well be a better choice than MySQL or other RDBMS as a datastore to Keystone. What would it take to enable Cassandra support for Keystone?

Lets start with the easy part: defining the tables. Lets look at how we define the Federation back end for SQL. We use SQL Alchemy to handle the migrations: we will need something comparable for Cassandra Query Language (CQL) but we also need to translate the table definitions themselves.

Before we create the tables, we need to create keyspace. I am going to make separate keyspaces for each of the subsystems in Keystone: Identity, Assignment, Federation, and so on. Here’s the Federated one:

CREATE KEYSPACE keystone_federation WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '3'}  AND durable_writes = true;

The Identity provider table is defined like this:

    idp_table = sql.Table(
        'identity_provider',
        meta,
        sql.Column('id', sql.String(64), primary_key=True),
        sql.Column('enabled', sql.Boolean, nullable=False),
        sql.Column('description', sql.Text(), nullable=True),
        mysql_engine='InnoDB',
        mysql_charset='utf8')
    idp_table.create(migrate_engine, checkfirst=True)

The comparable CQL to create a table would look like this:

CREATE TABLE identity_provider (id text PRIMARY KEY , enables boolean , description text);

However, when I describe the schema to view the table defintion, we see that there are many tuning and configuration parameters that are defaulted:

CREATE TABLE federation.identity_provider (
    id text PRIMARY KEY,
    description text,
    enables boolean
) WITH additional_write_policy = '99p'
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND cdc = false
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND default_time_to_live = 0
    AND extensions = {}
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair = 'BLOCKING'
    AND speculative_retry = '99p';

I don’t know Cassandra well enough to say if these are sane defaults to have in production. I do know that someone, somewhere, is going to want to tweak them, and we are going to have to provide a means to do so without battling the upgrade scripts. I suspect we are going to want to only use the short form (what I typed into the CQL prompt) in the migrations, not the form with all of the options. In addition, we might want an if not exists  clause on the table creation to allow people to make these changes themselves. Then again, that might make things get out of sync. Hmmm.

There are three more entities in this back end:

CREATE TABLE federation_protocol (id text, idp_id text, mapping_id text,  PRIMARY KEY(id, idp_id) );
cqlsh:federation> CREATE TABLE mapping (id text primary key, rules text,    );
CREATE TABLE service_provider ( auth_url text, id text primary key, enabled boolean, description text, sp_url text, RELAY_STATE_PREFIX  text);

One thing that is interesting is that we will not be limiting the ID fields to 32, 64, or 128 characters. There is no performance benefit to doing so in Cassandra, nor is there any way to enforce the length limits. From a Keystone perspective, there is not much value either; we still need to validate the UUIDs in Python code. We could autogenerate the UUIDs in Cassandra, and there might be some benefit to that, but it would diverge from the logic in the Keystone code, and explode the test matrix.

There is only one foreign key in the SQL section; the federation protocol has an idp_id that points to the identity provider table. We’ll have to accept this limitation and ensure the integrity is maintained in code. We can do this by looking up the Identity provider before inserting the protocol entry. Since creating a Federated entity is a rare and administrative task, the risk here is vanishingly small. It will be more significant elsewhere.

For access to the database, we should probably use Flask-CQLAlchemy. Fortunately, Keystone is already a Flask based project, so this makes the two projects align.

For migration support, It looks like the best option out there is cassandra-migrate.

An effort like this would best be started out of tree, with an expectation that it would be merged in once it had shown a degree of maturity. Thus, I would put it into a namespace that would not conflict with the existing keystone project. The python imports would look like:

from keystone.cassandra import migrations
from keystone.cassandra import identity
from keystone.cassandra import federation

This could go in its own git repo and be separately pip installed for development. The entrypoints would be registered such that the configuration file would have entries like:

[application_credential] driver = cassandra

Any tuning of the database could be put under a [cassandra] section of the conf file, or tuning for individual sections could be in keys prefixed with cassanda_ in the appropriate sections, such as application_credentials as shown above.

It might be interesting to implement a Cassandra token backend and use the default_time_to_live value on the table to control the lifespan and automate the cleanup of the tables. This might provide some performance benefit over the fernet approach, as the token data would be cached. However, the drawbacks due to token invalidation upon change of data would far outweigh the benefits unless the TTL was very short, perhaps 5 minutes.

Just making it work is one thing. In a follow on article, I’d like to go through what it would take to stretch a cluster from one datacenter to another, and to make sure that the other considerations that we discussed in that presentation are covered.

Feedback?

by Adam Young at November 18, 2020 09:41 PM

November 16, 2020

RDO Blog

RDO Victoria Released

RDO Victoria Released

The RDO community is pleased to announce the general availability of the RDO build for OpenStack Victoria for RPM-based distributions, CentOS Linux and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Victoria is the 22nd release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.

The release is already available on the CentOS mirror network at http://mirror.centos.org/centos/8/cloud/x86_64/openstack-victoria/.

The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Linux and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS Linux users looking to build and maintain their own on-premise, public or hybrid clouds.

All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.

PLEASE NOTE: RDO Victoria provides packages for CentOS8 and python 3 only. Please use the Train release, for CentOS7 and python 2.7.

Interesting things in the Victoria release include:

  • With the Victoria release, source tarballs are validated using the upstream GPG signature. This certifies that the source is identical to what is released upstream and ensures the integrity of the packaged source code.
  • With the Victoria release, openvswitch/ovn are not shipped as part of RDO. Instead RDO relies on builds from the CentOS NFV SIG.
  • Some new packages have been added to RDO during the Victoria release:
    • ansible-collections-openstack: This package includes OpenStack modules and plugins which are supported by the OpenStack community to help with the management of OpenStack infrastructure.
    • ansible-tripleo-ipa-server: This package contains Ansible for configuring the FreeIPA server for TripleO.
    • python-ibmcclient: This package contains the python library to communicate with HUAWEI iBMC based systems.
    • puppet-powerflex: This package contains the puppet module needed to deploy PowerFlex with TripleO.
    • The following packages have been retired from the RDO OpenStack distribution in the Victoria release:
      • The Congress project, an open policy framework for the cloud, has been retired upstream and from the RDO project in the Victoria release.
      • neutron-fwaas, the Firewall as a Service driver for neutron, is no longer maintained and has been removed from RDO.

Other highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/victoria/highlights.

Contributors
During the Victoria cycle, we saw the following new RDO contributors:

Amy Marrich (spotz)
Daniel Pawlik
Douglas Mendizábal
Lance Bragstad
Martin Chacon Piza
Paul Leimer
Pooja Jadhav
Qianbiao NG
Rajini Karthik
Sandeep Yadav
Sergii Golovatiuk
Steve Baker

Welcome to all of you and Thank You So Much for participating!

But we wouldn’t want to overlook anyone. A super massive Thank You to all 58 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:

Adam Kimball
Ade Lee
Alan Pevec
Alex Schultz
Alfredo Moralejo
Amol Kahat
Amy Marrich (spotz)
Arx Cruz
Bhagyashri Shewale
Bogdan Dobrelya
Cédric Jeanneret
Chandan Kumar
Damien Ciabrini
Daniel Pawlik
Dmitry Tantsur
Douglas Mendizábal
Emilien Macchi
Eric Harney
Francesco Pantano
Gabriele Cerami
Gael Chamoulaud
Gorka Eguileor
Grzegorz Grasza
Harald Jensås
Iury Gregory Melo Ferreira
Jakub Libosvar
Javier Pena
Joel Capitao
Jon Schlueter
Lance Bragstad
Lon Hohberger
Luigi Toscano
Marios Andreou
Martin Chacon Piza
Mathieu Bultel
Matthias Runge
Michele Baldessari
Mike Turek
Nicolas Hicher
Paul Leimer
Pooja Jadhav
Qianbiao.NG
Rabi Mishra
Rafael Folco
Rain Leander
Rajini Karthik
Riccardo Pittau
Ronelle Landy
Sagi Shnaidman
Sandeep Yadav
Sergii Golovatiuk
Slawek Kaplonski
Soniya Vyas
Sorin Sbarnea
Steve Baker
Tobias Urdin
Wes Hayutin
Yatin Karel

The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Wallaby.

Get Started
There are three ways to get started with RDO.

To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.

For a production deployment of RDO, use TripleO and you’ll be running a production cloud in short order.

Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.

Get Help
The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.

The #rdo channel on Freenode IRC is also an excellent place to find and give help.

We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel, and #tripleo on irc.freenode.net), however we have a more focused audience within the RDO venues.

Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.

Join us in #rdo and #tripleo on the Freenode IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.

by Amy Marrich at November 16, 2020 02:27 PM

October 22, 2020

Adam Young

Adding Nodes to Ironic

TheJulia was kind enough to update the docs for Ironic to show me how to include IPMI information when creating nodes.

To all delete the old nodes

for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ; do openstack baremetal node delete $UUID; done

nodes definition

I removed the ipmi common data from each definition as there is a password there, and I will set that afterwards on all nodes.

{
  "nodes": [
    {
      "ports": [
        {
          "address": "00:21:9b:93:d0:90"
        }
      ],
      "name": "zygarde",
      "driver": "ipmi",
      "driver_info": {
      		"ipmi_address":  "192.168.123.10"
      }
    },
    {
      "ports": [
        {
          "address": "00:21:9b:9b:c4:21"
        }
      ],
      "name": "umbreon",
      "driver": "ipmi",
      "driver_info": {
	      "ipmi_address": "192.168.123.11"
	}
      },	
    {
      "ports": [
        {
          "address": "00:21:9b:98:a3:1f"
        }
      ],
      "name": "zubat",
      "driver": "ipmi",
       "driver_info": {
	      "ipmi_address": "192.168.123.12"
       }
    }
  ]
}

Create the nodes

openstack baremetal create  ./nodes.ipmi.json 

Check that the nodes are present

$ openstack baremetal node list
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name    | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| 3fa4feae-0d5c-4e38-a012-29258d40651b | zygarde | None          | None        | enroll             | False       |
| 00965ad4-c972-46fa-948a-3ce87aecf5ac | umbreon | None          | None        | enroll             | False       |
| 8702ea0c-aa10-4542-9292-3b464fe72036 | zubat   | None          | None        | enroll             | False       |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+

Update IPMI common data

for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ; 
do  openstack baremetal node set $UUID --driver-info ipmi_password=`cat ~/ipmi.password`  --driver-info   ipmi_username=admin   ; 
done

EDIT: I had ipmi_user before and it does not work. Needs to be ipmi_username.

Final Check

And if I look in the returned data for the definition, we see the password is not readable:

$ openstack baremetal node show zubat  -f yaml | grep ipmi_password
  ipmi_password: '******'

Power On

for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ; do  openstack baremetal node power on $UUID  ; done

Change “on” to “off” to power off.

by Adam Young at October 22, 2020 03:14 AM

October 15, 2020

Adam Young

Introduction to Ironic

“I can do any thing. I can’t do everything.”

The sheer number of projects and problem domains covered by OpenStack was overwhelming. I never learned several of the other projects under the big tent. One project that is getting relevant to my day job is Ironic, the bare metal provisioning service. Here are my notes from spelunking the code.

The Setting

I want just Ironic. I don’t want Keystone (personal grudge) or Glance or Neutron or Nova.

Ironic will write files to e.g. /var/lib/tftp and /var/www/html/pxe and will not handle DHCP, but can make sue of static DHCP configurations.

Ironic is just an API server at this point ( python based web service) that manages the above files, and that can also talk to the IPMI ports on my servers to wake them up and perform configurations on them.

I need to provide ISO images to Ironic so it can put the in the right place to boot them

Developer steps

I checked the code out of git. I am working off the master branch.

I ran tox to ensure the unit tests are all at 100%

I have mysql already installed and running, but with a Keystone Database. I need to make a new one for ironic. The database name, user, and password are all going to be ironic, to keep things simple.

CREATE USER 'ironic'@'localhost' IDENTIFIED BY 'ironic';
create database ironic;
GRANT ALL PRIVILEGES ON ironic.* TO 'ironic'@'localhost';
FLUSH PRIVILEGES;

Note that I did this as the Keystone user. That dude has way to much privilege….good thing this is JUST for DEVELOPMENT. This will be used to follow the steps in the developers quickstart docs. I also set the mysql URL in the config file to this

connection = mysql+pymysql://ironic:ironic@localhost/ironic

Then I can run ironic db sync. Lets’ see what I got:

mysql ironic --user ironic --password
#....
MariaDB [ironic]> show tables;
+-------------------------------+
| Tables_in_ironic              |
+-------------------------------+
| alembic_version               |
| allocations                   |
| bios_settings                 |
| chassis                       |
| conductor_hardware_interfaces |
| conductors                    |
| deploy_template_steps         |
| deploy_templates              |
| node_tags                     |
| node_traits                   |
| nodes                         |
| portgroups                    |
| ports                         |
| volume_connectors             |
| volume_targets                |
+-------------------------------+
15 rows in set (0.000 sec)

OK, so the first table shows that Ironic uses Alembic to manage migrations. Unlike the SQLAlchemy migrations table, you can’t just query this table to see how many migrations have been performed:

MariaDB [ironic]> select * from alembic_version;
+--------------+
| version_num  |
+--------------+
| cf1a80fdb352 |
+--------------+
1 row in set (0.000 sec)

Running The Services

The script to start the API server is:
ironic-api -d --config-file etc/ironic/ironic.conf.local

Looking in the file requirements.txt, I see that they Web framework for Ironic is Pecan:

$ grep pecan requirements.txt 
pecan!=1.0.2,!=1.0.3,!=1.0.4,!=1.2,>=1.0.0 # BSD

This is new to me. On Keystone, we converted from no framework to Flask. I’m guessing that if I look in the chain that starts with ironic-api file, I will see a Pecan launcher for a web application. We can find that file with

$which ironic-api
/opt/stack/ironic/.tox/py3/bin/ironic-api

Looking in that file, it references ironic.cmd.api, which is the file ironic/cmd/api.py which in turn refers to ironic/common/wsgi_service.py. This in turn refers to ironic/api/app.py from which we can finally see that it imports pecan.

Now I am ready to run the two services. Like most of OpenStack, there is an API server and a “worker” server. In Ironic, this is called the Conductor. This maps fairly well to the Operator pattern in Kubernetes. In this pattern, the user makes changes to the API server via a web VERB on a URL, possibly with a body. These changes represent a desired state. The state change is then performed asynchronously. In OpenStack, the asynchronous communication is performed via a message queue, usually Rabbit MQ. The Ironic team has a simpler mechanism used for development; JSON RPC. This happens to be the same mechanism used in FreeIPA.

Command Line

OK, once I got the service running, I had to do a little fiddling around to get the command lines to work. The was an old reference to

OS_AUTH_TYPE=token_endpoint

which needed to be replaces with

OS_AUTH_TYPE=none

Both are in the documentation, but only the second one will work.

I can run the following commands:

$ baremetal driver list
+---------------------+----------------+
| Supported driver(s) | Active host(s) |
+---------------------+----------------+
| fake-hardware       | ayoungP40      |
+---------------------+----------------+
$ baremetal node list


curl

Lets see if I can figure out from CURL what APIs those are…There is only one version, and one link, so:

curl http://127.0.0.1:6385 | jq '.versions  | .[] | .links | .[] |  .href'

"http://127.0.0.1:6385/v1/"


Doing curl against that second link gives a list of the top level resources:

  • media_types
  • chassis
  • nodes
  • drivers

And I assume that, if I use curl to GET the drivers, I should see the fake driver entry from above:

$ curl "http://127.0.0.1:6385/v1/drivers" | jq '.drivers |.[] |.name'

"fake-hardware"

OK, that is enough to get started. I am going to try and do the same with the RPMs that we ship with OSP and see what I get there.

But that is a tale for another day.

Thank You

I had a conversation I had with Julia Kreger, a long time core member of the Ironic project. This helped get me oriented.

by Adam Young at October 15, 2020 07:27 PM

October 05, 2020

Lars Kellogg-Stedman

A note about running gpgv

I found the following error from gpgv to be a little opaque:

gpgv: unknown type of key resource 'trustedkeys.kbx'
gpgv: keyblock resource '/home/lars/.gnupg/trustedkeys.kbx': General error
gpgv: Can't check signature: No public key

It turns out that’s gpg-speak for “your trustedkeys.kbx keyring doesn’t exist”. That took longer to figure out than I care to admit. To get a key from your regular public keyring into your trusted keyring, you can run something like the following:

gpg --export -a lars@oddbit.com |
gpg --no-default-keyring --keyring ~/.gnupg/trustedkeys.kbx --import

After which gpgv works as expected:

$ echo hello world | gpg -s -u lars@oddbit.com | gpgv
gpgv: Signature made Mon 05 Oct 2020 07:44:22 PM EDT
gpgv: using RSA key FDE8364F7FEA3848EF7AD3A6042DF6CF74E4B84C
gpgv: issuer "lars@oddbit.com"
gpgv: Good signature from "Lars Kellogg-Stedman <lars@oddbit.com>"
gpgv: aka "keybase.io/larsks <larsks@keybase.io>"

October 05, 2020 12:00 AM

September 27, 2020

Lars Kellogg-Stedman

Installing metallb on OpenShift with Kustomize

Out of the box, OpenShift (4.x) on bare metal doesn’t come with any integrated load balancer support (when installed in a cloud environment, OpenShift typically makes use of the load balancing features available from the cloud provider). Fortunately, there are third party solutions available that are designed to work in bare metal environments. MetalLB is a popular choice, but requires some minor fiddling to get it to run properly on OpenShift.

If you read through the installation instructions, you will see this note about installation on OpenShift:

To run MetalLB on Openshift, two changes are required: changing the pod UIDs, and granting MetalLB additional networking privileges.

Pods get UIDs automatically assigned based on an OpenShift-managed UID range, so you have to remove the hardcoded unprivileged UID from the MetalLB manifests. You can do this by removing the spec.template.spec.securityContext.runAsUser field from both the controller Deployment and the speaker DaemonSet.

Additionally, you have to grant the speaker DaemonSet elevated privileges, so that it can do the raw networking required to make LoadBalancers work. You can do this with:

The docs here suggest some manual changes you can make, but it’s possible to get everything installed correctly using Kustomize (which makes sense especially given that the MetalLB docs already include instructions on using Kustomize).

A vanilla installation of MetalLB with Kustomize uses a kustomization.yml file that looks like this:

namespace: metallb-system
resources:
- github.com/metallb/metallb//manifests?ref=v0.9.3
- configmap.yml
- secret.yml

(Where configmap.yml and secret.yml are files you create locally containing, respectively, the MetalLB configuration and a secret used to authenticate cluster members.)

Fixing the security context

In order to remove the runAsUser directive form the template securityContext setting, we can use the patchesStrategicMerge feature. In our kustomization.yml file we add:

patches:
- |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: controller
namespace: metallb-system
spec:
template:
spec:
securityContext:
$patch: replace
runAsNonRoot: true

This instructs kustomize to replace the contents of the securityContext key with the value included in the patch (without the $patch: replace directive, the default behavior is to merge the contents, which in this situation would effectively be a no-op).

We can accomplish the same thing using jsonpatch syntax. In this case, we would write:

patches:
- target:
kind: Deployment
name: controller
namespace: metallb-system
patch: |-
- op: remove
path: /spec/template/spec/securityContext/runAsUser

With either solution, the final output includes a securityContext setting that looks like this:

spec:
template:
spec:
securityContext:
runAsNonRoot: true

Granting elevated privileges

The MetaLB docs suggest running:

oc adm policy add-scc-to-user privileged -n metallb-system -z speaker

But we can configure the same privilege level by setting up an appropriate role binding as part of our Kustomize manifests.

First, we create an allow-privileged cluster role by adding the following manifest in clusterrole.yml:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-privileged
rules:
- apiGroups:
- security.openshift.io
resourceNames:
- privileged
resources:
- securitycontextconstraints
verbs:
- use

Then we bind the speaker service account to the allow-privileged role by adding a ClusterRoleBinding in rolebinding.yml:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metallb-allow-privileged
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-privileged
subjects:
- kind: ServiceAccount
name: speaker
namespace: metallb-system

You will need to add these new manifests to your kustomization.yml, which should now look like:

namespace: metallb-system
resources:
- github.com/metallb/metallb//manifests?ref=v0.9.3
- configmap.yml
- secret.yml
- clusterole.yml
- rolebinding.yml
patches:
- target:
kind: Deployment
name: controller
namespace: metallb-system
patch: |-
- op: remove
path: /spec/template/spec/securityContext/runAsUser

Conclusion

The changes described here will result in a successful MetalLB deployment into your OpenShift environment.

September 27, 2020 12:00 AM

September 26, 2020

Lars Kellogg-Stedman

Vortex Core Keyboard Review

I’ve had my eye on the Vortex Core keyboard for a few months now, and this past week I finally broke down and bought one (with Cherry MX Brown switches). The Vortex Core is a 40% keyboard, which means it consists primarily of letter keys, a few lonely bits of punctuation, and several modifier keys to activate different layers on the keyboard.

Physical impressions

It’s a really cute keyboard. I’m a big fan of MX brown switches, and this keyboard is really a joy to type on, at least when you’re working primarily with the alpha keys. I’m still figuring out where some of the punctuation is, and with a few exceptions I haven’t yet spent time trying to remap things into more convenient positions.

The keyboard feels solid. I’m a little suspicious of the micro-usb connector; it feels a little wobbly. I wish that it was USB-C and I wish it felt a little more stable.

Here’s a picture of my Core next to my Durgod K320:

Programming

The keyboard first came out in 2017, and if you read reviews that came out around that time you’ll find several complaints around limitations in the keyboard’s programming features, in particular:

  • you can’t map the left and right spacebars differently
  • you can’t remap layer 0
  • you can’t remap the Fn1 key

And so forth. Fortunately, at some point (maybe 2018) Vortexgear released updated firmware that resolves all of the above issues, and introduces a completely new way of programming the keyboard.

Originally, the keyboard was programmed entirely via the keyboard itself: there was a key combination to activate programming mode in each of the three programmable layers, and this allowed you to freely remap keys. Unfortunately, this made it well difficult to share layouts, and made extensive remapping rather unwieldy.

The updated firmware ("CORE_MPC") does away with the hardware programming, and instead introduces both a web UI for generating keyboard layouts and a simple mechanism for pushing those layouts to the keyboard that is completely operating system independent (which is nice if you’re a Linux user and are tired of having to spin up a Windows VM just to run someone’s firmware programming tool). With the new firmware, you hold down Fn-d when booting the keyboard and it will present a FAT-format volume to the operating system. Drag your layout to the volume, unmount it, and reboot the keyboard and you’re all set (note that you will still need to spin up that Windows VM one last time in order to install the firmware update).

The Vortexgear keyboard configurator is available at http://www.vortexgear.tw/mpc/index.html, but you’re going to want to use https://tsfreddie.github.io/much-programming-core/ instead, which removes several limitations that are present in the official tool.

Because the new configurator (a) allows you to remap all layers, including layer 0, and (b) allows to create mappings for the Pn key, you have a lot of flexibility in how you set up your mappings.

How I’ve configured things

I performed some limited remapping of layer 0:

  • I’ve moved the Fn1 key to the right space bar, and turned the original Fn1 key into the quote key. I use that enough in general writing that it’s convenient to be able to access it without using a modifier.

  • I’ve set up a cursor cluster using the Pn key. This gets me the standard WASD keys for arrows, and Q and E for page up and page down.

  • Holding down the Pn key also gets me a numeric keypad on the right side of the keyboard.

Final thoughts

It’s a fun keyboard. I’m not sure it’s going to become my primary keyboard, especially for writing code, but I’m definitely happy with it.

September 26, 2020 12:00 AM

September 25, 2020

Lars Kellogg-Stedman

Building multi-architecture images with GitHub Actions

At work we have a cluster of IBM Power 9 systems running OpenShift. The problem with this environment is that nobody runs Power 9 on their desktop, and Docker Hub only offers automatic build support for the x86 architecture. This means there’s no convenient options for building Power 9 Docker images…or so I thought.

It turns out that Docker provides GitHub actions that make the process of producing multi-architecture images quite simple.

The code demonstrated in this post can be found in my hello-flask GitHub repository.

Configuring secrets

There is some information we need to provide to our workflow that we don’t want to hardcode into configuration files, both for reasons of security (we don’t want to expose passwords in the repository) and convenience (we want other people to be able to fork this repository and run the workflow without needing to make any changes to the code).

We can do this by configuring “secrets” in the repository on GitHub. You can configure secrets by visiting the “Secrets” tab in your repository settings (https://github.com/<USERNAME>/<REPOSITORY>/settings/secrets),

For this workflow, we’re going to need two secrets:

  • DOCKER_USERNAME – this is our Docker Hub username; we’ll need this both for authentication and to set the namespace for the images we’re building.

  • DOCKER_PASSWORD – this is our Docker Hub password, used for authentication.

Within a workflow, we can refer to these secrets using syntax like ${{ secrets.DOCKER_USERNAME }} (you’ll see example of this later on).

Creating a workflow

In the repository containing your Dockerfile, create a .github/workflows directory. This is where we will place the files that configure GitHub actions. In this directory, create a file called build_images.yml (the particular name isn’t important, but it’s nice to make names descriptive).

We’ll first give this workflow a name and configure it to run for pushes on our master branch by adding the following to our build_images.yml file:

---
name: 'build images'
on:
push:
branches:
- master

Setting up jobs

With that boilerplate out of the way, we can start configuring the jobs that will comprise our workflow. Jobs are defined in the jobs section of the configuration file, which is a dictionary that maps job names to their definition. A job can have multiple actions. For this example, we’re going to set up a docker job that will perform the following steps:

  • check out the repository
  • prepare some parameters
  • set up qemu, which is used to provide emulated environments for building on architecture other than the host arch
  • configure the docker builders
  • authenticate to docker hub
  • build and push the images to docker hub

We start by providing a name for our job and configuring the machine on which the jobs will run. In this example, we’re using ubuntu-latest; other options include some other Ubuntu variants, Windows, and MacOS (and you are able to host your own custom builders, but that’s outside the scope of this article).

jobs:
docker:
runs-on: ubuntu-latest
steps:

Checking out the repository

In our first step, we use the standard actions/checkout action to check out the repository:

 - name: Checkout
uses: actions/checkout@v2

Preparing parameters

The next step is a simple shell script that sets some output parameters we will be able to consume in subsequent steps. A script can set parameters by generating output in the form:

::set-output name=<name>::<value>

In other steps, we can refer to these parameters using the syntax ${{ steps.<step_name>.output.<name> }} (e.g. ${{ steps.prep.output.tags }}).

We’re going to use this step to set things like the image name (using our DOCKER_USERNAME secret to set the namespace), and to set up several tags for the image:

  • By default, we tag it latest
  • If we’re building from a git tag, use the tag name instead of latest. Note that here we’re assuming that git tags are of the form v1.0, so we strip off that initial v to get a Docker tag that is just the version number.
  • We also tag the image with the short commit id
 - name: Prepare
id: prep
run: |
DOCKER_IMAGE=${{ secrets.DOCKER_USERNAME }}/${GITHUB_REPOSITORY#*/}
VERSION=latest
SHORTREF=${GITHUB_SHA::8}
# If this is git tag, use the tag name as a docker tag
if [[ $GITHUB_REF == refs/tags/* ]]; then
VERSION=${GITHUB_REF#refs/tags/v}
fi
TAGS="${DOCKER_IMAGE}:${VERSION},${DOCKER_IMAGE}:${SHORTREF}"
# If the VERSION looks like a version number, assume that
# this is the most recent version of the image and also
# tag it 'latest'.
if [[ $VERSION =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
TAGS="$TAGS,${DOCKER_IMAGE}:latest"
fi
# Set output parameters.
echo ::set-output name=tags::${TAGS}
echo ::set-output name=docker_image::${DOCKER_IMAGE}

Set up QEMU

The docker/setup-qemu action installs QEMU static binaries, which are used to run builders for architectures other than the host.

 - name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all

Set up Docker builders

The docker/setup-buildx action configures buildx, which is a Docker CLI plugin that provides enhanced build capabilities. This is the infrastructure that the following step will use for actually building images.

 - name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master

Authenticate to Docker Hub

In order to push images to Docker Hub, we use the docker/login-action action to authenticate. This uses the DOCKER_USERNAME and DOCKER_PASSWORD secrets we created earlier in order to establish credentials for use in subsequent steps.

 - name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}

Build and push the images

This final step uses the [docker/build-push-action][] to build the images and push them to Docker Hub using the tags we defined in the prep step. In this example, we’re building images for amd64, arm64, and ppc64le architectures.

 - name: Build
uses: docker/build-push-action@v2
with:
builder: ${{ steps.buildx.outputs.name }}
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64,linux/ppc64le
push: true
tags: ${{ steps.prep.outputs.tags }}

The complete workflow

When we put all of the above together, we get:

---
name: 'build images'
on:
push:
branches:
- master
jobs:
docker:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Prepare
id: prep
run: |
DOCKER_IMAGE=${{ secrets.DOCKER_USERNAME }}/${GITHUB_REPOSITORY#*/}
VERSION=latest
SHORTREF=${GITHUB_SHA::8}
# If this is git tag, use the tag name as a docker tag
if [[ $GITHUB_REF == refs/tags/* ]]; then
VERSION=${GITHUB_REF#refs/tags/v}
fi
TAGS="${DOCKER_IMAGE}:${VERSION},${DOCKER_IMAGE}:${SHORTREF}"
# If the VERSION looks like a version number, assume that
# this is the most recent version of the image and also
# tag it 'latest'.
if [[ $VERSION =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
TAGS="$TAGS,${DOCKER_IMAGE}:latest"
fi
# Set output parameters.
echo ::set-output name=tags::${TAGS}
echo ::set-output name=docker_image::${DOCKER_IMAGE}
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build
uses: docker/build-push-action@v2
with:
builder: ${{ steps.buildx.outputs.name }}
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64,linux/ppc64le
push: true
tags: ${{ steps.prep.outputs.tags }}

You can grab the hello-flask repository and try this out yourself. You’ll need to set up the secrets described earlier in this article, but then for each commit to the master branch you will end up a new image, tagged both as latest and with the short git commit id.

The results

We can use the docker manifest inspect command to inspect the output of the build step. In the output below, you can see the images build for our three target architectures:

$ docker manifest inspect !$
docker manifest inspect larsks/hello-flask
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 3261,
"digest": "sha256:c6bab778a9fd0dc7bf167a5a49281bcd5ebc5e762ceeb06791aff8f0fbd15325",
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 3261,
"digest": "sha256:3c02f36562fcf8718a369a78054750382aba5706e1e9164b76bdc214591024c4",
"platform": {
"architecture": "arm64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 3262,
"digest": "sha256:192fc9acd658edd6b7f2726f921cba2582fb1101d929800dff7fb53de951dd76",
"platform": {
"architecture": "ppc64le",
"os": "linux"
}
}
]
}

Caveats

This process assumes, of course, that your base image of choice is available for your selected architectures. According to Docker:

Most of the official images on Docker Hub provide a variety of architectures. For example, the busybox image supports amd64, arm32v5, arm32v6, arm32v7, arm64v8, i386, ppc64le, and s390x.

So if you are starting from one of the official images, you’ll probably be in good shape. On the other hand, if you’re attempting to use a community image as a starting point, you might find that it’s only available for a single architecture.

September 25, 2020 12:00 AM