RDO Zed Released
The RDO community is pleased to announce the general availability of the RDO build for OpenStack Zed for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Zed is the 26th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world. As with the Upstream release, this release of RDO is dedicated to Ilya Etingof who was an upstream and RDO contributor.
The release is already available for CentOS Stream 9 on the CentOS mirror network in:
http://mirror.stream.centos.org/SIGs/9-stream/cloud/x86_64/openstack-zed/
The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.
All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.
The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/zed/highlights.html
TripleO in the RDO Zed release:
Since the Xena development cycle, TripleO follows the Independent release model (https://specs.openstack.org/openstack/tripleo-specs/specs/xena/tripleo-independent-release.html).
For the Zed cycle, TripleO project will maintain and validate stable Zed branches. As for the rest of packages, RDO will update and publish the releases created during the maintenance cycle.
Contributors During the Zed cycle, we saw the following new RDO contributors:
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 57 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:
The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Antelope.
Get Started
To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.
Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.
Get Help
The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.
The #rdo channel on OFTC IRC is also an excellent place to find and give help. We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel in Libera.Chat network, and #tripleo on OFTC), however we have a more focused audience within the RDO venues.
Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.
Join us in #rdo and #tripleo on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.
The command to run the formatting tests for the keystone project is:
tox -e pe8
Running this on Fedora35 failed for me with this error:
ERROR: pep8: could not install deps [-chttps://releases.openstack.org/constraints/upper/master, -r/opt/stack/keystone/test-requirements.txt, .[ldap,memcache,mongodb]]; v = InvocationError("/opt/stack/keystone/.tox/pep8/bin/python -m pip install -chttps://releases.openstack.org/constraints/upper/master -r/opt/stack/keystone/test-requirements.txt '.[ldap,memcache,mongodb]'", 1)
What gets swallowed up is the actual error in the install, and it has to do with the fact that the python dependencies are compiled against native libraries. If I activate the venv and run the command by hand, I can see the first failure. But if I look up at the previous output, I can see it, just buried a few screens up:
Error: pg_config executable not found.
A Later error was due to the compile step erroring out looking for lber.h:
In file included from Modules/LDAPObject.c:3:
Modules/common.h:15:10: fatal error: lber.h: No such file or directory
15 | #include
| ^~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
To get the build to run, I need to install both libpq-devel and libldap-devel and now it fails like this:
File "/opt/stack/keystone/.tox/pep8/lib/python3.10/site-packages/pep257.py", line 24, in
from collections import defaultdict, namedtuple, Set
ImportError: cannot import name 'Set' from 'collections' (/usr/lib64/python3.10/collections/__init__.py)
This appears to be due to the version of python3 on my system (3.10) which is later than supported by upstream openstack. I do have python3.9 installed on my system, and can modify the tox.ini to use it by specifying the basepython version.
[testenv:pep8]
basepython = python3.9
deps =
And then I can run tox -e pep8.
I got keystone in my Bifrost install to talk via LDAP to our Freeipa server. Here’s what I had to do.
I started with a new install of bifrost, using Keystone and TLS.
./bifrost-cli install --enable-keystone --enable-tls --network-interface enP4p4s0f0np0 --dhcp-pool 192.168.116.25-192.168.116.75
After making sure that Keystone could work for normal things;
source /opt/stack/bifrost/bin/activate
export OS_CLOUD=bifrost-admin
openstack user list -f yaml
- ID: 1751a5bb8b4a4f0188069f8cb4f8e333
Name: admin
- ID: 5942330b4f2c4822a9f2cdf45ad755ed
Name: ironic
- ID: 43e30ad5bf0349b7b351ca2e86fd1628
Name: ironic_inspector
- ID: 0c490e9d44204cc18ec1e507f2a07f83
Name: bifrost_user
I had to install python3-ldap and python3-ldappool .
sudo apt install python3-ldap python3-ldappool
Now create a domain for the LDAP data.
openstack domain create freeipa
...
openstack domain show freeipa -f yaml
description: ''
enabled: true
id: 422608e5c8d8428cb022792b459d30bf
name: freeipa
options: {}
tags: []
Edit /etc/keystone/keystone.conf to support domin specific backends and back them with file config. When you are done, your identity section should look like this.
[identity]
domain_specific_drivers_enabled=true
domain_config_dir=/etc/keystone/domains
driver = sql
Create the corresponding directory for the new configuration files.
sudo mkdir /etc/keystone/domains/
Add in a configuration file for your LDAP server. Since I called my domain freeipa I have to name the config file /etc/keystone/domains/keystone.freeipa.conf
[identity]
driver = ldap
[ldap]
url = ldap://den-admin-01
user_tree_dn = cn=users,cn=accounts,dc=younglogic,dc=com
user_objectclass = person
user_id_attribute = uid
user_name_attribute = uid
user_mail_attribute = mail
user_allow_create = false
user_allow_update = false
user_allow_delete = false
group_tree_dn = cn=groups,cn=accounts,dc=younglogic,dc=com
group_objectclass = groupOfNames
group_id_attribute = cn
group_name_attribute = cn
group_member_attribute = member
group_desc_attribute = description
group_allow_create = false
group_allow_update = false
group_allow_delete = false
user_enabled_attribute = nsAccountLock
user_enabled_default = False
user_enabled_invert = true
To make changes, to restart sudo systemctl restart uwsgi@keystone-public
sudo systemctl restart uwsgi@keystone-public
And test that it worked
openstack user list -f yaml --domain freeipa
- ID: b3054e3942f06016f8b9669b068e81fd2950b08c46ccb48032c6c67053e03767
Name: renee
- ID: d30e7bc818d2f633439d982783a2d145e324e3187c0e67f71d80fbab065d096a
Name: ann
This same approach can work if you need to add more than one LDAP server to your Keystone deployment.
The RDO community is pleased to announce the general availability of the RDO build for OpenStack Yoga for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Yoga is the 25th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.
The release is already available on the CentOS mirror network:
The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.
All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.
Interesting things in the Yoga release include:
The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/yoga/highlights.html
TripleO in the RDO Yoga release:
Since the Xena development cycle, TripleO follows the Independent release model and will only maintain branches for selected OpenStack releases. In the case of Yoga, TripleO will not support the Yoga release. For TripleO users in RDO, this means that:
You can find details about this on the RDO Webpage
Contributors
During the Yoga cycle, we saw the following new RDO contributors:
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 40 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:
The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Zed.
Get Started
To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.
Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.
Get Help
The RDO Project has the users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.
The #rdo channel on OFTC. IRC is also an excellent place to find and give help.
We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel in Libera Chat network, and #tripleo on OFTC), however we have a more focused audience within the RDO venues.
Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.
Join us in #rdo and #tripleo on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.
To make the AARCH64 ipxe process work using bifrost, I had to
git clone https://github.com/ipxe/ipxe.git
cd ipxe/src/
make bin-arm64-efi/snponly.efi ARCH=arm64
sudo cp bin-arm64-efi/snponly.efi /var/lib/tftpboot/ipxe.efi
This works for the Ampere reference implementation servers that use a Mellanox network interface card, which supports (only) snp.
For the past week I worked on getting a Ironic standalone to run on an Ampere AltraMax server in our lab. As I recently was able to get a baremetal node to boot, I wanted to record the steps I went through.
Our base operating system for this install is Ubuntu 20.04.
The controller node has 2 Mellanox Technologies MT27710 network cards, each with 2 ports apiece.
I started by following the steps to install with the bifrost-cli. However, there were a few places where the installation assumes an x86_64 architecture, and I hard-swapped them to be AARCH64/ARM64 specific:
$ git diff HEAD
diff --git a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
index 18e281b0..277bfc1c 100644
--- a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
+++ b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
@@ -6,8 +6,8 @@ ironic_rootwrap_dir: /usr/local/bin/
mysql_service_name: mysql
tftp_service_name: tftpd-hpa
efi_distro: debian
-grub_efi_binary: /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
-shim_efi_binary: /usr/lib/shim/shimx64.efi.signed
+grub_efi_binary: /usr/lib/grub/arm64-efi-signed/grubaa64.efi.signed
+shim_efi_binary: /usr/lib/shim/shimaa64.efi.signed
required_packages:
- mariadb-server
- python3-dev
diff --git a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
index 7fcbcd46..4d6a1337 100644
--- a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
+++ b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
@@ -26,7 +26,7 @@ required_packages:
- dnsmasq
- apache2-utils
- isolinux
- - grub-efi-amd64-signed
+ - grub-efi-arm64-signed
- shim-signed
- dosfstools
# NOTE(TheJulia): The above entry for dnsmasq must be the last entry in the
The long term approach to these is to make those variables architecture specific.
In order to install, I ran the cli:
./bifrost-cli install --network-interface enP4p4s0f1 --dhcp-pool 192.168.116.100-192.168.116.150
It took me several tries with -e variables until realized that it was not going to honor them. I did notice that the heart of the command was the Ansible call, which I ended up running directly:
/opt/stack/bifrost/bin/ansible-playbook ~/bifrost/playbooks/install.yaml -i ~/bifrost/playbooks/inventory/target -e bifrost_venv_dir=/opt/stack/bifrost -e @/home/ansible/bifrost/baremetal-install-env.json
You may notice that I added a -e with the baremetal-install-env.json file. That file had been created by the earlier CLI run., and contained the variables specific to my install. I also edited it to trigger the build of the ironic cleaning image.
{
"create_ipa_image": false,
"create_image_via_dib": false,
"install_dib": true,
"network_interface": "enP4p4s0f1",
"enable_keystone": false,
"enable_tls": false,
"generate_tls": false,
"noauth_mode": false,
"enabled_hardware_types": "ipmi,redfish,manual-management",
"cleaning_disk_erase": false,
"testing": false,
"use_cirros": false,
"use_tinyipa": false,
"developer_mode": false,
"enable_prometheus_exporter": false,
"default_boot_mode": "uefi",
"include_dhcp_server": true,
"dhcp_pool_start": "192.168.116.100",
"dhcp_pool_end": "192.168.116.150",
"download_ipa": false,
"create_ipa_image": true
}
With this ins place, I was able to enroll nodes using the Bifrost cli:
~/bifrost/bifrost-cli enroll ~/nodes.json
I prefer this to using my own script. However, my script checks for existence and thus can be run idempotently, unlike this one. Still, I like the file format and will likely script to it in the future.
WIth this, I was ready to try booting the nodes, but they hung as I reported in an earlier article.
The other place where the deployment is x86_64 specific is the iPXE binary. In a bifrost install on Ubuntu, the binary is called ipxe.efi, and it is placed in /var/lib/tftpboot/ipxe.efi. It is copied from the grub-ipxe package which places it in /boot/ipxe.efi. Although this package is not tagged as an x86_64 architecture (Debian/Ubuntu call it all) the file is architecture specific.
I went through the steps to fetch and install the latest one out of jammy which has an additional file: /boot/ipxe-arm64.efi. However, when I replaced the file /var/lib/tftpboot/ipxe.efi with this one, the baremetal node still failed to boot, although it did get a few steps further in the process.
The issue, as I understand it, is that the binary needs as set of drivers to set up the http request in the network interface cards, and the build in the Ubuntu package did not have that. Instead, I cloned the source git repo and compiled the binary directly. Roughly
git clone https://github.com/ipxe/ipxe.git
cd ipxe/src
make bin-arm64-efi/snponly.efi ARCH=arm64
SNP stands for the Simple Network Protocol. I guess this protocol is esoteric enough that Wikipedia has not heard of it.
The header file in the code says this:
The EFI_SIMPLE_NETWORK_PROTOCOL provides services to initialize a network interface, transmit packets, receive packets, and close a network interface.
It seems the Mellanox cards support/require SNP. With this file in place, I was able to get the cleaning image to PXE boot.
I call this a spike as it has a lot of corners cut in it that I would not want to maintain in production. We’ll work with the distributions to get a viable version of ipxe.efi produced that can work for an array of servers, including Ampere’s. In the meantime, I need a strategy to handle building our own binary. I also plan on reworking the Bifrost variables to handle ARM64/AARCH64 along side x86_64; a single server should be able to handle both based on the Architecture flag sent in the initial DHCP request.
Note: I was not able to get the cleaning image to boot, as it had an issue with werkzeug and JSON. However, I had an older build of the IPA kernel and initrd that I used, and the node properly deployed and cleaned.
And yes, I plan on integrating Keystone in the future, too.
There are a handful of questions a user will (implicitly) ask when using your API:
Answering these questions can be automated. The user, and the tools they use, can discover the answers by working with the system. That is what I mean when I use the word “Discoverability.”
We missed some opportunities to answer these questions when we designed the APIs for Keystone OpenStack. I’d like to talk about how to improve on what we did there.
First I’d like to state what not to do.
Don’t make the user read the documentation and code to an external spec.
Never require a user to manually perform an operation that should be automated. Answering every one of those question can be automated. If you can get it wrong, you will get it wrong. Make it possible to catch errors as early as possible.
Lets start with the question: “What actions can I do against this endpoint?” In the case of Keystone, the answer would be some of the following:
Create, Read, Update and Delete (CRUD) Users, Groups of Users, Projects, Roles, and Catalog Items such as Services and Endpoints. You can also CRUD relationships between these entities. You can CRUD Entities for Federated Identity. You can CRUD Policy files (historical). Taken in total, you have the tools to make access control decisions for a wide array of services, not just Keystone.
The primary way, however, that people interact with Keystone is to get a token. Let’s use this use case to start. To Get a token, you make a POST to the $OS_AUTH_URL/v3/auth/tokens/ URL. The data
How would you know this? Only by reading the documentation. If someone handed you the value of their OS_AUTH_URL environment variable, and you looked at it using a web client, what would you get? Really, just the version URL. Assuming you chopped off the V3:
$ curl http://10.76.10.254:35357/
{"versions": {"values": [{"id": "v3.14", "status": "stable", "updated": "2020-04-07T00:00:00Z", "links": [{"rel": "self", "href": "http://10.76.10.254:35357/v3/"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}]}}
and the only URL in there is the version URL, which gives you back the same thing.
If you point a web browser at the service, the output is in JSON, even though the web browser told the server that it preferred HTML.
What could this look like: If we look at the API spec for Keystone: We can see that the various entities referred to Above hat fairly predictable URL forms. However, for this use case, we want a token, so we should, at a minimum, see the path to get to the token. Since this is the V3 API, we should See an entry like this:
{"rel": "auth", "href": "http://10.76.10.254:35357/v3/auth"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}
And is we then performed an HTTP GET on http://10.76.10.254:35357/v3/auth we should see a link to :
{"rel": "token", "href": "http://10.76.10.254:35357/v3/auth/token"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}
Is this 100% of the solution? No. The Keystone API shows its prejudices toward PASSWORD based authentication, a very big antipattern. The Password goes in clear text into the middle of the JSON blob posted to this API. We trust in SSL/TLS to secure it over the wire, and have had to erase from logs and debugging. This is actually a step backwards from BASIC_AUTH in HTTP. All this aside, there is still no way to tell what you need to put into the body of the token request without reading the documentation….unless you know the magic of JSON-HOME.
Here is what you would need to do to get a list of the top level URLS, excluding all the ones that are templated, and thus require knowing an ID.
curl 10.76.116.63:5000 -H "Accept: application/json-home" | jq '.resources | to_entries | .[] | .value | .href ' | sort -u
This would be the friendly list to return from the /v3 page. Or, if we wanted to streamline it a bit for human consumption, we could put a top level grouping around each of these APIs. A friendlier list would look like this (chopping off the /v3)
There are a couple ways to order the list. Alphabetical order is the simplest for an English speaker if they know what they are looking for. This won’t internationalize, and it won’t guide the user to the use cases that are most common. Thus, I put auth at the top, as that is, by far, the most common use case. The others I have organized based on a quick think-through from most to least common. I could easily be convinced to restructure this a couple different ways.
However, we are starting to trip over some of the other aspects of usability. We have provided the user with way more information than they need, or, indeed, can use at this point. Since none of those operations can be performed unauthenticated, we have lead the user astray; we should show them, at this stage, only what they can do in their current state. Thus: the obvious entry would be.
Lets continue on with the old-school version of a token request using the v3/auth/tokens resource, as that is the most common use case. How now does a user request a token? Depends on whether they want to use password or another token, or multifactor, and whether they want an unscoped token or a scoped token.
None of this information is in the JSON home. You have to read the docs.
If we were using straight HTML to render the response, we would expect a form. Something along the lines of:
There is, as of now, no standard way to put form data into JSON. However, there are numerous standards to chose from. One such standard is FormData API. JSON Scheme https://json-schema.org/. If we look at the API do, we get a table that specifies the name. Anything that is not a single value is specified as an object, which really means a JSON object which is a dictionary that can bee deeply nested. We can see the complexity in the above form, where the scope value determines what is meant by the project/domain name field. And these versions don’t allow for IDs to be used instead of the names for users, projects, or domains.
A lot of the custom approach here is dictated by the fact that Keystone does not accept standard authentication. The Password based token request could easily be replaced with BASIC-AUTH. Tokens themselves could be stored as session cookies, with the same timeouts as the token expiration. All of the One-Offs in Keystone make it more difficult to use, and require more application specific knowledge.
Many of these issues were straightened out when we started doing federation. Now, there is still some out-of-band knowledge required to use the Federated API, but this was due to concerns about information leaking that I am going to ignore for now. The approach I am going to describe is basically what is used by any app that allows you to log in from the different cloud providers Identity sources today.
From the /v3 page, a user should be able to select the identity provider that they want to use. This could require a jump to /v3/FEDERATION and then to /v3/FEDERATION/idp, in order to keep things namespaced, or the list could be expanded in the /v3 page if there is really nothing else that a user can do unauthenticated.
Let us assume a case where there are three companies that all share access to the cloud; Acme, Beta, and Charlie. The JSON response would be the same as the list identity providers API. The interesting part of the result is this one here:
"protocols": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols"
Lets say that a given Identity provider supports multiple protocols. Here is where the user gets to chose which hone they want to use to try and authenticate. An HTTP GET on the link above would return that list: The documentation shows an example of an identity provider that supports saml2. Here is an expanded one that shows the set of protocols a user could expect in a private cloud running FreeIPA and Keycloak, or Active Directory and ADFS.
{
"links": {
"next": null,
"previous": null,
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols"
},
"protocols": [
{
"id": "saml2",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/saml2"
},
"mapping_id": "xyz234"
},
{
"id": "x509",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/x509"
},
"mapping_id": "xyz235"
},
{
"id": "gssapi",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/gssapi"
},
"mapping_id": "xyz236"
},
{
"id": "oidc",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/oidc"
},
"mapping_id": "xyz237"
},
{
"id": "basic-auth",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/basic-auth"
},
"mapping_id": "xyz238"
}
]
}
Note that this is very similar to the content that a web browser gives back in a 401 response: the set of acceptable authentication mechanisms. I actually prefer this here, as it actually allows the user to select the appropriate mechanism for the use case, which may vary depending on where the use connects from.
Lets ignore the actual response from the above links and assume that, if the user is unauthenticated, they merely get a link to where they can authenticate. /v3/OS-FEDERATION/identity_providers/{idp_id}/protocols/{protocol_id}/auth. The follow on link is a GET. Not a POST. There is no form Data required. The mapping resolves the users Domain Name/ID, so there is no need to provide that information, and the token is a Federated unscoped token.
The actual response contains the list of groups that a user belongs to. This is an artifact of the mapping, and it is useful for debugging. However, what the user has at this point is, effectively, an unscoped token. It is passed in the X-Subject-Token header, and not in the session cookie. However, for an HTML based workflow, and, indeed, for sane HTTP workflows against Keystone, a session scoped cookie containing the token would be much more useful.
With an unscoped token, a user can perform some operations against a Keystone server, but those operations are either read-only, operations specific to the user, or administrative actions specific to the Keystone server. For OpenStack, the vast majority of the time the user is going to Keystone to request a scoped token to use on one of the other services. As such, the user probably needs to convert the unscoped token shown above to a token scoped to a project. A very common setup has the user assigned to a single project. Even if they are scoped to multiple, it is unlikely that they are scoped to many. Thus, the obvious next step is to show the user a URL that will allow them to get a token scoped to a specific project.
Keystone does not have such a URL. In Keystone, again you are required to go through /v3/auth/tokens to request a scoped token.
A much friendlier URL scheme would be /v3/auth/projects which lists the set of projects a user can request a token for, and /v3/auth/project/{id} which lets a user request a scoped token for that project
However, even if we had such a URL pattern, we would need to direct the user to that URL. There are two distinct use cases. The first is the case where the user has just authenticated, and in the token response, they need to see the project list URL. A redirect makes the most sense, although the list of projects could also be in authentication response. However, the user might also be returning to the Keystone server from some other operation, still have the session cookie with the token in it, and start at the discovery page again. IN this case, the /v3/ response should show /v3/auth/projects/ in its list.
There is, unfortunately, one case where this would be problematic. With Hierarchical projects, a single assignment could allow a user to get a token for many projects. While this is a useful hack in practice, it means that the project list page could get extremely long. This is, unfortunately also the case with the project list page itself; projects may be nested, but the namespace needs to be flat, and listing projects will list all of them, only the parent-project ID distinguishes them. Since we do have ways to do path nesting in HTTP, this is a solvable problem. Lets lump the token request and the project list APIs together. This actually makes a very elegant solution;
Instead of /v3/auth/projects we put a link off the project page itelf back to /v3/auth/tokens but accepting the project ID as a URL parameter, like this: /v3/auth/tokens?project_id=abc123.
Of course, this means that there is a hidden mechanism now. If a user wants to look at any resource in Keystone, they can do so with an unscoped token, provided they have a role assignment on the project or domain that manages that object.
To this point we have discussed implicit answers to the questions of finding URLs and discovering what actions a user can perform. For the token request, is started discussing how to provide the answer to “What information do I need to provide in order to perform this action?” I think now we can state how to do that: the list page for any collection should either provide an inline form or a link to a form URL. The form provides the information in a format that makes sense for the content type. If the user does not have the permission to create the object, they should not see the form. If the form is on a separate link, a user that cannot create that object should get back a 403 error if they attempt to GET the URL.
If Keystone had been written to return HTML when hit by a browser instead of JSON, all of this navigation would have been painfully obvious. Instead, we subscribed to the point of view that UI was to be done by the Horizon server.
There still remains the last question: “What permission do I need in order to perform this action?” The user only thinks to answer this question when they come across an operation that they cannot perform. I’ll did deeper into this in the next article
Kolla creates an admin.rc file using the environment variables. I want to then use this in a terraform plan, but I’d rather not generate terrafoprm specific code for the Keystone login data. So, a simple python script converts from env vars to yaml.
#!/usr/bin/python3
import os
import yaml
clouds = {
"clouds":{
"cluster": {
"auth" : {
"auth_url" : os.environ["OS_AUTH_URL"],
"project_name": os.environ["OS_PROJECT_NAME"],
"project_domain_name": os.environ["OS_PROJECT_DOMAIN_NAME"],
"username": os.environ["OS_USERNAME"],
"user_domain_name": os.environ["OS_USER_DOMAIN_NAME"],
"password": os.environ["OS_PASSWORD"]
}
}
}
}
print (yaml.dump(clouds))
To use it:
./clouds.py > clouds.yaml
Note that you should have sourced the appropriate config environment variables file, such as :
. /etc/kolla/admin-openrc.sh
We'll do the following steps:
openstack server list
example:
$ openstack server list -c ID -c Name -c Status
+--------------------------------------+-------+--------+
| ID | Name | Status |
+--------------------------------------+-------+--------+
| 7e087c31-7e9c-47f7-a4c4-ebcc20034faa | foo-4 | ACTIVE |
| 977fd250-75d4-4da3-a37c-bc7649047151 | foo-2 | ACTIVE |
| a1182f44-7163-4ad9-89ed-36611f75bac7 | foo-5 | ACTIVE |
| a91951b3-ff4e-46a0-a5fd-20064f02afc9 | foo-3 | ACTIVE |
| f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f | foo-1 | ACTIVE |
+--------------------------------------+-------+--------+
We'll use f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f
. That's the uuid of the server foo-1
. For convenience, the server uuid and the resource ID used in openstack metric resource
are the same.
$ openstack metric resource show f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f
+-----------------------+-------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------+
| created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e |
| created_by_user_id | 39d9e30374a74fe8b58dee9e1dcd7382 |
| creator | 39d9e30374a74fe8b58dee9e1dcd7382:6a180caa64894f04b1d04aeed1cb920e |
| ended_at | None |
| id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| metrics | cpu: fe84c113-f98d-42ee-84fd-bdf360842bdb |
| | disk.ephemeral.size: ad79f268-5f56-4ff8-8ece-d1f170621217 |
| | disk.root.size: 6e021f8c-ead0-46e4-bd26-59131318e6a2 |
| | memory.usage: b768ec46-5e49-4d9a-b00d-004f610c152d |
| | memory: 1a4e720a-2151-4265-96cf-4daf633611b2 |
| | vcpus: 68654bc0-8275-4690-9433-27fe4a3aef9e |
| original_resource_id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| project_id | 8d077dbea6034e5aa45c0146d1feac5f |
| revision_end | None |
| revision_start | 2021-11-09T10:00:46.241527+00:00 |
| started_at | 2021-11-09T09:29:12.842149+00:00 |
| type | instance |
| user_id | 65bfeb6cc8ec4df4a3d53550f7e99a5a |
+-----------------------+-------------------------------------------------------------------+
This list shows the metrics associated with the instance.
You are done here.
$ ssh controller-0 -l root
$ podman ps --format "{{.Names}} {{.Status}}" | grep ceilometer
ceilometer_agent_central Up 2 hours ago
ceilometer_agent_notification Up 2 hours ago
On compute nodes, there should be ceilometer_agent_compute running
$ podman ps --format "{{.Names}} {{.Status}}" | grep ceilometer
ceilometer_agent_compute Up 2 hours ago
The metrics are being sent from ceilometer to a remote defined in
/var/lib/config-data/puppet-generated/ceilometer/etc/ceilometer/pipeline.yaml
, which may look similar to the following file
---
sources:
- name: meter_source
meters:
- "*"
sinks:
- meter_sink
sinks:
- name: meter_sink
publishers:
- gnocchi://?filter_project=service&archive_policy=ceilometer-high-rate
- notifier://172.17.1.40:5666/?driver=amqp&topic=metering
In this case, data is sent to both STF and Gnocchi. Next step is to check
if there are any errors happening. On controllers and computes, ceilometer
logs are found in /var/log/containers/ceilometer/
.
The agent-notification.log
shows logs from publishing data, as well as
errors if sending out metrics or logs fails for some reason.
If there are any errors in the log file, it is likely that metrics are not being delivered to the remote.
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 136, in _send_notification
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging retry=retry)
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 295, in wrap
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging return func(self, *args, **kws)
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 397, in send_notification
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging raise rc
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging oslo_messaging.exceptions.MessageDeliveryFailure: Notify message sent to <Target topic=event.sample> failed: timed out
In this case, it failes to send messages to the STF instance. The following example shows the gnocchi api not responding or not being accessible
2021-11-16 10:38:07.707 16 ERROR ceilometer.publisher.gnocchi [-] <html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
(HTTP 503): gnocchiclient.exceptions.ClientException: <html><body><h1>503 Service Unavailable</h1>
For more gnocchi debugging, see the gnocchi section.
Gnocchi sits on controller nodes and consists of three separate containers, gnocchi_metricd, gnocchi_statsd, and gnocchi_api. The latter is for the interaction with the outside world, such as ingesting metrics or returning measurements.
Gnocchi metricd are used for re-calculating metrics, downsampling for lower
granularity, etc. Gnocchi logfiles are found under /var/log/containers/gnocchi
and the gnocchi API is hooked into httpd, thus the logfiles are
stored under /var/log/containers/httpd/gnocchi-api/
. The corresponding files
there are either gnocchi_wsgi_access.log
or gnocchi_wsgi_error.log
.
In the case from above (ceilometer section), where ceilometer could not send metrics to gnocchi, one would also observe log output for the gnocchi API.
For a starter, let's see which resources there are.
openstack server list -c ID -c Name -c Status
+--------------------------------------+-------+--------+
| ID | Name | Status |
+--------------------------------------+-------+--------+
| 7e087c31-7e9c-47f7-a4c4-ebcc20034faa | foo-4 | ACTIVE |
| 977fd250-75d4-4da3-a37c-bc7649047151 | foo-2 | ACTIVE |
| a1182f44-7163-4ad9-89ed-36611f75bac7 | foo-5 | ACTIVE |
| a91951b3-ff4e-46a0-a5fd-20064f02afc9 | foo-3 | ACTIVE |
| f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f | foo-1 | ACTIVE |
+--------------------------------------+-------+--------+
To show which metrics are stored for the vm foo-1
one would use the following
command
openstack metric resource show f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f --max-width 75
+-----------------------+-------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------+
| created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e |
| created_by_user_id | 39d9e30374a74fe8b58dee9e1dcd7382 |
| creator | 39d9e30374a74fe8b58dee9e1dcd7382:6a180caa64894f |
| | 04b1d04aeed1cb920e |
| ended_at | None |
| id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| metrics | cpu: fe84c113-f98d-42ee-84fd-bdf360842bdb |
| | disk.ephemeral.size: |
| | ad79f268-5f56-4ff8-8ece-d1f170621217 |
| | disk.root.size: |
| | 6e021f8c-ead0-46e4-bd26-59131318e6a2 |
| | memory.usage: |
| | b768ec46-5e49-4d9a-b00d-004f610c152d |
| | memory: 1a4e720a-2151-4265-96cf-4daf633611b2 |
| | vcpus: 68654bc0-8275-4690-9433-27fe4a3aef9e |
| original_resource_id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| project_id | 8d077dbea6034e5aa45c0146d1feac5f |
| revision_end | None |
| revision_start | 2021-11-09T10:00:46.241527+00:00 |
| started_at | 2021-11-09T09:29:12.842149+00:00 |
| type | instance |
| user_id | 65bfeb6cc8ec4df4a3d53550f7e99a5a |
+-----------------------+-------------------------------------------------+
To view the memory usage between Nov 18 2021 17:00 UTC and 17:05 UTC, one would issue this command:
openstack metric measures show --start 2021-11-18T17:00:00 \
--stop 2021-11-18T17:05:00 \
--aggregation mean
b768ec46-5e49-4d9a-b00d-004f610c152d
+---------------------------+-------------+-------------+
| timestamp | granularity | value |
+---------------------------+-------------+-------------+
| 2021-11-18T17:00:00+00:00 | 3600.0 | 28.87890625 |
| 2021-11-18T17:00:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:01:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:02:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:03:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:04:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:00:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:00:44+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:01:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:01:44+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:02:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:02:44+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:03:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:03:44+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:04:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:04:44+00:00 | 1.0 | 28.87890625 |
+---------------------------+-------------+-------------+
This shows, the data is available with granularity 3600, 60 and 1 sec. The memory usage does not change over the time, that's why the values don't change. Please note, if you'd be asking for values with the granularity of 300, the result will be empty
$ openstack metric measures show --start 2021-11-18T17:00:00 \
--stop 2021-11-18T17:05:00 \
--aggregation mean \
--granularity 300
b768ec46-5e49-4d9a-b00d-004f610c152d
Aggregation method 'mean' at granularity '300.0' for metric b768ec46-5e49-4d9a-b00d-004f610c152d does not exist (HTTP 404)
More info about the metric can be actually listed by using
openstack metric show --resource-id f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f \
memory.usage \
--max-width 75
+--------------------------------+----------------------------------------+
| Field | Value |
+--------------------------------+----------------------------------------+
| archive_policy/name | ceilometer-high-rate |
| creator | 39d9e30374a74fe8b58dee9e1dcd7382:6a180 |
| | caa64894f04b1d04aeed1cb920e |
| id | b768ec46-5e49-4d9a-b00d-004f610c152d |
| name | memory.usage |
| resource/created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e |
| resource/created_by_user_id | 39d9e30374a74fe8b58dee9e1dcd7382 |
| resource/creator | 39d9e30374a74fe8b58dee9e1dcd7382:6a180 |
| | caa64894f04b1d04aeed1cb920e |
| resource/ended_at | None |
| resource/id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| resource/original_resource_id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| resource/project_id | 8d077dbea6034e5aa45c0146d1feac5f |
| resource/revision_end | None |
| resource/revision_start | 2021-11-09T10:00:46.241527+00:00 |
| resource/started_at | 2021-11-09T09:29:12.842149+00:00 |
| resource/type | instance |
| resource/user_id | 65bfeb6cc8ec4df4a3d53550f7e99a5a |
| unit | MB |
+--------------------------------+----------------------------------------+
It shows in this case, the used archive policy is ceilometer-high-rate.
openstack metric archive-policy show ceilometer-high-rate --max-width 75
+---------------------+---------------------------------------------------+
| Field | Value |
+---------------------+---------------------------------------------------+
| aggregation_methods | mean, rate:mean |
| back_window | 0 |
| definition | - timespan: 1:00:00, granularity: 0:00:01, |
| | points: 3600 |
| | - timespan: 1 day, 0:00:00, granularity: 0:01:00, |
| | points: 1440 |
| | - timespan: 365 days, 0:00:00, granularity: |
| | 1:00:00, points: 8760 |
| name | ceilometer-high-rate |
+---------------------+---------------------------------------------------+
That means, in this case, the aggregation methods one could use for querying the metrics are just mean and rate:mean. Other methods could include min or max.
Alarms can be retrieved by issuing
$ openstack alarm list
To create an alarm, for example based on disk.ephemeral.size, one would use something like
openstack alarm create --alarm-action 'log://' \
--ok-action 'log://' \
--comparison-operator ge \
--evaluation-periods 1 \
--granularity 60 \
--aggregation-method mean \
--metric disk.ephemeral.size \
--resource-id f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f \
--name ephemeral \
-t gnocchi_resources_threshold \
--resource-type instance \
--threshold 1
+---------------------------+----------------------------------------+
| Field | Value |
+---------------------------+----------------------------------------+
| aggregation_method | mean |
| alarm_actions | ['log:'] |
| alarm_id | 994a1710-98e8-495f-89b5-f14349575c96 |
| comparison_operator | ge |
| description | gnocchi_resources_threshold alarm rule |
| enabled | True |
| evaluation_periods | 1 |
| granularity | 60 |
| insufficient_data_actions | [] |
| metric | disk.ephemeral.size |
| name | ephemeral |
| ok_actions | ['log:'] |
| project_id | 8d077dbea6034e5aa45c0146d1feac5f |
| repeat_actions | False |
| resource_id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| resource_type | instance |
| severity | low |
| state | insufficient data |
| state_reason | Not evaluated yet |
| state_timestamp | 2021-11-22T10:16:15.250720 |
| threshold | 1.0 |
| time_constraints | [] |
| timestamp | 2021-11-22T10:16:15.250720 |
| type | gnocchi_resources_threshold |
| user_id | 65bfeb6cc8ec4df4a3d53550f7e99a5a |
+---------------------------+----------------------------------------+
The state here insufficient data
states, the data gathered or
stored is not sufficient to compare against. There is also a state
reason given, in this case Not evaluated yet
, which gives an explanation.
Another valid reason could be No datapoint for granularity 60
.
On OpenStack installations deployed via Tripleo aka OSP Director, the log files are located
on the separate nodes under /var/log/containers/{service_name}/
. The config files for
the services are stored under /var/lib/config-data/puppet-generated/<service_name>
and are mounted into the containers.
The highlights of the broader upstream OpenStack project may be read via
The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Yoga.
If an OpenStack server (Ironic or Nova) has an error, it shows up in a nested field. That field is hard to read in its normal layout, due to JSON formatting. Using jq to strip the formatting helps a bunch
The nested field is fault.details.
The -r option strips off the quotes.
[ayoung@ayoung-home scratch]$ openstack server show oracle-server-84-aarch64-vm-small -f json | jq -r '.fault | .details'
Traceback (most recent call last):
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance
block_device_info=block_device_info)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3458, in spawn
block_device_info=block_device_info)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3831, in _create_image
fallback_from_host)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3922, in _create_and_inject_local_root
instance, size, fallback_from_host)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 9243, in _try_fetch_image_cache
trusted_certs=instance.trusted_certs)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 275, in cache
*args, **kwargs)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 642, in create_image
self.verify_base_size(base, size)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 331, in verify_base_size
flavor_size=size, image_size=base_size)
nova.exception.FlavorDiskSmallerThanImage: Flavor's disk is too small for requested image. Flavor disk is 21474836480 bytes, image is 34359738368 bytes.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance
filter_properties, request_spec)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2525, in _build_and_run_instance
reason=e.format_message())
nova.exception.BuildAbortException: Build of instance 5281b93a-0c3c-4d38-965d-568d79abb530 aborted: Flavor's disk is too small for requested image. Flavor disk is 21474836480 bytes, image is 34359738368 bytes.
My team is running a small OpenStack cluster with reposnsibility for providing bare metal nodes via Ironic. Currently, we have a handful of nodes that are not usable. They show up as “Cleaning failed.” I’m learning how to debug this process.
The following ipmtool commands allow us to set the machine to PXE boot, remote power cycle the machine, and view what happens during the boot process.
ipmitool -H $H -U $U -I lanplus -P $P chassis power status
ipmitool -H $H -U $U -I lanplus -P $P chassis power on
ipmitool -H $H -U $U -I lanplus -P $P chassis power off
ipmitool -H $H -U $U -I lanplus -P $P chassis power cycle
ipmitool -H $H -U $U -I lanplus -P $P sol activate
ipmitool -H $H -U $U -I lanplus -P $P chassis bootdev pxe
#Set Boot Device to pxe
To tail the log and only see entries relevant to the UUID of the node I am cleaning:
tail -f /var/log/kolla/ironic/ironic-conductor.log | grep $UUID
What is the IPMI address for a node?
openstack baremetal node show fab1bcf7-a7fc-4c19-9d1d-fc4dbc4b2281 -f json | jq '.driver_info | .ipmi_address'
"10.76.97.171"
We have a script that prepares the PXE server to accept a cleaning request from a node. It performs the following three actions (don’t do these yet):
openstack baremetal node maintenance unset ${i}
openstack baremetal node manage ${i}
openstack baremetal node provide ${i}
To look at the IPM power status (and confirm that IPMI is set up right for the nodes)
for node in `openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="clean failed") | .UUID' ` ;
do
echo $node ;
METAL_IP=`openstack baremetal node show $node -f json | jq -r '.driver_info | .ipmi_address' ` ;
echo $METAL_IP ;
ipmitool -I lanplus -H $METAL_IP -L ADMINISTRATOR -U admin -R 12 -N 5 -P admin chassis power status ;
done
Yes, I did that all on one line, hence the semicolons.
A couple other one liners. This selects all active nodes and gives you their node id and ipmi IP address.
for node in `openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="active") | .UUID' ` ; do echo $node ; openstack baremetal node show $node -f json | jq -r '.driver_info | .ipmi_address' ;done
And you can swap out active with other values. For example, if you want to see what nodes are in either error or clean failed states:
openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="error" or ."Provisioning State"=="manageable") | .UUID'
If I want to ensure I can PXE boot, out side of the openstack operations, in one terminal, I can track the state in a console. I like to have this running in a dedicated terminal: open the SOL.
ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN sol activateand in another, set the machine to PXE boot, then power cycle it:
ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN chassis bootdev pxe
Set Boot Device to pxe
[ayoung@ayoung-home keystone]$ ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN chassis power cycle
Chassis Power Control: Cycle
If the Ironic server is not ready to accept the PXE request, your server will let you know with a message like this one:
>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: 1C-34-DA-51-D6-C0.
PXE-E18: Server response timeout.
ERROR: Boot option loading failed
openstack baremetal node list --provision-state "clean failed" -f value -c UUID
Produces output like this:
8470e638-0085-470c-9e51-b2ed016569e1
5411e7e8-8113-42d6-a966-8cacd1554039
08c14110-88aa-4e45-b5c1-4054ac49115a
3f5f510c-a313-4e40-943a-366917ec9e44
I’ll track what is going on in the log for a specific node by running tail -f and grepping for the uuid of the node:
tail -f /var/log/kolla/ironic/ironic-conductor.log | grep 5411e7e8-8113-42d6-a966-8cacd1554039
If you run the three commands I showed above, the Ironic server should be prepared for cleaning and will accept the PXE request. I can execute these one at a time and track the state in the conductor log. If I kick off a clean, eventually, I see entries like this in the conductor log (I’m removing the time stamps and request ids for readability):
ERROR ironic.conductor.task_manager [] Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "clean failed" from state "clean wait"; target provision state is "available"
INFO ironic.conductor.utils [] Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power off by power off.
INFO ironic.drivers.modules.network.flat [] Removing ports from cleaning network for node 5411e7e8-8113-42d6-a966-8cacd1554039
INFO ironic.common.neutron [] Successfully removed node 5411e7e8-8113-42d6-a966-8cacd1554039 neutron ports.
And I can trigger this manually if a run is taking too long by running:
openstack baremetal node abort $UUID
The command to kick off the clean process is
openstack baremetal node provide $UUID
In the conductor log, that should show messages like this (again, edited for readability)
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "cleaning" from state "manageable"; target provision state is "available"
Adding cleaning network to node 5411e7e8-8113-42d6-a966-8cacd1554039
For node 5411e7e8-8113-42d6-a966-8cacd1554039 in network de931fcc-32a0-468e-8691-ffcb43bf9f2e, successfully created ports (ironic ID: neutron ID): {'94306ff5-5cd4-4fdd-a33e-a0202c34d3d0': 'd9eeb64d-468d-4a9a-82a6-e70d54b73e62'}.
Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power on by rebooting.
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "clean wait" from state "cleaning"; target provision state is "available"
At this point, the most interesting thing is to see what is happening on the node. ipmiptool sol activate provides a running log. If you are lucky, the PXE process kicks off and a debian-based kernel should start booting. My company has a specific login set for the machines:
debian login: ampere Password: Linux debian 5.10.0-6-arm64 #1 SMP Debian 5.10.28-1 (2021-04-09) aarch64After this, I use sudo -i to run as root.
$ sudo -i
...
# ps -ef | grep ironic
root 2369 1 1 14:26 ? 00:00:02 /opt/ironic-python-agent/bin/python3 /usr/local/bin/ironic-python-agent --config-dir /etc/ironic-python-agent.d/
Looking for logs:
ls /var/log/
btmp ibacm.log opensm.0x9a039bfffead6720.log private
chrony lastlog opensm.0x9a039bfffead6721.log wtmp
No ironic log. Is this thing even on the network?
# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp1s0f0np0: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 98:03:9b:ad:67:20 brd ff:ff:ff:ff:ff:ff
3: enp1s0f1np1: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 98:03:9b:ad:67:21 brd ff:ff:ff:ff:ff:ff
4: enxda90910dd11e: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether da:90:91:0d:d1:1e brd ff:ff:ff:ff:ff:ff
Nope. Ok, lets get it on the network:
# dhclient
[ 486.508054] mlx5_core 0000:01:00.1 enp1s0f1np1: Link down
[ 486.537116] mlx5_core 0000:01:00.1 enp1s0f1np1: Link up
[ 489.371586] mlx5_core 0000:01:00.0 enp1s0f0np0: Link down
[ 489.394050] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0f1np1: link becomes ready
[ 489.400646] mlx5_core 0000:01:00.0 enp1s0f0np0: Link up
[ 489.406226] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0f0np0: link becomes ready
root@debian:~# [ 500.596626] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp1s0f0np0: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 98:03:9b:ad:67:20 brd ff:ff:ff:ff:ff:ff
inet 192.168.97.178/24 brd 192.168.97.255 scope global dynamic enp1s0f0np0
valid_lft 86386sec preferred_lft 86386sec
inet6 fe80::9a03:9bff:fead:6720/64 scope link
valid_lft forever preferred_lft forever
3: enp1s0f1np1: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 98:03:9b:ad:67:21 brd ff:ff:ff:ff:ff:ff
inet6 fe80::9a03:9bff:fead:6721/64 scope link
valid_lft forever preferred_lft forever
4: enxda90910dd11e: mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
link/ether da:90:91:0d:d1:1e brd ff:ff:ff:ff:ff:ff
inet6 fe80::d890:91ff:fe0d:d11e/64 scope link
valid_lft forever preferred_lft forever
And…quite shortly thereafter in the conductor log:
Agent on node 5411e7e8-8113-42d6-a966-8cacd1554039 returned cleaning command success, moving to next clean step
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "cleaning" from state "clean wait"; target provision state is "available"
Executing cleaning on node 5411e7e8-8113-42d6-a966-8cacd1554039, remaining steps: []
Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power off by power off.
Removing ports from cleaning network for node 5411e7e8-8113-42d6-a966-8cacd1554039
Successfully removed node 5411e7e8-8113-42d6-a966-8cacd1554039 neutron ports.
Node 5411e7e8-8113-42d6-a966-8cacd1554039 cleaning complete
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "available" from state "cleaning"; target provision state is "None"
So, in our case, the issue seems to be that the IPA image does not have dhcp enabled.
This Thursday at 14:00 UTC Francesco and I will be in a panel on OpenInfra Live Episode 24: OpenStack and Ceph.
by Unknown (noreply@blogger.com) at September 27, 2021 10:19 PM
OS Migrate is a toolbox for content migration (workloads and more) between OpenStack clouds. Let’s dive into why you’d use it, some of its most notable features, and a bit of how it works.
Why move cloud content between OpenStacks? Imagine these situations:
Old cloud hardware is obsolete, you’re buying new. A new green field deployment will be easier than gradual replacement of hardware in the original cloud.
You want to make fundamental changes to your OpenStack deployment, that would be difficult or risky to perform on a cloud which is already providing service to users.
You want to upgrade to a new release of OpenStack, but you want to cut down on associated cloud-wide risk, or you can’t schedule cloud-wide control plane downtime.
You want to upgrade to a new release of OpenStack, but the cloud users should be given a choice when to stop using the old release and start using the new.
A combination of the above.
In such situations, running (at least) two clouds in parallel for a period of time is often the preferable path. And when you run parallel clouds, perhaps with the intention of decomissioning some of them eventually, a tool may come in handy to copy/migrate the content that users have created (virtual networks, routers, security groups, machines, block storage, images etc.) from one cloud to another. This is what OS Migrate is for.
Now we know OS Migrate copies/moves content from one OpenStack to another. But there is more to say. Some of the design decisions that went into OS Migrate should make it a tool of choice:
Uses standard OpenStack APIs. You don’t need to install any plugins into your clouds before using OS Migrate, and OS Migrate does not need access to the backends of your cloud (databases etc.).
Runnable with tenant privileges. For moving tenant-owned content, OS Migrate only needs tenant credentials (not administrative credentials). This naturally reduces risks associated with the migration.
If desired, cloud tenants can even use OS Migrate on their own. Cloud admins do not necessarily need to get involved.
Admin credentials are only needed when the content being migrated requires admin privileges to be created (e.g. public Glance images).
Transparent. The metadata of exported content is in human-readable YAML files. You can inspect what has been exported from the source cloud, and tweak it if necessary, before executing the import into the destination cloud.
Stateless. There is no database in OS Migrate that could get out of sync with reality. The source of migration information are the human readable YAML files. ID-to-ID mappings are not kept, entry-point resources are referred to by names.
Idempotent. In case of an issue, fix the root cause and re-run, be it export or import. OS Migrate has mechanisms against duplicit exports and duplicit imports.
Cherry-pickable. There’s no need to migrate all content with OS Migrate. Only migrate some tenants, or further scope to some of their resource types, or further limit the resource type exports/imports by a list of resource names or regular expression patterns. Use as much or as little of OS Migrate as you need.
Implemented as an Ansible collection. When learning to work with OS Migrate, most importantly you’ll be learning to work with Ansible, an automation tool used across the IT industry. If you already know Ansible, you’ll feel right at home with OS Migrate.
If you want to use OS Migrate, the best thing I can do here is point towards the OS Migrate User Documentation. If you just want to get a glimpse for now, read on.
As OS Migrate is an Ansible collection, the main mode of use is setting Ansible variables and running playbooks shipped with the collection.
Should the default playbooks not fit a particular use case, a technically savvy user could also utilize the collection’s roles and modules as building blocks to craft their own playbooks. However, as i’ve wrote above in the point about cherry-picking features, we’ve tried to make the default playbooks quite generically usable.
In OS Migrate we differentiate between two main migration types with respect to what resources we are migrating: pre-workload migration, and workload migration.
Pre-workload migration focuses on content/resources that can be copied to the destination cloud without affecting workloads in the source cloud. It can be typically done with little timing pressure, ahead of time before migrating workloads. This includes resources like tenant networks, subnets, routers, images, security groups etc.
The content is serialized as editable YAML files to the Migrator host (the machine running the Ansible playbooks), and then resources are created in the destination according to the YAML serializations.
Workload migration focuses on copying VMs and their attached Cinder volumes, and on creating floating IPs for VMs in the destination cloud. The VM migration between clouds is a “cold” migration. VMs first need to be stopped and then they are copied.
With regards to the boot disk of the VM, we support two options: either the destination VM’s boot disk is created from a Glance image, or the source VM’s boot disk snapshot is copied into the destination cloud as a Cinder volume and the destination VM is created as boot-from-volume. There is a migration parameter controlling this behavior on a per-VM basis. Additional Cinder volumes attached to the source VM are copied.
The data path for VMs and volumes is slightly different than in the pre-workload migration. Only metadata gets exported onto the Migrator host. For moving the binary data, special VMs called conversion hosts are deployed, one in the source and one in the destination. This is done for performance reasons, to allow the VMs’ and volumes’ binary data to travel directly from cloud to cloud without going through the (perhaps external) Migrator host as an intermediary.
Now that we have an overview of OS Migrate, let’s finish with some links where more info can be found:
OS Migrate Documentation is the primary source of information on OS Migrate.
OS Migrate Matrix Channel is monitored by devs for any questions you might have.
Issues on Github is the right place to report any bugs, and you can ask questions there too.
If you want to contribute (code, docs, …), see OS Migrate Developer Documentation.
Have a good day!
RDO Wallaby Released
Contributors
collectd itself is intended as lightweight collecting agent for metrics and events. In larger infrastructure, the data is sent over the network to a central point, where data is stored and processed further.
This introduces a potential issue: what happens, if the remote endpoint to write data to is not available. The traditional network plugin uses UDP, which is by definition unreliable.
Collectd has a queue of values to be written to an output plugin, such
was write_http
or amqp1
. At the time, when metrics should be
written, collectd iterates on that queue and tries to write this data
to the endpoint. If writing was successful, the data is removed from
the queue. The little word if also hints, there is a chance that data
doesn't get removed. The question is: what happens, or what should be
done?
There is no easy answer to this. Some people tend to ignore missed
metrics, some don't. The way to address this is to cap the queue at a
given length and to remove oldest data when new comes in. The parameters
are WriteQueueLimitHigh
and WriteQueueLimitLow
. If they are unset,
the queue is not limited and will grow until memory is out. For
predictability reasons, you should set these two values to the same
number. To get the right value for this parameter, it would require a
bit of experimentation. If values are dropped, one would see that in
the log file.
When collectd is configured as part of Red Hat OpenStack Platform, the following config snippet can be used:
parameter_defaults:
ExtraConfig:
collectd::write_queue_limit_high: 100
collectd::write_queue_limit_low: 100
Another parameter can be used to limit explicitly the queue length in
case the amqp1 plugin is used for sending out data: the SendQueueLimit
parameter, which is used for the same purpose, but can differ from the
global WriteQueueLimitHigh
and WriteQueueLimitLow
.
parameter_defaults:
ExtraConfig:
collectd::plugin::amqp1::send_queue_limit: 50
In almost all cases, the issue of collectd using much memory could be tracked down to a write endpoint not being available, dropping data occasionally, etc.
Recently, I bought a couple of Raspberry Pi 4, one with 4 GB and 2 equipped with 8 GB of RAM. When I bought the first one, there was no option to get bigger memory. However, I saw this as a game and thought to give this a try. I also bought SSDs for these and USB3 to SATA adapters. Before purchasing anything, you may want to take a look at James Archers page. Unfortunately, there are a couple on adapters on the marked, which don't work that well.
Initially, I followed the description to deploy Fedora 32; it works the same way for Fedora 33 Server (in my case here).
Because ceph requires a partition (or better: a whole disk), I used the traditional setup using partitions and no LVM.
git clone https://github.com/kubernetes-sigs/kubespray
cd kubespray
I followed the documentation and created an inventory. For the container
runtime, I picked crio
, and as calico
as network plugin.
Because of an issue,
I had to patch roles/download/defaults/main.yml
:
diff --git a/roles/download/defaults/main.yml b/roles/download/defaults/main.yml
index a97be5a6..d4abb341 100644
--- a/roles/download/defaults/main.yml
+++ b/roles/download/defaults/main.yml
@@ -64,7 +64,7 @@ quay_image_repo: "quay.io"
# TODO(mattymo): Move calico versions to roles/network_plugins/calico/defaults
# after migration to container download
-calico_version: "v3.16.5"
+calico_version: "v3.15.2"
calico_ctl_version: "{{ calico_version }}"
calico_cni_version: "{{ calico_version }}"
calico_policy_version: "{{ calico_version }}"
@@ -520,13 +520,13 @@ etcd_image_tag: "{{ etcd_version }}{%- if image_arch != 'amd64' -%}-{{ image_arc
flannel_image_repo: "{{ quay_image_repo }}/coreos/flannel"
flannel_image_tag: "{{ flannel_version }}"
calico_node_image_repo: "{{ quay_image_repo }}/calico/node"
-calico_node_image_tag: "{{ calico_version }}"
+calico_node_image_tag: "{{ calico_version }}-arm64"
calico_cni_image_repo: "{{ quay_image_repo }}/calico/cni"
-calico_cni_image_tag: "{{ calico_cni_version }}"
+calico_cni_image_tag: "{{ calico_cni_version }}-arm64"
calico_policy_image_repo: "{{ quay_image_repo }}/calico/kube-controllers"
-calico_policy_image_tag: "{{ calico_policy_version }}"
+calico_policy_image_tag: "{{ calico_policy_version }}-arm64"
calico_typha_image_repo: "{{ quay_image_repo }}/calico/typha"
-calico_typha_image_tag: "{{ calico_typha_version }}"
+calico_typha_image_tag: "{{ calico_typha_version }}-arm64"
pod_infra_image_repo: "{{ kube_image_repo }}/pause"
pod_infra_image_tag: "{{ pod_infra_version }}"
install_socat_image_repo: "{{ docker_image_repo }}/xueshanf/install-socat"
Ceph requires a raw partition. Make sure, you have an empty partition available.
[root@node1 ~]# lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
sda
├─sda1
│ vfat FAT32 UEFI 7DC7-A592
├─sda2
│ vfat FAT32 CB75-24A9 567.9M 1% /boot/efi
├─sda3
│ xfs cab851cb-1910-453b-ae98-f6a2abc7f0e0 804.7M 23% /boot
├─sda4
│
├─sda5
│ xfs 6618a668-f165-48cc-9441-98f4e2cc0340 27.6G 45% /
└─sda6
In my case, there are sda4
and sda6
not formatted. sda4
is very
small and will be ignored, sda6
will be used.
Using rook is pretty straightforward
git clone --single-branch --branch v1.5.4 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
kubectl create -f cluster.yaml
While reviewing the comments on the Ironic spec, for Secure RBAC. I had to ask myself if the “project” construct makes sense for Ironic. I still think it does, but I’ll write this down to see if I can clarify it for me, and maybe for you, too.
Baremetal servers change. The whole point of Ironic is to control the change of Baremetal servers from inanimate pieces of metal to “really useful engines.” This needs to happen in a controlled and unsurprising way.
Ironic the server does what it is told. If a new piece of metal starts sending out DHCP requests, Ironic is going to PXE boot it. This is the start of this new piece of metals journey of self discovery. At least as far as Ironic is concerned.
But really, someone had to rack and wire said piece of metal. Likely the person that did this is not the person that is going to run workloads on it in the end. They might not even work for the same company; they might be a delivery person from Dell or Supermicro. So, once they are done with it, they don’t own it any more.
Who does? Who owns a piece of metal before it is enrolled in the OpenStack baremetal service?
No one. It does not exist.
Ok, so lets go back to someone pushing the button, booting our server for the first time, and it doing its PXE boot thing.
Or, we get the MAC address and enter that into the ironic database, so that when it does boot, we know about it.
Either way, Ironic is really the playground monitor, just making sure it plays nice.
What if Ironic is a multi-tenant system? Someone needs to be able to transfer the baremetal server from where ever it lands up front to the people that need to use it.
I suspect that ransferring metal from project to project is going to be one of the main use cases after the sun has set on day one.
So, who should be allowed to say what project a piece of baremetal can go to?
Well, in Keystone, we have the idea of hierarchy. A Project is owned by a domain, and a project can be nested inside another project.
But this information is not passed down to Ironic. There is no way to get a token for a project that shows its parent information. But a remote service could query the project hierarchy from Keystone.
Say I want to transfer a piece of metal from one project to another. Should I have a token for the source project or the remote project. Ok, dump question, I should definitely have a token for the source project. The smart question is whether I should also have a token for the destination project.
Sure, why not. Two tokens. One has the “delete” role and one that has the “create” role.
The only problem is that nothing like this exists in Open Stack. But it should.
We could fake it with hierarchy; I can pass things up and down the project tree. But that really does not one bit of good. People don’t really use the tree like that. They should. We built a perfectly nice tree and they ignore it. Poor, ignored, sad, lonely tree.
Actually, it has no feelings. Please stop anthropomorphising the tree.
What you could do is create the destination object, kind of a potential piece-of-metal or metal-receiver. This receiver object gets a UUID. You pass this UUID to the “move” API. But you call the MOVE api with a token for the source project. The move is done atomically. Lets call this thing identified by a UUID a move-request.
The order of operations could be done in reverse. The operator could create the move request on the source, and then pass that to the receiver. This might actually make mores sense, as you need to know about the object before you can even think to move it.
Both workflows seem to have merit.
And…this concept seems to be something that OpenStack needs in general.
Infact, why should the API not be a generic API. I mean, it would have to be per service, but the same API could be used to transfer VMs between projects in Nova nad between Volumes in Cinder. The API would have two verbs one for creating a new move request, and one for accepting it.
POST /thingy/v3.14/resource?resource_id=abcd&destination=project_id
If this is called with a token, it needs to be scoped. If it is scoped to the project_id in the API, it creates a receiving type request. If it is scoped to the project_id that owns the resource, it is a sending type request. Either way, it returns an URL. Call GET on that URL and you get information about the transfer. Call PATCH on it with the appropriately scoped token, and the resource is transferred. And maybe enough information to prove that you know what you are doing: maybe you have to specify the source and target projects in that patch request.
A foolish consistency is the hobgoblin of little minds.
Edit: OK, this is not a new idea. Cinder went through the same thought process according to Duncan Thomas. The result is this API: https://docs.openstack.org/api-ref/block-storage/v3/index.html#volume-transfer
Which looks like it then morphed to this one:
https://docs.openstack.org/api-ref/block-storage/v3/index.html#volume-transfers-volume-transfers-3-55-or-later
I've had to re-teach myself how to do this so I'm writing my own notes.
Prerequisites:
Once you have your environment ready run a test with the name from step 3.
Some tests in CI are configured to use `--skip-tags`. You can do this for your local tests too by setting the appropriate environment variables. For example:
./scripts/run-local-test tripleo_derived_parameters
export TRIPLEO_JOB_ANSIBLE_ARGS="--skip-tags run_ceph_ansible,run_uuid_ansible,ceph_client_rsync,clean_fetch_dir"
./scripts/run-local-test tripleo_ceph_run_ansible
by Unknown (noreply@blogger.com) at December 15, 2020 03:46 PM
Look back at our Pushing Keystone over the Edge presentation from the OpenStack Summit. Many of the points we make are problems faced by any application trying to scale across multiple datacenters. Cassandra is a database designed to deal with this level of scale. So Cassandra may well be a better choice than MySQL or other RDBMS as a datastore to Keystone. What would it take to enable Cassandra support for Keystone?
Lets start with the easy part: defining the tables. Lets look at how we define the Federation back end for SQL. We use SQL Alchemy to handle the migrations: we will need something comparable for Cassandra Query Language (CQL) but we also need to translate the table definitions themselves.
Before we create the tables, we need to create keyspace. I am going to make separate keyspaces for each of the subsystems in Keystone: Identity, Assignment, Federation, and so on. Here’s the Federated one:
CREATE KEYSPACE keystone_federation WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '3'} AND durable_writes = true;
The Identity provider table is defined like this:
idp_table = sql.Table(
'identity_provider',
meta,
sql.Column('id', sql.String(64), primary_key=True),
sql.Column('enabled', sql.Boolean, nullable=False),
sql.Column('description', sql.Text(), nullable=True),
mysql_engine='InnoDB',
mysql_charset='utf8')
idp_table.create(migrate_engine, checkfirst=True)
The comparable CQL to create a table would look like this:
CREATE TABLE identity_provider (id text PRIMARY KEY , enables boolean , description text);
However, when I describe the schema to view the table defintion, we see that there are many tuning and configuration parameters that are defaulted:
CREATE TABLE federation.identity_provider (
id text PRIMARY KEY,
description text,
enables boolean
) WITH additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
I don’t know Cassandra well enough to say if these are sane defaults to have in production. I do know that someone, somewhere, is going to want to tweak them, and we are going to have to provide a means to do so without battling the upgrade scripts. I suspect we are going to want to only use the short form (what I typed into the CQL prompt) in the migrations, not the form with all of the options. In addition, we might want an if not exists clause on the table creation to allow people to make these changes themselves. Then again, that might make things get out of sync. Hmmm.
There are three more entities in this back end:
CREATE TABLE federation_protocol (id text, idp_id text, mapping_id text, PRIMARY KEY(id, idp_id) );
cqlsh:federation> CREATE TABLE mapping (id text primary key, rules text, );
CREATE TABLE service_provider ( auth_url text, id text primary key, enabled boolean, description text, sp_url text, RELAY_STATE_PREFIX text);
One thing that is interesting is that we will not be limiting the ID fields to 32, 64, or 128 characters. There is no performance benefit to doing so in Cassandra, nor is there any way to enforce the length limits. From a Keystone perspective, there is not much value either; we still need to validate the UUIDs in Python code. We could autogenerate the UUIDs in Cassandra, and there might be some benefit to that, but it would diverge from the logic in the Keystone code, and explode the test matrix.
There is only one foreign key in the SQL section; the federation protocol has an idp_id that points to the identity provider table. We’ll have to accept this limitation and ensure the integrity is maintained in code. We can do this by looking up the Identity provider before inserting the protocol entry. Since creating a Federated entity is a rare and administrative task, the risk here is vanishingly small. It will be more significant elsewhere.
For access to the database, we should probably use Flask-CQLAlchemy. Fortunately, Keystone is already a Flask based project, so this makes the two projects align.
For migration support, It looks like the best option out there is cassandra-migrate.
An effort like this would best be started out of tree, with an expectation that it would be merged in once it had shown a degree of maturity. Thus, I would put it into a namespace that would not conflict with the existing keystone project. The python imports would look like:
from keystone.cassandra import migrations
from keystone.cassandra import identity
from keystone.cassandra import federation
This could go in its own git repo and be separately pip installed for development. The entrypoints would be registered such that the configuration file would have entries like:
[application_credential] driver = cassandraAny tuning of the database could be put under a [cassandra] section of the conf file, or tuning for individual sections could be in keys prefixed with cassanda_ in the appropriate sections, such as application_credentials as shown above.
It might be interesting to implement a Cassandra token backend and use the default_time_to_live value on the table to control the lifespan and automate the cleanup of the tables. This might provide some performance benefit over the fernet approach, as the token data would be cached. However, the drawbacks due to token invalidation upon change of data would far outweigh the benefits unless the TTL was very short, perhaps 5 minutes.
Just making it work is one thing. In a follow on article, I’d like to go through what it would take to stretch a cluster from one datacenter to another, and to make sure that the other considerations that we discussed in that presentation are covered.
Feedback?
RDO Victoria Released
The RDO community is pleased to announce the general availability of the RDO build for OpenStack Victoria for RPM-based distributions, CentOS Linux and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Victoria is the 22nd release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.
The release is already available on the CentOS mirror network at http://mirror.centos.org/centos/8/cloud/x86_64/openstack-victoria/.
The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Linux and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS Linux users looking to build and maintain their own on-premise, public or hybrid clouds.
All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.
PLEASE NOTE: RDO Victoria provides packages for CentOS8 and python 3 only. Please use the Train release, for CentOS7 and python 2.7.
Interesting things in the Victoria release include:
Other highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/victoria/highlights.
Contributors
During the Victoria cycle, we saw the following new RDO contributors:
Amy Marrich (spotz)
Daniel Pawlik
Douglas Mendizábal
Lance Bragstad
Martin Chacon Piza
Paul Leimer
Pooja Jadhav
Qianbiao NG
Rajini Karthik
Sandeep Yadav
Sergii Golovatiuk
Steve Baker
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 58 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:
Adam Kimball
Ade Lee
Alan Pevec
Alex Schultz
Alfredo Moralejo
Amol Kahat
Amy Marrich (spotz)
Arx Cruz
Bhagyashri Shewale
Bogdan Dobrelya
Cédric Jeanneret
Chandan Kumar
Damien Ciabrini
Daniel Pawlik
Dmitry Tantsur
Douglas Mendizábal
Emilien Macchi
Eric Harney
Francesco Pantano
Gabriele Cerami
Gael Chamoulaud
Gorka Eguileor
Grzegorz Grasza
Harald Jensås
Iury Gregory Melo Ferreira
Jakub Libosvar
Javier Pena
Joel Capitao
Jon Schlueter
Lance Bragstad
Lon Hohberger
Luigi Toscano
Marios Andreou
Martin Chacon Piza
Mathieu Bultel
Matthias Runge
Michele Baldessari
Mike Turek
Nicolas Hicher
Paul Leimer
Pooja Jadhav
Qianbiao.NG
Rabi Mishra
Rafael Folco
Rain Leander
Rajini Karthik
Riccardo Pittau
Ronelle Landy
Sagi Shnaidman
Sandeep Yadav
Sergii Golovatiuk
Slawek Kaplonski
Soniya Vyas
Sorin Sbarnea
Steve Baker
Tobias Urdin
Wes Hayutin
Yatin Karel
The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Wallaby.
Get Started
There are three ways to get started with RDO.
To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.
For a production deployment of RDO, use TripleO and you’ll be running a production cloud in short order.
Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.
Get Help
The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.
The #rdo channel on Freenode IRC is also an excellent place to find and give help.
We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel, and #tripleo on irc.freenode.net), however we have a more focused audience within the RDO venues.
Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.
Join us in #rdo and #tripleo on the Freenode IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.
TheJulia was kind enough to update the docs for Ironic to show me how to include IPMI information when creating nodes.
for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ; do openstack baremetal node delete $UUID; done
I removed the ipmi common data from each definition as there is a password there, and I will set that afterwards on all nodes.
{
"nodes": [
{
"ports": [
{
"address": "00:21:9b:93:d0:90"
}
],
"name": "zygarde",
"driver": "ipmi",
"driver_info": {
"ipmi_address": "192.168.123.10"
}
},
{
"ports": [
{
"address": "00:21:9b:9b:c4:21"
}
],
"name": "umbreon",
"driver": "ipmi",
"driver_info": {
"ipmi_address": "192.168.123.11"
}
},
{
"ports": [
{
"address": "00:21:9b:98:a3:1f"
}
],
"name": "zubat",
"driver": "ipmi",
"driver_info": {
"ipmi_address": "192.168.123.12"
}
}
]
}
openstack baremetal create ./nodes.ipmi.json
$ openstack baremetal node list
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| 3fa4feae-0d5c-4e38-a012-29258d40651b | zygarde | None | None | enroll | False |
| 00965ad4-c972-46fa-948a-3ce87aecf5ac | umbreon | None | None | enroll | False |
| 8702ea0c-aa10-4542-9292-3b464fe72036 | zubat | None | None | enroll | False |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ;
do openstack baremetal node set $UUID --driver-info ipmi_password=`cat ~/ipmi.password` --driver-info ipmi_username=admin ;
done
EDIT: I had ipmi_user before and it does not work. Needs to be ipmi_username.
And if I look in the returned data for the definition, we see the password is not readable:
$ openstack baremetal node show zubat -f yaml | grep ipmi_password
ipmi_password: '******'
for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ; do openstack baremetal node power on $UUID ; done
Change “on” to “off” to power off.
“I can do any thing. I can’t do everything.”
The sheer number of projects and problem domains covered by OpenStack was overwhelming. I never learned several of the other projects under the big tent. One project that is getting relevant to my day job is Ironic, the bare metal provisioning service. Here are my notes from spelunking the code.
I want just Ironic. I don’t want Keystone (personal grudge) or Glance or Neutron or Nova.
Ironic will write files to e.g. /var/lib/tftp and /var/www/html/pxe and will not handle DHCP, but can make sue of static DHCP configurations.
Ironic is just an API server at this point ( python based web service) that manages the above files, and that can also talk to the IPMI ports on my servers to wake them up and perform configurations on them.
I need to provide ISO images to Ironic so it can put the in the right place to boot them
I checked the code out of git. I am working off the master branch.
I ran tox to ensure the unit tests are all at 100%
I have mysql already installed and running, but with a Keystone Database. I need to make a new one for ironic. The database name, user, and password are all going to be ironic, to keep things simple.
CREATE USER 'ironic'@'localhost' IDENTIFIED BY 'ironic';
create database ironic;
GRANT ALL PRIVILEGES ON ironic.* TO 'ironic'@'localhost';
FLUSH PRIVILEGES;
Note that I did this as the Keystone user. That dude has way to much privilege….good thing this is JUST for DEVELOPMENT. This will be used to follow the steps in the developers quickstart docs. I also set the mysql URL in the config file to this
connection = mysql+pymysql://ironic:ironic@localhost/ironic
Then I can run ironic db sync. Lets’ see what I got:
mysql ironic --user ironic --password
#....
MariaDB [ironic]> show tables;
+-------------------------------+
| Tables_in_ironic |
+-------------------------------+
| alembic_version |
| allocations |
| bios_settings |
| chassis |
| conductor_hardware_interfaces |
| conductors |
| deploy_template_steps |
| deploy_templates |
| node_tags |
| node_traits |
| nodes |
| portgroups |
| ports |
| volume_connectors |
| volume_targets |
+-------------------------------+
15 rows in set (0.000 sec)
OK, so the first table shows that Ironic uses Alembic to manage migrations. Unlike the SQLAlchemy migrations table, you can’t just query this table to see how many migrations have been performed:
MariaDB [ironic]> select * from alembic_version;
+--------------+
| version_num |
+--------------+
| cf1a80fdb352 |
+--------------+
1 row in set (0.000 sec)
The script to start the API server is:
ironic-api -d --config-file etc/ironic/ironic.conf.local
Looking in the file requirements.txt, I see that they Web framework for Ironic is Pecan:
$ grep pecan requirements.txt
pecan!=1.0.2,!=1.0.3,!=1.0.4,!=1.2,>=1.0.0 # BSD
This is new to me. On Keystone, we converted from no framework to Flask. I’m guessing that if I look in the chain that starts with ironic-api file, I will see a Pecan launcher for a web application. We can find that file with
$which ironic-api
/opt/stack/ironic/.tox/py3/bin/ironic-api
Looking in that file, it references ironic.cmd.api, which is the file ironic/cmd/api.py which in turn refers to ironic/common/wsgi_service.py. This in turn refers to ironic/api/app.py from which we can finally see that it imports pecan.
Now I am ready to run the two services. Like most of OpenStack, there is an API server and a “worker” server. In Ironic, this is called the Conductor. This maps fairly well to the Operator pattern in Kubernetes. In this pattern, the user makes changes to the API server via a web VERB on a URL, possibly with a body. These changes represent a desired state. The state change is then performed asynchronously. In OpenStack, the asynchronous communication is performed via a message queue, usually Rabbit MQ. The Ironic team has a simpler mechanism used for development; JSON RPC. This happens to be the same mechanism used in FreeIPA.
OK, once I got the service running, I had to do a little fiddling around to get the command lines to work. The was an old reference to
OS_AUTH_TYPE=token_endpoint
which needed to be replaces with
OS_AUTH_TYPE=none
Both are in the documentation, but only the second one will work.
I can run the following commands:
$ baremetal driver list
+---------------------+----------------+
| Supported driver(s) | Active host(s) |
+---------------------+----------------+
| fake-hardware | ayoungP40 |
+---------------------+----------------+
$ baremetal node list
Lets see if I can figure out from CURL what APIs those are…There is only one version, and one link, so:
curl http://127.0.0.1:6385 | jq '.versions | .[] | .links | .[] | .href'
"http://127.0.0.1:6385/v1/"
Doing curl against that second link gives a list of the top level resources:
And I assume that, if I use curl to GET the drivers, I should see the fake driver entry from above:
$ curl "http://127.0.0.1:6385/v1/drivers" | jq '.drivers |.[] |.name'
"fake-hardware"
OK, that is enough to get started. I am going to try and do the same with the RPMs that we ship with OSP and see what I get there.
But that is a tale for another day.
I had a conversation I had with Julia Kreger, a long time core member of the Ironic project. This helped get me oriented.
Render changes to tripleo docs:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py
pip install tox
Check syntax errors before wasting CI time
cd /home/stack/tripleo-docs
tox -e deploy-guide
Run a specific unit test
tox -e linters
tox -e pep8
cd /home/stack/tripleo-common
tox -e py36 -- tripleo_common.tests.test_inventory.TestInventory.test_get_roles_by_service
cd /home/stack/tripleo-ansible
tox -e py36 -- tripleo_ansible.tests.modules.test_derive_hci_parameters.TestTripleoDeriveHciParameters
by Unknown (noreply@blogger.com) at September 04, 2020 06:31 PM
This is mostly a brain dump for myself for later reference, but may be also useful for others.
As I wrote in an earlier post, collectd is configured on OpenStack TripleO driven deployments by a config file.
parameter_defaults: CollectdExtraPlugins: - write_http ExtraConfig: collectd::plugin::write_http::nodes: collectd: url: collectd1.tld.org metrics: true header: foobar
The collectd exec plugin comes handy when launching some third party script. However, the config may be a bit tricky, for example to execute /usr/bin/true one would insert
parameter_defaults: CollectdExtraPlugins: - exec ExtraConfig: collectd::plugin::exec::commands: healthcheck: user: "collectd" group: "collectd" exec: ["/usr/bin/true",]