I was recently working with someone else’s C source and I wanted to add some basic error checking without mucking up the code with a bunch of if
statements and calls to perror
. I ended up implementing a simple must
function that checks the return value of an expression, and exits with an error if the return value is less than 0. You use it like this:
must(fd = open("textfile.txt", O_RDONLY));
Or:
must(close(fd));
In the event that an expression returns an error, the code will exit with a message that shows the file, line, and function in which the error occurred, along with the actual text of the called function and the output of perror
:
example.c:24 in main: fd = open("does-not-exist.xt", O_RDONLY): [2]: No such file or directory
To be clear, this is only useful when you’re using functions that conform to standard Unix error reporting conventions, and if you’re happy with “exit with an error message” as the failure handling mechanism.
The implementation starts with a macro defined in must.h
:
#ifndef _MUST
#define _MUST
#define must(x) _must(__FILE__, __LINE__, __func__, #x, (x))
void _must(const char *fileName, int lineNumber, const char *funcName,
const char *calledFunction, int err);
#endif
The __FILE__
, __LINE__
, and __func__
symbols are standard predefined symbols provided by gcc
; they are documented here. The expression #x
is using the stringify operator to convert the macro argument into a string.
The above macro transforms a call to must()
into a call to the _must()
function, which is defined in must.c
:
#include "must.h"
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
void _must(const char *fileName, int lineNumber, const char *funcName,
const char *calledFunction, int err) {
if (err < 0) {
char buf[256];
snprintf(buf, 256, "%s:%d in %s: %s: [%d]", fileName, lineNumber, funcName,
calledFunction, errno);
perror(buf);
exit(1);
}
}
In this function we check the value of err
(which will be the return value of the expression passed as the argument to the must()
macro), and if it evaluates to a number less than 0, we use snprintf()
to generate a string that we can pass to perror()
, and finally call perror()
which will print our information string, a colon, and then the error message corresponding to the value of errno
.
You can see must()
used in practice in the following example program:
#include "must.h"
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main() {
int fd;
char buf[1024];
printf("opening a file that does exist\n");
must(fd = open("file-that-exists.txt", O_RDONLY));
while (1) {
int nb;
must(nb = read(fd, buf, sizeof(buf)));
if (!nb)
break;
must(write(STDOUT_FILENO, buf, nb));
}
must(close(fd));
printf("opening a file that doesn't exist\n");
must(fd = open("file-that-does-not-exist.xt", O_RDONLY));
return 0;
}
Provided the file-that-exists.txt
(a) exists and (b) contains the text Hello, world.
, and that file-that-does-not-exist.txt
does not, in fact, exist, running the above code will produce the following output:
opening a file that does exist
Hello, world.
opening a file that doesn't exist
example.c:24 in main: fd = open("file-that-does-not-exist.xt", O_RDONLY): [2]: No such file or directory
I’ve been using a Garmin Fenix 6x for a couple of weeks and thought it might be interesting to put together a short review.
I think it’s a misnomer to call the Fenix a “smartwatch”. I would call it a highly capable fitness tracker. That’s not a knock on the product; I really like it so far, but pretty much everything it does is centered around either fitness tracking or navigation. If you browse around the “Connect IQ” store, mostly you’ll find (a) watch faces, (b) fitness apps, and (c) navigation apps. It’s not able to control your phone (for the most part; there are some apps available that offer remote camera control and some other limited features); you can’t check your email on it, or send text messages, and you’ll never find a watch version of any major smartphone app.
So if you’re looking for a smartwatch, maybe look elsewhere. But if you’re looking for a great fitness tracker, this just might be your device.
I don’t listen to music when I exercise. If I’m inside, I’m watching a video on a streaming service, and if I’m outside, I want to be able to hear my surroundings. So I won’t be looking at any aspects of music support on the Fenix.
One of the things I really like about the Fenix is that I now have more of my activity and health data in one place.
As part of my exercise a use a Schwinn IC4 spin bike. Previously, I was using a Fitbit Charge 5, which works fine but meant exercise metrics ended up in multiple places: while I could collect heart rate with the Fitbit, to collect cycling data like cadence, power, etc, I needed to use another app on my phone (I used Wahoo Fitness). Additionally, Fitbit doesn’t support sharing data with Apple Health, so there wasn’t a great way to see a unified view of things.
This has all changed with the Fenix:
First and probably most importantly, the Fenix is able to utilize the sensor on the IC4 directly, so cadence/speed/distance data is collected in the same place as heart rate data.
Through the magic of the Gymnasticon project, the Fenix is also able to collect power data from the bike.
The Fenix is also great at tracking my outside bike rides, and of course providing basic heart rate and time tracking of my strength and PT workouts.
All of this means that Garmin’s tools (both their app and the Garmin Connect website) provide a great unified view of my fitness activities.
This is an area in which I think there is a lot of room for improvement.
Like any good connected watch, you can configure your Fenix to receive notifications from your phone. Unfortunately, this is an all-or-nothing configuration; there’s no facility for blocking or selecting notifications from specific apps.
I usually have my phone in do-not-disturb mode, so notifications from Google or the New York Times app don’t interrupt me, but they show up in the notification center when I check for anything interesting. With my Fenix connected, I get interrupted every time something happens.
Having the ability to filter which notifications get sent to the watch would be an incredibly helpful feature.
One of the reasons I have the 6x instead of the 6 is the increased battery size that comes along with the bigger watch. While the advertising touts a battery life of “up to 21 days with activity tracking and 24/7 wrist-based heart rate monitoring”, I’ve been seeing battery life closer to 1 week under normal use (which includes probably 10-20 miles of GPS-tracked bike rides a week).
I’ve been using the pulse oximeter at night, but I understand that can have a substantial contribution to battery drain; I’ve disabled it for now and I’ll update this post if it turns out that has a significant impact on battery life.
One of the reasons that the Fenix is able to get substantially better battery life than the Apple Watch is that the screen is far, far dimmer. By default, the screen brightness is set to 20%; you can increase that, but you’ll consume substantially more power by doing so. In well lit areas – outdoors, or under office lighting – the display is generally very easy to read even with the backlight low.
It’s a mixed bag.
The basic watch and fitness tracking functionality is easy to use, and I like the fact that it uses physical buttons rather than a touch screen (I’ve spent too much time struggling with touch screens in winter). The phone app itself is relatively easy to use, although the “Activities & Apps” screen has the bad habit of refreshing while you’re trying to use it.
I have found Garmin’s documentation to be very good, and highly search optimized. In most cases, when I’ve wanted to know how to do something on my watch I’ve been able to search for it on Google, and:
For example, I wanted to know how to remove an activity from the list of favorite activities, so I searched for garmin remove activity from favorites
, which led me directly to this documentation.
This was exactly the information I needed. I’ve had similar success with just about everything I’ve searched for.
The Garmin Connect app and website are both generally easy to use and well organized. There is an emphasis on “social networking” aspects (share your activities! Join a group! Earn badges!) in which I have zero interest, and I wish there was a checkbox to simply disable those parts of the UI.
The place where things really fall over is the “IQ Connect” app store. There are many apps and watch faces there that require some sort of payment, but there’s no centralized payment processing facility so you end up getting sent to random payment processors all over the place depending on what the software author selected…and price information simply isn’t displayed in the app store at all unless an author happens to include it in the product description.
The UI for configuring custom watch faces is awful; it’s a small step up from someone just throwing a text editor at you and telling you to edit a file. For this reason I’ve mostly stuck with Garmin-produced watch faces (the built-in ones and a few from the app store), which tend to have high visual quality but aren’t very configurable.
While Garmin doesn’t provide any Linux support at all, you can plug the watch into your Linux system and access the watch filesystem using any MTP client, including Gnome’s GVFS. While this isn’t going to replace your phone app, it does give you reasonably convenient access to activity tracking data (as .fit
files).
The Fenix ships with reasonably complete US maps. I haven’t had the chance to assess their coverage of local hiking trails. You can load maps from the OpenStreetMap project, although the process for doing so is annoyingly baroque.
It is easy to load GPX tracks from your favorite hiking website onto the watch using the Garmin Connect website or phone app.
I’m happy with the watch. It is a substantial upgrade from my Charge 5 in terms of fitness tracking, and aesthetically I like it as much as the Seiko SNJ025 I was previously wearing. It’s not a great smartwatch, but that’s not what I was looking for, and the battery life is much better than actual smart watches from Apple and Samsung.
This isn’t a Garmin or Fenix issue, but I’d like to specially recognize All Trails for making the process of exporting a GPX file to Garmin Connect as painful as possible. You can’t do it at all from the phone app, so the process is something like:
That is…completely ridiculous. The “Share” button in the All Trails app should provide an option to share the GPX version of the route so the above process could be collapsed into a single step. All Trails, why do you hate your users so much?
In this question, August Vrubel has some C code that sets up a tun interface and then injects a packet, but the packet seemed to disappear into the ether. In this post, I’d like to take a slightly extended look at my answer because I think it’s a great opportunity for learning a bit more about performing network diagnostics.
The original code looked like this:
#include <arpa/inet.h>
#include <fcntl.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <netinet/in.h>
#include <poll.h>
#include <stdio.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
static int tunAlloc(void) {
int fd;
struct ifreq ifr = {.ifr_name = "tun0", .ifr_flags = IFF_TUN | IFF_NO_PI};
fd = open("/dev/net/tun", O_RDWR);
ioctl(fd, TUNSETIFF, (void *)&ifr);
ioctl(fd, TUNSETOWNER, geteuid());
return fd;
}
// this is a test
static void bringInterfaceUp(void) {
int sock;
struct sockaddr_in addr = {.sin_family = AF_INET};
struct ifreq ifr = {.ifr_name = "tun0"};
inet_aton("172.30.0.1", &addr.sin_addr);
memcpy(&ifr.ifr_addr, &addr, sizeof(struct sockaddr));
sock = socket(AF_INET, SOCK_DGRAM, 0);
ioctl(sock, SIOCSIFADDR, &ifr);
ioctl(sock, SIOCGIFFLAGS, &ifr);
ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
ioctl(sock, SIOCSIFFLAGS, &ifr);
close(sock);
}
static void emitPacket(int tap_fd) {
unsigned char packet[] = {
0x45, 0x00, 0x00, 0x3c, 0xd8, 0x6f, 0x40, 0x00, 0x3f, 0x06, 0x08, 0x91,
172, 30, 0, 1, 192, 168, 255, 8, 0xa2, 0x9a, 0x27, 0x11,
0x80, 0x0b, 0x63, 0x79, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x02, 0xfa, 0xf0,
0x89, 0xd8, 0x00, 0x00, 0x02, 0x04, 0x05, 0xb4, 0x04, 0x02, 0x08, 0x0a,
0x5b, 0x76, 0x5f, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x01, 0x03, 0x03, 0x07};
write(tap_fd, packet, sizeof(packet));
}
int main() {
int tap_fd;
tap_fd = tunAlloc();
bringInterfaceUp();
emitPacket(tap_fd);
close(tap_fd);
return 0;
}
A problem with the original code is that it creates the interface, sends the packet, and tears down the interface with no delays, making it very difficult to inspect the interface configuration, perform packet captures, or otherwise figure out what’s going on.
In order to resolve those issues, I added some prompts before sending the packet and before tearing down the tun
interface (and also some minimal error checking), giving us:
#include <arpa/inet.h>
#include <fcntl.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <netinet/in.h>
#include <poll.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define must(x) _must(#x, __FILE__, __LINE__, __func__, (x))
void _must(const char *call, const char *filename, int line,
const char *funcname, int err) {
char buf[1024];
snprintf(buf, 1023, "%s (@ %s:%d)", call, filename, line);
if (err < 0) {
perror(buf);
exit(1);
}
}
static int tunAlloc(void) {
int fd;
struct ifreq ifr = {.ifr_name = "tun0", .ifr_flags = IFF_TUN | IFF_NO_PI};
fd = open("/dev/net/tun", O_RDWR);
must(ioctl(fd, TUNSETIFF, (void *)&ifr));
must(ioctl(fd, TUNSETOWNER, geteuid()));
return fd;
}
static void bringInterfaceUp(void) {
int sock;
struct sockaddr_in addr = {.sin_family = AF_INET};
struct ifreq ifr = {.ifr_name = "tun0"};
inet_aton("172.30.0.1", &addr.sin_addr);
memcpy(&ifr.ifr_addr, &addr, sizeof(struct sockaddr));
sock = socket(AF_INET, SOCK_DGRAM, 0);
must(ioctl(sock, SIOCSIFADDR, &ifr));
must(ioctl(sock, SIOCGIFFLAGS, &ifr));
ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
must(ioctl(sock, SIOCSIFFLAGS, &ifr));
close(sock);
}
static void emitPacket(int tap_fd) {
unsigned char packet[] = {
0x45, 0x00, 0x00, 0x3c, 0xd8, 0x6f, 0x40, 0x00, 0x3f, 0x06, 0x08, 0x91,
172, 30, 0, 1, 192, 168, 255, 8, 0xa2, 0x9a, 0x27, 0x11,
0x80, 0x0b, 0x63, 0x79, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x02, 0xfa, 0xf0,
0x89, 0xd8, 0x00, 0x00, 0x02, 0x04, 0x05, 0xb4, 0x04, 0x02, 0x08, 0x0a,
0x5b, 0x76, 0x5f, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x01, 0x03, 0x03, 0x07};
write(tap_fd, packet, sizeof(packet));
}
void prompt(char *promptString) {
printf("%s\n", promptString);
getchar();
}
int main() {
int tap_fd;
tap_fd = tunAlloc();
bringInterfaceUp();
prompt("interface is up");
emitPacket(tap_fd);
prompt("sent packet");
close(tap_fd);
printf("all done");
return 0;
}
We start by compiling the code:
gcc -o sendpacket sendpacket.c
If we try running this as a regular user, it will simply fail (which confirms that at least some of our error handling is working correctly):
$ ./sendpacket
ioctl(fd, TUNSETIFF, (void *)&ifr) (@ sendpacket-pause.c:33): Operation not permitted
We need to run it as root
:
$ sudo ./sendpacket
interface is up
The interface is up
prompt means that the code has configured the interface but has not yet sent the packet. Let’s take a look at the interface configuration:
$ ip addr show tun0
3390: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 500
link/none
inet 172.30.0.1/32 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::c7ca:fe15:5d5c:2c49/64 scope link stable-privacy
valid_lft forever preferred_lft forever
The code will emit a TCP SYN
packet targeting address 192.168.255.8
, port 10001
. In another terminal, let’s watch for that on all interfaces. If we start tcpdump
and press RETURN at the interface is up
prompt, we’ll see something like:
# tcpdump -nn -i any port 10001
22:36:35.336643 tun0 In IP 172.30.0.1.41626 > 192.168.255.8.10001: Flags [S], seq 2148230009, win 64240, options [mss 1460,sackOK,TS val 1534484436 ecr 0,nop,wscale 7], length 0
And indeed, we see the problem that was described: the packet enters the system on tun0
, but never goes anywhere else. What’s going on?
pwru
is a nifty utility written by the folks at Cilium that takes advantage of eBPF to attach traces to hundreds of kernel functions to trace packet processing through the Linux kernel. It’s especially useful when packets seem to be getting dropped with no obvious explanation. Let’s see what it can tell us!
A convenient way to run pwru
is using their official Docker image. We’ll run it like this, filtering by protocol and destination port so that we only see results relating to the synthesized packet created by the sendpacket.c
code:
docker run --privileged --rm -t --pid=host \
-v /sys/kernel/debug/:/sys/kernel/debug/ \
cilium/pwru --filter-proto tcp --filter-port 10001
If we run sendpacket
while pwru
is running, the output looks something like this:
2023/02/15 03:42:33 Per cpu buffer size: 4096 bytes
2023/02/15 03:42:33 Attaching kprobes (via kprobe-multi)...
1469 / 1469 [-----------------------------------------------------------------------------] 100.00% ? p/s
2023/02/15 03:42:33 Attached (ignored 0)
2023/02/15 03:42:33 Listening for events..
SKB CPU PROCESS FUNC
0xffff8ce13e987900 6 [sendpacket-orig] netif_receive_skb
0xffff8ce13e987900 6 [sendpacket-orig] skb_defer_rx_timestamp
0xffff8ce13e987900 6 [sendpacket-orig] __netif_receive_skb
0xffff8ce13e987900 6 [sendpacket-orig] __netif_receive_skb_one_core
0xffff8ce13e987900 6 [sendpacket-orig] ip_rcv
0xffff8ce13e987900 6 [sendpacket-orig] ip_rcv_core
0xffff8ce13e987900 6 [sendpacket-orig] kfree_skb_reason(SKB_DROP_REASON_IP_CSUM)
0xffff8ce13e987900 6 [sendpacket-orig] skb_release_head_state
0xffff8ce13e987900 6 [sendpacket-orig] sock_wfree
0xffff8ce13e987900 6 [sendpacket-orig] skb_release_data
0xffff8ce13e987900 6 [sendpacket-orig] skb_free_head
0xffff8ce13e987900 6 [sendpacket-orig] kfree_skbmem
And now we have a big blinking sign that tells us why the packet is being dropped:
0xffff8ce13e987900 6 [sendpacket-orig] kfree_skb_reason(SKB_DROP_REASON_IP_CSUM)
It looks like the synthesized packet data includes a bad checksum. We could update the code to correctly calculate the checksum…or we could just use Wireshark and have it tell us the correct values. Because this isn’t meant to be an IP networking primer, we’ll just use Wireshark, which gives us the following updated code:
static void emitPacket(int tap_fd) {
uint16_t cs;
uint8_t packet[] = {
0x45, 0x00, 0x00, 0x3c, 0xd8, 0x6f, 0x40, 0x00, 0x3f, 0x06, 0xf7, 0x7b,
172, 30, 0, 1, 192, 168, 255, 8, 0xa2, 0x9a, 0x27, 0x11,
0x80, 0x0b, 0x63, 0x79, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x02, 0xfa, 0xf0,
0x78, 0xc3, 0x00, 0x00, 0x02, 0x04, 0x05, 0xb4, 0x04, 0x02, 0x08, 0x0a,
0x5b, 0x76, 0x5f, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x01, 0x03, 0x03, 0x07,
};
write(tap_fd, packet, sizeof(packet));
}
If we repeat our invocation of pwru
and run a test with the updated code, we see:
2023/02/15 04:17:29 Per cpu buffer size: 4096 bytes
2023/02/15 04:17:29 Attaching kprobes (via kprobe-multi)...
1469 / 1469 [-----------------------------------------------------------------------------] 100.00% ? p/s
2023/02/15 04:17:29 Attached (ignored 0)
2023/02/15 04:17:29 Listening for events..
SKB CPU PROCESS FUNC
0xffff8cd8a6c5ef00 9 [sendpacket-chec] netif_receive_skb
0xffff8cd8a6c5ef00 9 [sendpacket-chec] skb_defer_rx_timestamp
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __netif_receive_skb
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __netif_receive_skb_one_core
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_rcv
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_rcv_core
0xffff8cd8a6c5ef00 9 [sendpacket-chec] sock_wfree
0xffff8cd8a6c5ef00 9 [sendpacket-chec] nf_hook_slow
0xffff8cd8a6c5ef00 9 [sendpacket-chec] nf_checksum
0xffff8cd8a6c5ef00 9 [sendpacket-chec] nf_ip_checksum
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __skb_checksum_complete
0xffff8cd8a6c5ef00 9 [sendpacket-chec] tcp_v4_early_demux
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_route_input_noref
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_route_input_slow
0xffff8cd8a6c5ef00 9 [sendpacket-chec] fib_validate_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __fib_validate_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_handle_martian_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)
0xffff8cd8a6c5ef00 9 [sendpacket-chec] skb_release_head_state
0xffff8cd8a6c5ef00 9 [sendpacket-chec] skb_release_data
0xffff8cd8a6c5ef00 9 [sendpacket-chec] skb_free_head
0xffff8cd8a6c5ef00 9 [sendpacket-chec] kfree_skbmem
Looking at the above output, we’re no longer seeing the SKB_DROP_REASON_IP_CSUM
error; instead, we’re getting dropped by the routing logic:
0xffff8cd8a6c5ef00 9 [sendpacket-chec] fib_validate_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] __fib_validate_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] ip_handle_martian_source
0xffff8cd8a6c5ef00 9 [sendpacket-chec] kfree_skb_reason(SKB_DROP_REASON_NOT_SPECIFIED)
Specifically, the packet is being dropped as a “martian source”, which means a packet that has a source address that is invalid for the interface on which it is being received. Unlike the previous error, we can actually get kernel log messages about this problem. If we had the log_martians
sysctl enabled for all interfaces:
sysctl -w net.ipv4.conf.all.log_martians=1
Or if we enabled it specifically for tun0
after the interface is created:
sysctl -w net.ipv4.conf.tun0.log_martians=1
We would see the following message logged by the kernel:
Feb 14 12:14:03 madhatter kernel: IPv4: martian source 192.168.255.8 from 172.30.0.1, on dev tun0
We’re seeing this particular error because tun0
is configured with address 172.30.0.1
, but it claims to be receiving a packet with the same source address from “somewhere else” on the network. This is a problem because we would never be able to reply to that packet (our replies would get routed to the local host). To deal with this problem, we can either change the source address of the packet, or we can change the IP address assigned to the tun0
interface. Since changing the source address would mean mucking about with checksums again, let’s change the address of tun0
:
static void bringInterfaceUp(void) {
int sock;
struct sockaddr_in addr = {.sin_family = AF_INET};
struct ifreq ifr = {.ifr_name = "tun0"};
inet_aton("172.30.0.10", &addr.sin_addr);
memcpy(&ifr.ifr_addr, &addr, sizeof(struct sockaddr));
sock = socket(AF_INET, SOCK_DGRAM, 0);
must(ioctl(sock, SIOCSIFADDR, &ifr));
must(ioctl(sock, SIOCGIFFLAGS, &ifr));
ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
must(ioctl(sock, SIOCSIFFLAGS, &ifr));
close(sock);
}
With this change, tun0
now looks like:
3452: tun0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 500
link/none
inet 172.30.0.10/32 scope global tun0
valid_lft forever preferred_lft forever
inet6 fe80::bda3:ddc8:e60e:106b/64 scope link stable-privacy
valid_lft forever preferred_lft forever
And if we repeat our earlier test in which we use tcpdump
to watch for our synthesized packet on any interface, we now see the desired behavior:
# tcpdump -nn -i any port 10001
23:37:55.897786 tun0 In IP 172.30.0.1.41626 > 192.168.255.8.10001: Flags [S], seq 2148230009, win 64240, options [mss 1460,sackOK,TS val 1534484436 ecr 0,nop,wscale 7], length 0
23:37:55.897816 eth0 Out IP 172.30.0.1.41626 > 192.168.255.8.10001: Flags [S], seq 2148230009, win 64240, options [mss 1460,sackOK,TS val 1534484436 ecr 0,nop,wscale 7], length 0
The packet is correctly handled by the kernel and sent out to our default gateway.
The final version of the code looks like this:
#include <arpa/inet.h>
#include <fcntl.h>
#include <linux/if.h>
#include <linux/if_tun.h>
#include <netinet/in.h>
#include <poll.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#define must(x) _must(#x, __FILE__, __LINE__, __func__, (x))
void _must(const char *call, const char *filename, int line,
const char *funcname, int err) {
char buf[1024];
snprintf(buf, 1023, "%s (@ %s:%d)", call, filename, line);
if (err < 0) {
perror(buf);
exit(1);
}
}
static int tunAlloc(void) {
int fd;
struct ifreq ifr = {.ifr_name = "tun0", .ifr_flags = IFF_TUN | IFF_NO_PI};
fd = open("/dev/net/tun", O_RDWR);
must(ioctl(fd, TUNSETIFF, (void *)&ifr));
must(ioctl(fd, TUNSETOWNER, geteuid()));
return fd;
}
static void bringInterfaceUp(void) {
int sock;
struct sockaddr_in addr = {.sin_family = AF_INET};
struct ifreq ifr = {.ifr_name = "tun0"};
inet_aton("172.30.0.10", &addr.sin_addr);
memcpy(&ifr.ifr_addr, &addr, sizeof(struct sockaddr));
sock = socket(AF_INET, SOCK_DGRAM, 0);
must(ioctl(sock, SIOCSIFADDR, &ifr));
must(ioctl(sock, SIOCGIFFLAGS, &ifr));
ifr.ifr_flags |= IFF_UP | IFF_RUNNING;
must(ioctl(sock, SIOCSIFFLAGS, &ifr));
close(sock);
}
static void emitPacket(int tap_fd) {
uint16_t cs;
uint8_t packet[] = {
0x45, 0x00, 0x00, 0x3c, 0xd8, 0x6f, 0x40, 0x00, 0x3f, 0x06, 0xf7, 0x7b,
172, 30, 0, 1, 192, 168, 255, 8, 0xa2, 0x9a, 0x27, 0x11,
0x80, 0x0b, 0x63, 0x79, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x02, 0xfa, 0xf0,
0x78, 0xc3, 0x00, 0x00, 0x02, 0x04, 0x05, 0xb4, 0x04, 0x02, 0x08, 0x0a,
0x5b, 0x76, 0x5f, 0xd4, 0x00, 0x00, 0x00, 0x00, 0x01, 0x03, 0x03, 0x07,
};
write(tap_fd, packet, sizeof(packet));
}
void prompt(char *promptString) {
printf("%s\n", promptString);
getchar();
}
int main() {
int tap_fd;
tap_fd = tunAlloc();
bringInterfaceUp();
prompt("interface is up");
emitPacket(tap_fd);
prompt("sent packet");
close(tap_fd);
printf("all done");
return 0;
}
My internet service provider (FIOS) doesn’t yet (sad face) offer IPv6 capable service, so I’ve set up an IPv6 tunnel using the Hurricane Electric tunnel broker. I want to provide IPv6 connectivity to multiple systems in my house, but not to all systems in my house 1. In order to meet those requirements, I’m going to set up the tunnel on the router, and then expose connectivity over an IPv6-only VLAN. In this post, we’ll walk through the steps necessary to set that up.
Parts of this post are going to be device specific: for example, I’m using a Ubiquiti EdgeRouter X as my Internet router, so the tunnel setup is going to be specific to that device. The section about setting things up on my Linux desktop will be more generally applicable.
There are three major parts to this post:
This shows how to set up an IPv6 tunnel and configure an IPv6-only VLAN on the EdgeRouter.
This is only necessary due to the specifics of the connection between my desktop and the router; you can probably skip this.
This shows how to set up the IPv6 VLAN interface under Linux using nmcli
.
When you set up an IPv6 tunnel with hurricane electric, you receive several bits of information. We care in particular about the following (the IPv6 addresses and client IPv4 addresses here have been munged for privacy reasons):
Server IPv4 Address | 209.51.161.14 |
Server IPv6 Address | 2001:470:1236:1212::1/64 |
Client IPv4 Address | 1.2.3.4 |
Client IPv6 Address | 2001:470:1236:1212::2/64 |
Routed /64 | 2001:470:1237:1212::/64 |
We’ll refer back to this information as we configured things later on.
The first step in the process is to create a tunnel interface – that is, an interface that looks like an ordinary network interface, but is in fact encapsulating traffic and sending it to the tunnel broker where it will unpacked and sent on its way.
I’ll be creating a SIT tunnel, which is designed to “interconnect isolated IPv6 networks” over an IPv4 connection.
I start by setting the tunnel encapsulation type and assigning an IPv6 address to the tunnel interface. This is the “Client IPv6 Address” from the earlier table:
set interfaces tunnel tun0 encapsulation sit
set interfaces tunnel tun0 address 2001:470:1236:1212::2/64
Next I need to define the local and remote IPv4 endpoints of the tunnel. The remote endpoint is the “Server IPv4” address. The value 0.0.0.0
for the local-ip
option means “whichever source address is appropriate for connecting to the given remote address”:
set interfaces tunnel tun0 remote-ip 209.51.161.14
set interfaces tunnel tun0 local-ip 0.0.0.0
Finally, I associate some firewall rulesets with the interface. This is import because, unlike IPv4, as you assign IPv6 addresses to internal devices they will be directly connected to the internet. With no firewall rules in place you would find yourself inadvertently exposing services that previously were “behind” your home router.
set interfaces tunnel tun0 firewall in ipv6-name WANv6_IN
set interfaces tunnel tun0 firewall local ipv6-name WANv6_LOCAL
I’m using the existing WANv6_IN
and WANv6_LOCAL
rulesets, which by default block all inbound traffic. These correspond to the following ip6tables
chains:
root@ubnt:~# ip6tables -S WANv6_IN
-N WANv6_IN
-A WANv6_IN -m comment --comment WANv6_IN-10 -m state --state RELATED,ESTABLISHED -j RETURN
-A WANv6_IN -m comment --comment WANv6_IN-20 -m state --state INVALID -j DROP
-A WANv6_IN -m comment --comment "WANv6_IN-10000 default-action drop" -j LOG --log-prefix "[WANv6_IN-default-D]"
-A WANv6_IN -m comment --comment "WANv6_IN-10000 default-action drop" -j DROP
root@ubnt:~# ip6tables -S WANv6_LOCAL
-N WANv6_LOCAL
-A WANv6_LOCAL -m comment --comment WANv6_LOCAL-10 -m state --state RELATED,ESTABLISHED -j RETURN
-A WANv6_LOCAL -m comment --comment WANv6_LOCAL-20 -m state --state INVALID -j DROP
-A WANv6_LOCAL -p ipv6-icmp -m comment --comment WANv6_LOCAL-30 -j RETURN
-A WANv6_LOCAL -p udp -m comment --comment WANv6_LOCAL-40 -m udp --sport 547 --dport 546 -j RETURN
-A WANv6_LOCAL -m comment --comment "WANv6_LOCAL-10000 default-action drop" -j LOG --log-prefix "[WANv6_LOCAL-default-D]"
-A WANv6_LOCAL -m comment --comment "WANv6_LOCAL-10000 default-action drop" -j DROP
As you can see, both rulesets block all inbound traffic by default unless it is related to an existing outbound connection.
I need to create a network interface on the router that will be the default gateway for my local IPv6-only network. From the tunnel broker, I received the CIDR 2001:470:1237:1212::/64
for local use, so:
/110
networks in this example, which means I will only ever be able to have 262,146 devices on each network (note that the decision to use a smaller subnet impacts your choices for address autoconfiguration; see RFC 7421 for the relevant discussion).I’m using the first /110
network for this VLAN, which comprises addresses 2001:470:1237:1212::1
through 2001:470:1237:1212::3:ffff
. I’ll use the first address as the router address.
I’ve arbitrarily decided to use VLAN id 10 for this purpose.
To create an interface for VLAN id 10 with address 2001:470:1237:1212::1/110
, we use the set interfaces ... vif
command:
set interfaces switch switch0 vif 10 address 2001:470:1237:1212::1/110
We don’t receive router advertisements over the IPv6 tunnel, which means we need to explicitly configure the IPv6 default route. The default gateway will be the “Server IPv6 Address” we received from the tunnel broker.
set protocol static route6 ::/0 next-hop 2001:470:1236:1212::1
IPv6 systems on our local network will use the neighbor discovery protocol to discover the default gateway for the network. Support for this service is provided by RADVD, and we configure it using the set interfaces ... ipv6 router-advert
command:
set interfaces switch switch0 vif 10 ipv6 router-advert send-advert true
set interfaces switch switch0 vif 10 ipv6 router-advert managed-flag true
set interfaces switch switch0 vif 10 ipv6 router-advert prefix ::/110
The managed-flag
setting corresponds to the RADVD AdvManagedFlag
configuration setting, which instructs clients to use DHCPv6 for address autoconfiguration.
While in theory it is possible for clients to assign IPv6 addresses without the use of a DHCP server using stateless address autoconfiguration, this requires that we’re using a /64 subnet (see e.g. RFC 7421). There is no such limitation when using DHCPv6.
set service dhcpv6-server shared-network-name VLAN10 subnet 2001:470:1237:1212::/110 address-range start 2001:470:1237:1212::10 stop 2001:470:1237:1212::3:ffff
set service dhcpv6-server shared-network-name VLAN10 subnet 2001:470:1237:1212::/110 name-server 2001:470:1237:1212::1
set service dhcpv6-server shared-network-name VLAN10 subnet 2001:470:1237:1212::/110 domain-search house
set service dhcpv6-server shared-network-name VLAN10 subnet 2001:470:1237:1212::/110 lease-time default 86400
Here I’m largely setting things up to mirror the configuration of the IPv4 dhcp server for the name-server
, domain-search
, and lease-time
settings. I’m letting the DHCPv6 server allocate pretty much the entire network range, with the exception of the first 10 addresses.
After making the above changes they need to be activated:
commit
This produces the following interface configuration for tun0
:
13: tun0@NONE: <POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/sit 0.0.0.0 peer 209.51.161.14
inet6 2001:470:1236:1212::2/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::c0a8:101/64 scope link
valid_lft forever preferred_lft forever
inet6 fe80::6c07:49c7/64 scope link
valid_lft forever preferred_lft forever
And for switch0.10
:
ubnt@ubnt:~$ ip addr show switch0.10
14: switch0.10@switch0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 78:8a:20:bb:05:db brd ff:ff:ff:ff:ff:ff
inet6 2001:470:1237:1212::1/110 scope global
valid_lft forever preferred_lft forever
inet6 fe80::7a8a:20ff:febb:5db/64 scope link
valid_lft forever preferred_lft forever
And the following route configuration:
ubnt@ubnt:~$ ip -6 route | grep -v fe80
2001:470:1236:1212::/64 dev tun0 proto kernel metric 256 pref medium
2001:470:1237:1212::/110 dev switch0.10 proto kernel metric 256 pref medium
default via 2001:470:1236:1212::1 dev tun0 proto zebra metric 1024 pref medium
We can confirm things are properly configured by accessing a remote service that reports our ip address:
ubnt@ubnt:~$ curl https://api64.ipify.org
2001:470:1236:1212::2
In my home network, devices in my office connect to a switch, and the switch connects back to the router. I need to configure the switch (an older Netgear M4100-D12G) to pass the VLAN on to the desktop.
ipv6net0
I start by defining the VLAN in the VLAN database:
vlan database
vlan 10
vlan name 10 ipv6net0
exit
Next, I configure the switch to pass VLAN 10 as a tagged VLAN on all switch interfaces:
configure
interface 0/1-0/10
vlan participation include 10
vlan tagging 10
exit
exit
With the above configuration in place, traffic on VLAN 10 will arrive on my Linux desktop (which is connected to the switch we configured in the previous step). I can use nmcli
, the NetworkManager CLI, to add a VLAN interface (I’m using Fedora 37, which uses NetworkManager to manage network interface configuration; other distributions may have different tooling).
The following command will create a connection named vlan10
. Bringing up the connection will create an interface named vlan10
, configured to receive traffic on VLAN 10 arriving on eth0
:
nmcli con add type vlan con-name vlan10 ifname vlan10 dev eth0 id 10 ipv6.method auto
nmcli con up vlan10
This produces the following interface configuration:
$ ip addr show vlan10
7972: vlan10@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 2c:f0:5d:c9:12:a9 brd ff:ff:ff:ff:ff:ff
inet6 2001:470:1237:1212::2:c19a/128 scope global dynamic noprefixroute
valid_lft 85860sec preferred_lft 53460sec
inet6 fe80::ced8:1750:d67c:2ead/64 scope link noprefixroute
valid_lft forever preferred_lft forever
And the following route configuration:
$ ip -6 route show | grep vlan10
2001:470:1237:1212::2:c19a dev vlan10 proto kernel metric 404 pref medium
2001:470:1237:1212::/110 dev vlan10 proto ra metric 404 pref medium
fe80::/64 dev vlan10 proto kernel metric 1024 pref medium
default via fe80::7a8a:20ff:febb:5db dev vlan10 proto ra metric 404 pref medium
We can confirm things are properly configured by accessing a remote service that reports our ip address:
$ curl https://api64.ipify.org
2001:470:1237:1212::2:c19a
Note that unlike access using IPv4, the address visible here is the address assigned to our local interface. There is no NAT happening at the router.
Cover image by Chris Woodford/explainthatstuff.com, licensed under CC BY-NC-SA 3.0.
Some services (Netflix is a notable example) block access over the IPv6 tunnels because it breaks their geolocation process and prevents them from determining your country of origin. I don’t want to break things for other folks in my house just because I want to play with IPv6. ↩︎
RDO Zed Released
The RDO community is pleased to announce the general availability of the RDO build for OpenStack Zed for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Zed is the 26th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world. As with the Upstream release, this release of RDO is dedicated to Ilya Etingof who was an upstream and RDO contributor.
The release is already available for CentOS Stream 9 on the CentOS mirror network in:
http://mirror.stream.centos.org/SIGs/9-stream/cloud/x86_64/openstack-zed/
The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.
All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.
The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/zed/highlights.html
TripleO in the RDO Zed release:
Since the Xena development cycle, TripleO follows the Independent release model (https://specs.openstack.org/openstack/tripleo-specs/specs/xena/tripleo-independent-release.html).
For the Zed cycle, TripleO project will maintain and validate stable Zed branches. As for the rest of packages, RDO will update and publish the releases created during the maintenance cycle.
Contributors During the Zed cycle, we saw the following new RDO contributors:
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 57 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:
The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Antelope.
Get Started
To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.
Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.
Get Help
The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.
The #rdo channel on OFTC IRC is also an excellent place to find and give help. We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel in Libera.Chat network, and #tripleo on OFTC), however we have a more focused audience within the RDO venues.
Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.
Join us in #rdo and #tripleo on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.
In today’s post, we look at KeyOxide, a service that allows you to cryptographically assert ownership of online resources using your GPG key. Some aspects of the service are less than obvious; in response to some questions I saw on Mastodon I though I would put together a short guide to making use of the service.
We’re going to look at the following high-level tasks:
If you already have a keypair, skip on to “Step 2: Publish your key”.
The first thing you need to do is set up a GPG1 keypair and publish it to a keyserver (or a WKD endpoint). There are many guides out there that step you through the process (for example, GitHub’s guide on Generating a new GPG key), but if you’re in a hurry and not particularly picky, read on.
This assumes that you’re using a recent version of GPG; at the time of this writing, the current GPG release is 2.3.8, but these instructions seem to work at least with version 2.2.27.
Generate a new keypair using the --quick-gen-key
option:
gpg --batch --quick-gen-key <your email address>
This will use the GPG defaults for the key algorithm (varies by version) and expiration time (the key never expires2).
When prompted, enter a secure passphrase.
GPG will create a keypair for you; you can view it after the fact by running:
gpg -qk <your email address>
You should see something like:
pub ed25519 2022-11-13 [SC] [expires: 2024-11-12]
EC03DFAC71DB3205EC19BAB1404E03D044EE706B
uid [ultimate] testuser@example.com
sub cv25519 2022-11-13 [E]
In the above output, F79CE5D41D93C2C0E97F9A63C4178440F81E4261
is the key fingerprint. We’re going to need this later.
Now you have created a GPG keypair!
If you’ve already published your key at https://keys.openpgp.org/ or at a WKD endpoint, skip on to “Step 3: Add a claim”.
In order for KeyOxide to find your GPG key, it needs to be published at a known location. There are two choices:
In this post, we’re only going to consider the first option.
Export your public key to a file using gpg’s --export
option:
gpg --export -a <your email address> > mykey.asc
This will create a file mykey.asc
in your current directory that looks like:
-----BEGIN PGP PUBLIC KEY BLOCK-----
[...a bunch of base64 encoded text...]
-----END PGP PUBLIC KEY BLOCK-----
Select the key export you just created, and select “upload”.
When prompted on the next page, select “Send Verification Email”. Your key won’t discoverable until you have received and responded to the verification email.
When you receive the email, select the verification link.
Now your key has been published! You can verify this by going to https://keys.openpgp.org/ and searching for your email address.
You assert ownership of an online resource through a three step process:
Mark the online resource with your GPG key fingerprint. How you do this depends on the type of resource you’re claiming; e.g., for GitHub you create a gist with specific content, while for claiming a DNS domain you create a TXT
record.
Add a notation to your GPG key with a reference to the claim created in the previous step.
Update your published key.
In this post we’re going to look at two specific examples; for other services, see the “Service providers” section of the KeyOxide documentation.
In order to follow any of the following instructions, you’re going to need to know your key fingerprint. When you show your public key by running gpg -k
, you key fingerprint is the long hexadecimal string on the line following the line that starts with pub
:
$ gpg -qk testuser@example.com
pub ed25519 2022-11-13 [SC] [expires: 2024-11-12]
EC03DFAC71DB3205EC19BAB1404E03D044EE706B <--- THIS LINE HERE
uid [ultimate] testuser@example.com
sub cv25519 2022-11-13 [E]
This is a set of common instructions that we’ll use every time we need to add a claim to our GPG key.
Edit your GPG key using the --edit-key
option:
gpg --edit-key <your email address>
This will drop you into the GPG interactive key editor.
Select a user id on which to operate using the uid
command. If you created your key following the instructions earlier in this post, then you only have a single user id:
gpg> uid 1
Add an annotation to the key using the notation
command:
gpg> notation
When prompted, enter the notation (the format of the notation depends on the service you’re claiming; see below for details). For example, if we’re asserting a Mastodon identity at hachyderm.io, we would enter:
Enter the notation: proof@ariadne.id=https://hachyderm.io/@testuser
Save your changes with the save
command:
gpg> save
After adding an annotation to your key locally, you need to publish those changes. One way of doing this is simply following the instructions for initially uploading your public key:
Export the key to a file:
gpg --export -a <your email address> > mykey.asc
Upload your key to https://keys.openpgp.org/upload.
You won’t have to re-verify your key.
Alternately, you can configure gpg so that you can publish your key from the command line. Create or edit $HOME/.gnupg/gpg.conf
and add the following line:
keyserver hkps://keys.openpgp.org
Now every time you need to update the published version of your key:
Upload your public key using the --send-keys
option along with your key fingerprint, e.g:
gpg --send-keys EC03DFAC71DB3205EC19BAB1404E03D044EE706B
On your favorite Mastodon server, go to your profile and select “Edit profile”.
Look for the “Profile metadata section”; this allows you to associate four bits of metadata with your Mastodon profile. Assuming that you still have a slot free, give it a name (it could be anything, I went with “Keyoxide claim”), and for the value enter:
openpgp4fpr:<your key fingerprint>
E.g., given the gpg -k
output shown above, I would enter:
openpgp4fpr:EC03DFAC71DB3205EC19BAB1404E03D044EE706B
Click “Save Changes”
Now, add the claim to your GPG key by adding the notation proof@ariadne.id=https://<your mastodon server>/@<your mastodon username
. I am @larsks@hachyderm.io, so I would enter:
proof@ariadne.id=https://hachyderm.io/@larsks
After adding the claim, update your published key.
Create a new gist (it can be either secret or public).
In your gist, name the filename openpgp.md
.
Set the content of that file to:
openpgp4fpr:<your key fingerprint>
Now, add the claim to your GPG key by adding the notation proof@ariadne.id=https://gist.github.com/larsks/<gist id>
. You can see my claim at https://gist.github.com/larsks/9224f58cf82bdf95ef591a6703eb91c7; the notation I added to my key is:
proof@ariadne.id=https://gist.github.com/larsks/9224f58cf82bdf95ef591a6703eb91c7
After adding the claim, update your published key.
You’ll note that none of the previous steps required interacting with KeyOxide. That’s because KeyOxide doesn’t actually store any of your data: it just provides a mechanism for visualizing and verifying claims.
You can look up an identity by email address or by GPG key fingerprint.
To look up an identity using an email address:
https://keyoxide.org/<email address
. For example, to find my identity, visit https://keyoxide.org/lars@oddbit.com.To look up an identity by key fingerprint:
https://keyoxide.org/<fingerprint>
. For example, to find my identity, visit https://keyoxide.org/3e70a502bb5255b6bb8e86be362d63a80853d4cf.The pedantic among you will already be writing to me about how PGP is the standard and GPG is an implementation of that standard, but I’m going to stick with this nomenclature for the sake of simplicity. ↩︎
For some thoughts on key expiration, see this question on the Information Security StackExchange. ↩︎
Hello, future me. This is for you next time you want to do this.
When setting up the CI for a project I will sometimes end up with a tremendous clutter of workflow runs. Sometimes they have embarrassing mistakes. Who wants to show that to people? I was trying to figure out how to bulk delete workflow runs from the CLI, and I came up with something that works:
gh run list --json databaseId -q '.[].databaseId' |
xargs -IID gh api \
"repos/$(gh repo view --json nameWithOwner -q .nameWithOwner)/actions/runs/ID" \
-X DELETE
This will delete all (well, up to 20, or whatever you set in --limit
) your workflow runs. You can add flags to gh run list
to filter runs by workflow or by triggering user.
We are working with an application that produces resource utilization reports for clients of our OpenShift-based cloud environments. The developers working with the application have been reporting mysterious issues concerning connection timeouts between the application and the database (a MariaDB instance). For a long time we had only high-level verbal descriptions of the problem (“I’m seeing a lot of connection timeouts!”) and a variety of unsubstantiated theories (from multiple sources) about the cause. Absent a solid reproducer of the behavior in question, we looked at other aspects of our infrastructure:
What was going on?
I was finally able to get my hands on container images, deployment manifests, and instructions to reproduce the problem this past Friday. After working through some initial errors that weren’t the errors we were looking for (insert Jedi hand gesture here), I was able to see the behavior in practice. In a section of code that makes a number of connections to the database, we were seeing:
Failed to create databases:
Command returned non-zero value '1': ERROR 2003 (HY000): Can't connect to MySQL server on 'mariadb' (110)
#0 /usr/share/xdmod/classes/CCR/DB/MySQLHelper.php(521): CCR\DB\MySQLHelper::staticExecuteCommand(Array)
#1 /usr/share/xdmod/classes/CCR/DB/MySQLHelper.php(332): CCR\DB\MySQLHelper::staticExecuteStatement('mariadb', '3306', 'root', 'pass', NULL, 'SELECT SCHEMA_N...')
#2 /usr/share/xdmod/classes/OpenXdmod/Shared/DatabaseHelper.php(65): CCR\DB\MySQLHelper::databaseExists('mariadb', '3306', 'root', 'pass', 'mod_logger')
#3 /usr/share/xdmod/classes/OpenXdmod/Setup/DatabaseSetupItem.php(39): OpenXdmod\Shared\DatabaseHelper::createDatabases('root', 'pass', Array, Array, Object(OpenXdmod\Setup\Console))
#4 /usr/share/xdmod/classes/OpenXdmod/Setup/DatabaseSetup.php(109): OpenXdmod\Setup\DatabaseSetupItem->createDatabases('root', 'pass', Array, Array)
#5 /usr/share/xdmod/classes/OpenXdmod/Setup/Menu.php(69): OpenXdmod\Setup\DatabaseSetup->handle()
#6 /usr/bin/xdmod-setup(37): OpenXdmod\Setup\Menu->display()
#7 /usr/bin/xdmod-setup(22): main()
#8 {main}
Where 110
is ETIMEDOUT
, “Connection timed out”.
The application consists of two Deployment resources, one that manages a MariaDB pod and another that manages the application itself. There are also the usual suspects, such as PersistentVolumeClaims for the database backing store, etc, and a Service to allow the application to access the database.
While looking at this problem, I attempted to look at the logs for the application by running:
kubectl logs deploy/moc-xdmod
But to my surprise, I found myself looking at the logs for the MariaDB container instead…which provided me just about all the information I needed about the problem.
To understand what’s going on, let’s first take a closer look at a Deployment manifest. The basic framework is something like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: example
spec:
selector:
matchLabels:
app: example
strategy:
type: Recreate
template:
metadata:
labels:
app: example
spec:
containers:
- name: example
image: docker.io/alpine:latest
command:
- sleep
- inf
There are labels in three places in this manifest:
The Deployment itself has labels in the metadata
section.
There are labels in spec.template.metadata
that will be applied to Pods spawned by the Deployment.
There are labels in spec.selector
which, in the words of [the documentation]:
defines how the Deployment finds which Pods to manage
It’s not spelled out explicitly anywhere, but the spec.selector
field is also used to identify to which pods to attach when using the Deployment name in a command like kubectl logs
: that is, given the above manifest, running kubectl logs deploy/example
would look for pods that have label app
set to example
.
With this in mind, let’s take a look at how our application manifests are being deployed. Like most of our applications, this is deployed using Kustomize. The kustomization.yaml
file for the application manifests looked like this:
commonLabels:
app: xdmod
resources:
- svc-mariadb.yaml
- deployment-mariadb.yaml
- deployment-xdmod.yaml
That commonLabels
statement will apply the label app: xdmod
to all of the resources managed by the kustomization.yaml
file. The Deployments looked like this:
For MariaDB:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mariadb
spec:
selector:
matchLabels:
app: mariadb
template:
metadata:
labels:
app: mariadb
For the application experience connection problems:
apiVersion: apps/v1
kind: Deployment
metadata:
name: moc-xdmod
spec:
selector:
matchLabels:
app: xdmod
template:
metadata:
labels:
app: xdmod
The problem here is that when these are processed by kustomize
, the app
label hardcoded in the manifests will be replaced by the app
label defined in the commonLabels
section of kustomization.yaml
. When we run kustomize build
on these manifests, we will have as output:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: xdmod
name: mariadb
spec:
selector:
matchLabels:
app: xdmod
template:
metadata:
labels:
app: xdmod
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: xdmod
name: moc-xdmod
spec:
selector:
matchLabels:
app: xdmod
template:
metadata:
labels:
app: xdmod
In other words, all of our pods will have the same labels (because the spec.template.metadata.labels
section is identical in both Deployments). When I run kubectl logs deploy/moc-xdmod
, I’m just getting whatever the first match is for a query that is effectively the same as kubectl get pod -l app=xdmod
.
So, that’s what was going on with the kubectl logs
command.
A Service manifest in Kubernetes looks something like this:
apiVersion: v1
kind: Service
metadata:
name: mariadb
spec:
selector:
app: mariadb
ports:
- protocol: TCP
port: 3306
targetPort: 3306
Here, spec.selector
has a function very similar to what it had in a Deployment
: it selects pods to which the Service will direct traffic. From the documentation, we know that a Service proxy will select a backend either in a round-robin fashion (using the legacy user-space proxy) or in a random fashion (using the iptables proxy) (there is also an IPVS proxy mode, but that’s not available in our environment).
Given what we know from the previous section about Deployments, you can probably see what’s going on here:
spec.selector
.We can see the impact of this behavior by running a simple loop that attempts to connect to MariaDB and run a query:
while :; do
_start=$SECONDS
echo -n "$(date +%T) "
timeout 10 mysql -h mariadb -uroot -ppass -e 'select 1' > /dev/null && echo -n OKAY || echo -n FAILED
echo " $(( SECONDS - _start))"
sleep 1
done
Which outputs:
01:41:30 OKAY 1
01:41:32 OKAY 0
01:41:33 OKAY 1
01:41:35 OKAY 0
01:41:36 OKAY 3
01:41:40 OKAY 1
01:41:42 OKAY 0
01:41:43 OKAY 3
01:41:47 OKAY 3
01:41:51 OKAY 4
01:41:56 OKAY 1
01:41:58 OKAY 1
01:42:00 FAILED 10
01:42:10 OKAY 0
01:42:11 OKAY 0
Here we can see that connection time is highly variable, and we occasionally hit the 10 second timeout imposed by the timeout
call.
In order to resolve this behavior, we want to ensure (a) that Pods managed by a Deployment are uniquely identified by their labels and that (b) spec.selector
for both Deployments and Services will only select the appropriate Pods. We can do this with a few simple changes.
It’s useful to apply some labels consistently across all of the resource we generate, so we’ll keep the existing commonLabels
section of our kustomization.yaml
:
commonLabels:
app: xdmod
But then in each Deployment we’ll add a component
label identifying the specific service, like this:
apiVersion: apps/v1
kind: Deployment
metadata:
name: mariadb
labels:
component: mariadb
spec:
selector:
matchLabels:
component: mariadb
template:
metadata:
labels:
component: mariadb
When we generate the final manifest with kustomize
, we end up with:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: xdmod
component: mariadb
name: mariadb
spec:
selector:
matchLabels:
app: xdmod
component: mariadb
template:
metadata:
labels:
app: xdmod
component: mariadb
In the above output, you can see that kustomize
has combined the commonLabel
definition with the labels configured individually in the manifests. With this change, spec.selector
will now select only the pod in which MariaDB is running.
We’ll similarly modify the Service manifest to look like:
apiVersion: v1
kind: Service
metadata:
name: mariadb
spec:
selector:
component: mariadb
ports:
- protocol: TCP
port: 3306
targetPort: 3306
Resulting in a generated manifest that looks like:
apiVersion: v1
kind: Service
metadata:
labels:
app: xdmod
name: mariadb
spec:
ports:
- port: 3306
protocol: TCP
targetPort: 3306
selector:
app: xdmod
component: mariadb
Which, as with the Deployment, will now select only the correct pods.
With these changes in place, if we re-run the test loop I presented earlier, we see as output:
01:57:27 OKAY 0
01:57:28 OKAY 0
01:57:29 OKAY 0
01:57:30 OKAY 0
01:57:31 OKAY 0
01:57:32 OKAY 0
01:57:33 OKAY 0
01:57:34 OKAY 0
01:57:35 OKAY 0
01:57:36 OKAY 0
01:57:37 OKAY 0
01:57:38 OKAY 0
01:57:39 OKAY 0
01:57:40 OKAY 0
There is no variability in connection time, and there are no timeouts.
This post is mostly for myself: I find the Traefik documentation hard to navigate, so having figured this out in response to a question on Stack Overflow, I’m putting it here to help it stick in my head.
The question asks essentially how to perform port-based routing of requests to containers, so that a request for http://example.com
goes to one container while a request for http://example.com:9090
goes to a different container.
A default Traefik configuration will already have a listener on port 80, but if we want to accept connections on port 9090 we need to create a new listener: what Traefik calls an entrypoint. We do this using the --entrypoints.<name>.address
option. For example, --entrypoints.ep1.address=80
creates an entrypoint named ep1
on port 80, while --entrypoints.ep2.address=9090
creates an entrypoint named ep2
on port 9090. Those names are important because we’ll use them for mapping containers to the appropriate listener later on.
This gives us a Traefik configuration that looks something like:
proxy:
image: traefik:latest
command:
- --api.insecure=true
- --providers.docker
- --entrypoints.ep1.address=:80
- --entrypoints.ep2.address=:9090
ports:
- "80:80"
- "127.0.0.1:8080:8080"
- "9090:9090"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
We need to publish ports 80
and 9090
on the host in order to accept connections. Port 8080 is by default the Traefik dashboard; in this configuration I have it bound to localhost
because I don’t want to provide external access to the dashboard.
Now we need to configure our services so that connections on ports 80 and 9090 will get routed to the appropriate containers. We do this using the traefik.http.routers.<name>.entrypoints
label. Here’s a simple example:
app1:
image: docker.io/alpinelinux/darkhttpd:latest
labels:
- traefik.http.routers.app1.entrypoints=ep1
- traefik.http.routers.app1.rule=Host(`example.com`)
In the above configuration, we’re using the following labels:
traefik.http.routers.app1.entrypoints=ep1
This binds our app1
container to the ep1
entrypoint.
traefik.http.routers.app1.rule=Host(`example.com`)
This matches requests with Host: example.com
.
So in combination, these two rules say that any request on port 80 for Host: example.com
will be routed to the app1
container.
To get port 9090
routed to a second container, we add:
app2:
image: docker.io/alpinelinux/darkhttpd:latest
labels:
- traefik.http.routers.app2.rule=Host(`example.com`)
- traefik.http.routers.app2.entrypoints=ep2
This is the same thing, except we use entrypoint ep2
.
With everything running, we can watch the logs from docker-compose up
and see that a request on port 80:
curl -H 'host: example.com' localhost
Is serviced by app1
:
app1_1 | 172.20.0.2 - - [21/Jun/2022:02:44:11 +0000] "GET / HTTP/1.1" 200 354 "" "curl/7.76.1"
And that request on port 9090:
curl -H 'host: example.com' localhost:9090
Is serviced by app2
:
app2_1 | 172.20.0.2 - - [21/Jun/2022:02:44:39 +0000] "GET / HTTP/1.1" 200 354 "" "curl/7.76.1"
The complete docker-compose.yaml
file from this post looks like:
version: "3"
services:
proxy:
image: traefik:latest
command:
- --api.insecure=true
- --providers.docker
- --entrypoints.ep1.address=:80
- --entrypoints.ep2.address=:9090
ports:
- "80:80"
- "8080:8080"
- "9090:9090"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
app1:
image: docker.io/alpinelinux/darkhttpd:latest
labels:
- traefik.http.routers.app1.rule=Host(`example.com`)
- traefik.http.routers.app1.entrypoints=ep1
app2:
image: docker.io/alpinelinux/darkhttpd:latest
labels:
- traefik.http.routers.app2.rule=Host(`example.com`)
- traefik.http.routers.app2.entrypoints=ep2
The command to run the formatting tests for the keystone project is:
tox -e pe8
Running this on Fedora35 failed for me with this error:
ERROR: pep8: could not install deps [-chttps://releases.openstack.org/constraints/upper/master, -r/opt/stack/keystone/test-requirements.txt, .[ldap,memcache,mongodb]]; v = InvocationError("/opt/stack/keystone/.tox/pep8/bin/python -m pip install -chttps://releases.openstack.org/constraints/upper/master -r/opt/stack/keystone/test-requirements.txt '.[ldap,memcache,mongodb]'", 1)
What gets swallowed up is the actual error in the install, and it has to do with the fact that the python dependencies are compiled against native libraries. If I activate the venv and run the command by hand, I can see the first failure. But if I look up at the previous output, I can see it, just buried a few screens up:
Error: pg_config executable not found.
A Later error was due to the compile step erroring out looking for lber.h:
In file included from Modules/LDAPObject.c:3:
Modules/common.h:15:10: fatal error: lber.h: No such file or directory
15 | #include
| ^~~~~~~~
compilation terminated.
error: command '/usr/bin/gcc' failed with exit code 1
To get the build to run, I need to install both libpq-devel and libldap-devel and now it fails like this:
File "/opt/stack/keystone/.tox/pep8/lib/python3.10/site-packages/pep257.py", line 24, in
from collections import defaultdict, namedtuple, Set
ImportError: cannot import name 'Set' from 'collections' (/usr/lib64/python3.10/collections/__init__.py)
This appears to be due to the version of python3 on my system (3.10) which is later than supported by upstream openstack. I do have python3.9 installed on my system, and can modify the tox.ini to use it by specifying the basepython version.
[testenv:pep8]
basepython = python3.9
deps =
And then I can run tox -e pep8.
I got keystone in my Bifrost install to talk via LDAP to our Freeipa server. Here’s what I had to do.
I started with a new install of bifrost, using Keystone and TLS.
./bifrost-cli install --enable-keystone --enable-tls --network-interface enP4p4s0f0np0 --dhcp-pool 192.168.116.25-192.168.116.75
After making sure that Keystone could work for normal things;
source /opt/stack/bifrost/bin/activate
export OS_CLOUD=bifrost-admin
openstack user list -f yaml
- ID: 1751a5bb8b4a4f0188069f8cb4f8e333
Name: admin
- ID: 5942330b4f2c4822a9f2cdf45ad755ed
Name: ironic
- ID: 43e30ad5bf0349b7b351ca2e86fd1628
Name: ironic_inspector
- ID: 0c490e9d44204cc18ec1e507f2a07f83
Name: bifrost_user
I had to install python3-ldap and python3-ldappool .
sudo apt install python3-ldap python3-ldappool
Now create a domain for the LDAP data.
openstack domain create freeipa
...
openstack domain show freeipa -f yaml
description: ''
enabled: true
id: 422608e5c8d8428cb022792b459d30bf
name: freeipa
options: {}
tags: []
Edit /etc/keystone/keystone.conf to support domin specific backends and back them with file config. When you are done, your identity section should look like this.
[identity]
domain_specific_drivers_enabled=true
domain_config_dir=/etc/keystone/domains
driver = sql
Create the corresponding directory for the new configuration files.
sudo mkdir /etc/keystone/domains/
Add in a configuration file for your LDAP server. Since I called my domain freeipa I have to name the config file /etc/keystone/domains/keystone.freeipa.conf
[identity]
driver = ldap
[ldap]
url = ldap://den-admin-01
user_tree_dn = cn=users,cn=accounts,dc=younglogic,dc=com
user_objectclass = person
user_id_attribute = uid
user_name_attribute = uid
user_mail_attribute = mail
user_allow_create = false
user_allow_update = false
user_allow_delete = false
group_tree_dn = cn=groups,cn=accounts,dc=younglogic,dc=com
group_objectclass = groupOfNames
group_id_attribute = cn
group_name_attribute = cn
group_member_attribute = member
group_desc_attribute = description
group_allow_create = false
group_allow_update = false
group_allow_delete = false
user_enabled_attribute = nsAccountLock
user_enabled_default = False
user_enabled_invert = true
To make changes, to restart sudo systemctl restart uwsgi@keystone-public
sudo systemctl restart uwsgi@keystone-public
And test that it worked
openstack user list -f yaml --domain freeipa
- ID: b3054e3942f06016f8b9669b068e81fd2950b08c46ccb48032c6c67053e03767
Name: renee
- ID: d30e7bc818d2f633439d982783a2d145e324e3187c0e67f71d80fbab065d096a
Name: ann
This same approach can work if you need to add more than one LDAP server to your Keystone deployment.
The RDO community is pleased to announce the general availability of the RDO build for OpenStack Yoga for RPM-based distributions, CentOS Stream and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Yoga is the 25th release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.
The release is already available on the CentOS mirror network:
The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Stream and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS users looking to build and maintain their own on-premise, public or hybrid clouds.
All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.
Interesting things in the Yoga release include:
The highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/yoga/highlights.html
TripleO in the RDO Yoga release:
Since the Xena development cycle, TripleO follows the Independent release model and will only maintain branches for selected OpenStack releases. In the case of Yoga, TripleO will not support the Yoga release. For TripleO users in RDO, this means that:
You can find details about this on the RDO Webpage
Contributors
During the Yoga cycle, we saw the following new RDO contributors:
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 40 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:
The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Zed.
Get Started
To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.
Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.
Get Help
The RDO Project has the users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.
The #rdo channel on OFTC. IRC is also an excellent place to find and give help.
We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel in Libera Chat network, and #tripleo on OFTC), however we have a more focused audience within the RDO venues.
Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.
Join us in #rdo and #tripleo on the OFTC IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.
To make the AARCH64 ipxe process work using bifrost, I had to
git clone https://github.com/ipxe/ipxe.git
cd ipxe/src/
make bin-arm64-efi/snponly.efi ARCH=arm64
sudo cp bin-arm64-efi/snponly.efi /var/lib/tftpboot/ipxe.efi
This works for the Ampere reference implementation servers that use a Mellanox network interface card, which supports (only) snp.
For the past week I worked on getting a Ironic standalone to run on an Ampere AltraMax server in our lab. As I recently was able to get a baremetal node to boot, I wanted to record the steps I went through.
Our base operating system for this install is Ubuntu 20.04.
The controller node has 2 Mellanox Technologies MT27710 network cards, each with 2 ports apiece.
I started by following the steps to install with the bifrost-cli. However, there were a few places where the installation assumes an x86_64 architecture, and I hard-swapped them to be AARCH64/ARM64 specific:
$ git diff HEAD
diff --git a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
index 18e281b0..277bfc1c 100644
--- a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
+++ b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Debian_family.yml
@@ -6,8 +6,8 @@ ironic_rootwrap_dir: /usr/local/bin/
mysql_service_name: mysql
tftp_service_name: tftpd-hpa
efi_distro: debian
-grub_efi_binary: /usr/lib/grub/x86_64-efi-signed/grubx64.efi.signed
-shim_efi_binary: /usr/lib/shim/shimx64.efi.signed
+grub_efi_binary: /usr/lib/grub/arm64-efi-signed/grubaa64.efi.signed
+shim_efi_binary: /usr/lib/shim/shimaa64.efi.signed
required_packages:
- mariadb-server
- python3-dev
diff --git a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
index 7fcbcd46..4d6a1337 100644
--- a/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
+++ b/playbooks/roles/bifrost-ironic-install/defaults/required_defaults_Ubuntu.yml
@@ -26,7 +26,7 @@ required_packages:
- dnsmasq
- apache2-utils
- isolinux
- - grub-efi-amd64-signed
+ - grub-efi-arm64-signed
- shim-signed
- dosfstools
# NOTE(TheJulia): The above entry for dnsmasq must be the last entry in the
The long term approach to these is to make those variables architecture specific.
In order to install, I ran the cli:
./bifrost-cli install --network-interface enP4p4s0f1 --dhcp-pool 192.168.116.100-192.168.116.150
It took me several tries with -e variables until realized that it was not going to honor them. I did notice that the heart of the command was the Ansible call, which I ended up running directly:
/opt/stack/bifrost/bin/ansible-playbook ~/bifrost/playbooks/install.yaml -i ~/bifrost/playbooks/inventory/target -e bifrost_venv_dir=/opt/stack/bifrost -e @/home/ansible/bifrost/baremetal-install-env.json
You may notice that I added a -e with the baremetal-install-env.json file. That file had been created by the earlier CLI run., and contained the variables specific to my install. I also edited it to trigger the build of the ironic cleaning image.
{
"create_ipa_image": false,
"create_image_via_dib": false,
"install_dib": true,
"network_interface": "enP4p4s0f1",
"enable_keystone": false,
"enable_tls": false,
"generate_tls": false,
"noauth_mode": false,
"enabled_hardware_types": "ipmi,redfish,manual-management",
"cleaning_disk_erase": false,
"testing": false,
"use_cirros": false,
"use_tinyipa": false,
"developer_mode": false,
"enable_prometheus_exporter": false,
"default_boot_mode": "uefi",
"include_dhcp_server": true,
"dhcp_pool_start": "192.168.116.100",
"dhcp_pool_end": "192.168.116.150",
"download_ipa": false,
"create_ipa_image": true
}
With this ins place, I was able to enroll nodes using the Bifrost cli:
~/bifrost/bifrost-cli enroll ~/nodes.json
I prefer this to using my own script. However, my script checks for existence and thus can be run idempotently, unlike this one. Still, I like the file format and will likely script to it in the future.
WIth this, I was ready to try booting the nodes, but they hung as I reported in an earlier article.
The other place where the deployment is x86_64 specific is the iPXE binary. In a bifrost install on Ubuntu, the binary is called ipxe.efi, and it is placed in /var/lib/tftpboot/ipxe.efi. It is copied from the grub-ipxe package which places it in /boot/ipxe.efi. Although this package is not tagged as an x86_64 architecture (Debian/Ubuntu call it all) the file is architecture specific.
I went through the steps to fetch and install the latest one out of jammy which has an additional file: /boot/ipxe-arm64.efi. However, when I replaced the file /var/lib/tftpboot/ipxe.efi with this one, the baremetal node still failed to boot, although it did get a few steps further in the process.
The issue, as I understand it, is that the binary needs as set of drivers to set up the http request in the network interface cards, and the build in the Ubuntu package did not have that. Instead, I cloned the source git repo and compiled the binary directly. Roughly
git clone https://github.com/ipxe/ipxe.git
cd ipxe/src
make bin-arm64-efi/snponly.efi ARCH=arm64
SNP stands for the Simple Network Protocol. I guess this protocol is esoteric enough that Wikipedia has not heard of it.
The header file in the code says this:
The EFI_SIMPLE_NETWORK_PROTOCOL provides services to initialize a network interface, transmit packets, receive packets, and close a network interface.
It seems the Mellanox cards support/require SNP. With this file in place, I was able to get the cleaning image to PXE boot.
I call this a spike as it has a lot of corners cut in it that I would not want to maintain in production. We’ll work with the distributions to get a viable version of ipxe.efi produced that can work for an array of servers, including Ampere’s. In the meantime, I need a strategy to handle building our own binary. I also plan on reworking the Bifrost variables to handle ARM64/AARCH64 along side x86_64; a single server should be able to handle both based on the Architecture flag sent in the initial DHCP request.
Note: I was not able to get the cleaning image to boot, as it had an issue with werkzeug and JSON. However, I had an older build of the IPA kernel and initrd that I used, and the node properly deployed and cleaned.
And yes, I plan on integrating Keystone in the future, too.
There are a handful of questions a user will (implicitly) ask when using your API:
Answering these questions can be automated. The user, and the tools they use, can discover the answers by working with the system. That is what I mean when I use the word “Discoverability.”
We missed some opportunities to answer these questions when we designed the APIs for Keystone OpenStack. I’d like to talk about how to improve on what we did there.
First I’d like to state what not to do.
Don’t make the user read the documentation and code to an external spec.
Never require a user to manually perform an operation that should be automated. Answering every one of those question can be automated. If you can get it wrong, you will get it wrong. Make it possible to catch errors as early as possible.
Lets start with the question: “What actions can I do against this endpoint?” In the case of Keystone, the answer would be some of the following:
Create, Read, Update and Delete (CRUD) Users, Groups of Users, Projects, Roles, and Catalog Items such as Services and Endpoints. You can also CRUD relationships between these entities. You can CRUD Entities for Federated Identity. You can CRUD Policy files (historical). Taken in total, you have the tools to make access control decisions for a wide array of services, not just Keystone.
The primary way, however, that people interact with Keystone is to get a token. Let’s use this use case to start. To Get a token, you make a POST to the $OS_AUTH_URL/v3/auth/tokens/ URL. The data
How would you know this? Only by reading the documentation. If someone handed you the value of their OS_AUTH_URL environment variable, and you looked at it using a web client, what would you get? Really, just the version URL. Assuming you chopped off the V3:
$ curl http://10.76.10.254:35357/
{"versions": {"values": [{"id": "v3.14", "status": "stable", "updated": "2020-04-07T00:00:00Z", "links": [{"rel": "self", "href": "http://10.76.10.254:35357/v3/"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}]}}
and the only URL in there is the version URL, which gives you back the same thing.
If you point a web browser at the service, the output is in JSON, even though the web browser told the server that it preferred HTML.
What could this look like: If we look at the API spec for Keystone: We can see that the various entities referred to Above hat fairly predictable URL forms. However, for this use case, we want a token, so we should, at a minimum, see the path to get to the token. Since this is the V3 API, we should See an entry like this:
{"rel": "auth", "href": "http://10.76.10.254:35357/v3/auth"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}
And is we then performed an HTTP GET on http://10.76.10.254:35357/v3/auth we should see a link to :
{"rel": "token", "href": "http://10.76.10.254:35357/v3/auth/token"}], "media-types": [{"base": "application/json", "type": "application/vnd.openstack.identity-v3+json"}]}
Is this 100% of the solution? No. The Keystone API shows its prejudices toward PASSWORD based authentication, a very big antipattern. The Password goes in clear text into the middle of the JSON blob posted to this API. We trust in SSL/TLS to secure it over the wire, and have had to erase from logs and debugging. This is actually a step backwards from BASIC_AUTH in HTTP. All this aside, there is still no way to tell what you need to put into the body of the token request without reading the documentation….unless you know the magic of JSON-HOME.
Here is what you would need to do to get a list of the top level URLS, excluding all the ones that are templated, and thus require knowing an ID.
curl 10.76.116.63:5000 -H "Accept: application/json-home" | jq '.resources | to_entries | .[] | .value | .href ' | sort -u
This would be the friendly list to return from the /v3 page. Or, if we wanted to streamline it a bit for human consumption, we could put a top level grouping around each of these APIs. A friendlier list would look like this (chopping off the /v3)
There are a couple ways to order the list. Alphabetical order is the simplest for an English speaker if they know what they are looking for. This won’t internationalize, and it won’t guide the user to the use cases that are most common. Thus, I put auth at the top, as that is, by far, the most common use case. The others I have organized based on a quick think-through from most to least common. I could easily be convinced to restructure this a couple different ways.
However, we are starting to trip over some of the other aspects of usability. We have provided the user with way more information than they need, or, indeed, can use at this point. Since none of those operations can be performed unauthenticated, we have lead the user astray; we should show them, at this stage, only what they can do in their current state. Thus: the obvious entry would be.
Lets continue on with the old-school version of a token request using the v3/auth/tokens resource, as that is the most common use case. How now does a user request a token? Depends on whether they want to use password or another token, or multifactor, and whether they want an unscoped token or a scoped token.
None of this information is in the JSON home. You have to read the docs.
If we were using straight HTML to render the response, we would expect a form. Something along the lines of:
There is, as of now, no standard way to put form data into JSON. However, there are numerous standards to chose from. One such standard is FormData API. JSON Scheme https://json-schema.org/. If we look at the API do, we get a table that specifies the name. Anything that is not a single value is specified as an object, which really means a JSON object which is a dictionary that can bee deeply nested. We can see the complexity in the above form, where the scope value determines what is meant by the project/domain name field. And these versions don’t allow for IDs to be used instead of the names for users, projects, or domains.
A lot of the custom approach here is dictated by the fact that Keystone does not accept standard authentication. The Password based token request could easily be replaced with BASIC-AUTH. Tokens themselves could be stored as session cookies, with the same timeouts as the token expiration. All of the One-Offs in Keystone make it more difficult to use, and require more application specific knowledge.
Many of these issues were straightened out when we started doing federation. Now, there is still some out-of-band knowledge required to use the Federated API, but this was due to concerns about information leaking that I am going to ignore for now. The approach I am going to describe is basically what is used by any app that allows you to log in from the different cloud providers Identity sources today.
From the /v3 page, a user should be able to select the identity provider that they want to use. This could require a jump to /v3/FEDERATION and then to /v3/FEDERATION/idp, in order to keep things namespaced, or the list could be expanded in the /v3 page if there is really nothing else that a user can do unauthenticated.
Let us assume a case where there are three companies that all share access to the cloud; Acme, Beta, and Charlie. The JSON response would be the same as the list identity providers API. The interesting part of the result is this one here:
"protocols": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols"
Lets say that a given Identity provider supports multiple protocols. Here is where the user gets to chose which hone they want to use to try and authenticate. An HTTP GET on the link above would return that list: The documentation shows an example of an identity provider that supports saml2. Here is an expanded one that shows the set of protocols a user could expect in a private cloud running FreeIPA and Keycloak, or Active Directory and ADFS.
{
"links": {
"next": null,
"previous": null,
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols"
},
"protocols": [
{
"id": "saml2",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/saml2"
},
"mapping_id": "xyz234"
},
{
"id": "x509",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/x509"
},
"mapping_id": "xyz235"
},
{
"id": "gssapi",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/gssapi"
},
"mapping_id": "xyz236"
},
{
"id": "oidc",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/oidc"
},
"mapping_id": "xyz237"
},
{
"id": "basic-auth",
"links": {
"identity_provider": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME",
"self": "http://example.com/identity/v3/OS-FEDERATION/identity_providers/ACME/protocols/basic-auth"
},
"mapping_id": "xyz238"
}
]
}
Note that this is very similar to the content that a web browser gives back in a 401 response: the set of acceptable authentication mechanisms. I actually prefer this here, as it actually allows the user to select the appropriate mechanism for the use case, which may vary depending on where the use connects from.
Lets ignore the actual response from the above links and assume that, if the user is unauthenticated, they merely get a link to where they can authenticate. /v3/OS-FEDERATION/identity_providers/{idp_id}/protocols/{protocol_id}/auth. The follow on link is a GET. Not a POST. There is no form Data required. The mapping resolves the users Domain Name/ID, so there is no need to provide that information, and the token is a Federated unscoped token.
The actual response contains the list of groups that a user belongs to. This is an artifact of the mapping, and it is useful for debugging. However, what the user has at this point is, effectively, an unscoped token. It is passed in the X-Subject-Token header, and not in the session cookie. However, for an HTML based workflow, and, indeed, for sane HTTP workflows against Keystone, a session scoped cookie containing the token would be much more useful.
With an unscoped token, a user can perform some operations against a Keystone server, but those operations are either read-only, operations specific to the user, or administrative actions specific to the Keystone server. For OpenStack, the vast majority of the time the user is going to Keystone to request a scoped token to use on one of the other services. As such, the user probably needs to convert the unscoped token shown above to a token scoped to a project. A very common setup has the user assigned to a single project. Even if they are scoped to multiple, it is unlikely that they are scoped to many. Thus, the obvious next step is to show the user a URL that will allow them to get a token scoped to a specific project.
Keystone does not have such a URL. In Keystone, again you are required to go through /v3/auth/tokens to request a scoped token.
A much friendlier URL scheme would be /v3/auth/projects which lists the set of projects a user can request a token for, and /v3/auth/project/{id} which lets a user request a scoped token for that project
However, even if we had such a URL pattern, we would need to direct the user to that URL. There are two distinct use cases. The first is the case where the user has just authenticated, and in the token response, they need to see the project list URL. A redirect makes the most sense, although the list of projects could also be in authentication response. However, the user might also be returning to the Keystone server from some other operation, still have the session cookie with the token in it, and start at the discovery page again. IN this case, the /v3/ response should show /v3/auth/projects/ in its list.
There is, unfortunately, one case where this would be problematic. With Hierarchical projects, a single assignment could allow a user to get a token for many projects. While this is a useful hack in practice, it means that the project list page could get extremely long. This is, unfortunately also the case with the project list page itself; projects may be nested, but the namespace needs to be flat, and listing projects will list all of them, only the parent-project ID distinguishes them. Since we do have ways to do path nesting in HTTP, this is a solvable problem. Lets lump the token request and the project list APIs together. This actually makes a very elegant solution;
Instead of /v3/auth/projects we put a link off the project page itelf back to /v3/auth/tokens but accepting the project ID as a URL parameter, like this: /v3/auth/tokens?project_id=abc123.
Of course, this means that there is a hidden mechanism now. If a user wants to look at any resource in Keystone, they can do so with an unscoped token, provided they have a role assignment on the project or domain that manages that object.
To this point we have discussed implicit answers to the questions of finding URLs and discovering what actions a user can perform. For the token request, is started discussing how to provide the answer to “What information do I need to provide in order to perform this action?” I think now we can state how to do that: the list page for any collection should either provide an inline form or a link to a form URL. The form provides the information in a format that makes sense for the content type. If the user does not have the permission to create the object, they should not see the form. If the form is on a separate link, a user that cannot create that object should get back a 403 error if they attempt to GET the URL.
If Keystone had been written to return HTML when hit by a browser instead of JSON, all of this navigation would have been painfully obvious. Instead, we subscribed to the point of view that UI was to be done by the Horizon server.
There still remains the last question: “What permission do I need in order to perform this action?” The user only thinks to answer this question when they come across an operation that they cannot perform. I’ll did deeper into this in the next article
Kolla creates an admin.rc file using the environment variables. I want to then use this in a terraform plan, but I’d rather not generate terrafoprm specific code for the Keystone login data. So, a simple python script converts from env vars to yaml.
#!/usr/bin/python3
import os
import yaml
clouds = {
"clouds":{
"cluster": {
"auth" : {
"auth_url" : os.environ["OS_AUTH_URL"],
"project_name": os.environ["OS_PROJECT_NAME"],
"project_domain_name": os.environ["OS_PROJECT_DOMAIN_NAME"],
"username": os.environ["OS_USERNAME"],
"user_domain_name": os.environ["OS_USER_DOMAIN_NAME"],
"password": os.environ["OS_PASSWORD"]
}
}
}
}
print (yaml.dump(clouds))
To use it:
./clouds.py > clouds.yaml
Note that you should have sourced the appropriate config environment variables file, such as :
. /etc/kolla/admin-openrc.sh
I like to fiddle with Micropython, particularly on the Wemos D1 Mini, because these are such a neat form factor. Unfortunately, they have a cheap CH340 serial adapter on board, which means that from the perspective of Linux these devices are all functionally identical – there’s no way to identify one device from another. This by itself would be a manageable problem, except that the device names assigned to these devices aren’t constant: depending on the order in which they get plugged in (and the order in which they are detected at boot), a device might be /dev/ttyUSB0
one day and /dev/ttyUSB2
another day.
On more than one occasion, I have accidentally re-flashed the wrong device. Ouch.
A common solution to this problem is to create device names based on the USB topology – that is, assign names based on a device’s position in the USB bus: e.g., when attaching a new USB serial device, expose it at something like /dev/usbserial/<bus>/<device_path>
. While that sounds conceptually simple, it took me a while to figure out the correct udev rules.
Looking at the available attributes for a serial device, we see:
# udevadm info -a -n /dev/ttyUSB0
[...]
looking at device '/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3/3-1.4.3:1.0/ttyUSB0/tty/ttyUSB0':
KERNEL=="ttyUSB0"
SUBSYSTEM=="tty"
DRIVER==""
ATTR{power/control}=="auto"
ATTR{power/runtime_active_time}=="0"
ATTR{power/runtime_status}=="unsupported"
ATTR{power/runtime_suspended_time}=="0"
looking at parent device '/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3/3-1.4.3:1.0/ttyUSB0':
KERNELS=="ttyUSB0"
SUBSYSTEMS=="usb-serial"
DRIVERS=="ch341-uart"
ATTRS{port_number}=="0"
ATTRS{power/control}=="auto"
ATTRS{power/runtime_active_time}=="0"
ATTRS{power/runtime_status}=="unsupported"
ATTRS{power/runtime_suspended_time}=="0"
looking at parent device '/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3/3-1.4.3:1.0':
KERNELS=="3-1.4.3:1.0"
SUBSYSTEMS=="usb"
DRIVERS=="ch341"
ATTRS{authorized}=="1"
ATTRS{bAlternateSetting}==" 0"
ATTRS{bInterfaceClass}=="ff"
ATTRS{bInterfaceNumber}=="00"
ATTRS{bInterfaceProtocol}=="02"
ATTRS{bInterfaceSubClass}=="01"
ATTRS{bNumEndpoints}=="03"
ATTRS{supports_autosuspend}=="1"
looking at parent device '/devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3':
KERNELS=="3-1.4.3"
SUBSYSTEMS=="usb"
DRIVERS=="usb"
ATTRS{authorized}=="1"
ATTRS{avoid_reset_quirk}=="0"
ATTRS{bConfigurationValue}=="1"
ATTRS{bDeviceClass}=="ff"
ATTRS{bDeviceProtocol}=="00"
ATTRS{bDeviceSubClass}=="00"
ATTRS{bMaxPacketSize0}=="8"
ATTRS{bMaxPower}=="98mA"
ATTRS{bNumConfigurations}=="1"
ATTRS{bNumInterfaces}==" 1"
ATTRS{bcdDevice}=="0262"
ATTRS{bmAttributes}=="80"
ATTRS{busnum}=="3"
ATTRS{configuration}==""
ATTRS{devnum}=="8"
ATTRS{devpath}=="1.4.3"
ATTRS{idProduct}=="7523"
ATTRS{idVendor}=="1a86"
ATTRS{ltm_capable}=="no"
ATTRS{maxchild}=="0"
ATTRS{power/active_duration}=="48902765"
ATTRS{power/autosuspend}=="2"
ATTRS{power/autosuspend_delay_ms}=="2000"
ATTRS{power/connected_duration}=="48902765"
ATTRS{power/control}=="on"
ATTRS{power/level}=="on"
ATTRS{power/persist}=="1"
ATTRS{power/runtime_active_time}=="48902599"
ATTRS{power/runtime_status}=="active"
ATTRS{power/runtime_suspended_time}=="0"
ATTRS{product}=="USB2.0-Serial"
ATTRS{quirks}=="0x0"
ATTRS{removable}=="unknown"
ATTRS{rx_lanes}=="1"
ATTRS{speed}=="12"
ATTRS{tx_lanes}=="1"
ATTRS{urbnum}=="17"
ATTRS{version}==" 1.10"
[...]
In this output, we find that the device itself (at the top) doesn’t have any useful attributes we can use for creating a systematic device name. It’s not until we’ve moved up the device hierarchy to /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3
that we find topology information (in the busnum
and devpath
attributes). This complicates matters because a udev rule only has access to attributes defined directly on matching device, so we can’t right something like:
SUBSYSTEM=="usb-serial", SYMLINK+="usbserial/$attr{busnum}/$attr{devpath}"
How do we access the attributes of a parent node in our rule?
The answer is by creating environment variables that preserve the values in which we are interested. I started with this:
SUBSYSTEMS=="usb", ENV{.USB_BUSNUM}="$attr{busnum}", ENV{.USB_DEVPATH}="$attr{devpath}"
Here, my goal was to stash the busnum
and devpath
attributes in .USB_BUSNUM
and .USB_DEVPATH
, but this didn’t work: it matches device path /devices/pci0000:00/0000:00:1c.0/0000:03:00.0/usb3/3-1/3-1.4/3-1.4.3/3-1.4.3:1.0
, which is:
KERNELS=="3-1.4.3:1.0"
SUBSYSTEMS=="usb"
DRIVERS=="ch341"
ATTRS{authorized}=="1"
ATTRS{bAlternateSetting}==" 0"
ATTRS{bInterfaceClass}=="ff"
ATTRS{bInterfaceNumber}=="00"
ATTRS{bInterfaceProtocol}=="02"
ATTRS{bInterfaceSubClass}=="01"
ATTRS{bNumEndpoints}=="03"
ATTRS{supports_autosuspend}=="1"
We need to match the next device up the chain, so we need to make our match more specific. There are a couple of different options we can pursue; the simplest is probably to take advantage of the fact that the next device up the chain has SUBSYSTEMS=="usb"
and DRIVERS="usb"
, so we could instead write:
SUBSYSTEMS=="usb", DRIVERS=="usb", ENV{.USB_BUSNUM}="$attr{busnum}", ENV{.USB_DEVPATH}="$attr{devpath}"
Alternately, we could ask for “the first device that has a busnum
attribute” like this:
SUBSYSTEMS=="usb", ATTRS{busnum}=="?*", ENV{.USB_BUSNUM}="$attr{busnum}", ENV{.USB_DEVPATH}="$attr{devpath}"
Where (from the udev(7)
man page), ?
matches any single character and *
matches zero or more characters, so this matches any device in which busnum
has a non-empty value. We can test this rule out using the udevadm test
command:
# udevadm test $(udevadm info --query=path --name=/dev/ttyUSB0)
[...]
.USB_BUSNUM=3
.USB_DEVPATH=1.4.3
[...]
This shows us that our rule is matching and setting up the appropriate variables. We can now use those in a subsequent rule to create the desired symlink:
SUBSYSTEMS=="usb", ATTRS{busnum}=="?*", ENV{.USB_BUSNUM}="$attr{busnum}", ENV{.USB_DEVPATH}="$attr{devpath}"
SUBSYSTEMS=="usb-serial", SYMLINK+="usbserial/$env{.USB_BUSNUM}/$env{.USB_DEVPATH}"
Re-running the test command, we see:
# udevadm test $(udevadm info --query=path --name=/dev/ttyUSB0)
[...]
DEVLINKS=/dev/serial/by-path/pci-0000:03:00.0-usb-0:1.4.3:1.0-port0 /dev/usbserial/3/1.4.3 /dev/serial/by-id/usb-1a86_USB2.0-Serial-if00-port0
[...]
You can see the new symlink in the DEVLINKS
value, and looking at /dev/usbserial
we can see the expected symlinks:
# tree /dev/usbserial
/dev/usbserial/
└── 3
├── 1.1 -> ../../ttyUSB1
└── 1.4.3 -> ../../ttyUSB0
And there have it. Now as long as I attach a specific device to the same USB port on my system, it will have the same device node. I’ve updated my tooling to use these paths (/dev/usbserial/3/1.4.3
) instead of the kernel names (/dev/ttyUSB0
), and it has greatly simplified things.
We'll do the following steps:
openstack server list
example:
$ openstack server list -c ID -c Name -c Status
+--------------------------------------+-------+--------+
| ID | Name | Status |
+--------------------------------------+-------+--------+
| 7e087c31-7e9c-47f7-a4c4-ebcc20034faa | foo-4 | ACTIVE |
| 977fd250-75d4-4da3-a37c-bc7649047151 | foo-2 | ACTIVE |
| a1182f44-7163-4ad9-89ed-36611f75bac7 | foo-5 | ACTIVE |
| a91951b3-ff4e-46a0-a5fd-20064f02afc9 | foo-3 | ACTIVE |
| f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f | foo-1 | ACTIVE |
+--------------------------------------+-------+--------+
We'll use f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f
. That's the uuid of the server foo-1
. For convenience, the server uuid and the resource ID used in openstack metric resource
are the same.
$ openstack metric resource show f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f
+-----------------------+-------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------+
| created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e |
| created_by_user_id | 39d9e30374a74fe8b58dee9e1dcd7382 |
| creator | 39d9e30374a74fe8b58dee9e1dcd7382:6a180caa64894f04b1d04aeed1cb920e |
| ended_at | None |
| id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| metrics | cpu: fe84c113-f98d-42ee-84fd-bdf360842bdb |
| | disk.ephemeral.size: ad79f268-5f56-4ff8-8ece-d1f170621217 |
| | disk.root.size: 6e021f8c-ead0-46e4-bd26-59131318e6a2 |
| | memory.usage: b768ec46-5e49-4d9a-b00d-004f610c152d |
| | memory: 1a4e720a-2151-4265-96cf-4daf633611b2 |
| | vcpus: 68654bc0-8275-4690-9433-27fe4a3aef9e |
| original_resource_id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| project_id | 8d077dbea6034e5aa45c0146d1feac5f |
| revision_end | None |
| revision_start | 2021-11-09T10:00:46.241527+00:00 |
| started_at | 2021-11-09T09:29:12.842149+00:00 |
| type | instance |
| user_id | 65bfeb6cc8ec4df4a3d53550f7e99a5a |
+-----------------------+-------------------------------------------------------------------+
This list shows the metrics associated with the instance.
You are done here.
$ ssh controller-0 -l root
$ podman ps --format "{{.Names}} {{.Status}}" | grep ceilometer
ceilometer_agent_central Up 2 hours ago
ceilometer_agent_notification Up 2 hours ago
On compute nodes, there should be ceilometer_agent_compute running
$ podman ps --format "{{.Names}} {{.Status}}" | grep ceilometer
ceilometer_agent_compute Up 2 hours ago
The metrics are being sent from ceilometer to a remote defined in
/var/lib/config-data/puppet-generated/ceilometer/etc/ceilometer/pipeline.yaml
, which may look similar to the following file
---
sources:
- name: meter_source
meters:
- "*"
sinks:
- meter_sink
sinks:
- name: meter_sink
publishers:
- gnocchi://?filter_project=service&archive_policy=ceilometer-high-rate
- notifier://172.17.1.40:5666/?driver=amqp&topic=metering
In this case, data is sent to both STF and Gnocchi. Next step is to check
if there are any errors happening. On controllers and computes, ceilometer
logs are found in /var/log/containers/ceilometer/
.
The agent-notification.log
shows logs from publishing data, as well as
errors if sending out metrics or logs fails for some reason.
If there are any errors in the log file, it is likely that metrics are not being delivered to the remote.
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.6/site-packages/oslo_messaging/transport.py", line 136, in _send_notification
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging retry=retry)
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 295, in wrap
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging return func(self, *args, **kws)
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging File "/usr/lib/python3.6/site-packages/oslo_messaging/_drivers/impl_amqp1.py", line 397, in send_notification
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging raise rc
2021-11-16 07:01:07.063 16 ERROR oslo_messaging.notify.messaging oslo_messaging.exceptions.MessageDeliveryFailure: Notify message sent to <Target topic=event.sample> failed: timed out
In this case, it failes to send messages to the STF instance. The following example shows the gnocchi api not responding or not being accessible
2021-11-16 10:38:07.707 16 ERROR ceilometer.publisher.gnocchi [-] <html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>
(HTTP 503): gnocchiclient.exceptions.ClientException: <html><body><h1>503 Service Unavailable</h1>
For more gnocchi debugging, see the gnocchi section.
Gnocchi sits on controller nodes and consists of three separate containers, gnocchi_metricd, gnocchi_statsd, and gnocchi_api. The latter is for the interaction with the outside world, such as ingesting metrics or returning measurements.
Gnocchi metricd are used for re-calculating metrics, downsampling for lower
granularity, etc. Gnocchi logfiles are found under /var/log/containers/gnocchi
and the gnocchi API is hooked into httpd, thus the logfiles are
stored under /var/log/containers/httpd/gnocchi-api/
. The corresponding files
there are either gnocchi_wsgi_access.log
or gnocchi_wsgi_error.log
.
In the case from above (ceilometer section), where ceilometer could not send metrics to gnocchi, one would also observe log output for the gnocchi API.
For a starter, let's see which resources there are.
openstack server list -c ID -c Name -c Status
+--------------------------------------+-------+--------+
| ID | Name | Status |
+--------------------------------------+-------+--------+
| 7e087c31-7e9c-47f7-a4c4-ebcc20034faa | foo-4 | ACTIVE |
| 977fd250-75d4-4da3-a37c-bc7649047151 | foo-2 | ACTIVE |
| a1182f44-7163-4ad9-89ed-36611f75bac7 | foo-5 | ACTIVE |
| a91951b3-ff4e-46a0-a5fd-20064f02afc9 | foo-3 | ACTIVE |
| f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f | foo-1 | ACTIVE |
+--------------------------------------+-------+--------+
To show which metrics are stored for the vm foo-1
one would use the following
command
openstack metric resource show f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f --max-width 75
+-----------------------+-------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------+
| created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e |
| created_by_user_id | 39d9e30374a74fe8b58dee9e1dcd7382 |
| creator | 39d9e30374a74fe8b58dee9e1dcd7382:6a180caa64894f |
| | 04b1d04aeed1cb920e |
| ended_at | None |
| id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| metrics | cpu: fe84c113-f98d-42ee-84fd-bdf360842bdb |
| | disk.ephemeral.size: |
| | ad79f268-5f56-4ff8-8ece-d1f170621217 |
| | disk.root.size: |
| | 6e021f8c-ead0-46e4-bd26-59131318e6a2 |
| | memory.usage: |
| | b768ec46-5e49-4d9a-b00d-004f610c152d |
| | memory: 1a4e720a-2151-4265-96cf-4daf633611b2 |
| | vcpus: 68654bc0-8275-4690-9433-27fe4a3aef9e |
| original_resource_id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| project_id | 8d077dbea6034e5aa45c0146d1feac5f |
| revision_end | None |
| revision_start | 2021-11-09T10:00:46.241527+00:00 |
| started_at | 2021-11-09T09:29:12.842149+00:00 |
| type | instance |
| user_id | 65bfeb6cc8ec4df4a3d53550f7e99a5a |
+-----------------------+-------------------------------------------------+
To view the memory usage between Nov 18 2021 17:00 UTC and 17:05 UTC, one would issue this command:
openstack metric measures show --start 2021-11-18T17:00:00 \
--stop 2021-11-18T17:05:00 \
--aggregation mean
b768ec46-5e49-4d9a-b00d-004f610c152d
+---------------------------+-------------+-------------+
| timestamp | granularity | value |
+---------------------------+-------------+-------------+
| 2021-11-18T17:00:00+00:00 | 3600.0 | 28.87890625 |
| 2021-11-18T17:00:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:01:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:02:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:03:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:04:00+00:00 | 60.0 | 28.87890625 |
| 2021-11-18T17:00:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:00:44+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:01:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:01:44+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:02:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:02:44+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:03:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:03:44+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:04:14+00:00 | 1.0 | 28.87890625 |
| 2021-11-18T17:04:44+00:00 | 1.0 | 28.87890625 |
+---------------------------+-------------+-------------+
This shows, the data is available with granularity 3600, 60 and 1 sec. The memory usage does not change over the time, that's why the values don't change. Please note, if you'd be asking for values with the granularity of 300, the result will be empty
$ openstack metric measures show --start 2021-11-18T17:00:00 \
--stop 2021-11-18T17:05:00 \
--aggregation mean \
--granularity 300
b768ec46-5e49-4d9a-b00d-004f610c152d
Aggregation method 'mean' at granularity '300.0' for metric b768ec46-5e49-4d9a-b00d-004f610c152d does not exist (HTTP 404)
More info about the metric can be actually listed by using
openstack metric show --resource-id f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f \
memory.usage \
--max-width 75
+--------------------------------+----------------------------------------+
| Field | Value |
+--------------------------------+----------------------------------------+
| archive_policy/name | ceilometer-high-rate |
| creator | 39d9e30374a74fe8b58dee9e1dcd7382:6a180 |
| | caa64894f04b1d04aeed1cb920e |
| id | b768ec46-5e49-4d9a-b00d-004f610c152d |
| name | memory.usage |
| resource/created_by_project_id | 6a180caa64894f04b1d04aeed1cb920e |
| resource/created_by_user_id | 39d9e30374a74fe8b58dee9e1dcd7382 |
| resource/creator | 39d9e30374a74fe8b58dee9e1dcd7382:6a180 |
| | caa64894f04b1d04aeed1cb920e |
| resource/ended_at | None |
| resource/id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| resource/original_resource_id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| resource/project_id | 8d077dbea6034e5aa45c0146d1feac5f |
| resource/revision_end | None |
| resource/revision_start | 2021-11-09T10:00:46.241527+00:00 |
| resource/started_at | 2021-11-09T09:29:12.842149+00:00 |
| resource/type | instance |
| resource/user_id | 65bfeb6cc8ec4df4a3d53550f7e99a5a |
| unit | MB |
+--------------------------------+----------------------------------------+
It shows in this case, the used archive policy is ceilometer-high-rate.
openstack metric archive-policy show ceilometer-high-rate --max-width 75
+---------------------+---------------------------------------------------+
| Field | Value |
+---------------------+---------------------------------------------------+
| aggregation_methods | mean, rate:mean |
| back_window | 0 |
| definition | - timespan: 1:00:00, granularity: 0:00:01, |
| | points: 3600 |
| | - timespan: 1 day, 0:00:00, granularity: 0:01:00, |
| | points: 1440 |
| | - timespan: 365 days, 0:00:00, granularity: |
| | 1:00:00, points: 8760 |
| name | ceilometer-high-rate |
+---------------------+---------------------------------------------------+
That means, in this case, the aggregation methods one could use for querying the metrics are just mean and rate:mean. Other methods could include min or max.
Alarms can be retrieved by issuing
$ openstack alarm list
To create an alarm, for example based on disk.ephemeral.size, one would use something like
openstack alarm create --alarm-action 'log://' \
--ok-action 'log://' \
--comparison-operator ge \
--evaluation-periods 1 \
--granularity 60 \
--aggregation-method mean \
--metric disk.ephemeral.size \
--resource-id f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f \
--name ephemeral \
-t gnocchi_resources_threshold \
--resource-type instance \
--threshold 1
+---------------------------+----------------------------------------+
| Field | Value |
+---------------------------+----------------------------------------+
| aggregation_method | mean |
| alarm_actions | ['log:'] |
| alarm_id | 994a1710-98e8-495f-89b5-f14349575c96 |
| comparison_operator | ge |
| description | gnocchi_resources_threshold alarm rule |
| enabled | True |
| evaluation_periods | 1 |
| granularity | 60 |
| insufficient_data_actions | [] |
| metric | disk.ephemeral.size |
| name | ephemeral |
| ok_actions | ['log:'] |
| project_id | 8d077dbea6034e5aa45c0146d1feac5f |
| repeat_actions | False |
| resource_id | f0a62c20-2304-4c8c-aaf5-f5bc9d385b5f |
| resource_type | instance |
| severity | low |
| state | insufficient data |
| state_reason | Not evaluated yet |
| state_timestamp | 2021-11-22T10:16:15.250720 |
| threshold | 1.0 |
| time_constraints | [] |
| timestamp | 2021-11-22T10:16:15.250720 |
| type | gnocchi_resources_threshold |
| user_id | 65bfeb6cc8ec4df4a3d53550f7e99a5a |
+---------------------------+----------------------------------------+
The state here insufficient data
states, the data gathered or
stored is not sufficient to compare against. There is also a state
reason given, in this case Not evaluated yet
, which gives an explanation.
Another valid reason could be No datapoint for granularity 60
.
On OpenStack installations deployed via Tripleo aka OSP Director, the log files are located
on the separate nodes under /var/log/containers/{service_name}/
. The config files for
the services are stored under /var/lib/config-data/puppet-generated/<service_name>
and are mounted into the containers.
The highlights of the broader upstream OpenStack project may be read via
The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Yoga.
If an OpenStack server (Ironic or Nova) has an error, it shows up in a nested field. That field is hard to read in its normal layout, due to JSON formatting. Using jq to strip the formatting helps a bunch
The nested field is fault.details.
The -r option strips off the quotes.
[ayoung@ayoung-home scratch]$ openstack server show oracle-server-84-aarch64-vm-small -f json | jq -r '.fault | .details'
Traceback (most recent call last):
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2437, in _build_and_run_instance
block_device_info=block_device_info)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3458, in spawn
block_device_info=block_device_info)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3831, in _create_image
fallback_from_host)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 3922, in _create_and_inject_local_root
instance, size, fallback_from_host)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/driver.py", line 9243, in _try_fetch_image_cache
trusted_certs=instance.trusted_certs)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 275, in cache
*args, **kwargs)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 642, in create_image
self.verify_base_size(base, size)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/virt/libvirt/imagebackend.py", line 331, in verify_base_size
flavor_size=size, image_size=base_size)
nova.exception.FlavorDiskSmallerThanImage: Flavor's disk is too small for requested image. Flavor disk is 21474836480 bytes, image is 34359738368 bytes.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2161, in _do_build_and_run_instance
filter_properties, request_spec)
File "/var/lib/kolla/venv/lib/python3.7/site-packages/nova/compute/manager.py", line 2525, in _build_and_run_instance
reason=e.format_message())
nova.exception.BuildAbortException: Build of instance 5281b93a-0c3c-4d38-965d-568d79abb530 aborted: Flavor's disk is too small for requested image. Flavor disk is 21474836480 bytes, image is 34359738368 bytes.
My team is running a small OpenStack cluster with reposnsibility for providing bare metal nodes via Ironic. Currently, we have a handful of nodes that are not usable. They show up as “Cleaning failed.” I’m learning how to debug this process.
The following ipmtool commands allow us to set the machine to PXE boot, remote power cycle the machine, and view what happens during the boot process.
ipmitool -H $H -U $U -I lanplus -P $P chassis power status
ipmitool -H $H -U $U -I lanplus -P $P chassis power on
ipmitool -H $H -U $U -I lanplus -P $P chassis power off
ipmitool -H $H -U $U -I lanplus -P $P chassis power cycle
ipmitool -H $H -U $U -I lanplus -P $P sol activate
ipmitool -H $H -U $U -I lanplus -P $P chassis bootdev pxe
#Set Boot Device to pxe
To tail the log and only see entries relevant to the UUID of the node I am cleaning:
tail -f /var/log/kolla/ironic/ironic-conductor.log | grep $UUID
What is the IPMI address for a node?
openstack baremetal node show fab1bcf7-a7fc-4c19-9d1d-fc4dbc4b2281 -f json | jq '.driver_info | .ipmi_address'
"10.76.97.171"
We have a script that prepares the PXE server to accept a cleaning request from a node. It performs the following three actions (don’t do these yet):
openstack baremetal node maintenance unset ${i}
openstack baremetal node manage ${i}
openstack baremetal node provide ${i}
To look at the IPM power status (and confirm that IPMI is set up right for the nodes)
for node in `openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="clean failed") | .UUID' ` ;
do
echo $node ;
METAL_IP=`openstack baremetal node show $node -f json | jq -r '.driver_info | .ipmi_address' ` ;
echo $METAL_IP ;
ipmitool -I lanplus -H $METAL_IP -L ADMINISTRATOR -U admin -R 12 -N 5 -P admin chassis power status ;
done
Yes, I did that all on one line, hence the semicolons.
A couple other one liners. This selects all active nodes and gives you their node id and ipmi IP address.
for node in `openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="active") | .UUID' ` ; do echo $node ; openstack baremetal node show $node -f json | jq -r '.driver_info | .ipmi_address' ;done
And you can swap out active with other values. For example, if you want to see what nodes are in either error or clean failed states:
openstack baremetal node list -f json | jq -r '.[] | select(."Provisioning State"=="error" or ."Provisioning State"=="manageable") | .UUID'
If I want to ensure I can PXE boot, out side of the openstack operations, in one terminal, I can track the state in a console. I like to have this running in a dedicated terminal: open the SOL.
ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN sol activateand in another, set the machine to PXE boot, then power cycle it:
ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN chassis bootdev pxe
Set Boot Device to pxe
[ayoung@ayoung-home keystone]$ ipmitool -H 10.76.97.176 -U ADMIN -I lanplus -P ADMIN chassis power cycle
Chassis Power Control: Cycle
If the Ironic server is not ready to accept the PXE request, your server will let you know with a message like this one:
>>Checking Media Presence......
>>Media Present......
>>Start PXE over IPv4 on MAC: 1C-34-DA-51-D6-C0.
PXE-E18: Server response timeout.
ERROR: Boot option loading failed
openstack baremetal node list --provision-state "clean failed" -f value -c UUID
Produces output like this:
8470e638-0085-470c-9e51-b2ed016569e1
5411e7e8-8113-42d6-a966-8cacd1554039
08c14110-88aa-4e45-b5c1-4054ac49115a
3f5f510c-a313-4e40-943a-366917ec9e44
I’ll track what is going on in the log for a specific node by running tail -f and grepping for the uuid of the node:
tail -f /var/log/kolla/ironic/ironic-conductor.log | grep 5411e7e8-8113-42d6-a966-8cacd1554039
If you run the three commands I showed above, the Ironic server should be prepared for cleaning and will accept the PXE request. I can execute these one at a time and track the state in the conductor log. If I kick off a clean, eventually, I see entries like this in the conductor log (I’m removing the time stamps and request ids for readability):
ERROR ironic.conductor.task_manager [] Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "clean failed" from state "clean wait"; target provision state is "available"
INFO ironic.conductor.utils [] Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power off by power off.
INFO ironic.drivers.modules.network.flat [] Removing ports from cleaning network for node 5411e7e8-8113-42d6-a966-8cacd1554039
INFO ironic.common.neutron [] Successfully removed node 5411e7e8-8113-42d6-a966-8cacd1554039 neutron ports.
And I can trigger this manually if a run is taking too long by running:
openstack baremetal node abort $UUID
The command to kick off the clean process is
openstack baremetal node provide $UUID
In the conductor log, that should show messages like this (again, edited for readability)
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "cleaning" from state "manageable"; target provision state is "available"
Adding cleaning network to node 5411e7e8-8113-42d6-a966-8cacd1554039
For node 5411e7e8-8113-42d6-a966-8cacd1554039 in network de931fcc-32a0-468e-8691-ffcb43bf9f2e, successfully created ports (ironic ID: neutron ID): {'94306ff5-5cd4-4fdd-a33e-a0202c34d3d0': 'd9eeb64d-468d-4a9a-82a6-e70d54b73e62'}.
Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power on by rebooting.
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "clean wait" from state "cleaning"; target provision state is "available"
At this point, the most interesting thing is to see what is happening on the node. ipmiptool sol activate provides a running log. If you are lucky, the PXE process kicks off and a debian-based kernel should start booting. My company has a specific login set for the machines:
debian login: ampere Password: Linux debian 5.10.0-6-arm64 #1 SMP Debian 5.10.28-1 (2021-04-09) aarch64After this, I use sudo -i to run as root.
$ sudo -i
...
# ps -ef | grep ironic
root 2369 1 1 14:26 ? 00:00:02 /opt/ironic-python-agent/bin/python3 /usr/local/bin/ironic-python-agent --config-dir /etc/ironic-python-agent.d/
Looking for logs:
ls /var/log/
btmp ibacm.log opensm.0x9a039bfffead6720.log private
chrony lastlog opensm.0x9a039bfffead6721.log wtmp
No ironic log. Is this thing even on the network?
# ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp1s0f0np0: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 98:03:9b:ad:67:20 brd ff:ff:ff:ff:ff:ff
3: enp1s0f1np1: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 98:03:9b:ad:67:21 brd ff:ff:ff:ff:ff:ff
4: enxda90910dd11e: mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether da:90:91:0d:d1:1e brd ff:ff:ff:ff:ff:ff
Nope. Ok, lets get it on the network:
# dhclient
[ 486.508054] mlx5_core 0000:01:00.1 enp1s0f1np1: Link down
[ 486.537116] mlx5_core 0000:01:00.1 enp1s0f1np1: Link up
[ 489.371586] mlx5_core 0000:01:00.0 enp1s0f0np0: Link down
[ 489.394050] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0f1np1: link becomes ready
[ 489.400646] mlx5_core 0000:01:00.0 enp1s0f0np0: Link up
[ 489.406226] IPv6: ADDRCONF(NETDEV_CHANGE): enp1s0f0np0: link becomes ready
root@debian:~# [ 500.596626] sr 0:0:0:0: [sr0] CDROM not ready. Make sure there is a disc in the drive.
ip a
1: lo: mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: enp1s0f0np0: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 98:03:9b:ad:67:20 brd ff:ff:ff:ff:ff:ff
inet 192.168.97.178/24 brd 192.168.97.255 scope global dynamic enp1s0f0np0
valid_lft 86386sec preferred_lft 86386sec
inet6 fe80::9a03:9bff:fead:6720/64 scope link
valid_lft forever preferred_lft forever
3: enp1s0f1np1: mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 98:03:9b:ad:67:21 brd ff:ff:ff:ff:ff:ff
inet6 fe80::9a03:9bff:fead:6721/64 scope link
valid_lft forever preferred_lft forever
4: enxda90910dd11e: mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
link/ether da:90:91:0d:d1:1e brd ff:ff:ff:ff:ff:ff
inet6 fe80::d890:91ff:fe0d:d11e/64 scope link
valid_lft forever preferred_lft forever
And…quite shortly thereafter in the conductor log:
Agent on node 5411e7e8-8113-42d6-a966-8cacd1554039 returned cleaning command success, moving to next clean step
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "cleaning" from state "clean wait"; target provision state is "available"
Executing cleaning on node 5411e7e8-8113-42d6-a966-8cacd1554039, remaining steps: []
Successfully set node 5411e7e8-8113-42d6-a966-8cacd1554039 power state to power off by power off.
Removing ports from cleaning network for node 5411e7e8-8113-42d6-a966-8cacd1554039
Successfully removed node 5411e7e8-8113-42d6-a966-8cacd1554039 neutron ports.
Node 5411e7e8-8113-42d6-a966-8cacd1554039 cleaning complete
Node 5411e7e8-8113-42d6-a966-8cacd1554039 moved to provision state "available" from state "cleaning"; target provision state is "None"
So, in our case, the issue seems to be that the IPA image does not have dhcp enabled.
This Thursday at 14:00 UTC Francesco and I will be in a panel on OpenInfra Live Episode 24: OpenStack and Ceph.
by Unknown (noreply@blogger.com) at September 27, 2021 10:19 PM
I’ve been a regular visitor to Stack Overflow and other Stack Exchange sites over the years, and while I’ve mostly enjoyed the experience, I’ve been frustrated by the lack of control I have over what questions I see. I’m not really interested in looking at questions that have already been closed, or that have a negative score, but there’s no native facility for filtering questions like this.
I finally spent the time learning just enough JavaScript to put together a pair of scripts that let me present the
questions that way I want:to hurt
myself
The sx-hide-questions script will hide:
Because I wanted it to be obvious that the script was actually doing something, hidden questions don’t just disappear; they fade out.
These behaviors (including the fading) can all be controlled individually by a set of global variables at the top of the script.
The sx-reorder-questions script will sort questions such that anything that has an answer will be at the bottom, and questions that have not yet been answered appear at the top.
If you are using the Tampermonkey extension, you should be able to click on the links to the script earlier in this post and be taken directly to the installation screen. If you’re not running Tampermonkey, than either (a) install it, or (b) you’re on your own.
You can find both of these scripts in my sx-question-filter repository.
These scripts rely on the CSS classes and layout of the Stack Exchange websites. If these change, the scripts will need updating. If you notice that something no longer works as advertised, please feel free to submit pull request with the necessary corrections!
At $JOB we maintain the configuration for our OpenShift clusters in a public git repository. Changes in the git repository are applied automatically using ArgoCD and Kustomize. This works great, but the public nature of the repository means we need to find a secure solution for managing secrets (such as passwords and other credentials necessary for authenticating to external services). In particular, we need a solution that permits our public repository to be the source of truth for our cluster configuration, without compromising our credentials.
We initially looked at including secrets directly in the repository through the use of the KSOPS plugin for Kustomize, which uses sops to encrypt secrets with GPG keys. There are some advantages to this arrangement:
There were some minor disadvantages:
And there was one major problem:
One a private key is compromised, anyone with access to that key and the git repository will be able to decrypt data in historical commits, even if we re-encrypt all the data with a new key.
Because of these security implications we decided we would need a different solution (it’s worth noting here that Bitnami Sealed Secrets suffers from effectively the same problem).
We’ve selected a solution that uses the External Secrets project in concert with the AWS SecretsManager service.
The External Secrets project allows one to store secrets in an external secrets store, such as AWS SecretsManager, Hashicorp Vault, and others 1. The manifests that get pushed into your OpenShift cluster contain only pointers (called ExternalSecrets
) to those secrets; the external secrets controller running on the cluster uses the information contained in the ExternalSecret
in combination with stored credentials to fetch the secret from your chosen backend and realize the actual Secret
resource. An external secret manifest referring to a secret named mysceret
stored in AWS SecretsManager would look something like:
apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
name: example-secret
spec:
backendType: secretsManager
data:
- key: mysecret
name: mysecretvalue
This model means that no encrypted data is ever stored in the git repository, which resolves the main problem we had with the solutions mentioned earlier.
External Secrets can be installed into your Kubernetes environment using Helm, or you can use helm template
to generate manifests locally and apply them using Kustomize or some other tool (this is the route we took).
AWS SecretsManager is a service for storing and managing secrets and making them accessible via an API. Using SecretsManager we have very granular control over who can view or modify secrets; this allows us, for example, to create cluster-specific secret readers that can only read secrets intended for a specific cluster (e.g. preventing our development environment from accidentally using production secrets).
SecretsManager provides automatic versioning of secrets to prevent loss of data if you inadvertently change a secret while still requiring the old value.
We can create secrets through the AWS SecretsManager console, or we can use the AWS CLI, which looks something like:
aws secretsmanager create-secret \
--name mysecretname \
--secret-string mysecretvalue
This combination solves a number of our problems:
Because we’re not storing actual secrets in the repository, we don’t need to worry about encrypting anything.
Because we’re not managing encrypted data, replacing secrets is much easier.
There’s a robust mechanism for controlling access to secrets.
This solution offers a separation of concern that simply wasn’t possible with the KSOPS model: someone can maintain secrets without having to know anything about Kubernetes manifests, and someone can work on the repository without needing to know any secrets.
In its simplest form, an ExternalSecret
resource maps values from specific named secrets in the backend to keys in a Secret
resource. For example, if we wanted to create a Secret
in OpenShift with the username and password for an external service, we could create to separate secrets in SecretsManager. One for the username:
aws secretsmanager create-secret \
--name cluster/cluster1/example-secret-username \
--secret-string foo
And one for the password:
aws secretsmanager create-secret \
--name cluster/cluster1/example-secret-password \
--secret-string bar \
--tags Key=cluster,Value=cluster1
And then create an ExternalSecret
manifest like this:
apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
name: example-secret
spec:
backendType: secretsManager
data:
- key: cluster/cluster1/example-secret-username
name: username
- key: cluster/cluster1/example-secret-password
name: password
This instructs the External Secrets controller to create an Opaque
secret named example-secret
from data in AWS SecretsManager. The value of the username
key will come from the secret named cluster/cluster1/example-secret-username
, and similarly for password
. The resulting Secret
resource will look something like this:
apiVersion: v1
kind: Secret
metadata:
name: example-secret
type: Opaque
data:
password: YmFy
username: Zm9v
In the previous example, we created two separate secrets in SecretsManager for storing a username and password. It might be more convenient if we could store both credentials in a single secret. Thanks to the templating support in External Secrets, we can do that!
Let’s redo the previous example, but instead of using two separate secrets, we’ll create a single secret named cluster/cluster1/example-secret
in which the secret value is a JSON document containing both the username and password:
aws secretsmanager create-secret \
--name cluster/cluster1/example-secret \
--secret-string '{"username": "foo", "password": "bar"}'
NB: The jo utility is a neat little utility for generating JSON from the command line; using that we could write the above like this…
aws secretsmanager create-secret \
--name cluster/cluster1/example-secret \
--secret-string $(jo username=foo password=bar)
…which makes it easier to write JSON without missing a quote, closing bracket, etc.
We can extract these values into the appropriate keys by adding a template
section to our ExternalSecret
, and using the JSON.parse
template function, like this:
apiVersion: "kubernetes-client.io/v1"
kind: ExternalSecret
metadata:
name: example-secret
namespace: sandbox
spec:
backendType: secretsManager
data:
- key: cluster/cluster1/example-secret
name: creds
template:
stringData:
username: "<%= JSON.parse(data.creds).username %>"
password: "<%= JSON.parse(data.creds).password %>"
The result secret will look like:
apiVersion: v1
kind: Secret
metadata:
name: example-secret
type: Opaque
data:
creds: eyJ1c2VybmFtZSI6ICJmb28iLCAicGFzc3dvcmQiOiAiYmFyIn0=
password: YmFy
username: Zm9v
Notice that in addition to the values created in the template
section, the Secret
also contains any keys defined in the data
section of the ExternalSecret
.
Templating can also be used to override the secret type if you want something other than Opaque
, add metadata, and otherwise influence the generated Secret
.
E.g. Azure Key Vault, Google Secret Manager, Alibaba Cloud KMS Secret Manager, Akeyless ↩︎
Red Hat’s OpenShift Data Foundation (formerly “OpenShift Container Storage”, or “OCS”) allows you to either (a) automatically set up a Ceph cluster as an application running on your OpenShift cluster, or (b) connect your OpenShift cluster to an externally managed Ceph cluster. While setting up Ceph as an OpenShift application is a relatively polished experienced, connecting to an external cluster still has some rough edges.
NB I am not a Ceph expert. If you read this and think I’ve made a
mistake with respect to permissions or anything else, please feel free
to leave a comment and I will update the article as necessary. In
particular, I think it may be possible to further restrict the mgr
permissions shown in this article and I’m interested in feedback on
that topic.
Regardless of which option you choose, you start by installing the “OpenShift Container Storage” operator (the name change apparently hasn’t made it to the Operator Hub yet). When you select “external mode”, you will be given the opportunity to download a Python script that you are expected to run on your Ceph cluster. This script will create some Ceph authentication principals and will emit a block of JSON data that gets pasted into the OpenShift UI to configure the external StorageCluster resource.
The script has a single required option, --rbd-data-pool-name
, that
you use to provide the name of an existing pool. If you run the script
with only that option, it will create the following ceph principals
and associated capabilities:
client.csi-rbd-provisioner
caps mgr = "allow rw"
caps mon = "profile rbd"
caps osd = "profile rbd"
client.csi-rbd-node
caps mon = "profile rbd"
caps osd = "profile rbd"
client.healthchecker
caps mgr = "allow command config"
caps mon = "allow r, allow command quorum_status, allow command version"
caps osd = "allow rwx pool=default.rgw.meta, allow r pool=.rgw.root, allow rw pool=default.rgw.control, allow rx pool=default.rgw.log, allow x pool=default.rgw.buckets.index"
This account is used to verify the health of the ceph cluster.
If you also provide the --cephfs-filesystem-name
option, the script
will also create:
client.csi-cephfs-provisioner
caps mgr = "allow rw"
caps mon = "allow r"
caps osd = "allow rw tag cephfs metadata=*"
client.csi-cephfs-node
caps mds = "allow rw"
caps mgr = "allow rw"
caps mon = "allow r"
caps osd = "allow rw tag cephfs *=*"
If you specify --rgw-endpoint
, the script will create a RGW user
named rgw-admin-ops-user
with administrative access to the default
RGW pool.
The above principals and permissions are fine if you’ve created an external Ceph cluster explicitly for the purpose of supporting a single OpenShift cluster.
In an environment where a single Ceph cluster is providing storage to multiple OpenShift clusters, and especially in an environment where administration of the Ceph and OpenShift environments are managed by different groups, the process, principals, and permissions create a number of problems.
The first and foremost is that the script provided by OCS both (a) gathers information about the Ceph environment, and (b) makes changes to that environment. If you are installing OCS on OpenShift and want to connect to a Ceph cluster over which you do not have administrative control, you may find yourself stymied when the storage administrators refuse to run your random Python script on the Ceph cluster.
Ideally, the script would be read-only, and instead of making changes to the Ceph cluster it would only validate the cluster configuration, and inform the administrator of what changes were necessary. There should be complete documentation that describes the necessary configuration scripts so that a Ceph cluster can be configured correctly without running any script, and OCS should provide something more granular than “drop a blob of JSON here” for providing the necessary configuration to OpenShift.
The second major problem is that while the script creates several
principals, it only allows you to set the name of one of them. The
script has a --run-as-user
option, which at first sounds promising,
but ultimately is of questionable use: it only allows you set the Ceph
principal used for cluster health checks.
There is no provision in the script to create separate principals for each OpenShift cluster.
Lastly, the permissions granted to the principals are too broad. For
example, the csi-rbd-node
principal has access to all RBD pools on
the cluster.
If you would like to deploy OCS in an environment where the default behavior of the configuration script is inappropriate you can work around this problem by:
Manually generating the necessary principals (with more appropriate permissions), and
Manually generating the JSON data for input into OCS
I’ve adopted the following conventions for naming storage pools and filesystems:
All resources are prefixed with the name of the cluster (represented
here by ${clustername}
).
The RBD pool is named ${clustername}-rbd
. I create it like this:
ceph osd pool create ${clustername}-rbd
ceph osd pool application enable ${clustername}-rbd rbd
The CephFS filesystem (if required) is named
${clustername}-fs
, and I create it like this:
ceph fs volume create ${clustername}-fs
In addition to the filesystem, this creates two pools:
cephfs.${clustername}-fs.meta
cephfs.${clustername}-fs.data
Assuming that you have followed the same conventions and have an RBD
pool named ${clustername}-rbd
and a CephFS filesystem named
${clustername}-fs
, the following set of ceph auth add
commands
should create an appropriate set of principals (with access limited to
just those resources that belong to the named cluster):
ceph auth add client.healthchecker-${clustername} \
mgr "allow command config" \
mon "allow r, allow command quorum_status, allow command version"
ceph auth add client.csi-rbd-provisioner-${clustername} \
mgr "allow rw" \
mon "profile rbd" \
osd "profile rbd pool=${clustername}-rbd"
ceph auth add client.csi-rbd-node-${clustername} \
mon "profile rbd" \
osd "profile rbd pool=${clustername}-rbd"
ceph auth add client.csi-cephfs-provisioner-${clustername} \
mgr "allow rw" \
mds "allow rw fsname=${clustername}-fs" \
mon "allow r fsname=${clustername}-fs" \
osd "allow rw tag cephfs metadata=${clustername}-fs"
ceph auth add client.csi-cephfs-node-${clustername} \
mgr "allow rw" \
mds "allow rw fsname=${clustername}-fs" \
mon "allow r fsname=${clustername}-fs" \
osd "allow rw tag cephfs data=${clustername}-fs"
Note that I’ve excluded the RGW permissions here; in our OpenShift environments, we typically rely on the object storage interface provided by Noobaa so I haven’t spent time investigating permissions on the RGW side.
The final step is to create the JSON blob that you paste into the OCS
installation UI. I use the following script which calls ceph -s
,
ceph mon dump
, and ceph auth get-key
to get the necessary
information from the cluster:
#!/usr/bin/python3
import argparse
import json
import subprocess
from urllib.parse import urlparse
usernames = [
'healthchecker',
'csi-rbd-node',
'csi-rbd-provisioner',
'csi-cephfs-node',
'csi-cephfs-provisioner',
]
def parse_args():
p = argparse.ArgumentParser()
p.add_argument('--use-cephfs', action='store_true', dest='use_cephfs')
p.add_argument('--no-use-cephfs', action='store_false', dest='use_cephfs')
p.add_argument('instance_name')
p.set_defaults(use_rbd=True, use_cephfs=True)
return p.parse_args()
def main():
args = parse_args()
cluster_status = json.loads(subprocess.check_output(['ceph', '-s', '-f', 'json']))
mon_status = json.loads(subprocess.check_output(['ceph', 'mon', 'dump', '-f', 'json']))
users = {}
for username in usernames:
key = subprocess.check_output(['ceph', 'auth', 'get-key', 'client.{}-{}'.format(username, args.instance_name)])
users[username] = {
'name': 'client.{}-{}'.format(username, args.instance_name),
'key': key.decode(),
}
mon_name = mon_status['mons'][0]['name']
mon_ip = [
addr for addr in
mon_status['mons'][0]['public_addrs']['addrvec']
if addr['type'] == 'v1'
][0]['addr']
prom_url = urlparse(cluster_status['mgrmap']['services']['prometheus'])
prom_ip, prom_port = prom_url.netloc.split(':')
output = [
{
"name": "rook-ceph-mon-endpoints",
"kind": "ConfigMap",
"data": {
"data": "{}={}".format(mon_name, mon_ip),
"maxMonId": "0",
"mapping": "{}"
}
},
{
"name": "rook-ceph-mon",
"kind": "Secret",
"data": {
"admin-secret": "admin-secret",
"fsid": cluster_status['fsid'],
"mon-secret": "mon-secret"
}
},
{
"name": "rook-ceph-operator-creds",
"kind": "Secret",
"data": {
"userID": users['healthchecker']['name'],
"userKey": users['healthchecker']['key'],
}
},
{
"name": "ceph-rbd",
"kind": "StorageClass",
"data": {
"pool": "{}-rbd".format(args.instance_name),
}
},
{
"name": "monitoring-endpoint",
"kind": "CephCluster",
"data": {
"MonitoringEndpoint": prom_ip,
"MonitoringPort": prom_port,
}
},
{
"name": "rook-csi-rbd-node",
"kind": "Secret",
"data": {
"userID": users['csi-rbd-node']['name'].replace('client.', ''),
"userKey": users['csi-rbd-node']['key'],
}
},
{
"name": "rook-csi-rbd-provisioner",
"kind": "Secret",
"data": {
"userID": users['csi-rbd-provisioner']['name'].replace('client.', ''),
"userKey": users['csi-rbd-provisioner']['key'],
}
}
]
if args.use_cephfs:
output.extend([
{
"name": "rook-csi-cephfs-provisioner",
"kind": "Secret",
"data": {
"adminID": users['csi-cephfs-provisioner']['name'].replace('client.', ''),
"adminKey": users['csi-cephfs-provisioner']['key'],
}
},
{
"name": "rook-csi-cephfs-node",
"kind": "Secret",
"data": {
"adminID": users['csi-cephfs-node']['name'].replace('client.', ''),
"adminKey": users['csi-cephfs-node']['key'],
}
},
{
"name": "cephfs",
"kind": "StorageClass",
"data": {
"fsName": "{}-fs".format(args.instance_name),
"pool": "cephfs.{}-fs.data".format(args.instance_name),
}
}
])
print(json.dumps(output, indent=2))
if __name__ == '__main__':
main()
If you’d prefer a strictly manual process, you can fill in the necessary values yourself. The JSON produced by the above script looks like the following, which is invalid JSON because I’ve use inline comments to mark all the values which you would need to provide:
[
{
"name": "rook-ceph-mon-endpoints",
"kind": "ConfigMap",
"data": {
# The format is <mon_name>=<mon_endpoint>, and you only need to
# provide a single mon address.
"data": "ceph0=192.168.122.140:6789",
"maxMonId": "0",
"mapping": "{}"
}
},
{
"name": "rook-ceph-mon",
"kind": "Secret",
"data": {
# Fill in the fsid of your Ceph cluster.
"fsid": "c9c32c73-dac4-4cc9-8baa-d73b96c135f4",
# Do **not** fill in these values, they are unnecessary. OCS
# does not require admin access to your Ceph cluster.
"admin-secret": "admin-secret",
"mon-secret": "mon-secret"
}
},
{
"name": "rook-ceph-operator-creds",
"kind": "Secret",
"data": {
# Fill in the name and key for your healthchecker principal.
# Note that here, unlike elsewhere in this JSON, you must
# provide the "client." prefix to the principal name.
"userID": "client.healthchecker-mycluster",
"userKey": "<key>"
}
},
{
"name": "ceph-rbd",
"kind": "StorageClass",
"data": {
# Fill in the name of your RBD pool.
"pool": "mycluster-rbd"
}
},
{
"name": "monitoring-endpoint",
"kind": "CephCluster",
"data": {
# Fill in the address and port of the Ceph cluster prometheus
# endpoint.
"MonitoringEndpoint": "192.168.122.140",
"MonitoringPort": "9283"
}
},
{
"name": "rook-csi-rbd-node",
"kind": "Secret",
"data": {
# Fill in the name and key of the csi-rbd-node principal.
"userID": "csi-rbd-node-mycluster",
"userKey": "<key>"
}
},
{
"name": "rook-csi-rbd-provisioner",
"kind": "Secret",
"data": {
# Fill in the name and key of your csi-rbd-provisioner
# principal.
"userID": "csi-rbd-provisioner-mycluster",
"userKey": "<key>"
}
},
{
"name": "rook-csi-cephfs-provisioner",
"kind": "Secret",
"data": {
# Fill in the name and key of your csi-cephfs-provisioner
# principal.
"adminID": "csi-cephfs-provisioner-mycluster",
"adminKey": "<key>"
}
},
{
"name": "rook-csi-cephfs-node",
"kind": "Secret",
"data": {
# Fill in the name and key of your csi-cephfs-node principal.
"adminID": "csi-cephfs-node-mycluster",
"adminKey": "<key>"
}
},
{
"name": "cephfs",
"kind": "StorageClass",
"data": {
# Fill in the name of your CephFS filesystem and the name of the
# associated data pool.
"fsName": "mycluster-fs",
"pool": "cephfs.mycluster-fs.data"
}
}
]
I’ve opened several bug reports to see about adressing some of these issues:
OS Migrate is a toolbox for content migration (workloads and more) between OpenStack clouds. Let’s dive into why you’d use it, some of its most notable features, and a bit of how it works.
Why move cloud content between OpenStacks? Imagine these situations:
Old cloud hardware is obsolete, you’re buying new. A new green field deployment will be easier than gradual replacement of hardware in the original cloud.
You want to make fundamental changes to your OpenStack deployment, that would be difficult or risky to perform on a cloud which is already providing service to users.
You want to upgrade to a new release of OpenStack, but you want to cut down on associated cloud-wide risk, or you can’t schedule cloud-wide control plane downtime.
You want to upgrade to a new release of OpenStack, but the cloud users should be given a choice when to stop using the old release and start using the new.
A combination of the above.
In such situations, running (at least) two clouds in parallel for a period of time is often the preferable path. And when you run parallel clouds, perhaps with the intention of decomissioning some of them eventually, a tool may come in handy to copy/migrate the content that users have created (virtual networks, routers, security groups, machines, block storage, images etc.) from one cloud to another. This is what OS Migrate is for.
Now we know OS Migrate copies/moves content from one OpenStack to another. But there is more to say. Some of the design decisions that went into OS Migrate should make it a tool of choice:
Uses standard OpenStack APIs. You don’t need to install any plugins into your clouds before using OS Migrate, and OS Migrate does not need access to the backends of your cloud (databases etc.).
Runnable with tenant privileges. For moving tenant-owned content, OS Migrate only needs tenant credentials (not administrative credentials). This naturally reduces risks associated with the migration.
If desired, cloud tenants can even use OS Migrate on their own. Cloud admins do not necessarily need to get involved.
Admin credentials are only needed when the content being migrated requires admin privileges to be created (e.g. public Glance images).
Transparent. The metadata of exported content is in human-readable YAML files. You can inspect what has been exported from the source cloud, and tweak it if necessary, before executing the import into the destination cloud.
Stateless. There is no database in OS Migrate that could get out of sync with reality. The source of migration information are the human readable YAML files. ID-to-ID mappings are not kept, entry-point resources are referred to by names.
Idempotent. In case of an issue, fix the root cause and re-run, be it export or import. OS Migrate has mechanisms against duplicit exports and duplicit imports.
Cherry-pickable. There’s no need to migrate all content with OS Migrate. Only migrate some tenants, or further scope to some of their resource types, or further limit the resource type exports/imports by a list of resource names or regular expression patterns. Use as much or as little of OS Migrate as you need.
Implemented as an Ansible collection. When learning to work with OS Migrate, most importantly you’ll be learning to work with Ansible, an automation tool used across the IT industry. If you already know Ansible, you’ll feel right at home with OS Migrate.
If you want to use OS Migrate, the best thing I can do here is point towards the OS Migrate User Documentation. If you just want to get a glimpse for now, read on.
As OS Migrate is an Ansible collection, the main mode of use is setting Ansible variables and running playbooks shipped with the collection.
Should the default playbooks not fit a particular use case, a technically savvy user could also utilize the collection’s roles and modules as building blocks to craft their own playbooks. However, as i’ve wrote above in the point about cherry-picking features, we’ve tried to make the default playbooks quite generically usable.
In OS Migrate we differentiate between two main migration types with respect to what resources we are migrating: pre-workload migration, and workload migration.
Pre-workload migration focuses on content/resources that can be copied to the destination cloud without affecting workloads in the source cloud. It can be typically done with little timing pressure, ahead of time before migrating workloads. This includes resources like tenant networks, subnets, routers, images, security groups etc.
The content is serialized as editable YAML files to the Migrator host (the machine running the Ansible playbooks), and then resources are created in the destination according to the YAML serializations.
Workload migration focuses on copying VMs and their attached Cinder volumes, and on creating floating IPs for VMs in the destination cloud. The VM migration between clouds is a “cold” migration. VMs first need to be stopped and then they are copied.
With regards to the boot disk of the VM, we support two options: either the destination VM’s boot disk is created from a Glance image, or the source VM’s boot disk snapshot is copied into the destination cloud as a Cinder volume and the destination VM is created as boot-from-volume. There is a migration parameter controlling this behavior on a per-VM basis. Additional Cinder volumes attached to the source VM are copied.
The data path for VMs and volumes is slightly different than in the pre-workload migration. Only metadata gets exported onto the Migrator host. For moving the binary data, special VMs called conversion hosts are deployed, one in the source and one in the destination. This is done for performance reasons, to allow the VMs’ and volumes’ binary data to travel directly from cloud to cloud without going through the (perhaps external) Migrator host as an intermediary.
Now that we have an overview of OS Migrate, let’s finish with some links where more info can be found:
OS Migrate Documentation is the primary source of information on OS Migrate.
OS Migrate Matrix Channel is monitored by devs for any questions you might have.
Issues on Github is the right place to report any bugs, and you can ask questions there too.
If you want to contribute (code, docs, …), see OS Migrate Developer Documentation.
Have a good day!
RDO Wallaby Released
Contributors
In this post, we’ll walk through the process of getting virtual machines on two different hosts to communicate over an overlay network created using the support for VXLAN in Open vSwitch (or OVS).
For this post, I’ll be working with two systems:
node0.ovs.virt
at address 192.168.122.107node1.ovs.virt
at address 192.168.122.174These hosts are running CentOS 8, although once we get past the package installs the instructions will be similar for other distributions.
While reading through this post, remember that unless otherwise specified we’re going to be running the indicated commands on both hosts.
Before we can get started configuring things we’ll need to install OVS
and libvirt. While libvirt
is included with the base CentOS
distribution, for OVS we’ll need to add both the EPEL repository
as well as a recent CentOS OpenStack repository (OVS is included
in the CentOS OpenStack repositories because it is required by
OpenStack’s networking service):
yum -y install epel-release centos-release-openstack-victoria
With these additional repositories enabled we can now install the required packages:
yum -y install \
libguestfs-tools-c \
libvirt \
libvirt-daemon-kvm \
openvswitch2.15 \
tcpdump \
virt-install
We need to start both the libvirtd
and openvswitch
services:
systemctl enable --now openvswitch libvirtd
This command will (a) mark the services to start automatically when the system boots and (b) immediately start the service.
When libvirt
is first installed it doesn’t have any configured
storage pools. Let’s create one in the default location,
/var/lib/libvirt/images
:
virsh pool-define-as default --type dir --target /var/lib/libvirt/images
We need to mark the pool active, and we might as well configure it to activate automatically next time the system boots:
virsh pool-start default
virsh pool-autostart default
With all the prerequisites out of the way we can finally start working
with Open vSwitch. Our first task is to create the OVS bridge that
will host our VXLAN tunnels. To create a bridge named br0
, we run:
ovs-vsctl add-br br0
We can inspect the OVS configuration by running ovs-vsctl show
,
which should output something like:
cc1e7217-e393-4e21-97c1-92324d47946d
Bridge br0
Port br0
Interface br0
type: internal
ovs_version: "2.15.1"
Let’s not forget to mark the interface “up”:
ip link set br0 up
Up until this point we’ve been running identical commands on both
node0
and node1
. In order to create our VXLAN tunnels, we need to
provide a remote endpoint for the VXLAN connection, which is going to
be “the other host”. On node0
, we run:
ovs-vsctl add-port br0 vx_node1 -- set interface vx_node1 \
type=vxlan options:remote_ip=192.168.122.174
This creates a VXLAN interface named vx_node1
(named that way
because the remote endpoint is node1
). The OVS configuration now
looks like:
cc1e7217-e393-4e21-97c1-92324d47946d
Bridge br0
Port vx_node1
Interface vx_node1
type: vxlan
options: {remote_ip="192.168.122.174"}
Port br0
Interface br0
type: internal
ovs_version: "2.15.1"
On node1
we will run:
ovs-vsctl add-port br0 vx_node0 -- set interface vx_node0 \
type=vxlan options:remote_ip=192.168.122.107
Which results in:
58451994-e0d1-4bf1-8f91-7253ddf4c016
Bridge br0
Port br0
Interface br0
type: internal
Port vx_node0
Interface vx_node0
type: vxlan
options: {remote_ip="192.168.122.107"}
ovs_version: "2.15.1"
At this point, we have a functional overlay network: anything attached
to br0
on either system will appear to share the same layer 2
network. Let’s take advantage of this to connect a pair of virtual
machines.
We’ll need a base image for our virtual machines. I’m going to use the CentOS 8 Stream image, which we can download to our storage directory like this:
curl -L -o /var/lib/libvirt/images/centos-8-stream.qcow2 \
https://cloud.centos.org/centos/8-stream/x86_64/images/CentOS-Stream-GenericCloud-8-20210210.0.x86_64.qcow2
We need to make sure libvirt
is aware of the new image:
virsh pool-refresh default
Lastly, we’ll want to set a root password on the image so that we can log in to our virtual machines:
virt-customize -a /var/lib/libvirt/images/centos-8-stream.qcow2 \
--root-password password:secret
We’re going to create a pair of virtual machines (one on each host). We’ll be creating each vm with two network interfaces:
default
network; this will
allow us to ssh
into the vm in order to configure things.To create a virtual machine on node0
named vm0.0
, run the
following command:
virt-install \
-r 3000 \
--network network=default \
--network bridge=br0,virtualport.type=openvswitch \
--os-variant centos8 \
--disk pool=default,size=10,backing_store=centos-8-stream.qcow2,backing_format=qcow2 \
--import \
--noautoconsole \
-n vm0.0
The most interesting option in the above command line is probably the one used to create the virtual disk:
--disk pool=default,size=10,backing_store=centos-8-stream.qcow2,backing_format=qcow2 \
This creates a 10GB “copy-on-write” disk that uses
centos-8-stream.qcow2
as a backing store. That means that reads will
generally come from the centos-8-stream.qcow2
image, but writes will
be stored in the new image. This makes it easy for us to quickly
create multiple virtual machines from the same base image.
On node1
we would run a similar command, although here we’re naming
the virtual machine vm1.0
:
virt-install \
-r 3000 \
--network network=default \
--network bridge=br0,virtualport.type=openvswitch \
--os-variant centos8 \
--disk pool=default,size=10,backing_store=centos-8-stream.qcow2,backing_format=qcow2 \
--import \
--noautoconsole \
-n vm1.0
On node0
, get the address of the new virtual machine on the default
network using the virsh domifaddr
command:
[root@node0 ~]# virsh domifaddr vm0.0
Name MAC address Protocol Address
-------------------------------------------------------------------------------
vnet2 52:54:00:21:6e:4f ipv4 192.168.124.83/24
Connect to the vm using ssh
:
[root@node0 ~]# ssh 192.168.124.83
root@192.168.124.83's password:
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Sat Apr 17 14:08:17 2021 from 192.168.124.1
[root@localhost ~]#
(Recall that the root
password is secret
.)
Configure interface eth1
with an address. For this post, we’ll use
the 10.0.0.0/24
range for our overlay network. To assign this vm the
address 10.0.0.10
, we can run:
ip addr add 10.0.0.10/24 dev eth1
ip link set eth1 up
We need to repeat the process for vm1.0
on node1
:
[root@node1 ~]# virsh domifaddr vm1.0
Name MAC address Protocol Address
-------------------------------------------------------------------------------
vnet0 52:54:00:e9:6e:43 ipv4 192.168.124.69/24
Connect to the vm using ssh
:
[root@node0 ~]# ssh 192.168.124.69
root@192.168.124.69's password:
Activate the web console with: systemctl enable --now cockpit.socket
Last login: Sat Apr 17 14:08:17 2021 from 192.168.124.1
[root@localhost ~]#
We’ll use address 10.0.0.11 for this system:
ip addr add 10.0.0.11/24 dev eth1
ip link set eth1 up
At this point, our setup is complete. On vm0.0
, we can connect to
vm1.1
over the overlay network. For example, we can ping the remote
host:
[root@localhost ~]# ping -c2 10.0.0.11
PING 10.0.0.11 (10.0.0.11) 56(84) bytes of data.
64 bytes from 10.0.0.11: icmp_seq=1 ttl=64 time=1.79 ms
64 bytes from 10.0.0.11: icmp_seq=2 ttl=64 time=0.719 ms
--- 10.0.0.11 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1002ms
rtt min/avg/max/mdev = 0.719/1.252/1.785/0.533 ms
Or connect to it using ssh
:
[root@localhost ~]# ssh 10.0.0.11 uptime
root@10.0.0.11's password:
14:21:33 up 1:18, 1 user, load average: 0.00, 0.00, 0.00
Using tcpdump
, we can verify that these connections are going over
the overlay network. Let’s watch for VXLAN traffic on node1
by
running the following command (VXLAN is a UDP protocol running on port
4789)
tcpdump -i eth0 -n port 4789
When we run ping -c2 10.0.0.11
on vm0.0
, we see the following:
14:23:50.312574 IP 192.168.122.107.52595 > 192.168.122.174.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.10 > 10.0.0.11: ICMP echo request, id 4915, seq 1, length 64
14:23:50.314896 IP 192.168.122.174.59510 > 192.168.122.107.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.11 > 10.0.0.10: ICMP echo reply, id 4915, seq 1, length 64
14:23:51.314080 IP 192.168.122.107.52595 > 192.168.122.174.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.10 > 10.0.0.11: ICMP echo request, id 4915, seq 2, length 64
14:23:51.314259 IP 192.168.122.174.59510 > 192.168.122.107.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.11 > 10.0.0.10: ICMP echo reply, id 4915, seq 2, length 64
In the output above, we see that each packet in the transaction
results in two lines of output from tcpdump
:
14:23:50.312574 IP 192.168.122.107.52595 > 192.168.122.174.vxlan: VXLAN, flags [I] (0x08), vni 0
IP 10.0.0.10 > 10.0.0.11: ICMP echo request, id 4915, seq 1, length 64
The first line shows the contents of the VXLAN packet, while the second lines shows the data that was encapsulated in the VXLAN packet.
We’ve achieved our goal: we have two virtual machines on two different hosts communicating over a VXLAN overlay network. If you were to do this “for real”, you would probably want to make a number of changes: for example, the network configuration we’ve applied in many cases will not persist across a reboot; handling persistent network configuration is still very distribution dependent, so I’ve left it out of this post.
collectd itself is intended as lightweight collecting agent for metrics and events. In larger infrastructure, the data is sent over the network to a central point, where data is stored and processed further.
This introduces a potential issue: what happens, if the remote endpoint to write data to is not available. The traditional network plugin uses UDP, which is by definition unreliable.
Collectd has a queue of values to be written to an output plugin, such
was write_http
or amqp1
. At the time, when metrics should be
written, collectd iterates on that queue and tries to write this data
to the endpoint. If writing was successful, the data is removed from
the queue. The little word if also hints, there is a chance that data
doesn't get removed. The question is: what happens, or what should be
done?
There is no easy answer to this. Some people tend to ignore missed
metrics, some don't. The way to address this is to cap the queue at a
given length and to remove oldest data when new comes in. The parameters
are WriteQueueLimitHigh
and WriteQueueLimitLow
. If they are unset,
the queue is not limited and will grow until memory is out. For
predictability reasons, you should set these two values to the same
number. To get the right value for this parameter, it would require a
bit of experimentation. If values are dropped, one would see that in
the log file.
When collectd is configured as part of Red Hat OpenStack Platform, the following config snippet can be used:
parameter_defaults:
ExtraConfig:
collectd::write_queue_limit_high: 100
collectd::write_queue_limit_low: 100
Another parameter can be used to limit explicitly the queue length in
case the amqp1 plugin is used for sending out data: the SendQueueLimit
parameter, which is used for the same purpose, but can differ from the
global WriteQueueLimitHigh
and WriteQueueLimitLow
.
parameter_defaults:
ExtraConfig:
collectd::plugin::amqp1::send_queue_limit: 50
In almost all cases, the issue of collectd using much memory could be tracked down to a write endpoint not being available, dropping data occasionally, etc.
Kustomize is a tool for assembling Kubernetes manifests from a collection of files. We’re making extensive use of Kustomize in the operate-first project. In order to keep secrets stored in our configuration repositories, we’re using the KSOPS plugin, which enables Kustomize to use sops to encrypt/files using GPG.
In this post, I’d like to walk through the steps necessary to get everything up and running.
We encrypt files using GPG, so the first step is making sure that you have a GPG keypair and that your public key is published where other people can find it.
GPG will be pre-installed on most Linux distributions. You can check
if it’s installed by running e.g. gpg --version
. If it’s not
installed, you will need to figure out how to install it for your
operating system.
Run the following command to create a new GPG keypair:
gpg --full-generate-key
This will step you through a series of prompts. First, select a key
type. You can just press <RETURN>
for the default:
gpg (GnuPG) 2.2.25; Copyright (C) 2020 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Please select what kind of key you want:
(1) RSA and RSA (default)
(2) DSA and Elgamal
(3) DSA (sign only)
(4) RSA (sign only)
(14) Existing key from card
Your selection?
Next, select a key size. The default is fine:
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (3072)
Requested keysize is 3072 bits
You will next need to select an expiration date for your key. The default is “key does not expire”, which is a fine choice for our purposes. If you’re interested in understanding this value in more detail, the following articles are worth reading:
Setting an expiration date will require that you periodically update the expiration date (or generate a new key).
Please specify how long the key should be valid.
0 = key does not expire
<n> = key expires in n days
<n>w = key expires in n weeks
<n>m = key expires in n months
<n>y = key expires in n years
Key is valid for? (0)
Key does not expire at all
Is this correct? (y/N) y
Now you will need to enter your identity, which consists of your name,
your email address, and a comment (which is generally left blank).
Note that you’ll need to enter o
for okay
to continue from this
prompt.
GnuPG needs to construct a user ID to identify your key.
Real name: Your Name
Email address: you@example.com
Comment:
You selected this USER-ID:
"Your Name <you@example.com>"
Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? o
Lastly, you need to enter a password. In most environments, GPG will open a new window asking you for a passphrase. After you’ve entered and confirmed the passphrase, you should see your key information on the console:
gpg: key 02E34E3304C8ADEB marked as ultimately trusted
gpg: revocation certificate stored as '/home/lars/tmp/gpgtmp/openpgp-revocs.d/9A4EB5B1F34B3041572937C002E34E3304C8ADEB.rev'
public and secret key created and signed.
pub rsa3072 2021-03-11 [SC]
9A4EB5B1F34B3041572937C002E34E3304C8ADEB
uid Your Name <you@example.com>
sub rsa3072 2021-03-11 [E]
You need to publish your GPG key so that others can find it. You’ll
need your key id, which you can get by running gpg -k --fingerprint
like this (using your email address rather than mine):
$ gpg -k --fingerprint lars@oddbit.com
The output will look like the following:
pub rsa2048/0x362D63A80853D4CF 2013-06-21 [SC]
Key fingerprint = 3E70 A502 BB52 55B6 BB8E 86BE 362D 63A8 0853 D4CF
uid [ultimate] Lars Kellogg-Stedman <lars@oddbit.com>
uid [ultimate] keybase.io/larsks <larsks@keybase.io>
sub rsa2048/0x042DF6CF74E4B84C 2013-06-21 [S] [expires: 2023-07-01]
sub rsa2048/0x426D9382DFD6A7A9 2013-06-21 [E]
sub rsa2048/0xEE1A8B9F9369CC85 2013-06-21 [A]
Look for the Key fingerprint
line, you want the value after the =
.
Use this to publish your key to keys.openpgp.org
:
gpg --keyserver keys.opengpg.org \
--send-keys '3E70 A502 BB52 55B6 BB8E 86BE 362D 63A8 0853 D4CF'
You will shortly receive an email to the address in your key asking you to approve it. Once you have approved the key, it will be published on https://keys.openpgp.org and people will be able to look it up by address or key id. For example, you can find my public key at https://keys.openpgp.org/vks/v1/by-fingerprint/3E70A502BB5255B6BB8E86BE362D63A80853D4CF.
In this section, we’ll get all the necessary tools installed on your system in order to interact with a repository using Kustomize and KSOPS.
Pre-compiled binaries of Kustomize are published on GitHub. To install the command, navigate to the current release (v4.0.5 as of this writing) and download the appropriate tarball for your system. E.g, for an x86-64 Linux environment, you would grab kustomize_v4.0.5_linux_amd64.tar.gz.
The tarball contains a single file. You need to extract this file and
place it somwhere in your $PATH
. For example, if you use your
$HOME/bin
directory, you could run:
tar -C ~/bin -xf kustomize_v4.0.5_linux_amd64.tar.gz
Or to install into /usr/local/bin
:
sudo tar -C /usr/local/bin -xf kustomize_v4.0.5_linux_amd64.tar.gz
Run kustomize
with no arguments to verify the command has been
installed correctly.
The KSOPS plugin relies on the sops command, so we need to install that first. Binary releases are published on GitHub, and the current release is v3.6.1.
Instead of a tarball, the project publishes the raw binary as well as
packages for a couple of different Linux distributions. For
consistency with the rest of this post we’re going to grab the raw
binary. We can install that into $HOME/bin
like this:
curl -o ~/bin/sops https://github.com/mozilla/sops/releases/download/v3.6.1/sops-v3.6.1.linux
chmod 755 ~/bin/sops
KSOPS is a Kustomize plugin. The kustomize
command looks for plugins
in subdirectories of $HOME/.config/kustomize/plugin
. Directories are
named after an API and plugin name. In the case of KSOPS, kustomize
will be looking for a plugin named ksops
in the
$HOME/.config/kustomize/plugin/viaduct.ai/v1/ksops/
directory.
The current release of KSOPS is v2.4.0, which is published as a tarball. We’ll start by downloading ksops_2.4.0_Linux_x86_64.tar.gz, which contains the following files:
LICENSE
README.md
ksops
To extract the ksops
command to $HOME/bin
, you can run:
mkdir -p ~/.config/kustomize/plugin/viaduct.ai/v1/ksops/
tar -C ~/.config/kustomize/plugin/viaduct.ai/v1/ksops -xf ksops_2.4.0_Linux_x86_64.tar.gz ksops
Let’s create a simple Kustomize project to make sure everything is installed and functioning.
Start by creating a new directory and changing into it:
mkdir kustomize-test
cd kustomize-test
Create a kustomization.yaml
file that looks like this:
generators:
- secret-generator.yaml
Put the following content in secret-generator.yaml
:
---
apiVersion: viaduct.ai/v1
kind: ksops
metadata:
name: secret-generator
files:
- example-secret.enc.yaml
This instructs Kustomize to use the KSOPS plugin to generate content
from the file example-secret.enc.yaml
.
Configure sops
to use your GPG key by default by creating a
.sops.yaml
(note the leading dot) similar to the following (you’ll
need to put your GPG key fingerprint in the right place):
creation_rules:
- encrypted_regex: "^(users|data|stringData)$"
pgp: <YOUR KEY FINGERPRINT HERE>
The encrypted_regex
line tells sops
which attributes in your YAML
files should be encrypted. The pgp
line is a (comma delimited) list
of keys to which data will be encrypted.
Now, edit the file example-secret.enc.yaml
using the sops
command.
Run:
sops example-secret.enc.yaml
This will open up an editor with some default content. Replace the content with the following:
apiVersion: v1
kind: Secret
metadata:
name: example-secret
type: Opaque
stringData:
message: this is a test
Save the file and exit your editor. Now examine the file; you will see that it contains a mix of encrypted and unencrypted content. When encrypted with my private key, it looks like this:
$ cat example-secret.enc.yaml
{
"data": "ENC[AES256_GCM,data:wZvEylsvhfU29nfFW1PbGqyk82x8+Vm/3p2Y89B8a1A26wa5iUTr1hEjDYrQIGQq4rvDyK4Bevxb/PrTzdOoTrYIhaerEWk13g9UrteLoaW0FpfGv9bqk0c12OwTrzS+5qCW2mIlfzQpMH5+7xxeruUXO7w=,iv:H4i1/Znp6WXrMmmP9YVkz+xKOX0XBH7kPFaa36DtTxs=,tag:bZhSzkM74wqayo7McV/VNQ==,type:str]",
"sops": {
"kms": null,
"gcp_kms": null,
"azure_kv": null,
"hc_vault": null,
"lastmodified": "2021-03-12T03:11:46Z",
"mac": "ENC[AES256_GCM,data:2NrsF6iLA3zHeupD314Clg/WyBA8mwCn5SHHI5P9tsOt6472Tevdamv6ARD+xqfrSVWz+Wy4PtWPoeqZrFJwnL/qCR4sdjt/CRzLmcBistUeAnlqoWIwbtMxBqaFg9GxTd7f5q0iHr9QNWGSVV3JMeZZ1jeWyeQohAPpPufsuPQ=,iv:FJvZz8SV+xsy4MC1W9z1Vn0s4Dzw9Gya4v+rSpwZLrw=,tag:pfW8r5856c7qetCNgXMyeA==,type:str]",
"pgp": [
{
"created_at": "2021-03-12T03:11:45Z",
"enc": "-----BEGIN PGP MESSAGE-----\n\nwcBMA0Jtk4Lf1qepAQgAGKwk6zDMPUYbUscky07v/7r3fsws3pTVRMgpEdhTra6x\nDxiMaLnjTKJi9fsB7sQuh/PTGWhXGuHtHg0YBtxRkuZY0Kl6xKXTXGBIBhI/Ahgw\n4BSz/rE7gbz1h6X4EFml3e1NeUTvGntA3HjY0o42YN9uwsi9wvMbiR4OLQfwY1gG\np9/v57KJx5ipEKSgt+81KwzOhuW79ttXd2Tvi9rjuAfvmLBU9q/YKMT8miuNhjet\nktNwXNJNpglHJta431YUhPZ6q41LpgvQPMX4bIZm7i7NuR470njYLQPe7xiGqqeT\nBcuF7KkNXGcDu9/RnIyxK4W5Bo9NEa06TqUGTHLEENLgAeSzHdQdUwx/pLLD6OPa\nv/U34YJU4JngqOGqTuDu4orgwLDg++XysBwVsmFp1t/nHvTkwj57wAuxJ4/It/9l\narvRHlCx6uA05IXukmCTvYMPRV3kY/81B+biHcka7uFUOQA=\n=x+7S\n-----END PGP MESSAGE-----",
"fp": "3E70A502BB5255B6BB8E86BE362D63A80853D4CF"
}
],
"encrypted_regex": "^(users|data|stringData)$",
"version": "3.6.1"
}
}
Finally, attempt to render the project with Kustomize by running:
kustomize build --enable-alpha-plugins
This should produce on stdout the unencrypted content of your secret:
apiVersion: v1
kind: Secret
metadata:
name: example-secret
type: Opaque
stringData:
message: this is a test
I sometimes find myself writing articles or documentation about git, so I put together a couple of terrible hacks for generating reproducible histories and pretty graphs of those histories.
The git synth
command reads a YAML description of a
repository and executes the necessary commands to reproduce that
history. It allows you set the name and email address of the author
and committer as well as static date, so you every time you generate
the repository you can identical commit ids.
The git dot
command generates a representation of a repository
history in the dot language, and uses Graphviz to render those
into diagrams.
For example, the following history specification:
<!-- include examplerepo.yml -->
When applied with git synth
:
$ git synth -r examplerepo examplerepo.yml
Will generate the following repository:
$ git -C examplerepo log --graph --all --decorate --oneline
* 28f7b38 (HEAD -> master) H
| * 93e1d18 (topic2) G
| * 3ef811d F
| * 973437c (topic1) E
| * 2c0bd1c D
|/
* cabdedf C
* a5cbd99 B
* d98f949 A
We can run this git dot
command line:
$ git -C examplerepo dot -m -g branch --rankdir=RL
To produce the following dot
description of the history:
<!-- include examplerepo.dot -->
Running that through the dot
utility (dot -Tsvg -o repo.svg repo.dot
) results in the following diagram:
<!-- include examplerepo.dot -->
Both tools live in my git-snippets repository, which is a motley
collection of shells scripts, python programs, and other utilities for
interacting with git
.
It’s all undocumented and uninstallable, but if there’s interest in either of these tools I can probably find the time to polish them up a bit.
This is just a note that I’ve substantially changed how the post sources are organized. I’ve tried to ensure that I preserve all the existing links, but if you spot something missing please feel free to leave a comment on this post.
While working on a pull request I will make liberal use of git
rebase to clean up a series of commits: squashing typos,
re-ordering changes for logical clarity, and so forth. But there are
some times when all I want to do is change a commit message somewhere
down the stack, and I was wondering if I had any options for doing
that without reaching for git rebase
.
It turns out the answer is “yes”, as long as you have a linear history.
Let’s assume we have a git history that looks like this:
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ 4be811 │ ◀── │ 519636 │ ◀── │ 38f6fe │ ◀── │ 2951ec │ ◀╴╴ │ master │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
The corresponding git log
looks like:
commit 2951ec3f54205580979d63614ef2751b61102c5d
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Add detailed, high quality documentation
commit 38f6fe61ffd444f601ac01ecafcd524487c83394
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Fixed bug that would erroneously call rm -rf
commit 51963667037ceb79aff8c772a009a5fbe4b8d7d9
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
A very interesting change
commit 4be8115640821df1565c421d8ed848bad34666e5
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
The beginning of time
We would like to modify the message on commit 519636
.
We start by extracting the commit
object for that commit using git cat-file
:
$ git cat-file -p 519636
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent 4be8115640821df1565c421d8ed848bad34666e5
author Alice User <alice@example.com> 978325200 -0500
committer Alice User <alice@example.com> 978325200 -0500
A very interesting change
We want to produce a commit object that is identical except for an
updated commit message. That sounds like a job for sed
! We can strip
the existing message out like this:
git cat-file -p 519636 | sed '/^$/q'
And we can append a new commit message with the power of cat
:
git cat-file -p 519636 | sed '/^$/q'; cat <<EOF
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
EOF
This will give us:
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
parent 4be8115640821df1565c421d8ed848bad34666e5
author Alice User <alice@example.com> 978325200 -0500
committer Alice User <alice@example.com> 978325200 -0500
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
We need to take this modified commit and store it back into the git
object database. We do that using the git hash-object
command:
(git cat-file -p 519636 | sed '/^$/q'; cat <<EOF) | git hash-object -t commit --stdin -w
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
EOF
The -t commit
argument instructs hash-object
to create a new
commit object. The --stdin
argument instructs hash-object
to read
input from stdin
, while the -w
argument instructs hash-object
to
write a new object to the object database, rather than just
calculating the hash and printing it for us.
This will print the hash of the new object on stdout. We can wrap
everything in a $(...)
expression to capture the output:
newref=$(
(git cat-file -p 519636 | sed '/^$/q'; cat <<EOF) | git hash-object -t commit --stdin -w
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
EOF
)
At this point we have successfully created a new commit, but it isn’t
reachable from anywhere. If we were to run git log
at this point,
everything would look the same as when we started. We need to walk
back up the tree, starting with the immediate descendant of our target
commit, replacing parent pointers as we go along.
The first thing we need is a list of revisions from our target commit
up to the current HEAD
. We can get that with git rev-list
:
$ git rev-list 519636..HEAD
2951ec3f54205580979d63614ef2751b61102c5d
38f6fe61ffd444f601ac01ecafcd524487c83394
We’ll process these in reverse order, so first we modify 38f6fe
:
oldref=51963667037ceb79aff8c772a009a5fbe4b8d7d9
newref=$(git cat-file -p 38f6fe61ffd444f601ac01ecafcd524487c83394 |
sed "s/parent $oldref/parent $newref/" |
git hash-object -t commit --stdin -w)
And then repeat that for the next commit up the tree:
oldref=38f6fe61ffd444f601ac01ecafcd524487c83394
newref=$(git cat-file -p 2951ec3f54205580979d63614ef2751b61102c5d |
sed "s/parent $oldref/parent $newref/" |
git hash-object -t commit --stdin -w)
We’ve now replaced all the descendants of the modified commit…but
git log
would still show us the old history. The last thing we
need to do is update the branch point to point at the top of the
modified tree. We do that using the git update-ref
command. Assuming
we’re on the master
branch, the command would look like this:
git update-ref refs/heads/master $newref
And at this point, running git log
show us our modified commit in
all its glory:
commit 365bc25ee1fe365d5d63d2248b77196d95d9573a
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Add detailed, high quality documentation
commit 09d6203a2b64c201dde12af7ef5a349e1ae790d7
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Fixed bug that would erroneously call rm -rf
commit fb01f35c38691eafbf44e9ee86824b594d036ba4
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
commit 4be8115640821df1565c421d8ed848bad34666e5
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
The beginning of time
Giving us a modified history that looks like:
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│ 4be811 │ ◀── │ fb01f3 │ ◀── │ 09d620 │ ◀── │ 365bc2 │ ◀╴╴ │ master │
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
Now, that was a lot of manual work. Let’s try to automate the process.
#!/bin/sh
# get the current branch name
branch=$(git rev-parse --symbolic-full-name HEAD)
# git the full commit id of our target commit (this allows us to
# specify the target as a short commit id, or as something like
# `HEAD~3` or `:/interesting`.
oldref=$(git rev-parse "$1")
# generate a replacement commit object, reading the new commit message
# from stdin.
newref=$(
(git cat-file -p $oldref | sed '/^$/q'; cat) | tee newref.txt | git hash-object -t commit --stdin -w
)
# iterate over commits between our target commit and HEAD in
# reverse order, replacing parent points with updated commit objects
for rev in $(git rev-list --reverse ${oldref}..HEAD); do
newref=$(git cat-file -p $rev |
sed "s/parent $oldref/parent $newref/" |
git hash-object -t commit --stdin -w)
oldref=$rev
done
# update the branch pointer to the head of the modified tree
git update-ref $branch $newref
If we place the above script in editmsg.sh
and restore our original
revision history, we can run:
sh editmsg.sh :/interesting <<EOF
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
EOF
And end up with a new history identical to the one we created manually:
commit 365bc25ee1fe365d5d63d2248b77196d95d9573a
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Add detailed, high quality documentation
commit 09d6203a2b64c201dde12af7ef5a349e1ae790d7
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
Fixed bug that would erroneously call rm -rf
commit fb01f35c38691eafbf44e9ee86824b594d036ba4
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
A very interesting change
Completely refactor the widget implementation to prevent
a tear in the time/space continuum when given invalid
input.
commit 4be8115640821df1565c421d8ed848bad34666e5
Author: Alice User <alice@example.com>
Date: Mon Jan 1 00:00:00 2001 -0500
The beginning of time
The above script is intentionally simple. If you’re interesting in doing something like this in practice, you should be aware of the following:
It’s possible to check for all of these conditions in our script, but I’m leaving that as an exercise for the reader.
OpenShift Container Storage (OCS) from Red Hat deploys Ceph in your OpenShift cluster (or allows you to integrate with an external Ceph cluster). In addition to the file- and block- based volume services provided by Ceph, OCS includes two S3-api compatible object storage implementations.
The first option is the Ceph Object Gateway (radosgw), Ceph’s native object storage interface. The second option called the “Multicloud Object Gateway”, which is in fact a piece of software named Noobaa, a storage abstraction layer that was acquired by Red Hat in 2018. In this article I’d like to demonstrate how to take advantage of these storage options.
The storage we interact with regularly on our local computers is block storage: data is stored as a collection of blocks on some sort of storage device. Additional layers – such as a filesystem driver – are responsible for assembling those blocks into something useful.
Object storage, on the other hand, manages data as objects: a single unit of data and associated metadata (such as access policies). An object is identified by some sort of unique id. Object storage generally provides an API that is largely independent of the physical storage layer; data may live on a variety of devices attached to a variety of systems, and you don’t need to know any of those details in order to access the data.
The most well known example of object storage service Amazon’s S3 service (“Simple Storage Service”), first introduced in 2006. The S3 API has become a de-facto standard for object storage implementations. The two services we’ll be discussing in this article provide S3-compatible APIs.
The fundamental unit of object storage is called a “bucket”.
Creating a bucket with OCS works a bit like creating a persistent
volume, although instead of starting with a PersistentVolumeClaim
you instead start with an ObjectBucketClaim
("OBC
"). An OBC
looks something like this when using RGW:
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
name: example-rgw
spec:
generateBucketName: example-rgw
storageClassName: ocs-storagecluster-ceph-rgw
Or like this when using Noobaa (note the different value for
storageClassName
):
apiVersion: objectbucket.io/v1alpha1
kind: ObjectBucketClaim
metadata:
name: example-noobaa
spec:
generateBucketName: example-noobaa
storageClassName: openshift-storage.noobaa.io
With OCS 4.5, your out-of-the-box choices for storageClassName
will be
ocs-storagecluster-ceph-rgw
, if you choose to use Ceph Radosgw, or
openshift-storage.noobaa.io
, if you choose to use the Noobaa S3 endpoint.
Before we continue, I’m going to go ahead and create these resources
in my OpenShift environment. To do so, I’m going to use Kustomize
to deploy the resources described in the following kustomization.yml
file:
namespace: oddbit-ocs-example
resources:
- obc-noobaa.yml
- obc-rgw.yml
Running kustomize build | oc apply -f-
from the directory containing
this file populates the specified namespace with the two
ObjectBucketClaims
mentioned above:
$ kustomize build | oc apply -f-
objectbucketclaim.objectbucket.io/example-noobaa created
objectbucketclaim.objectbucket.io/example-rgw created
Verifying that things seem healthy:
$ oc get objectbucketclaim
NAME STORAGE-CLASS PHASE AGE
example-noobaa openshift-storage.noobaa.io Bound 2m59s
example-rgw ocs-storagecluster-ceph-rgw Bound 2m59s
Each ObjectBucketClaim
will result in a OpenShift creating a new
ObjectBucket
resource (which, like PersistentVolume
resources, are
not namespaced). The ObjectBucket
resource will be named
obc-<namespace-name>-<objectbucketclaim-name>
.
$ oc get objectbucket obc-oddbit-ocs-example-example-rgw obc-oddbit-ocs-example-example-noobaa
NAME STORAGE-CLASS CLAIM-NAMESPACE CLAIM-NAME RECLAIM-POLICY PHASE AGE
obc-oddbit-ocs-example-example-rgw ocs-storagecluster-ceph-rgw oddbit-ocs-example example-rgw Delete Bound 67m
obc-oddbit-ocs-example-example-noobaa openshift-storage.noobaa.io oddbit-ocs-example example-noobaa Delete Bound 67m
Each ObjectBucket
resource corresponds to a bucket in the selected
object storage backend.
Because buckets exist in a flat namespace, the OCS documentation
recommends always using generateName
in the claim, rather than
explicitly setting bucketName
, in order to avoid unexpected
conflicts. This means that the generated buckets will have a named
prefixed by the value in generateName
, followed by a random string:
$ oc get objectbucketclaim example-rgw -o jsonpath='{.spec.bucketName}'
example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661
$ oc get objectbucketclaim example-noobaa -o jsonpath='{.spec.bucketName}'
example-noobaa-2e087028-b3a4-475b-ae83-a4fa80d9e3ef
Along with the bucket itself, OpenShift will create a Secret
and a
ConfigMap
resource – named after your OBC
– with the metadata
necessary to access the bucket.
The Secret
contains AWS-style credentials for authenticating to the
S3 API:
$ oc get secret example-rgw -o yaml | oc neat
apiVersion: v1
data:
AWS_ACCESS_KEY_ID: ...
AWS_SECRET_ACCESS_KEY: ...
kind: Secret
metadata:
labels:
bucket-provisioner: openshift-storage.ceph.rook.io-bucket
name: example-rgw
namespace: oddbit-ocs-example
type: Opaque
(I’m using the neat filter here to remove extraneous metadata that OpenShift returns when you request a resource.)
The ConfigMap
contains a number of keys that provide you (or your code)
with the information necessary to access the bucket. For the RGW
bucket:
$ oc get configmap example-rgw -o yaml | oc neat
apiVersion: v1
data:
BUCKET_HOST: rook-ceph-rgw-ocs-storagecluster-cephobjectstore.openshift-storage.svc.cluster.local
BUCKET_NAME: example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661
BUCKET_PORT: "80"
BUCKET_REGION: us-east-1
kind: ConfigMap
metadata:
labels:
bucket-provisioner: openshift-storage.ceph.rook.io-bucket
name: example-rgw
namespace: oddbit-ocs-example
And for the Noobaa bucket:
$ oc get configmap example-noobaa -o yaml | oc neat
apiVersion: v1
data:
BUCKET_HOST: s3.openshift-storage.svc
BUCKET_NAME: example-noobaa-2e087028-b3a4-475b-ae83-a4fa80d9e3ef
BUCKET_PORT: "443"
kind: ConfigMap
metadata:
labels:
app: noobaa
bucket-provisioner: openshift-storage.noobaa.io-obc
noobaa-domain: openshift-storage.noobaa.io
name: example-noobaa
namespace: oddbit-ocs-example
Note that BUCKET_HOST
contains the internal S3 API endpoint. You won’t be
able to reach this from outside the cluster. We’ll tackle that in just a
bit.
The easiest way to expose the credentials in a pod is to map the keys
from both the ConfigMap
and Secret
as environment variables using
the envFrom
directive, like this:
apiVersion: v1
kind: Pod
metadata:
name: bucket-example
spec:
containers:
- image: myimage
env:
- name: AWS_CA_BUNDLE
value: /run/secrets/kubernetes.io/serviceaccount/service-ca.crt
envFrom:
- configMapRef:
name: example-rgw
- secretRef:
name: example-rgw
[...]
Note that we’re also setting AWS_CA_BUNDLE
here, which you’ll need
if the internal endpoint referenced by $BUCKET_HOST
is using SSL.
Inside the pod, we can run, for example, aws
commands as long as we
provide an appropriate s3 endpoint. We can inspect the value of
BUCKET_PORT
to determine if we need http
or https
:
$ [ "$BUCKET_PORT" = 80 ] && schema=http || schema=https
$ aws s3 --endpoint $schema://$BUCKET_HOST ls
2021-02-10 04:30:31 example-rgw-8710aa46-a47a-4a8b-8edd-7dabb7d55469
Python’s boto3
module can also make use of the same environment
variables:
>>> import boto3
>>> import os
>>> bucket_host = os.environ['BUCKET_HOST']
>>> schema = 'http' if os.environ['BUCKET_PORT'] == '80' else 'https'
>>> s3 = boto3.client('s3', endpoint_url=f'{schema}://{bucket_host}')
>>> s3.list_buckets()['Buckets']
[{'Name': 'example-noobaa-...', 'CreationDate': datetime.datetime(...)}]
External access to services in OpenShift is often managed via
routes. If you look at the routes available in your
openshift-storage
namespace, you’ll find the following:
$ oc -n openshift-storage get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
noobaa-mgmt noobaa-mgmt-openshift-storage.apps.example.com noobaa-mgmt mgmt-https reencrypt None
s3 s3-openshift-storage.apps.example.com s3 s3-https reencrypt None
The s3
route provides external access to your Noobaa S3 endpoint.
You’ll note that in the list above there is no route registered for
radosgw1. There is a service registered for Radosgw named
rook-ceph-rgw-ocs-storagecluster-cephobjectstore
, so we
can expose that service to create an external route by running
something like:
oc create route edge rgw --service rook-ceph-rgw-ocs-storagecluster-cephobjectstore
This will create a route with “edge” encryption (TLS termination is handled by the default ingress router):
$ oc -n openshift storage get route
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
noobaa-mgmt noobaa-mgmt-openshift-storage.apps.example.com noobaa-mgmt mgmt-https reencrypt None
rgw rgw-openshift-storage.apps.example.com rook-ceph-rgw-ocs-storagecluster-cephobjectstore http edge None
s3 s3-openshift-storage.apps.example.com s3 s3-https reencrypt None
Once we know the Route
to our S3 endpoint, we can use the
information in the Secret
and ConfigMap
created for us when we
provisioned the storage. We just need to replace the BUCKET_HOST
with the hostname in the route, and we need to use SSL over port 443
regardless of what BUCKET_PORT
tells us.
We can extract the values into variables using something like the
following shell script, which takes care of getting the appropriate
route from the openshift-storage
namespace, base64-decoding the values
in the Secret
, and replacing the BUCKET_HOST
value:
#!/bin/sh
bucket_host=$(oc get configmap $1 -o json | jq -r .data.BUCKET_HOST)
service_name=$(cut -f1 -d. <<<$bucket_host)
service_ns=$(cut -f2 -d. <<<$bucket_host)
# get the externally visible hostname provided by the route
public_bucket_host=$(
oc -n $service_ns get route -o json |
jq -r '.items[]|select(.spec.to.name=="'"$service_name"'")|.spec.host'
)
# dump configmap and secret as shell variables, replacing the
# value of BUCKET_HOST in the process.
(
oc get configmap $1 -o json |
jq -r '.data as $data|.data|keys[]|"\(.)=\($data[.])"'
oc get secret $1 -o json |
jq -r '.data as $data|.data|keys[]|"\(.)=\($data[.]|@base64d)"'
) | sed -e 's/^/export /' -e '/BUCKET_HOST/ s/=.*/='$public_bucket_host'/'
If we call the script getenv.sh
and run it like this:
$ sh getenv.sh example-rgw
It will produce output like this:
export BUCKET_HOST="s3-openshift-storage.apps.cnv.massopen.cloud"
export BUCKET_NAME="example-noobaa-2e1bca2f-ff49-431a-99b8-d7d63a8168b0"
export BUCKET_PORT="443"
export BUCKET_REGION=""
export BUCKET_SUBREGION=""
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
We could accomplish something similar in Python with the following, which shows how to use the OpenShift dynamic client to interact with OpenShift:
import argparse
import base64
import kubernetes
import openshift.dynamic
def parse_args():
p = argparse.ArgumentParser()
p.add_argument('-n', '--namespace', required=True)
p.add_argument('obcname')
return p.parse_args()
args = parse_args()
k8s_client = kubernetes.config.new_client_from_config()
dyn_client = openshift.dynamic.DynamicClient(k8s_client)
v1_configmap = dyn_client.resources.get(api_version='v1', kind='ConfigMap')
v1_secret = dyn_client.resources.get(api_version='v1', kind='Secret')
v1_service = dyn_client.resources.get(api_version='v1', kind='Service')
v1_route = dyn_client.resources.get(api_version='route.openshift.io/v1', kind='Route')
configmap = v1_configmap.get(name=args.obcname, namespace=args.namespace)
secret = v1_secret.get(name=args.obcname, namespace=args.namespace)
env = dict(configmap.data)
env.update({k: base64.b64decode(v).decode() for k, v in secret.data.items()})
svc_name, svc_ns = env['BUCKET_HOST'].split('.')[:2]
routes = v1_route.get(namespace=svc_ns)
for route in routes.items:
if route.spec.to.name == svc_name:
break
env['BUCKET_PORT'] = 443
env['BUCKET_HOST'] = route['spec']['host']
for k, v in env.items():
print(f'export {k}="{v}"')
If we run it like this:
python genenv.py -n oddbit-ocs-example example-noobaa
It will produce output largely identical to what we saw above with the shell script.
If we load those variables into the environment:
$ eval $(sh getenv.sh example-rgw)
We can perform the same operations we executed earlier from inside the pod:
$ aws s3 --endpoint https://$BUCKET_HOST ls
2021-02-10 14:34:12 example-rgw-425d7193-ae3a-41d9-98e3-9d07b82c9661
note that this may have changed in the recent OCS 4.6 release ↩︎
Performance of the primary PyPi service has been so bad lately that it’s become very disruptive. Tasks that used to take a few seconds will now churn along for 15-20 minutes or longer before completing, which is incredibly frustrating.
I first went looking to see if there was a PyPi mirror infrastructure, like we see with CPAN for Perl or CTAN for Tex (and similarly for most Linux distributions). There is apparently no such beast,
I didn’t really want to set up a PyPi mirror locally, since the number of packages I actually use is small vs. the number of packages available. I figured there must be some sort of caching proxy available that would act as a shim between me and PyPi, fetching packages from PyPi and caching them if they weren’t already available locally.
I was previously aware of Artifactory, which I suspected (and confirmed) was capable of this, but while looking around I came across DevPi, which unlike Artifactory is written exclusively for managing Python packages. DevPi itself is hosted on PyPi, and the documentation made things look easy to configure.
After reading through their Quickstart: running a pypi mirror on your laptop documentation, I built a containerized service that would be easy for me to run on my desktop, laptop, work computer, etc. You can find the complete configuration at https://github.com/oddbit-dot-com/docker-devpi-server.
I started with the following Dockerfile
(note I’m using
podman rather than Docker as my container runtime, but the
resulting image will work fine for either environment):
FROM python:3.9
RUN pip install devpi-server devpi-web
WORKDIR /root
VOLUME /root/.devpi
COPY docker-entrypoint.sh /docker-entrypoint.sh
ENTRYPOINT ["sh", "/docker-entrypoint.sh"]
CMD ["devpi-server", "--host", "0.0.0.0"]
This installs both devpi-server
, which provides the basic caching
for pip install
, as well as devpi-web
, which provides support for
pip search
.
To ensure that things are initialized correctly when the container
start up, I’ve set the ENYTRYPOINT
to the following script:
#!/bin/sh
if ! [ -f /root/.devpi/server ]; then
devpi-init
fi
exec "$@"
This will run devpi-init
if the target directory hasn’t already been
initialized.
The repository includes a GitHub workflow that builds a new image on each commit
and pushes the result to the oddbit/devpi-server
repository on
Docker Hub.
Once the image was available on Docker Hub, I created the following systemd unit to run the service locally:
[Service]
Restart=on-failure
ExecStartPre=/usr/bin/rm -f %t/%n-pid
ExecStart=/usr/bin/podman run --replace \
--conmon-pidfile %t/%n-pid --cgroups=no-conmon \
--name %n -d -p 127.0.0.1:3141:3141 \
-v devpi:/root/.devpi oddbit/devpi-server
ExecStopPost=/usr/bin/rm -f %t/%n-pid
PIDFile=%t/%n-pid
Type=forking
[Install]
WantedBy=multi-user.target default.target
There are a couple items of note in this unitfile:
The service is exposed only on localhost
using -p 127.0.0.1:3141:3141
. I don’t want this service exposed on
externally visible addresses since I haven’t bothered setting up any
sort of authentication.
The service mounts a named volume for use by devpi-server
via the
-v devpi:/root/.devpi
command line option.
This unit file gets installed into
~/.config/systemd/user/devpi.service
. Running systemctl --user enable --now devpi.service
both enables the service to start at boot
and actually starts it up immediately.
With the service running, the last thing to do is configure pip
to
utilize it. The following configuration, placed in
~/.config/pip/pip.conf
, does the trick:
[install]
index-url = http://localhost:3141/root/pypi/+simple/
[search]
index = http://localhost:3141/root/pypi/
Now both pip install
and pip search
hit the local cache instead of
the upstream PyPi server, and things are generally much, much faster.
Poetry respects the pip
configuration and will Just Work.
Pipenv does not respect the pip configuration [1,
2], so you will
need to set the PIPENV_PYPI_MIRROR
environment variable. E.g:
export PIPENV_PYPI_MIRROR=http://localhost:3141/root/pypi/+simple/
The SYM-1 is a 6502-based single-board computer produced by Synertek Systems Corp in the mid 1970’s. I’ve had one floating around in a box for many, many years, and after a recent foray into the world of 6502 assembly language programming I decided to pull it out, dust it off, and see if it still works.
The board I have has a whopping 8KB of memory, and in addition to the standard SUPERMON monitor it has the expansion ROMs for the Synertek BASIC interpreter (yet another Microsoft BASIC) and RAE (the “Resident Assembler Editor”). One interacts with the board either through the onboard hex keypad and six-digit display, or via a serial connection at 4800bps (or lower).
[If you’re interested in Microsoft BASIC, the mist64/msbasic repository on GitHub is a trove of information, containing the source for multiple versions of Microsoft BASIC including the Synertek version.]
Fiddling around with the BASIC interpreter and the onboard assembler was fun, but I wanted to use a real editor for writing source files, assemble them on my Linux system, and then transfer the compiled binary to the SYM-1. The first two tasks are easy; there are lots of editors and there are a variety of 6502 assemblers that will run under Linux. I’m partial to ca65, part of the cc65 project (which is an incredible project that implements a C compiler that cross-compiles C for 6502 processors). But what’s the best way to get compiled code over to the SYM-1?
That’s where symtool comes in. Symtool runs on your host and talks to the SUPERMON monitor on the SYM-1 over a serial connection. It allows you to view registers, dump and load memory, fill memory, and execute code.
Symtool needs to know to what serial device your SYM-1 is attached.
You can specify this using the -d <device>
command line option, but
this quickly gets old. To save typing, you can instead set the
SYMTOOL_DEVICE
environment variable:
$ export SYMTOOL_DEVICE=/dev/ttyUSB0
$ symtool load ...
$ symtool dump ...
The baud rate defaults to 4800bps. If for some reason you want to use
a slower speed (maybe you’d like to relive the good old days of 300bps
modems), you can use the -s
command line option or the
SYMTOOL_SPEED
environment variable.
After compiling your code (I’ve included the examples from the SYM-1
Technical Notes in the repository), use the load
command to
load the code into the memory of the SYM-1:
$ make -C asm
[...]
$ symtool -v load 0x200 asm/countdown.bin
INFO:symtool.symtool:using port /dev/ttyUSB0, speed 4800
INFO:symtool.symtool:connecting to sym1...
INFO:symtool.symtool:connected
INFO:symtool.symtool:loading 214 bytes of data at $200
(Note the -v
on the command line there; without that, symtool
won’t produce any output unless there’s an error.)
[A note on compiling code: the build logic in the asm/
directory is configured to load code at address 0x200
. If you want
to load code at a different address, you will need to add the
appropriate --start-addr
option to LD65FLAGS
when building, or
modify the linker configuration in sym1.cfg
.]
The above command loads the code into memory but doesn’t execute it.
We can use the dump
command to examine memory. By default, dump
produces binary output. We can use that to extract code from the SYM-1
ROM or to verify that the code we just loaded was transferred
correctly:
$ symtool dump 0x200 $(wc -c < asm/countdown.bin) -o check.bin
$ sha1sum check.bin asm/countdown.bin
5851c40bed8cc8b2a132163234b68a7fc0e434c0 check.bin
5851c40bed8cc8b2a132163234b68a7fc0e434c0 asm/countdown.bin
We can also produce a hexdump:
$ symtool dump 0x200 $(wc -c < asm/countdown.bin) -h
00000000: 20 86 8B A9 20 85 03 A9 55 8D 7E A6 A9 02 8D 7F ... ...U.~.....
00000010: A6 A9 40 8D 0B AC A9 4E 8D 06 AC A9 C0 8D 0E AC ..@....N........
00000020: A9 00 85 02 A9 20 8D 05 AC 18 58 A9 00 8D 40 A6 ..... ....X...@.
00000030: 8D 41 A6 8D 44 A6 8D 45 A6 A5 04 29 0F 20 73 02 .A..D..E...). s.
00000040: 8D 43 A6 A5 04 4A 4A 4A 4A 20 73 02 8D 42 A6 20 .C...JJJJ s..B.
00000050: 06 89 4C 2B 02 48 8A 48 98 48 AD 0D AC 8D 0D AC ..L+.H.H.H......
00000060: E6 02 A5 02 C9 05 F0 02 50 66 A9 00 85 02 20 78 ........Pf.... x
00000070: 02 50 5D AA BD 29 8C 60 18 A5 04 69 01 18 B8 85 .P]..).`...i....
00000080: 04 C9 FF F0 01 60 A9 7C 8D 41 A6 A9 79 8D 42 A6 .....`.|.A..y.B.
00000090: 8D 43 A6 A9 73 8D 44 A6 A9 00 85 04 20 72 89 20 .C..s.D..... r.
000000A0: 06 89 20 06 89 20 06 89 20 06 89 20 06 89 20 06 .. .. .. .. .. .
000000B0: 89 C6 03 20 06 89 20 06 89 20 06 89 20 06 89 20 ... .. .. .. ..
000000C0: 06 89 20 06 89 A5 03 C9 00 D0 D1 A9 20 85 03 60 .. ......... ..`
000000D0: 68 A8 68 AA 68 40 h.h.h@
Or a disassembly:
$ symtool dump 0x200 $(wc -c < asm/countdown.bin) -d
$0200 20 86 8b JSR $8B86
$0203 a9 20 LDA #$20
$0205 85 03 STA $03
$0207 a9 55 LDA #$55
$0209 8d 7e a6 STA $A67E
$020c a9 02 LDA #$02
$020e 8d 7f a6 STA $A67F
$0211 a9 40 LDA #$40
$0213 8d 0b ac STA $AC0B
$0216 a9 4e LDA #$4E
$0218 8d 06 ac STA $AC06
$021b a9 c0 LDA #$C0
$021d 8d 0e ac STA $AC0E
$0220 a9 00 LDA #$00
$0222 85 02 STA $02
$0224 a9 20 LDA #$20
$0226 8d 05 ac STA $AC05
$0229 18 CLC
$022a 58 CLI
$022b a9 00 LDA #$00
$022d 8d 40 a6 STA $A640
$0230 8d 41 a6 STA $A641
$0233 8d 44 a6 STA $A644
$0236 8d 45 a6 STA $A645
$0239 a5 04 LDA $04
$023b 29 0f AND #$0F
$023d 20 73 02 JSR $0273
$0240 8d 43 a6 STA $A643
$0243 a5 04 LDA $04
$0245 4a LSR
$0246 4a LSR
$0247 4a LSR
$0248 4a LSR
$0249 20 73 02 JSR $0273
$024c 8d 42 a6 STA $A642
$024f 20 06 89 JSR $8906
$0252 4c 2b 02 JMP $022B
$0255 48 PHA
$0256 8a TXA
$0257 48 PHA
$0258 98 TYA
$0259 48 PHA
$025a ad 0d ac LDA $AC0D
$025d 8d 0d ac STA $AC0D
$0260 e6 02 INC $02
$0262 a5 02 LDA $02
$0264 c9 05 CMP #$05
$0266 f0 02 BEQ $02
$0268 50 66 BVC $66
$026a a9 00 LDA #$00
$026c 85 02 STA $02
$026e 20 78 02 JSR $0278
$0271 50 5d BVC $5D
$0273 aa TAX
$0274 bd 29 8c LDA $8C29,X
$0277 60 RTS
$0278 18 CLC
$0279 a5 04 LDA $04
$027b 69 01 ADC #$01
$027d 18 CLC
$027e b8 CLV
$027f 85 04 STA $04
$0281 c9 ff CMP #$FF
$0283 f0 01 BEQ $01
$0285 60 RTS
$0286 a9 7c LDA #$7C
$0288 8d 41 a6 STA $A641
$028b a9 79 LDA #$79
$028d 8d 42 a6 STA $A642
$0290 8d 43 a6 STA $A643
$0293 a9 73 LDA #$73
$0295 8d 44 a6 STA $A644
$0298 a9 00 LDA #$00
$029a 85 04 STA $04
$029c 20 72 89 JSR $8972
$029f 20 06 89 JSR $8906
$02a2 20 06 89 JSR $8906
$02a5 20 06 89 JSR $8906
$02a8 20 06 89 JSR $8906
$02ab 20 06 89 JSR $8906
$02ae 20 06 89 JSR $8906
$02b1 c6 03 DEC $03
$02b3 20 06 89 JSR $8906
$02b6 20 06 89 JSR $8906
$02b9 20 06 89 JSR $8906
$02bc 20 06 89 JSR $8906
$02bf 20 06 89 JSR $8906
$02c2 20 06 89 JSR $8906
$02c5 a5 03 LDA $03
$02c7 c9 00 CMP #$00
$02c9 d0 d1 bNE $D1
$02cb a9 20 LDA #$20
$02cd 85 03 STA $03
$02cf 60 RTS
$02d0 68 PLA
$02d1 a8 TAY
$02d2 68 PLA
$02d3 aa TAX
$02d4 68 PLA
$02d5 40 RTI
There are two ways to run your code using symtool
. If you provide
the -g
option to the load
command, symtool
will execute your
code as soon as the load has finished:
$ symtool load -g 0x200 asm/countdown.bin
Alternatively, you can use the go
command to run code that has
already been loaded onto the SYM-1:
$ symtool go 0x200
The registers
command allows you to examine the contents of the 6502
registers:
$ symtool registers
s ff (11111111)
f b1 (10110001) +carry -zero -intr -dec -oflow +neg
a 80 (10000000)
x 00 (00000000)
y 50 (01010000)
p b0ac (1011000010101100)
If you want to clear a block of memory, you can use the fill
command. For example, to wipe out the code we loaded in the earlier
example:
$ symtool fill 0x200 0 $(wc -c < asm/countdown.bin)
$ symtool dump -h 0x200 $(wc -c < asm/countdown.bin)
00000000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
[...]
The symtool
repository includes both unit and functional tests. The
functional tests require an actual SYM-1 to be attached to your system
(with the device name in the SYMTOOL_DEVICE
environment variable).
The unit tests will run anywhere.
No lie, this is a pretty niche project. I’m not sure how many people out there own a SYM-1 these days, but this has been fun to work with and if maybe one other person finds it useful, I would consider that a success :).
The symtool repository includes an assembly listing for the monitor.
6502.org hosts just about all the SYM-1 documentation.
Recently, I bought a couple of Raspberry Pi 4, one with 4 GB and 2 equipped with 8 GB of RAM. When I bought the first one, there was no option to get bigger memory. However, I saw this as a game and thought to give this a try. I also bought SSDs for these and USB3 to SATA adapters. Before purchasing anything, you may want to take a look at James Archers page. Unfortunately, there are a couple on adapters on the marked, which don't work that well.
Initially, I followed the description to deploy Fedora 32; it works the same way for Fedora 33 Server (in my case here).
Because ceph requires a partition (or better: a whole disk), I used the traditional setup using partitions and no LVM.
git clone https://github.com/kubernetes-sigs/kubespray
cd kubespray
I followed the documentation and created an inventory. For the container
runtime, I picked crio
, and as calico
as network plugin.
Because of an issue,
I had to patch roles/download/defaults/main.yml
:
diff --git a/roles/download/defaults/main.yml b/roles/download/defaults/main.yml
index a97be5a6..d4abb341 100644
--- a/roles/download/defaults/main.yml
+++ b/roles/download/defaults/main.yml
@@ -64,7 +64,7 @@ quay_image_repo: "quay.io"
# TODO(mattymo): Move calico versions to roles/network_plugins/calico/defaults
# after migration to container download
-calico_version: "v3.16.5"
+calico_version: "v3.15.2"
calico_ctl_version: "{{ calico_version }}"
calico_cni_version: "{{ calico_version }}"
calico_policy_version: "{{ calico_version }}"
@@ -520,13 +520,13 @@ etcd_image_tag: "{{ etcd_version }}{%- if image_arch != 'amd64' -%}-{{ image_arc
flannel_image_repo: "{{ quay_image_repo }}/coreos/flannel"
flannel_image_tag: "{{ flannel_version }}"
calico_node_image_repo: "{{ quay_image_repo }}/calico/node"
-calico_node_image_tag: "{{ calico_version }}"
+calico_node_image_tag: "{{ calico_version }}-arm64"
calico_cni_image_repo: "{{ quay_image_repo }}/calico/cni"
-calico_cni_image_tag: "{{ calico_cni_version }}"
+calico_cni_image_tag: "{{ calico_cni_version }}-arm64"
calico_policy_image_repo: "{{ quay_image_repo }}/calico/kube-controllers"
-calico_policy_image_tag: "{{ calico_policy_version }}"
+calico_policy_image_tag: "{{ calico_policy_version }}-arm64"
calico_typha_image_repo: "{{ quay_image_repo }}/calico/typha"
-calico_typha_image_tag: "{{ calico_typha_version }}"
+calico_typha_image_tag: "{{ calico_typha_version }}-arm64"
pod_infra_image_repo: "{{ kube_image_repo }}/pause"
pod_infra_image_tag: "{{ pod_infra_version }}"
install_socat_image_repo: "{{ docker_image_repo }}/xueshanf/install-socat"
Ceph requires a raw partition. Make sure, you have an empty partition available.
[root@node1 ~]# lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINT
sda
├─sda1
│ vfat FAT32 UEFI 7DC7-A592
├─sda2
│ vfat FAT32 CB75-24A9 567.9M 1% /boot/efi
├─sda3
│ xfs cab851cb-1910-453b-ae98-f6a2abc7f0e0 804.7M 23% /boot
├─sda4
│
├─sda5
│ xfs 6618a668-f165-48cc-9441-98f4e2cc0340 27.6G 45% /
└─sda6
In my case, there are sda4
and sda6
not formatted. sda4
is very
small and will be ignored, sda6
will be used.
Using rook is pretty straightforward
git clone --single-branch --branch v1.5.4 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yaml
kubectl create -f cluster.yaml
Let’s say you have a couple of sensors attached to an ESP8266 running MicroPython. You’d like to sample them at different frequencies (say, one every 60 seconds and one every five minutes), and you’d like to do it as efficiently as possible in terms of power consumption. What are your options?
If we don’t care about power efficiency, the simplest solution is probably a loop like this:
import machine
lastrun_1 = 0
lastrun_2 = 0
while True:
now = time.time()
if (lastrun_1 == 0) or (now - lastrun_1 >= 60):
read_sensor_1()
lastrun_1 = now
if (lastrun_2 == 0) or (now - lastrun_2 >= 300):
read_sensor_2()
lastrun_2 = now
machine.idle()
If we were only reading a single sensor (or multiple sensors at the same interval), we could drop the loop and juse use the ESP8266’s deep sleep mode (assuming we have wired things properly):
import machine
def deepsleep(duration):
rtc = machine.RTC()
rtc.irq(trigger=rtc.ALARM0, wake=machine.DEEPSLEEP)
rtc.alarm(rtc.ALARM0, duration)
read_sensor_1()
deepsleep(60000)
This will wake up, read the sensor, then sleep for 60 seconds, at which point the device will reboot and repeat the process.
If we want both use deep sleep and run tasks at different intervals, we can effectively combine the above two methods. This requires a little help from the RTC, which in addition to keeping time also provides us with a small amount of memory (492 bytes when using MicroPython) that will persist across a deepsleep/reset cycle.
The machine.RTC
class includes a memory
method that provides
access to the RTC memory. We can read the memory like this:
import machine
rtc = machine.RTC()
bytes = rtc.memory()
Note that rtc.memory()
will always return a byte string.
We write to it like this:
rtc.memory('somevalue')
Lastly, note that the time maintained by the RTC also persists across
a deepsleep/reset cycle, so that if we call time.time()
and then
deepsleep for 10 seconds, when the module boots back up time.time()
will show that 10 seconds have elapsed.
We’re going to implement a solution similar to the loop presented at the beginning of this article in that we will store the time at which at task was last run. Because we need to maintain two different values, and because the RTC memory operates on bytes, we need a way to serialize and deserialize a pair of integers. We could use functions like this:
import json
def store_time(t1, t2):
rtc.memory(json.dumps([t1, t2]))
def load_time():
data = rtc.memory()
if not data:
return [0, 0]
try:
return json.loads(data)
except ValueError:
return [0, 0]
The load_time
method returns [0, 0]
if either (a) the RTC memory
was unset or (b) we were unable to decode the value stored in memory
(which might happen if you had previously stored something else
there).
You don’t have to use json
for serializing the data we’re storing in
the RTC; you could just as easily use the struct
module:
import struct
def store_time(t1, t2):
rtc.memory(struct.pack('ll', t1, t2))
def load_time():
data = rtc.memory()
if not data:
return [0, 0]
try:
return struct.unpack('ll', data)
except ValueError:
return [0, 0]
Once we’re able to store and retrieve data from the RTC, the main part of our code ends up looking something like this:
lastrun_1, lastrun_2 = load_time()
now = time.time()
something_happened = False
if lastrun_1 == 0 or (now - lastrun_1 > 60):
read_sensor_1()
lastrun_1 = now
something_happened = True
if lastrun_2 == 0 or (now - lastrun_2 > 300):
read_sensor_2()
lastrun_2 = now
something_happened = True
if something_happened:
store_time(lastrun_1, lastrun_2)
deepsleep(60000)
This code will wake up every 60 seconds. That means it will always run
the read_sensor_1
task, and it will run the read_sensor_2
task
every five minutes. In between, the ESP8266 will be in deep sleep
mode, consuming around 20µA. In order to avoid too many unnecessary
writes to RTC memory, we only store values when lastrun_1
or
lastrun_2
has changed.
While developing your code, it can be inconvenient to have the device
enter deep sleep mode (because you can’t just ^C
to return to the
REPL). You can make the deep sleep behavior optional by wrapping
everything in a loop, and optionally calling deepsleep
at the end of
the loop, like this:
lastrun_1, lastrun_2 = load_time()
while True:
now = time.time()
something_happened = False
if lastrun_1 == 0 or (now - lastrun_1 > 60):
read_sensor_1()
lastrun_1 = now
something_happened = True
if lastrun_2 == 0 or (now - lastrun_2 > 300):
read_sensor_2()
lastrun_2 = now
something_happened = True
if something_happened:
store_time(lastrun_1, lastrun_2)
if use_deep_sleep:
deepsleep(60000)
else:
machine.idle()
If the variable use_deepsleep
is True
, this code will perform as
described in the previous section, waking once every 60 seconds. If
use_deepsleep
is False
, this will use a busy loop.
While reviewing the comments on the Ironic spec, for Secure RBAC. I had to ask myself if the “project” construct makes sense for Ironic. I still think it does, but I’ll write this down to see if I can clarify it for me, and maybe for you, too.
Baremetal servers change. The whole point of Ironic is to control the change of Baremetal servers from inanimate pieces of metal to “really useful engines.” This needs to happen in a controlled and unsurprising way.
Ironic the server does what it is told. If a new piece of metal starts sending out DHCP requests, Ironic is going to PXE boot it. This is the start of this new piece of metals journey of self discovery. At least as far as Ironic is concerned.
But really, someone had to rack and wire said piece of metal. Likely the person that did this is not the person that is going to run workloads on it in the end. They might not even work for the same company; they might be a delivery person from Dell or Supermicro. So, once they are done with it, they don’t own it any more.
Who does? Who owns a piece of metal before it is enrolled in the OpenStack baremetal service?
No one. It does not exist.
Ok, so lets go back to someone pushing the button, booting our server for the first time, and it doing its PXE boot thing.
Or, we get the MAC address and enter that into the ironic database, so that when it does boot, we know about it.
Either way, Ironic is really the playground monitor, just making sure it plays nice.
What if Ironic is a multi-tenant system? Someone needs to be able to transfer the baremetal server from where ever it lands up front to the people that need to use it.
I suspect that ransferring metal from project to project is going to be one of the main use cases after the sun has set on day one.
So, who should be allowed to say what project a piece of baremetal can go to?
Well, in Keystone, we have the idea of hierarchy. A Project is owned by a domain, and a project can be nested inside another project.
But this information is not passed down to Ironic. There is no way to get a token for a project that shows its parent information. But a remote service could query the project hierarchy from Keystone.
Say I want to transfer a piece of metal from one project to another. Should I have a token for the source project or the remote project. Ok, dump question, I should definitely have a token for the source project. The smart question is whether I should also have a token for the destination project.
Sure, why not. Two tokens. One has the “delete” role and one that has the “create” role.
The only problem is that nothing like this exists in Open Stack. But it should.
We could fake it with hierarchy; I can pass things up and down the project tree. But that really does not one bit of good. People don’t really use the tree like that. They should. We built a perfectly nice tree and they ignore it. Poor, ignored, sad, lonely tree.
Actually, it has no feelings. Please stop anthropomorphising the tree.
What you could do is create the destination object, kind of a potential piece-of-metal or metal-receiver. This receiver object gets a UUID. You pass this UUID to the “move” API. But you call the MOVE api with a token for the source project. The move is done atomically. Lets call this thing identified by a UUID a move-request.
The order of operations could be done in reverse. The operator could create the move request on the source, and then pass that to the receiver. This might actually make mores sense, as you need to know about the object before you can even think to move it.
Both workflows seem to have merit.
And…this concept seems to be something that OpenStack needs in general.
Infact, why should the API not be a generic API. I mean, it would have to be per service, but the same API could be used to transfer VMs between projects in Nova nad between Volumes in Cinder. The API would have two verbs one for creating a new move request, and one for accepting it.
POST /thingy/v3.14/resource?resource_id=abcd&destination=project_id
If this is called with a token, it needs to be scoped. If it is scoped to the project_id in the API, it creates a receiving type request. If it is scoped to the project_id that owns the resource, it is a sending type request. Either way, it returns an URL. Call GET on that URL and you get information about the transfer. Call PATCH on it with the appropriately scoped token, and the resource is transferred. And maybe enough information to prove that you know what you are doing: maybe you have to specify the source and target projects in that patch request.
A foolish consistency is the hobgoblin of little minds.
Edit: OK, this is not a new idea. Cinder went through the same thought process according to Duncan Thomas. The result is this API: https://docs.openstack.org/api-ref/block-storage/v3/index.html#volume-transfer
Which looks like it then morphed to this one:
https://docs.openstack.org/api-ref/block-storage/v3/index.html#volume-transfers-volume-transfers-3-55-or-later
I've had to re-teach myself how to do this so I'm writing my own notes.
Prerequisites:
Once you have your environment ready run a test with the name from step 3.
Some tests in CI are configured to use `--skip-tags`. You can do this for your local tests too by setting the appropriate environment variables. For example:
./scripts/run-local-test tripleo_derived_parameters
export TRIPLEO_JOB_ANSIBLE_ARGS="--skip-tags run_ceph_ansible,run_uuid_ansible,ceph_client_rsync,clean_fetch_dir"
./scripts/run-local-test tripleo_ceph_run_ansible
by Unknown (noreply@blogger.com) at December 15, 2020 03:46 PM
Look back at our Pushing Keystone over the Edge presentation from the OpenStack Summit. Many of the points we make are problems faced by any application trying to scale across multiple datacenters. Cassandra is a database designed to deal with this level of scale. So Cassandra may well be a better choice than MySQL or other RDBMS as a datastore to Keystone. What would it take to enable Cassandra support for Keystone?
Lets start with the easy part: defining the tables. Lets look at how we define the Federation back end for SQL. We use SQL Alchemy to handle the migrations: we will need something comparable for Cassandra Query Language (CQL) but we also need to translate the table definitions themselves.
Before we create the tables, we need to create keyspace. I am going to make separate keyspaces for each of the subsystems in Keystone: Identity, Assignment, Federation, and so on. Here’s the Federated one:
CREATE KEYSPACE keystone_federation WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '3'} AND durable_writes = true;
The Identity provider table is defined like this:
idp_table = sql.Table(
'identity_provider',
meta,
sql.Column('id', sql.String(64), primary_key=True),
sql.Column('enabled', sql.Boolean, nullable=False),
sql.Column('description', sql.Text(), nullable=True),
mysql_engine='InnoDB',
mysql_charset='utf8')
idp_table.create(migrate_engine, checkfirst=True)
The comparable CQL to create a table would look like this:
CREATE TABLE identity_provider (id text PRIMARY KEY , enables boolean , description text);
However, when I describe the schema to view the table defintion, we see that there are many tuning and configuration parameters that are defaulted:
CREATE TABLE federation.identity_provider (
id text PRIMARY KEY,
description text,
enables boolean
) WITH additional_write_policy = '99p'
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND cdc = false
AND comment = ''
AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
AND compression = {'chunk_length_in_kb': '16', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND default_time_to_live = 0
AND extensions = {}
AND gc_grace_seconds = 864000
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair = 'BLOCKING'
AND speculative_retry = '99p';
I don’t know Cassandra well enough to say if these are sane defaults to have in production. I do know that someone, somewhere, is going to want to tweak them, and we are going to have to provide a means to do so without battling the upgrade scripts. I suspect we are going to want to only use the short form (what I typed into the CQL prompt) in the migrations, not the form with all of the options. In addition, we might want an if not exists clause on the table creation to allow people to make these changes themselves. Then again, that might make things get out of sync. Hmmm.
There are three more entities in this back end:
CREATE TABLE federation_protocol (id text, idp_id text, mapping_id text, PRIMARY KEY(id, idp_id) );
cqlsh:federation> CREATE TABLE mapping (id text primary key, rules text, );
CREATE TABLE service_provider ( auth_url text, id text primary key, enabled boolean, description text, sp_url text, RELAY_STATE_PREFIX text);
One thing that is interesting is that we will not be limiting the ID fields to 32, 64, or 128 characters. There is no performance benefit to doing so in Cassandra, nor is there any way to enforce the length limits. From a Keystone perspective, there is not much value either; we still need to validate the UUIDs in Python code. We could autogenerate the UUIDs in Cassandra, and there might be some benefit to that, but it would diverge from the logic in the Keystone code, and explode the test matrix.
There is only one foreign key in the SQL section; the federation protocol has an idp_id that points to the identity provider table. We’ll have to accept this limitation and ensure the integrity is maintained in code. We can do this by looking up the Identity provider before inserting the protocol entry. Since creating a Federated entity is a rare and administrative task, the risk here is vanishingly small. It will be more significant elsewhere.
For access to the database, we should probably use Flask-CQLAlchemy. Fortunately, Keystone is already a Flask based project, so this makes the two projects align.
For migration support, It looks like the best option out there is cassandra-migrate.
An effort like this would best be started out of tree, with an expectation that it would be merged in once it had shown a degree of maturity. Thus, I would put it into a namespace that would not conflict with the existing keystone project. The python imports would look like:
from keystone.cassandra import migrations
from keystone.cassandra import identity
from keystone.cassandra import federation
This could go in its own git repo and be separately pip installed for development. The entrypoints would be registered such that the configuration file would have entries like:
[application_credential] driver = cassandraAny tuning of the database could be put under a [cassandra] section of the conf file, or tuning for individual sections could be in keys prefixed with cassanda_ in the appropriate sections, such as application_credentials as shown above.
It might be interesting to implement a Cassandra token backend and use the default_time_to_live value on the table to control the lifespan and automate the cleanup of the tables. This might provide some performance benefit over the fernet approach, as the token data would be cached. However, the drawbacks due to token invalidation upon change of data would far outweigh the benefits unless the TTL was very short, perhaps 5 minutes.
Just making it work is one thing. In a follow on article, I’d like to go through what it would take to stretch a cluster from one datacenter to another, and to make sure that the other considerations that we discussed in that presentation are covered.
Feedback?
RDO Victoria Released
The RDO community is pleased to announce the general availability of the RDO build for OpenStack Victoria for RPM-based distributions, CentOS Linux and Red Hat Enterprise Linux. RDO is suitable for building private, public, and hybrid clouds. Victoria is the 22nd release from the OpenStack project, which is the work of more than 1,000 contributors from around the world.
The release is already available on the CentOS mirror network at http://mirror.centos.org/centos/8/cloud/x86_64/openstack-victoria/.
The RDO community project curates, packages, builds, tests and maintains a complete OpenStack component set for RHEL and CentOS Linux and is a member of the CentOS Cloud Infrastructure SIG. The Cloud Infrastructure SIG focuses on delivering a great user experience for CentOS Linux users looking to build and maintain their own on-premise, public or hybrid clouds.
All work on RDO and on the downstream release, Red Hat OpenStack Platform, is 100% open source, with all code changes going upstream first.
PLEASE NOTE: RDO Victoria provides packages for CentOS8 and python 3 only. Please use the Train release, for CentOS7 and python 2.7.
Interesting things in the Victoria release include:
Other highlights of the broader upstream OpenStack project may be read via https://releases.openstack.org/victoria/highlights.
Contributors
During the Victoria cycle, we saw the following new RDO contributors:
Amy Marrich (spotz)
Daniel Pawlik
Douglas Mendizábal
Lance Bragstad
Martin Chacon Piza
Paul Leimer
Pooja Jadhav
Qianbiao NG
Rajini Karthik
Sandeep Yadav
Sergii Golovatiuk
Steve Baker
Welcome to all of you and Thank You So Much for participating!
But we wouldn’t want to overlook anyone. A super massive Thank You to all 58 contributors who participated in producing this release. This list includes commits to rdo-packages, rdo-infra, and redhat-website repositories:
Adam Kimball
Ade Lee
Alan Pevec
Alex Schultz
Alfredo Moralejo
Amol Kahat
Amy Marrich (spotz)
Arx Cruz
Bhagyashri Shewale
Bogdan Dobrelya
Cédric Jeanneret
Chandan Kumar
Damien Ciabrini
Daniel Pawlik
Dmitry Tantsur
Douglas Mendizábal
Emilien Macchi
Eric Harney
Francesco Pantano
Gabriele Cerami
Gael Chamoulaud
Gorka Eguileor
Grzegorz Grasza
Harald Jensås
Iury Gregory Melo Ferreira
Jakub Libosvar
Javier Pena
Joel Capitao
Jon Schlueter
Lance Bragstad
Lon Hohberger
Luigi Toscano
Marios Andreou
Martin Chacon Piza
Mathieu Bultel
Matthias Runge
Michele Baldessari
Mike Turek
Nicolas Hicher
Paul Leimer
Pooja Jadhav
Qianbiao.NG
Rabi Mishra
Rafael Folco
Rain Leander
Rajini Karthik
Riccardo Pittau
Ronelle Landy
Sagi Shnaidman
Sandeep Yadav
Sergii Golovatiuk
Slawek Kaplonski
Soniya Vyas
Sorin Sbarnea
Steve Baker
Tobias Urdin
Wes Hayutin
Yatin Karel
The Next Release Cycle
At the end of one release, focus shifts immediately to the next release i.e Wallaby.
Get Started
There are three ways to get started with RDO.
To spin up a proof of concept cloud, quickly, and on limited hardware, try an All-In-One Packstack installation. You can run RDO on a single node to get a feel for how it works.
For a production deployment of RDO, use TripleO and you’ll be running a production cloud in short order.
Finally, for those that don’t have any hardware or physical resources, there’s the OpenStack Global Passport Program. This is a collaborative effort between OpenStack public cloud providers to let you experience the freedom, performance and interoperability of open source infrastructure. You can quickly and easily gain access to OpenStack infrastructure via trial programs from participating OpenStack public cloud providers around the world.
Get Help
The RDO Project has our users@lists.rdoproject.org for RDO-specific users and operators. For more developer-oriented content we recommend joining the dev@lists.rdoproject.org mailing list. Remember to post a brief introduction about yourself and your RDO story. The mailing lists archives are all available at https://mail.rdoproject.org. You can also find extensive documentation on RDOproject.org.
The #rdo channel on Freenode IRC is also an excellent place to find and give help.
We also welcome comments and requests on the CentOS devel mailing list and the CentOS and TripleO IRC channels (#centos, #centos-devel, and #tripleo on irc.freenode.net), however we have a more focused audience within the RDO venues.
Get Involved
To get involved in the OpenStack RPM packaging effort, check out the RDO contribute pages, peruse the CentOS Cloud SIG page, and inhale the RDO packaging documentation.
Join us in #rdo and #tripleo on the Freenode IRC network and follow us on Twitter @RDOCommunity. You can also find us on Facebook and YouTube.
TheJulia was kind enough to update the docs for Ironic to show me how to include IPMI information when creating nodes.
for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ; do openstack baremetal node delete $UUID; done
I removed the ipmi common data from each definition as there is a password there, and I will set that afterwards on all nodes.
{
"nodes": [
{
"ports": [
{
"address": "00:21:9b:93:d0:90"
}
],
"name": "zygarde",
"driver": "ipmi",
"driver_info": {
"ipmi_address": "192.168.123.10"
}
},
{
"ports": [
{
"address": "00:21:9b:9b:c4:21"
}
],
"name": "umbreon",
"driver": "ipmi",
"driver_info": {
"ipmi_address": "192.168.123.11"
}
},
{
"ports": [
{
"address": "00:21:9b:98:a3:1f"
}
],
"name": "zubat",
"driver": "ipmi",
"driver_info": {
"ipmi_address": "192.168.123.12"
}
}
]
}
openstack baremetal create ./nodes.ipmi.json
$ openstack baremetal node list
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
| 3fa4feae-0d5c-4e38-a012-29258d40651b | zygarde | None | None | enroll | False |
| 00965ad4-c972-46fa-948a-3ce87aecf5ac | umbreon | None | None | enroll | False |
| 8702ea0c-aa10-4542-9292-3b464fe72036 | zubat | None | None | enroll | False |
+--------------------------------------+---------+---------------+-------------+--------------------+-------------+
for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ;
do openstack baremetal node set $UUID --driver-info ipmi_password=`cat ~/ipmi.password` --driver-info ipmi_username=admin ;
done
EDIT: I had ipmi_user before and it does not work. Needs to be ipmi_username.
And if I look in the returned data for the definition, we see the password is not readable:
$ openstack baremetal node show zubat -f yaml | grep ipmi_password
ipmi_password: '******'
for UUID in `openstack baremetal node list -f json | jq -r '.[] | .UUID' ` ; do openstack baremetal node power on $UUID ; done
Change “on” to “off” to power off.
“I can do any thing. I can’t do everything.”
The sheer number of projects and problem domains covered by OpenStack was overwhelming. I never learned several of the other projects under the big tent. One project that is getting relevant to my day job is Ironic, the bare metal provisioning service. Here are my notes from spelunking the code.
I want just Ironic. I don’t want Keystone (personal grudge) or Glance or Neutron or Nova.
Ironic will write files to e.g. /var/lib/tftp and /var/www/html/pxe and will not handle DHCP, but can make sue of static DHCP configurations.
Ironic is just an API server at this point ( python based web service) that manages the above files, and that can also talk to the IPMI ports on my servers to wake them up and perform configurations on them.
I need to provide ISO images to Ironic so it can put the in the right place to boot them
I checked the code out of git. I am working off the master branch.
I ran tox to ensure the unit tests are all at 100%
I have mysql already installed and running, but with a Keystone Database. I need to make a new one for ironic. The database name, user, and password are all going to be ironic, to keep things simple.
CREATE USER 'ironic'@'localhost' IDENTIFIED BY 'ironic';
create database ironic;
GRANT ALL PRIVILEGES ON ironic.* TO 'ironic'@'localhost';
FLUSH PRIVILEGES;
Note that I did this as the Keystone user. That dude has way to much privilege….good thing this is JUST for DEVELOPMENT. This will be used to follow the steps in the developers quickstart docs. I also set the mysql URL in the config file to this
connection = mysql+pymysql://ironic:ironic@localhost/ironic
Then I can run ironic db sync. Lets’ see what I got:
mysql ironic --user ironic --password
#....
MariaDB [ironic]> show tables;
+-------------------------------+
| Tables_in_ironic |
+-------------------------------+
| alembic_version |
| allocations |
| bios_settings |
| chassis |
| conductor_hardware_interfaces |
| conductors |
| deploy_template_steps |
| deploy_templates |
| node_tags |
| node_traits |
| nodes |
| portgroups |
| ports |
| volume_connectors |
| volume_targets |
+-------------------------------+
15 rows in set (0.000 sec)
OK, so the first table shows that Ironic uses Alembic to manage migrations. Unlike the SQLAlchemy migrations table, you can’t just query this table to see how many migrations have been performed:
MariaDB [ironic]> select * from alembic_version;
+--------------+
| version_num |
+--------------+
| cf1a80fdb352 |
+--------------+
1 row in set (0.000 sec)
The script to start the API server is:
ironic-api -d --config-file etc/ironic/ironic.conf.local
Looking in the file requirements.txt, I see that they Web framework for Ironic is Pecan:
$ grep pecan requirements.txt
pecan!=1.0.2,!=1.0.3,!=1.0.4,!=1.2,>=1.0.0 # BSD
This is new to me. On Keystone, we converted from no framework to Flask. I’m guessing that if I look in the chain that starts with ironic-api file, I will see a Pecan launcher for a web application. We can find that file with
$which ironic-api
/opt/stack/ironic/.tox/py3/bin/ironic-api
Looking in that file, it references ironic.cmd.api, which is the file ironic/cmd/api.py which in turn refers to ironic/common/wsgi_service.py. This in turn refers to ironic/api/app.py from which we can finally see that it imports pecan.
Now I am ready to run the two services. Like most of OpenStack, there is an API server and a “worker” server. In Ironic, this is called the Conductor. This maps fairly well to the Operator pattern in Kubernetes. In this pattern, the user makes changes to the API server via a web VERB on a URL, possibly with a body. These changes represent a desired state. The state change is then performed asynchronously. In OpenStack, the asynchronous communication is performed via a message queue, usually Rabbit MQ. The Ironic team has a simpler mechanism used for development; JSON RPC. This happens to be the same mechanism used in FreeIPA.
OK, once I got the service running, I had to do a little fiddling around to get the command lines to work. The was an old reference to
OS_AUTH_TYPE=token_endpoint
which needed to be replaces with
OS_AUTH_TYPE=none
Both are in the documentation, but only the second one will work.
I can run the following commands:
$ baremetal driver list
+---------------------+----------------+
| Supported driver(s) | Active host(s) |
+---------------------+----------------+
| fake-hardware | ayoungP40 |
+---------------------+----------------+
$ baremetal node list
Lets see if I can figure out from CURL what APIs those are…There is only one version, and one link, so:
curl http://127.0.0.1:6385 | jq '.versions | .[] | .links | .[] | .href'
"http://127.0.0.1:6385/v1/"
Doing curl against that second link gives a list of the top level resources:
And I assume that, if I use curl to GET the drivers, I should see the fake driver entry from above:
$ curl "http://127.0.0.1:6385/v1/drivers" | jq '.drivers |.[] |.name'
"fake-hardware"
OK, that is enough to get started. I am going to try and do the same with the RPMs that we ship with OSP and see what I get there.
But that is a tale for another day.
I had a conversation I had with Julia Kreger, a long time core member of the Ironic project. This helped get me oriented.
I found the following error from gpgv
to be a little opaque:
gpgv: unknown type of key resource 'trustedkeys.kbx'
gpgv: keyblock resource '/home/lars/.gnupg/trustedkeys.kbx': General error
gpgv: Can't check signature: No public key
It turns out that’s gpg-speak for “your trustedkeys.kbx
keyring doesn’t
exist”. That took longer to figure out than I care to admit. To get a key
from your regular public keyring into your trusted keyring, you can run
something like the following:
gpg --export -a lars@oddbit.com |
gpg --no-default-keyring --keyring ~/.gnupg/trustedkeys.kbx --import
After which gpgv
works as expected:
$ echo hello world | gpg -s -u lars@oddbit.com | gpgv
gpgv: Signature made Mon 05 Oct 2020 07:44:22 PM EDT
gpgv: using RSA key FDE8364F7FEA3848EF7AD3A6042DF6CF74E4B84C
gpgv: issuer "lars@oddbit.com"
gpgv: Good signature from "Lars Kellogg-Stedman <lars@oddbit.com>"
gpgv: aka "keybase.io/larsks <larsks@keybase.io>"
Out of the box, OpenShift (4.x) on bare metal doesn’t come with any integrated load balancer support (when installed in a cloud environment, OpenShift typically makes use of the load balancing features available from the cloud provider). Fortunately, there are third party solutions available that are designed to work in bare metal environments. MetalLB is a popular choice, but requires some minor fiddling to get it to run properly on OpenShift.
If you read through the installation instructions, you will see this note about installation on OpenShift:
To run MetalLB on Openshift, two changes are required: changing the pod UIDs, and granting MetalLB additional networking privileges.
Pods get UIDs automatically assigned based on an OpenShift-managed UID range, so you have to remove the hardcoded unprivileged UID from the MetalLB manifests. You can do this by removing the spec.template.spec.securityContext.runAsUser field from both the controller Deployment and the speaker DaemonSet.
Additionally, you have to grant the speaker DaemonSet elevated privileges, so that it can do the raw networking required to make LoadBalancers work. You can do this with:
The docs here suggest some manual changes you can make, but it’s possible to get everything installed correctly using Kustomize (which makes sense especially given that the MetalLB docs already include instructions on using Kustomize).
A vanilla installation of MetalLB with Kustomize uses a kustomization.yml
file that looks like this:
namespace: metallb-system
resources:
- github.com/metallb/metallb//manifests?ref=v0.9.3
- configmap.yml
- secret.yml
(Where configmap.yml
and secret.yml
are files you create locally
containing, respectively, the MetalLB configuration and a secret used to
authenticate cluster members.)
In order to remove the runAsUser
directive form the template
securityContext
setting, we can use the patchesStrategicMerge
feature. In our kustomization.yml
file we add:
patches:
- |-
apiVersion: apps/v1
kind: Deployment
metadata:
name: controller
namespace: metallb-system
spec:
template:
spec:
securityContext:
$patch: replace
runAsNonRoot: true
This instructs kustomize
to replace the contents of the securityContext
key with the value included in the patch (without the $patch: replace
directive, the default behavior is to merge the contents, which in this
situation would effectively be a no-op).
We can accomplish the same thing using jsonpatch syntax. In this case, we would write:
patches:
- target:
kind: Deployment
name: controller
namespace: metallb-system
patch: |-
- op: remove
path: /spec/template/spec/securityContext/runAsUser
With either solution, the final output includes a securityContext
setting
that looks like this:
spec:
template:
spec:
securityContext:
runAsNonRoot: true
The MetaLB docs suggest running:
oc adm policy add-scc-to-user privileged -n metallb-system -z speaker
But we can configure the same privilege level by setting up an appropriate role binding as part of our Kustomize manifests.
First, we create an allow-privileged
cluster role by adding the following
manifest in clusterrole.yml
:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: allow-privileged
rules:
- apiGroups:
- security.openshift.io
resourceNames:
- privileged
resources:
- securitycontextconstraints
verbs:
- use
Then we bind the speaker
service account to the allow-privileged
role
by adding a ClusterRoleBinding
in rolebinding.yml
:
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: metallb-allow-privileged
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: allow-privileged
subjects:
- kind: ServiceAccount
name: speaker
namespace: metallb-system
You will need to add these new manifests to your kustomization.yml
, which
should now look like:
namespace: metallb-system
resources:
- github.com/metallb/metallb//manifests?ref=v0.9.3
- configmap.yml
- secret.yml
- clusterole.yml
- rolebinding.yml
patches:
- target:
kind: Deployment
name: controller
namespace: metallb-system
patch: |-
- op: remove
path: /spec/template/spec/securityContext/runAsUser
The changes described here will result in a successful MetalLB deployment into your OpenShift environment.
I’ve had my eye on the Vortex Core keyboard for a few months now, and this past week I finally broke down and bought one (with Cherry MX Brown switches). The Vortex Core is a 40% keyboard, which means it consists primarily of letter keys, a few lonely bits of punctuation, and several modifier keys to activate different layers on the keyboard.
It’s a really cute keyboard. I’m a big fan of MX brown switches, and this keyboard is really a joy to type on, at least when you’re working primarily with the alpha keys. I’m still figuring out where some of the punctuation is, and with a few exceptions I haven’t yet spent time trying to remap things into more convenient positions.
The keyboard feels solid. I’m a little suspicious of the micro-usb connector; it feels a little wobbly. I wish that it was USB-C and I wish it felt a little more stable.
Here’s a picture of my Core next to my Durgod K320:
The keyboard first came out in 2017, and if you read reviews that came out around that time you’ll find several complaints around limitations in the keyboard’s programming features, in particular:
And so forth. Fortunately, at some point (maybe 2018) Vortexgear released updated firmware that resolves all of the above issues, and introduces a completely new way of programming the keyboard.
Originally, the keyboard was programmed entirely via the keyboard itself: there was a key combination to activate programming mode in each of the three programmable layers, and this allowed you to freely remap keys. Unfortunately, this made it well difficult to share layouts, and made extensive remapping rather unwieldy.
The updated firmware ("CORE_MPC
") does away with the hardware
programming, and instead introduces both a web UI for generating keyboard
layouts and a simple mechanism for pushing those layouts to the keyboard that
is completely operating system independent (which is nice if you’re a Linux
user and are tired of having to spin up a Windows VM just to run someone’s
firmware programming tool). With the new firmware, you hold down Fn-d
when
booting the keyboard and it will present a FAT-format volume to the operating
system. Drag your layout to the volume, unmount it, and reboot the keyboard and
you’re all set (note that you will still need to spin up that Windows VM
one last time in order to install the firmware update).
The Vortexgear keyboard configurator is available at http://www.vortexgear.tw/mpc/index.html, but you’re going to want to use https://tsfreddie.github.io/much-programming-core/ instead, which removes several limitations that are present in the official tool.
Because the new configurator (a) allows you to remap all layers, including
layer 0, and (b) allows to create mappings for the Pn
key, you have a lot
of flexibility in how you set up your mappings.
I performed some limited remapping of layer 0:
I’ve moved the Fn1
key to the right space bar, and turned the original
Fn1
key into the quote key. I use that enough in general writing that
it’s convenient to be able to access it without using a modifier.
I’ve set up a cursor cluster using the Pn
key. This gets me the
standard WASD
keys for arrows, and Q
and E
for page up and page
down.
Holding down the Pn
key also gets me a numeric keypad on the right side
of the keyboard.
It’s a fun keyboard. I’m not sure it’s going to become my primary keyboard, especially for writing code, but I’m definitely happy with it.
At work we have a cluster of IBM Power 9 systems running OpenShift. The problem with this environment is that nobody runs Power 9 on their desktop, and Docker Hub only offers automatic build support for the x86 architecture. This means there’s no convenient options for building Power 9 Docker images…or so I thought.
It turns out that Docker provides GitHub actions that make the process of producing multi-architecture images quite simple.
The code demonstrated in this post can be found in my hello-flask GitHub repository.
There is some information we need to provide to our workflow that we don’t want to hardcode into configuration files, both for reasons of security (we don’t want to expose passwords in the repository) and convenience (we want other people to be able to fork this repository and run the workflow without needing to make any changes to the code).
We can do this by configuring “secrets” in the repository on GitHub. You
can configure secrets by visiting the “Secrets” tab in your repository
settings (https://github.com/<USERNAME>/<REPOSITORY>/settings/secrets
),
For this workflow, we’re going to need two secrets:
DOCKER_USERNAME
– this is our Docker Hub username; we’ll need this
both for authentication and to set the namespace for the images we’re
building.
DOCKER_PASSWORD
– this is our Docker Hub password, used for
authentication.
Within a workflow, we can refer to these secrets using syntax like ${{ secrets.DOCKER_USERNAME }}
(you’ll see example of this later on).
In the repository containing your Dockerfile
, create a
.github/workflows
directory. This is where we will place the files that
configure GitHub actions. In this directory, create a file called
build_images.yml
(the particular name isn’t important, but it’s nice to
make names descriptive).
We’ll first give this workflow a name and configure it to run for pushes on
our master
branch by adding the following to our build_images.yml
file:
---
name: 'build images'
on:
push:
branches:
- master
With that boilerplate out of the way, we can start configuring the jobs
that will comprise our workflow. Jobs are defined in the jobs
section of
the configuration file, which is a dictionary that maps job names to their
definition. A job can have multiple actions. For this example, we’re going
to set up a docker
job that will perform the following steps:
We start by providing a name for our job and configuring the machine on
which the jobs will run. In this example, we’re using ubuntu-latest
;
other options include some other Ubuntu variants, Windows, and MacOS (and
you are able to host your own custom builders, but that’s outside the scope
of this article).
jobs:
docker:
runs-on: ubuntu-latest
steps:
In our first step, we use the standard actions/checkout action to check out the repository:
- name: Checkout
uses: actions/checkout@v2
The next step is a simple shell script that sets some output parameters we will be able to consume in subsequent steps. A script can set parameters by generating output in the form:
::set-output name=<name>::<value>
In other steps, we can refer to these parameters using the syntax
${{ steps.<step_name>.output.<name> }}
(e.g. ${{ steps.prep.output.tags }}
).
We’re going to use this step to set things like the image name (using our
DOCKER_USERNAME
secret to set the namespace), and to set up several tags
for the image:
latest
latest
.
Note that here we’re assuming that git tags are of the form v1.0
, so we
strip off that initial v
to get a Docker tag that is just the version
number. - name: Prepare
id: prep
run: |
DOCKER_IMAGE=${{ secrets.DOCKER_USERNAME }}/${GITHUB_REPOSITORY#*/}
VERSION=latest
SHORTREF=${GITHUB_SHA::8}
# If this is git tag, use the tag name as a docker tag
if [[ $GITHUB_REF == refs/tags/* ]]; then
VERSION=${GITHUB_REF#refs/tags/v}
fi
TAGS="${DOCKER_IMAGE}:${VERSION},${DOCKER_IMAGE}:${SHORTREF}"
# If the VERSION looks like a version number, assume that
# this is the most recent version of the image and also
# tag it 'latest'.
if [[ $VERSION =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
TAGS="$TAGS,${DOCKER_IMAGE}:latest"
fi
# Set output parameters.
echo ::set-output name=tags::${TAGS}
echo ::set-output name=docker_image::${DOCKER_IMAGE}
The docker/setup-qemu action installs QEMU static binaries, which are used to run builders for architectures other than the host.
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all
The docker/setup-buildx action configures buildx, which is a Docker CLI plugin that provides enhanced build capabilities. This is the infrastructure that the following step will use for actually building images.
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
In order to push images to Docker Hub, we use the docker/login-action
action to authenticate. This uses the DOCKER_USERNAME
and
DOCKER_PASSWORD
secrets we created earlier in order to establish
credentials for use in subsequent steps.
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
This final step uses the [docker/build-push-action][] to build the images
and push them to Docker Hub using the tags we defined in the prep
step.
In this example, we’re building images for amd64
, arm64
, and ppc64le
architectures.
- name: Build
uses: docker/build-push-action@v2
with:
builder: ${{ steps.buildx.outputs.name }}
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64,linux/ppc64le
push: true
tags: ${{ steps.prep.outputs.tags }}
When we put all of the above together, we get:
---
name: 'build images'
on:
push:
branches:
- master
jobs:
docker:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Prepare
id: prep
run: |
DOCKER_IMAGE=${{ secrets.DOCKER_USERNAME }}/${GITHUB_REPOSITORY#*/}
VERSION=latest
SHORTREF=${GITHUB_SHA::8}
# If this is git tag, use the tag name as a docker tag
if [[ $GITHUB_REF == refs/tags/* ]]; then
VERSION=${GITHUB_REF#refs/tags/v}
fi
TAGS="${DOCKER_IMAGE}:${VERSION},${DOCKER_IMAGE}:${SHORTREF}"
# If the VERSION looks like a version number, assume that
# this is the most recent version of the image and also
# tag it 'latest'.
if [[ $VERSION =~ ^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$ ]]; then
TAGS="$TAGS,${DOCKER_IMAGE}:latest"
fi
# Set output parameters.
echo ::set-output name=tags::${TAGS}
echo ::set-output name=docker_image::${DOCKER_IMAGE}
- name: Set up QEMU
uses: docker/setup-qemu-action@master
with:
platforms: all
- name: Set up Docker Buildx
id: buildx
uses: docker/setup-buildx-action@master
- name: Login to DockerHub
if: github.event_name != 'pull_request'
uses: docker/login-action@v1
with:
username: ${{ secrets.DOCKER_USERNAME }}
password: ${{ secrets.DOCKER_PASSWORD }}
- name: Build
uses: docker/build-push-action@v2
with:
builder: ${{ steps.buildx.outputs.name }}
context: .
file: ./Dockerfile
platforms: linux/amd64,linux/arm64,linux/ppc64le
push: true
tags: ${{ steps.prep.outputs.tags }}
You can grab the hello-flask repository and try this out yourself.
You’ll need to set up the secrets described earlier in this article, but
then for each commit to the master
branch you will end up a new image,
tagged both as latest
and with the short git commit id.
We can use the docker manifest inspect
command to inspect the output of
the build step. In the output below, you can see the images build for our
three target architectures:
$ docker manifest inspect !$
docker manifest inspect larsks/hello-flask
{
"schemaVersion": 2,
"mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
"manifests": [
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 3261,
"digest": "sha256:c6bab778a9fd0dc7bf167a5a49281bcd5ebc5e762ceeb06791aff8f0fbd15325",
"platform": {
"architecture": "amd64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 3261,
"digest": "sha256:3c02f36562fcf8718a369a78054750382aba5706e1e9164b76bdc214591024c4",
"platform": {
"architecture": "arm64",
"os": "linux"
}
},
{
"mediaType": "application/vnd.docker.distribution.manifest.v2+json",
"size": 3262,
"digest": "sha256:192fc9acd658edd6b7f2726f921cba2582fb1101d929800dff7fb53de951dd76",
"platform": {
"architecture": "ppc64le",
"os": "linux"
}
}
]
}
This process assumes, of course, that your base image of choice is available for your selected architectures. According to Docker:
Most of the official images on Docker Hub provide a variety of architectures. For example, the busybox image supports amd64, arm32v5, arm32v6, arm32v7, arm64v8, i386, ppc64le, and s390x.
So if you are starting from one of the official images, you’ll probably be in good shape. On the other hand, if you’re attempting to use a community image as a starting point, you might find that it’s only available for a single architecture.
Render changes to tripleo docs:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python3 get-pip.py
pip install tox
Check syntax errors before wasting CI time
cd /home/stack/tripleo-docs
tox -e deploy-guide
Run a specific unit test
tox -e linters
tox -e pep8
cd /home/stack/tripleo-common
tox -e py36 -- tripleo_common.tests.test_inventory.TestInventory.test_get_roles_by_service
cd /home/stack/tripleo-ansible
tox -e py36 -- tripleo_ansible.tests.modules.test_derive_hci_parameters.TestTripleoDeriveHciParameters
by Unknown (noreply@blogger.com) at September 04, 2020 06:31 PM
This is part of a series of posts about my experience working with OpenShift and CNV. In this post, I’ll look at how the recently released CNV 2.4 resolves some issues in managing virtual machines that are attached directly to local layer 2 networks
In an earlier post, I discussed some issues around the management of virtual machine MAC addresses in CNV 2.3: in particular, that virtual machines are assigned a random MAC address not just at creation time but every time they boot. CNV 2.4 (re-)introduces MAC address pools to alleviate these issues. The high level description reads:
The KubeMacPool component provides a MAC address pool service for virtual machine NICs in designated namespaces.
In more specific terms, that means that if you enable MAC address
pools on a namespace, when you create create virtual machine network
interfaces they will receive a MAC address from the pool. This is
associated with the VirtualMachine
resource, not the
VirtualMachineInstance
resource, which means that the MAC address
will persist across reboots.
This solves one of the major pain points of using CNV-managed virtual machines attached to host networks.
To enable MAC address pools for a given namespace, set the
mutatevirtualmachines.kubemacpool.io
label to allocate
:
oc label namespace <namespace> mutatevirtualmachines.kubemacpool.io=allocate
This is the second in a series of posts about my experience working with OpenShift and CNV. In this post, I’ll be taking a look at how to expose services on a virtual machine once you’ve git it up and running.
Networking seems to be a weak area for CNV right now. Out of the box, your options for exposing a service on a virtual machine on a public address at a well known port are slim.
We’re hoping to use OpenShift + CNV as an alternative to existing hypervisor platforms, primarily to reduce the number of complex, distributed projects we need to manage. If we can have a single control plane for both containerized and virtualized workloads, it seems like a win for everyone.
In order to support the most common use case for our virtualization platforms, consumers of this service need to be able to:
All of the above should be self service (that is, none of those steps should requiring opening a support ticket or otherwise require administrative assistance).
There are broadly two major connectivity models available to CNV managed virtual machines:
We’re going to start with the direct attachment model, since this may be familiar to people coming to CNV from other hypervisor platforms.
With a little configuration, it is possible to attach virtual machines directly to an existing layer two network.
When running CNV, you can affect the network configuration of your
OpenShift hosts by creating NodeNetworkConfigurationPolicy
objects. Support for this is provided by nmstate
, which is packaged
with CNV. For details, see “Updating node network configuration” in
the OpenShift documentation.
For example, if we want to create a bridge interface on our nodes to
permit CNV managed virtual machines to attach to the network
associated with interface eth1
, we might submit the following
configuration:
apiVersion: nmstate.io/v1alpha1
kind: NodeNetworkConfigurationPolicy
metadata:
name: br-example-policy
spec:
nodeSelector:
node-role.kubernetes.io/worker: ""
desiredState:
interfaces:
- name: br-example
type: linux-bridge
state: up
ipv4:
dhcp: true
enabled: true
bridge:
options:
stp:
enabled: false
port:
- name: eth1
This would create a Linux bridge device br-example
with interface
eth1
as a member. In order to expose this bridge to virtual
machines, we need to create a NetworkAttachmentDefinition
(which can
be abbreviated as net-attach-def
, but not as nad
for reasons that
may be obvious to English speakers or readers of Urban Dictionary).
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: example
namespace: default
spec:
config: >-
{
"name": "example",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "cnv-bridge",
"bridge": "br-example",
"ipam": {}
},
{
"type": "cnv-tuning"
}
]
}
Once you have the above definitions in place, it’s easy to select this network when adding interfaces to a virtual machine. Actually making use of these connections can be a little difficult.
In a situation that may remind of you of some issues we had with the
installer, your virtual machine will boot with a randomly
generated MAC address. Under CNV, generated MAC addresses are
associated with VirtualMachineInstance
resources, which represents
currently running virtual machines. Your VirtualMachine
object is
effectively a template used to generate a new VirtualMachineInstance
each time it boots. Because the address is associated with the
instance, you get a new MAC address every time you boot the virtual
machine. That makes it very difficult to associate a static IP address
with your CNV managed virtual machine.
It is possible to manually assign a MAC address to the virtual machine when you create, but now you have a bevy of new problems:
Anybody who wants to deploy a virtual machine needs to know what a MAC address looks like (you laugh, but this isn’t something people generally have to think about).
You probably need some way to track MAC address allocation to avoid
conflicts when everyone chooses DE:AD:BE:EF:CA:FE
.
Out of the box, your virtual machines can attach to the default pod
network, which is private network that provides masqueraded outbound
access and no direct inbound access. In this situation, your virtual
machine behaves much more like a container from a network perspective,
and you have access to many of the same network primitives available
to pods. You access these mechanisms by creating an OpenShift
Service
resource.
Under OpenShift, a Service
is used to “expose an application running
on a set of Pods
as a network service (from the Kubernetes
documentation”. From the perspective of OpenShift, your
virtual machine is just another application running in a Pod, so we
can use Service resources to expose applications running on your
virtual machine.
In order to manage these options, you’ll want to install the
virtctl
client. You can grab an upstream release from the
kubevirt project, or you can enable the appropriate
repositories and install the kubevirt-virtctl
package.
A NodePort
lets you expose a service on a random port associated
with the ip addresses of your OpenShift nodes. If you have a virtual
machine named test-vm-1
and you want to access the SSH service on
port 22, you can use the virtctl
command like this:
virtctl expose vm test-vm-1 --port=22 --name=myvm-ssh-np --type=NodePort
This will result in Service
that looks like:
$ oc get service myvm-ssh-np
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myvm-ssh-np NodePort 172.30.4.25 <none> 22:31424/TCP 42s
The CLUSTER-IP
in the above output is a cluster internal IP address
that can be used to connect to your server from other containers or
virtual machines. The 22:31424/TCP
entry tells us that port 31424
on our OpenShift hosts now maps to port 22
in our virtual machine.
You can connect to your virtual machine with an ssh
command line
along the lines of:
ssh -p 31424 someuser@hostname.of.a.node
You can use the hostname of any node in your OpenShift cluster.
This is fine for testing things out, but it doesn’t allow you to expose services on a well known port, and the cluster administrator may be uncomfortable with services like this using the ip addresses of cluster hosts.
It is possible to manually assign an external ip address to an OpenShift service. For example:
virtctl expose vm test-vm-1 --port 22 --name myvm-ssh-ext --external-ip 192.168.185.18
Which results in the follow service:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
myvm-ssh-ext ClusterIP 172.30.224.127 192.168.185.18 22/TCP 47s
While this sounds promising at first, there are several caveats:
The practical impact of setting an external ip on a service is to instantiate netfilter rules equivalent to the following:
-d 192.168.185.18/32 -p tcp --dport 22 -j DNAT --to-destination 10.129.2.11:22
If you configure the address 192.168.185.18
on a host interface (or
otherwise arrange for traffic to that address to reach your host),
these rules take care of directing the connection to your virtual
machine.
Historically, OpenShift was designed to run in cloud environments such
as OpenStack, AWS, Google Cloud Engine, and so forth. These platforms
provide integrated load balancer mechanisms that OpenShift was able to
leverage to expose services. Creating a LoadBalancer
service would
instruct the platform to (a) allocate an address, (b) create a load
balancer, and (c) direct traffic from the load balancer to the target
of your service.
We can request a LoadBalancer
using virtctl
like this:
virtctl expose vm test-vm-1 --port=22 --name=myvm-ssh-np --type=LoadBalancer
Unfortunately, OpenShift for baremetal hosts does not include a load
balancer out of the box. This is a shame, because the LoadBalancer
solution hits just about all of our requirements:
It automatically assigns ip addresses from a configured pool, so consumers of the environment don’t need to manage either ip- or MAC-address assignment on their own.
It doesn’t require special privileges or administrator intervention (other than for the initial configuration).
It lets you expose services on ports of your choice, rather than random ports.
There are some solutions out there that will provide an integrated load balancer implementation for your baremetal cluster. I’ve looked at:
I hope we see an integrated LoadBalancer mechanism available for OpenShift on baremetal in a near-future release.
This is the first in a series of posts about my experience working with OpenShift and CNV (“Container Native Virtualization”, a technology that allows you to use OpenShift to manage virtualized workloads in addition to the containerized workloads for which OpenShift is known). In this post, I’ll be taking a look at the installation experience, and in particular at how restrictions in our local environment interacted with the network requirements of the installer.
We’re installing OpenShift on baremetal hosts using the IPI installer. “IPI” stands for “Installer Provisioned Infrastructure”, which means that the OpenShift installer is responsible for provisioning an operating system onto your hardware and managing the system configuration. This is in contrast to UPI (“User Provisioned Infrastructure”), in which you pre-provision the hosts using whatever tools you’re comfortable with and then point the installer and the hardware once things are up and running.
In the environment I’m working with, we had a few restrictions that I suspect are relatively common:
The network we were using as our “baremetal” network (for the purposes of this article you can read that as “public” network) does not have a dynamic pool of leases. There is DHCP, but all addresses are statically assigned.
Both the installer and the Metal3 service use IPMI to manage the power of the OpenShift nodes. Access to our IPMI network requires that a static route exist on the host.
Access to the IPMI network also requires a firewall exception for the host IP address.
When you’re reading through the installer documentation, the above requirements don’t seem problematic at first. Looking at the network requirements, you’ll see that the install calls for static addressing of all the hardware involved in the install:
Reserving IP Addresses for Nodes with the DHCP Server
For the baremetal network, a network administrator must reserve a number of IP addresses, including:
Three virtual IP addresses.
1 IP address for the API endpoint
1 IP address for the wildcard ingress endpoint
1 IP address for the name server
One IP Address for the Provisioning node.
One IP address for each Control Plane (Master) node.
One IP address for each worker node.
The “provisioning node” is the host on which you run the OpenShift installer. What the documentation fails to mention is that the services that manage the install don’t actually run on the provisioning node itself: instead, the installer starts up a “bootstrap virtual machine” on the provisioning node, and manages the install from there.
The bootstrap vm is directly attached to both the baremetal and the provisioning networks. It is created with a random MAC address, and relies on DHCP for configuring the baremetal interface. This means that:
It’s not possible to create a static DHCP lease for it, since you don’t know the MAC address ahead of time.
Since you can’t create a static DHCP lease, you can’t give it a static IP address.
Since you can’t give it a static IP address, you can’t create a firewall exception for access to the IPMI network.
And lastly, since you can’t create a static DHCP lease, you can’t conveniently use DHCP to assign the static route to the IPMI network.
This design decision – the use of a bootstrap vm with a random MAC address and no facility for assigning a static ip address – is what complicated our lives when we first set out to install OpenShift.
I’d like to emphasize that other than the issues discussed in the remainder of this article, the install process has been relatively smooth. We’re able to go from zero to a completely installed OpenShift cluster in just a few hours. There were some documentation issues early on, but I think most of those have already been resolved.
OpenShift uses Ignition for performing host configuration tasks.
If you’re familiar with cloud-init, Ignition is doing something
very similar. One of the first things we tried was passing in a static
network configuration using Ignition. By running
openshift-baremetal-install create ignition-configs
, it’s possible
to modify the ignition configuration passed into the bootstrap vm.
Unfortunately, it turns out that prior to loading the ignition
configuration, the bootstrap vm image will attempt to configure all
system interfaces using DHCP…and if it fails to acquire any
addresses, it just gives up.
In that case, it never gets as far as attempting to apply the ignition configuration, so this option didn’t work out.
It is possible to pass a static ip configuration into the bootstrap vm by modifying the kernel command line parameters. There are several steps involved in creating a custom image:
virt-edit
to modify the grub configurationThis also requires configuring your install-config.yaml
to use the
new image, and finding an appropriate place to host it.
This mechanism does work, but there are a lot of moving parts and in particular it seems like modifying the grub configuration could be a little tricky if the command line in the original image were to change in unexpected ways.
We ended up taking advantage of the fact that while we didn’t know the MAC address ahead of time, we did know the MAC address prefix ahead of time, so we created a small dynamic range (6 addresses) limited to that MAC prefix (which would match pretty much anything started by libvirt, but the only libvirt managed virtual machines attached to this network were OpenShift bootstrap vms). We were able to (a) attach the static route declaration to this small dynamic range, and (b) grant firewall exceptions for these specific addresses. The relevant lines in our dnsmasq configuration look something like:
dhcp-host=52:54:00:*:*:*,set:libvirt,set:ocp
dhcp-range=tag:libvirt,10.1.2.130,10.1.2.135,255.255.255.0
dhcp-option=tag:ocp,option:classless-static-route,10.0.0.0/19,10.1.2.101
It’s not perfect, but it’s working fine.
The baremetal installer should allow the deployer to pass in a
static address configuration for the bootstrap vm using the
install-config.yaml
file. The bootstrap vm should continue to boot
even if it can’t initially configure an interface using DHCP (one
should be able to disable that initial DHCP attempt).