Exposing the Kubernetes Dashboard with an Ingress

With MicroK8s it’s easy to enable the Kubernetes Dashboard by running

microk8s enable dashboard

If you’re running MicroK8s on a local PC or VM, you can access the dashboard with kube-proxy as described in the docs, but if you want to expose it properly then the best way to do this is with an Ingress resource.

Firstly, make sure you’ve got the Ingress addon enabled in your MicroK8s.

microk8s enable ingress

HTTP

The simplest case is to set up a plain HTTP Ingress on port 80 which presents the Dashboard. However this is not recommended as it is insecure.

---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
  name: dashboard
  namespace: kube-system
spec:
  rules:
  - host: <your-external-address>
    http:
      paths:
      - backend:
          serviceName: kubernetes-dashboard
          servicePort: 443
        path: /

HTTPS

For proper security we should serve the Dashboard via HTTPS on port 443. However there are some prerequisites:

  • You need to set up Cert Manager
  • You need to set up Let’s Encrypt as an Issuer so you can provision TLS certificates (included below)
  • You need to use a fully-qualified domain name that matches the common name of your certificate, and it must be in DNS
---
apiVersion: cert-manager.io/v1alpha2
kind: Issuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: youremail@example.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod
       # Enable the HTTP-01 challenge provider
    solvers:
    - http01:
        ingress:
           class: nginx
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/issuer: letsencrypt-prod
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
  name: dashboard
  namespace: kube-system
spec:
  rules:
  - host: dashboard.example.com
    http:
      paths:
      - backend:
          serviceName: kubernetes-dashboard
          servicePort: 443
        path: /
  tls:
  - hosts:
    - dashboard.example.com
    secretName: dashboard-ingress-cert

After applying this manifest, wait for the certificate to be ready:

$ kubectl get certs -n kube-system
NAME                     READY   SECRET                   AGE
dashboard-ingress-cert   True    dashboard-ingress-cert   169m

Building a hyperconverged Kubernetes cluster with MicroK8s and Ceph

This guide explains how to build a highly-available, hyperconverged Kubernetes cluster using MicroK8s, Ceph and MetalLB on commodity hardware or virtual machines. This could be useful for small production deployments, dev/test clusters, or a nerdy toy.

Other guides are available – this one is written from a sysadmin point of view, focusing on stability and ease of maintenance. I prefer to avoid running random scripts or fetching binaries that are then unmanaged and unmanageable. This guide uses package managers and operators wherever possible. I’ve also attempted to explain each step so readers can gain some understanding instead of just copying and pasting the commands. However, this does not absolve you from having a decent background of the components, and it is strongly recommended that you are familiar with kubectl/Kubernetes and Ceph in particular.

The technological landscape moves so fast so these instructions may become outdated quickly. I’ll link to upstream documentation wherever possible so you can check for updated versions.

Finally, this is a fairly simplistic guide that gives you the minimum possible configuration. There are many other components and configurations that you can add, and it also takes no account of security with RBAC etc.

Hardware

There are a few of considerations when choosing your hardware or virtual “hardware” for use as Kubernetes nodes.

  • MicroK8s requires at least 3 nodes to work in HA mode, so we’ll start with 3 VMs
  • While MicroK8s is quite lightweight, by the time you start adding the storage capability you will need a reasonable amount of memory. Recommended minimum spec for this guide is 2 CPUs and 4GB RAM. More is obviously better, depending on your workload.
  • Each VM will need two block devices (disks). One should be partitioned, formatted and used as a normal OS disk, and the other should be left untouched so it can be claimed by Ceph later. The OS disk will also contain cached container images so could get quite large. I’ve allowed 16GB for the OS disk, and Ceph requires a minimum of 10GB for its disk.
  • If running in VirtualBox, place all VMs either in the same NAT network, or bridged to the host network. Ideally have static IPs.
  • If you are running on bare metal, make sure the machines are on the same network, or at least on networks that can talk to each other.

In my case, I used VirtualBoxc and created 3 identical VMs, kube01, kube02 and kube03.

Operating system

This guide focuses on CentOS/Fedora but should be applicable to many distributions with minor tweaks. I have started with a CentOS 8 minimal installation. Fedora Server or Ubuntu Server would also work just as well but you’ll need to tweak some of the commands.

  • Don’t create a swap partition on these machines
  • Make sure ntp is enabled for accurate time
  • Make sure the VMs have static IPs or DHCP reservations, so their IPs won’t change

Snap

Reference: https://snapcraft.io/docs/installing-snap-on-centos

Snap is a package manager that contains MicroK8s. It comes preinstalled on Ubuntu, but if you’re on CentOS, Fedora or others, you’ll need to install it on all your nodes.

sudo dnf -y install epel-release
sudo dnf -y install snapd
sudo systemctl enable --now snapd
sudo ln -s /var/lib/snapd/snap /snap

MicroK8s

Reference: https://microk8s.io/

MicroK8s is a lightweight, pre-packaged Kubernetes distribution which is easy to use and works well for small deployments. It’s a lot more straightforward than following Kubernetes the hard way.

Install

Install MicroK8s 1.19.1 or greater from Snap on all your nodes:

sudo snap install microk8s --classic --channel=latest/edge
microk8s status --wait-ready
echo 'alias kubectl="microk8s kubectl"' >> ~/.bashrc

The first time you run microk8s status, you will be prompted to add your user to the microk8s group. Follow the instructions and log in again.

Enable HA mode

Reference: https://microk8s.io/docs/high-availability

Enable MicroK8s HA mode on all nodes, which allows any of the worker nodes to also behave as a master, instead of just being a worker node. This must be enabled before nodes are joined to the master. On some versions of MicroK8s this is enabled by default. https://microk8s.io/docs/high-availability

microk8s enable ha-cluster

Add firewall rules

Reference: https://microk8s.io/docs/ports

Create firewall rules for your nodes, so they can communicate with each other.

Enable clustering

Reference: https://microk8s.io/docs/clustering

Enable Microk8s clustering, which allows you to add multiple worker nodes to your existing master node

Run this on the first node only:

[jonathan@kube01 ~]$ microk8s add-node
From the node you wish to join to this cluster, run the following:
microk8s join 192.168.0.41:25000/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Then execute the join command on the second node, to join it to the master.

[jonathan@kube02 ~]$ microk8s join 192.168.0.41:25000/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Contacting cluster at 192.168.0.41
Waiting for this node to finish joining the cluster. ..

Repeat for the third node and remember to run the add-node command for each node you add, so they all get a unique token.

Verify that they are correctly joined:

[jonathan@kube01 ~]$ kubectl get nodes
NAME                         STATUS   ROLES    AGE   VERSION
kube01.jonathangazeley.com   Ready    <none>   35h   v1.19.1-34+08a87c75adb55c
kube03.jonathangazeley.com   Ready    <none>   35h   v1.19.1-34+08a87c75adb55c
kube02.jonathangazeley.com   Ready    <none>   35h   v1.19.1-34+08a87c75adb55c

Finally make sure that full HA mode is enabled:

[jonathan@kube01 ~]$ microk8s status
microk8s is running
high-availability: yes
  datastore master nodes: 192.168.0.41:19001 192.168.0.42:19001 192.168.0.43:19001
  datastore standby nodes: none
addons:
  enabled:
...

Addons

Reference: https://microk8s.io/docs/addon-dns

Reference: https://kubernetes.io/docs/reference/access-authn-authz/rbac/

Enable some basic addons across the cluster to provide a usable experience. Run this on any one node.

microk8s enable dns rbac

Check

We’ve already checked that all 3 nodes are up. Now let’s make sure pods are being scheduled on all nodes:

[jonathan@kube01 ~]$ kubectl get pods --all-namespaces -o wide
NAMESPACE     NAME                                                READY   STATUS              RESTARTS   AGE    IP             NODE                      
kube-system   calico-node-bqqqd                                   1/1     Running             0          112m   192.168.0.41   kube01.jonathangazeley.com
kube-system   calico-node-z4sxd                                   1/1     Running             0          110m   192.168.0.43   kube03.jonathangazeley.com
kube-system   calico-kube-controllers-847c8c99d-4qblz             1/1     Running             0          115m   10.1.58.1      kube01.jonathangazeley.com
kube-system   coredns-86f78bb79c-t2sgt                            1/1     Running             0          109m   10.1.111.65    kube02.jonathangazeley.com
kube-system   calico-node-t5skc                                   1/1     Running             0          111m   192.168.0.42   kube02.jonathangazeley.com

With the cluster in a health and operational state, let’s add the hyperconverged storage. From now on, all steps can be run on kube01.

Ceph

Ceph is a clustered storage engine which can present its storage to Kubernetes as block storage or a filesystem. We will use the Rook operator to manage our Ceph deployment.

Install

Reference: https://rook.io/docs/rook/v1.4/ceph-quickstart.html

These steps are taken verbatim from the official Rook docs. Check the link above to make sure you are using the latest version of Rook.

First we install the Rook operator, which automates the rest of the Ceph installation.

git clone --single-branch --branch release-1.4 https://github.com/rook/rook.git
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f common.yaml
kubectl create -f operator.yaml
kubectl -n rook-ceph get pod

Wait until the rook-ceph-operator pod and the rook-discover pods are all Running. This took a few minutes for me. Then we can create the actual Ceph cluster.

kubectl create -f cluster.yaml
kubectl -n rook-ceph get pod

This command will probably take a while – be patient. The operator creates various pods including canaries, monitors, a manager, and provisioners. There will be periods where it looks like it isn’t doing anything, but don’t be tempted to intervene. You can check what the operator is doing by reading its log:

kubectl -n rook-ceph logs rook-ceph-operator-775d4b6c5f-52r87

Check

Reference: https://rook.io/docs/rook/v1.4/ceph-toolbox.html

Install the Ceph toolbox and connect to it so we can run some checks.

kubectl create -f toolbox.yaml
kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

OSDs are the individual pieces of storage. Make sure all 3 are available and check the overall health of the cluster.

[root@rook-ceph-tools-6967fc698d-5f4sh /]# ceph status
  cluster:
    id:     e37a9364-b2e4-42ba-a7c0-c7276bc2083d
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,d (age 2m)
    mgr: a(active, since 33s)
    osd: 3 osds: 3 up (since 89s), 3 in (since 89s)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 45 GiB / 48 GiB avail
    pgs:     1 active+clean
[root@rook-ceph-tools-6967fc698d-5f4sh /]# ceph osd status
ID  HOST                         USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE      
 0  kube03.jonathangazeley.com  1027M  14.9G      0        0       0        0   exists,up  
 1  kube02.jonathangazeley.com  1027M  14.9G      0        0       0        0   exists,up  
 2  kube01.jonathangazeley.com  1027M  14.9G      0        0       0        0   exists,up  

Block storage

Reference: https://rook.io/docs/rook/v1.4/ceph-block.html

Ceph can provide persistent block storage to Kubernetes as a storage class which can be consumed by one pod at any one time.

kubectl create -f csi/rbd/storageclass.yaml

Verify that the block storageclass is available:

[jonathan@kube01 ~]$ kubectl get storageclass
NAME                PROVISIONER                  RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block     rook-ceph.rbd.csi.ceph.com   Delete          Immediate           true                   3m53s

Filesystem

Reference: https://rook.io/docs/rook/v1.4/ceph-filesystem.html

Ceph can provide persistent storage which can be consumed across multiple pods simultaneously by providing a filesystem layer.

kubectl create -f filesystem.yaml

Use the toolbox again to verify that there is a metadata service (mds) available:

[root@rook-ceph-tools-6967fc698d-5f4sh /]# ceph status
  cluster:
    id:     e37a9364-b2e4-42ba-a7c0-c7276bc2083d
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,d (age 36m)
    mgr: a(active, since 34m)
    mds: myfs:1 {0=myfs-b=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 35m), 3 in (since 35m)
 
  task status:
    scrub status:
        mds.myfs-a: idle
        mds.myfs-b: idle
 
  data:
    pools:   4 pools, 97 pgs
    objects: 22 objects, 2.2 KiB
    usage:   3.0 GiB used, 45 GiB / 48 GiB avail
    pgs:     97 active+clean
 
  io:
    client:   852 B/s rd, 1 op/s rd, 0 op/s wr

Now we can create a new storageclass based on the filesystem:

kubectl create -f csi/cephfs/storageclass.yaml

Verify the storageclass is present:

[jonathan@kube01 ceph]$ kubectl get storageclass
NAME                        PROVISIONER                     RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
rook-ceph-block (default)   rook-ceph.rbd.csi.ceph.com      Delete          Immediate           true                   49m
rook-cephfs                 rook-ceph.cephfs.csi.ceph.com   Delete          Immediate           true                   34m

Consume

It’s easy to consume the new Ceph storage. Use the storageClassName rook-ceph-block in ReadWriteOnce mode for persistent storage for a single pod, or rook-cephfs in ReadWriteMany mode for persistent storage that can be shared between pods.

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ceph-rbd-pvc
  labels:
spec:
  storageClassName: rook-ceph-block
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: cephfs-pvc
spec:
  storageClassName: rook-cephfs
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

Ingress

Reference: https://microk8s.io/docs/addon-ingress

Probably the simplest way to expose web applications on your cluster is to use an Ingress. This binds to ports 80+443 on all your nodes and listens for http+https requests. It will effectively do name-based virtual hosting, terminate your SSL and will direct your web traffic to a Kubernetes Service with an internal ClusterIP which acts as a simple load balancer. This will require you to set up external round robin DNS to point your A record at all 3 of the node IPs.

microk8s enable ingress
sudo firewall-cmd --permanent --add-service http
sudo firewall-cmd --permanent --add-service https
sudo firewall-cmd --reload

MetalLB

Reference: https://microk8s.io/docs/addon-metallb

If you want to set up more advanced load balancing, consider using MetalLB. It will load balance your Kubernetes Service and present it on a single virtual IP.

Install

MetalLB will prompt you for one or more ranges of IPs that it can use for load-balancing. It should be fine to accept the default suggestion.

[jonathan@kube01 ~]$ microk8s enable metallb
Enabling MetalLB
Enter each IP address range delimited by comma (e.g. '10.64.140.43-10.64.140.49,192.168.0.105-192.168.0.111'): 10.64.140.43-10.64.140.49,192.168.0.105-192.168.0.111

Consume

Once MetalLB is installed and configured, to expose a service externally, simply create it with spec.type set to LoadBalancer, and MetalLB will do the rest.

It’s important to note that in the default config, the vIP will only appear on one of your nodes and that node will act as the entry point for all traffic before it gets load balanced between nodes, so this could be a bottleneck in busy environments.

---
apiVersion: v1
kind: Service
metadata:
  name: nginx
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: nginx
  type: LoadBalancer

Summary

You now have a fully-featured Kubernetes cluster with high availability, clustered storage, ingress, and load balancing. The possibilities are endless!

If you spot any mistakes, improvements or versions that need updating, please drop a comment below.

Canon New FD 35-70mm lenses

In the late 1970s and early 1980s, Canon released these two similar lenses as part of their New FD series – both 35-70mm zoom lenses. But what’s the difference between these two lenses, and which is better?

First, let’s cover the similarities. These are both compact zoom lenses with the same focal lengths from the New FD lineup, with bayonet mount instead of the original FD lenses with their silver breech ring. They are both double-touch zoom lenses, with separate rings for zoom and focus.

The key difference between them is their aperture – at first glance the smaller lens has the slightly faster aperture of f/3.5-4.5 while the larger lens has an aperture of f/4.

However, the f/4 version can manage f/4 at all focal lengths while the f/3.5-4.5 version can only manage f/3.5 at 35mm, falling to f/4.5 at 70mm.

There are obvious physical differences, too. The f/4 version is longer and heavier as it seems to have more metal components. The f/3.5-4.5 version feels plasticky in comparison.

Specifications

Let’s have a look at the specs on paper and see what they reveal.

New FD 35-70mm f/4New FD 35-70mm f/3.5-4.5
June 1979MarketedMarch 1983
45,000 yenOriginal Price31,900 yen
8/8Elements/Groups9/8
6Diaphragm blades8
22Minimum aperture22
0.5mClosest Focusing Distance0.5m
0.15×Maximum Magnification0.15×
52mmFilter Diameter52mm
63 × 84.5mmMax Diameter × Length63 × 60.9mm
315gWeight200g

Looking at these specifications is a mixed bag. The optical formula is very similar but has been altered, with the addition of an extra element, which is presumably to improve image quality.

As already mentioned, the new lens weighs less and uses more plastic, which is consistent with camera design in the 1980s. We can see that it was priced lower at launch. But it also has a greater number of diaphragm blades – something which is usually associated with more expensive lenses.

A little bit of context

Let’s take a moment to consider the context that these lenses were marketed in. The f/4 model was marketed in 1979, just one year after the New FD system was launched in 1978. There was a flood of new lenses and improved versions of existing ones, but this 35-70mm f/4 was a new design. Cameras released around this time include the A-1 and AV-1, which both included 50mm prime lenses as their “kit” lens (50mm f/1.4 and 50mm f/2.0 respectively). So this zoom lens was a premium item marketed as an upgrade.

Meanwhile, the f/3.5-4.5 lens was released in 1983 with the T50. The T series were the first Canon cameras to openly embrace the use of plastics, and were much lighter. This lens appeared as the kit lens on the T50 and the T70 the following year.

My verdict

I haven’t done any thorough side-by-side testing of these lenses but I think they are both pretty decent performers.

Going by the rest of the data, the earlier f/4 is the superior lens as it occupied a higher position in the lineup, and the f/3.5-4.5 was built to a budget as a kit lens. It’s a more solid lens and probably has slightly better image quality.

However, I would still pick the f/3.5-4.5 over the f/4 if I needed a small and light lens to go with a small and light camera.

CameraHub

If you like nerding out over camera and lens data, you should check out CameraHub. It’s a public database of camera and lens data that anyone can edit and add to. It’s browseable and searchable but to get you started, here are a few links to cameras and lenses mentioned in this article:

Rethinking database architecture

Originally published 2015-09-02 on the UoB Unix blog

The eduroam wireless network has a reliance on a database for the authorization and accounting parts of AAA (authentication, authorization and accounting – are you who you say you are, what access are you allowed, and what did you do while connected).

When we started dabbling with database-backed AAA in 2007 or so, we used a centrally-provided Oracle database. The volume of AAA traffic was low and high performance was not necessary. However (spoiler alert) demand for wireless connectivity grew and before many months, we were placing more demand on Oracle than it could handle. The latency of our queries was taking sufficiently long that some wireless authentication requests would time out and fail.

First gen – MySQL (2007)

It was clear that we needed a dedicated database platform, and at the time that we asked, the DBAs were not able to provide a suitable platform. We went down the route of implementing our own. We decided to use MySQL as a low-complexity open source database server with a large community. The first iteration of the eduroam database hardware was a single second-hand server that was going spare. It had no resilience but was suitably snappy for our needs.

First gen database

Second gen – MySQL MMM (2011)

Demand continued to grow but more crucially eduroam went from being a beta service that was “not to be relied upon” to being a core service that users routinely used for their teaching, learning and research. Clearly a cobbled-together solution was no longer fit for purpose, so we went about designing a new database platform.

The two key requirements were high query capacity, and high availability, i.e. resilience against the failure of an individual node. At the time, none of the open source database servers had proper clustering – only master-slave replication. We installed a clustering wrapper for MySQL, called MMM (MySQL Multi Master). This gave us a resilient two-node cluster whether either node could be queried for reads and one node was designated the “writer” at any one time. In the event of a node failure, the writer role would be automatically moved around by the supervisor.

Second gen database

Not only did this buy us resilience against hardware faults, for the first time it also allowed us to drop either node out of the cluster for patching and maintenance during the working day without affecting service for users.

The two-node MMM system served us well for several years, until the hardware came to its natural end of life. The size of the dataset had grown and exceeded the size of the servers’ memory (the 8GB that seemed generous in 2011 didn’t really go so far in 2015) meaning that some queries were quite slow. By this time, MMM had been discontinued so we set out to investigate other forms of clustering.

Third gen – MariaDB Galera (2015)

MySQL had been forked into MariaDB which was becoming the default open source database, replacing MySQL while retaining full compatibility. MariaDB came with an integrated clustering driver called Galera which was getting lots of attention online. Even the developer of MMM recommended using MariaDB Galera.

MariaDB Galera has no concept of “master” or “slave” – all the nodes are masters and are considered equal. Read and write queries can be sent to any of the nodes at will. For this reason, it is strongly recommended to have an odd number of nodes, so if a cluster has a conflict or goes split-brain, the nodes will vote on who is the “odd one out”. This node will then be forced to resync.

This approach lends itself naturally to load-balancing. After talking to Netcomms about the options, we placed all three MariaDB Galera nodes behind the F5 load balancer. This allows us to use one single IP address for the whole cluster, and the F5 will direct queries to the most appropriate backend node. We configured a probe so the F5 is aware of the state of the nodes, and will not direct queries to a node that is too busy, out of sync, or offline.

Having three nodes that can be simultaneously queried gives us an unprecedented capacity which allows us to easily meet the demands of eduroam AAA today, with plenty of spare capacity for tomorrow. We are receiving more queries per second than ever before (240 per second, and we are currently in the summer vacation!).

We are required to keep eduroam accounting data for between 3 and 6 months – this means a large dataset. While disk is cheap these days and you can store an awful lot of data, you also need a lot of memory to hold the dataset twice over, for UPDATE operations which require duplicating a table in memory, making changes, merging the two copies back and syncing to disk. The new MariaDB Galera nodes have 192GB memory each while the size of the dataset is about 30GB. That should keep us going… for now.

Service availability monitoring with Nagios and BPI

Originally published  2016-11-21 on the UoB Unix blog

Several times, senior management have asked Team Wireless to provide an uptime figure for eduroam. While we do have an awful lot of monitoring of systems and services, it has never been possible to give a single uptime figure because it needs some detailed knowledge to make sense of the many Nagios checks (currently 2704 of them).

From the point of view of a Bristol user on campus here, there are three services that must be up for eduroam to work: RADIUS authentication, DNS, and DHCP. For the purposes of resilience, the RADIUS service for eduroam is provided by 3 servers, DNS by 2 servers and DHCP by 2 servers. It’s hard to see the overall state of the eduroam service from a glance at which systems and services are currently up in Nagios.

Nagios gives us detailed performance monitoring and graphing for each system and service but has no built-in aggregation tools. I decided to use an addon called Business Process Intelligence (BPI) to do the aggregation. We built this as an RPM for easy deployment, and configured it with Puppet.

BPI lets you define meta-services which consist of other services that are currently in Nagios. I defined a BPI service called RADIUS which contains all three RADIUS servers. Any one RADIUS server must be up for the RADIUS group to be up. I did likewise for DNS and DHCP.

BPI also lets meta-services depend on other groups. To consider eduroam to be up, you need the RADIUS group and the DNS group and the DHCP group to be up. It’s probably easier to see what’s going on with a screenshot of the BPI control panel:

BPI control panel

So far, these BPI meta-services are only visible in the BPI control panel and not in the Nagios interface itself. The BPI project does, however, provide a Nagios plugin check_bpi which allows Nagios to monitor the state of BPI meta-services. As part of that, it will draw you a table of availability data.

eduroam uptime

So now we have a definitive uptime figure to the overall eduroam service. How many nines? An infinite number of them! 😉 (Also, I like the fact that “OK” is split into scheduled and unscheduled uptime…)

This availability report is still only visible to Nagios users though. It’s a few clicks deep in the web interface and provides a lot more information than is actually needed. We need a simpler way of obtaining this information.

So I wrote a script called nagios-report which runs on the same host as Nagios and generates custom availability reports with various options for output formatting. As an example:

$ sudo /usr/bin/nagios-report -h bpi -s eduroam -o uptime -v -d
Total uptime percentage for service eduroam during period lastmonth was 100.000%

This can now be run as a cron job to automagically email availability reports to people. The one we were asked to provide is monthly, so this is our crontab entry to generate it on the first day of each month:

# Puppet Name: eduroam-availability
45 6 1 * * nagios-report -h bpi -s eduroam -t lastmonth -o uptime -v -d

It’s great that our work on resilience has paid off. Just last week (during the time covered by the eduroam uptime table) we experienced a temporary loss of about a third of our VMs, and yet users did not see a single second of downtime. That’s what we’re aiming for.

Unlocking features in your mk5 Mondeo with FORScan

The Ford Mondeo mk5 (from 2015 onwards) has a number of useful features that are disabled in the factory but can be unlocked using free software and a USB cable, without any special knowledge. Here’s how.

You will need a compatible ELM327 cable. There are loads on eBay but it can’t be just any – it has to be one with a manual switch between HS and MS mode. The ones that lack this switch may not be fully compatible. Mine specifically said Designed for Forscan and cost about £15.

The switch wasn’t labelled so initially I had to guess which way was which. On mine, the HS position was towards the label side, so I labelled it with a sticker.

Now you need to download FORScan. There are versions for phones/tablets but to change settings you must download the Windows version. You’ll need the Extended License to change settings but fortunately there is a 2-week free trial. You can install FORScan but don’t activate the trial until you’re ready to use it!

First you need to start the engine and disable the auto stop-start.

Then you need to connect the cable up. The OBD port is under the steering column and has a cap on it.

Load FORScan, click the Connect button at the bottom of the window, and follow the instructions.

If it connects successfully, it will scroll through a list of modules that it has detected. Wait until it finishes, and then click the Configuration & Programming button in the left menu – it’s the one with the chip icon.

In the list of modules, scroll down and select IPC Module configuration. This is the module that controls the instrument cluster. Make sure you choose the one without AS BUILT format. Then click the Play button at the bottom. Flip the HS/MS switch when it tells you to.

Now you’ve got a long list of settings that can be changed. Most of them will require compatible hardware to be installed on the car so don’t be tempted to fiddle unless you know you have that hardware, and be sure you understand every setting that you change. Be sure to make note of anything that you change, so you can put it back if necessary. These are the settings that I changed:

  • Auto Lock
  • Auto Relock
  • Autolocking While The Vehicle Is Moving
  • Digital Speedometer Configuration
  • Fuel Economy Menu
  • Fuel History Menu
  • TPMS Menu (Additional change required in BCM)
  • Tire Pressure Gauge

In every case, I double-clicked the setting, changed Disabled to Enabled and clicked the tick. Bear in mind this doesn’t actually change it on the car – it just prepares a batch of settings to apply in FORScan.

When you’ve changed everything that you want, click Write, review the changes, and FORScan will change the settings on the car. You can apply them one by one if you prefer. The instrument cluster will go dark for a few seconds before reloading. FORScan will tell you to turn the ignition off and on again. On my car, every time it reloaded, the temperature reverted to Fahrenheit so I had to set it back to Celcius.

To complete the tyre pressure settings, click the Stop button to leave the IPC module settings. Now find BCMii Module configuration in the list and click Play on that. Look for TPMS (additional change required in IPC) in the list and set it to Enabled. Click Write. Turn the engine off and on again.

Some of the new features are a bit hidden. You have to enable the digital speedo by clicking the button at the end of the left stalk (which usually controls the lane keeping assist). The fuel history, tyre pressure and lock settings are in the left menu system. Changing the lock settings with FORScan doesn’t actually enable the lock settings, it just adds new items to the in-car menu so you can enable them yourself.

Merging SELinux policies

Originally published 2016-08-01 on the UoB Unix blog

We make extensive use of SELinux on all our systems. We manage SELinux config and policy with the jfryman/selinux Puppet module, which means we store SELinux policies in plain text .te format – the same format that audit2allow generates them in.

One of our SELinux policies that covers permissions for NRPE is a large file. When we generate new rules (e.g. for new Nagios plugins) with audit2allow it’s a tedious process to merge the new rules in by hand and mistakes are easy to make.

So I wrote semerge – a tool to merge SELinux policy files with the ability to mix and match stdin/stdout and reading/writing files.

This example accepts input from audit2allow and merges the new rules into an existing policy:

cat /var/log/audit/audit.log | audit2allow | semerge -i existingpolicy.pp -o existingpolicy.pp

And this example deduplicates and alphabetises an existing policy:

semerge -i existingpolicy.pp -o existingpolicy.pp

There are probably bugs so please do let me know if you find it useful and log an issue if you run into problems.

Fronting legacy services with Kubernetes

There are many benefits to Kubernetes but what’s not discussed so often is how to migrate your services from their legacy hosting to their new home in Kubernetes. Specifically, I’m looking at the case where you have a single server or a single public IP address and you want to run your services on that server with a mixture of legacy hosting and Kubernetes – either permanently or as part of a migration process.

Let’s suppose you are running an application like ownCloud in a standard way, with Apache httpd bound to ports 80 and 443, with port 80 redirecting to port 443 to force HTTPS/SSL. This is how the simplified config might look:

# /etc/httpd/conf.d/owncloud.conf

<VirtualHost *:80>
  ServerName owncloud.example.com

  DocumentRoot "/var/www/html/owncloud"

  # Redirect non-SSL traffic to SSL site
  RewriteEngine On
  RewriteCond %{HTTPS} off
  RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI}

</VirtualHost>

<VirtualHost *:443>
  ServerName owncloud.example.com

  DocumentRoot "/var/www/html/owncloud"

  ## SSL directives
  SSLEngine on
  SSLCertificateFile      /etc/letsencrypt/live/cert.pem
  SSLCertificateKeyFile   /etc/letsencrypt/live/privkey.pem
  SSLCertificateChainFile /etc/letsencrypt/live/chain.pem
  SSLCACertificatePath    /etc/pki/tls/certs

</VirtualHost>

Now suppose you want to add some new services in a one-node Kubernetes solution like MicroK8s. When you add your Ingress resource to start serving your applications, it will complain because it wants to bind to ports 80 and 443, but they are already reserved by your legacy Apache installation.

The neatest solution is to run your legacy application on a high port, without SSL, thus freeing up 80 and 443. Then set up your Kubernetes Ingress and let it bind to 80 and 443, terminate SSL for your legacy application, and proxy onwards to your application without SSL. You’ll be able to add other Kubernetes Service resources on the same Ingress on the same ports with ease – like Apache’s name-based virtual hosting.

Let’s have a look at the revised Apache config for ownCloud. Notice the Listen directive to bind to an arbitrary high port, and the lack of any SSL directives:

# /etc/httpd/conf.d/owncloud.conf

Listen 5678
<VirtualHost *:5678>
  ServerName owncloud.example.com

  DocumentRoot "/var/www/html/owncloud"

</VirtualHost>

Now we must consider how the Kubernetes infrastructure will look. The typical pattern is to use a Service resource to identify where the application is running, and an Ingress resource to expose the Service to the outside world.

Service resources are usually designed to point an applications running inside a Kubernetes cluster, but by setting the type to ExternalName, we can tell Kubernetes that our legacy service is running on localhost. You could consider an ExternalName type Service to be analogous to a DNS CNAME record.

Here’s how we configure it. Note that we don’t yet specify the port:

kind: Service
apiVersion: v1
metadata:
  name: owncloud
spec:
  type: ExternalName
  externalName: localhost

Now that Kubernetes knows it should look on localhost for your legacy ownCloud application, we need to configure the way it will be presented to the outside world. To begin with, we will set up a dumb proxy without SSL. All the relevant bits are in the spec section, which specifies the domain that the app should be served on, and then specifies the Service resource we created earlier, along with the port number.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: owncloud
  annotations:
    kubernetes.io/ingress.class: "nginx"
spec:
  rules:
  - host: owncloud.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: owncloud
          servicePort: 5678

For bonus points, we can use cert-manager and Let’s Encrypt to add SSL, and fully automate the process of issuing SSL certificates. You will need to configure cert-manager in advance – this is beyond the scope of this blog post but there are good docs online. This revised Ingress config is the same as the one above, but with a few extra lines:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: owncloud
  annotations:
    kubernetes.io/ingress.class: "nginx"
    cert-manager.io/issuer: "letsencrypt-prod"
spec:
  tls:
  - hosts:
    - owncloud.example.com
    secretName: owncloud-tls
  rules:
  - host: owncloud.example.com
    http:
      paths:
      - path: /
        backend:
          serviceName: owncloud
          servicePort: 5678

And that’s it! You can verify the config with the kubectl command:

[jonathan@zeus ~]$ kubectl get service
NAME                   TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
default-http-backend   ClusterIP      10.152.183.41   <none>        80/TCP    87d
kubernetes             ClusterIP      10.152.183.1    <none>        443/TCP   98d
owncloud               ExternalName   <none>          localhost     <none>    87d

[jonathan@zeus ~]$ kubectl get ingress
NAME       HOSTS                  ADDRESS     PORTS     AGE
owncloud   owncloud.example.com   127.0.0.1   80, 443   87d

Now your legacy ownCloud service is available at owncloud.example.com but fronted by Kubernetes, leaving you free to install as many other services in Kubernetes as you like without having to worry about port clashes.

M5 traffic incidents

The M5 motorway is notorious for accidents in the summer holidays as it is the major route for the rest of the UK to access the popular holiday regions of Devon and Cornwall, so traffic volume increases considerably when the schools close. On top of this, holidaymakers often tow caravans or trailers, or use roof racks with tents, canoes and other leisure equipment. This too can increase the risk of accidents.

Delays are common, especially around the two August bank holidays, but 2019 seems to be the worst year than anyone can remember – it seems like there is a crash on the M5 every other day. I’ve done a little light research to put some numbers to the congestion.

I gathered my data in a 100% scientific way, by searching on the Bristol Post for references to “M5 traffic” in the month of August 2019. This limits results to incidents that affected Bristol, or journeys to/from Bristol. It probably includes M5 incidents in Somerset, Bristol and Gloucestershire, and probably excludes incidents at the southern end in Devon and at the northern end in the Midlands.

In total, I found 23 incidents on the M5, of which there were 17 collisions. The remaining six incidents were three counts of congestion due to sheer volume of traffic, one severe weather incident, one breakdown, and one fire.

Looking just at the collisions, as we are not even out of August yet, that’s 17 collisions over just 27 days. The hunch that there had been an accident every couple of days turns out to be true – that’s an average one collision every 1.59 days.

Quite a few of the collisions occurred with more than one per day – in total there were 12 days with collisions. I don’t have any data on whether those accidents were related or just coincidental.

I haven’t done any research into the causes of these collisions, but it’s safe to say that human error played a role in these incidents. Even if human error did not cause the initial accident, there were several multi-car pileups where the most likely secondary cause is driving too close, or failing to react in time.

Autonomous cars are on the horizon, and in my opinion they can’t come soon enough – at least for motorway use. Motorways are the easiest type of driving to automate, as motorways are usually closed environments without pedestrians, animals, etc. They are also quite difficult for human drivers, who are easily bored and prone to distraction.

I’d be quite happy for motorways to be reserved for autonomous vehicles in the interests of safety – while still permitting human drivers in towns and on A-roads.

The mystery of the Canon A/L switch

Since the 1960s, Canon SLRs have had their power switch confusingly labelled as A and L. This has persisted through many different generations of camera and confused beginners through the ages. But what do the letters A and L stand for? Why not use On and Off, or a red dot and a white dot, or a tick and a cross?

First let’s have a look at the switches, starting off with the first ever Canon SLRs, the Canonflex series, which didn’t have any shutter lock at all. The photographer simply had to get into the habit of not winding on until they were ready to shoot, or keeping the camera in a case where the button couldn’t be accidentally pressed.

Canonflex RM, 1962

The first Canon SLRs with a power switch were the FL generation of cameras from the 1960s. These have a rotating collar around the shutter release button with two positions – A and L. This was a physical setting as these cameras had no active electronics in them.

Canon FT, 1966

This design was maintained with the introduction of the first generation of FD cameras, the F-series. Some of these cameras had a separate switch on the left hand side to control the light meter. These were labelled On and Off.

Canon FTb, 1973

Breaking with tradition, the next generation of FD cameras, the A-series in the mid 1970s came along with an unmarked switch close to the shutter release, displaying a red dot when switched off. It looks like an LED, but it’s just a red plastic knobble.

Canon AT-1, 1977

The later half of the A-series from the late 1970s started using a sliding lever near the shutter release, once again returning to the same two positions, A and L. On this AE-1 Program, you can see where the lever has scratched the body with use.

Canon AE-1 Program, 1981

The unashamedly electronic T-series (not a compliment) from the mid 1980s saw a change, and it seems Canon couldn’t decide what to do with the power switch. The consumer-level T70 and T80 used a sliding switch on the top of the camera, but let the secret slip by labelling the switch Lock instead of the usual L. The other settings are the self-timer, and two different metering modes.

Canon T70, 1984

The T80 and T90 put the power switch on the back of the camera, using the traditional A and L designations.

Canon T90, 1986

The T60 (which is not a true Canon, being made by Cosina) gave a hint of the future by doing away with a power switch entirely and having the A and L positions on the shutter speed dial.

Canon T60, 1990

The early EOS film cameras of the late 1980s had a rotating knob on the back with A and L modes, plus other modes on some models.

Canon EOS RT, 1989

Apparently the rotating knob idea didn’t work out, as the later EOS film camera series of the 1990s quickly returned to the trend set by the T60, by having an L position on the new command dial – but no A position.

Canon EOS 5, 1992

All EOS digital cameras were equipped with an On/Off switch from the very first model back in 2001. The switch varied in position from the back, to next to the shutter release – but never on the command dial.

Canon EOS 600D, 2011

After this journey spanning five decades of photographic history, are we any closer to knowing what these letters mean? Well, we saw from the T70 that L stands for Lock. But what about A?

Some Canon cameras of the 1970-80s also used A on lenses to designate “auto aperture”, but the Canonflex models of the late 1950s and early 1960s had nothing automatic about them so we can rule out A standing for Auto.

I haven’t been able to find anything online about this, but my theory is that A represents Active or Action, to mean that the camera is ready to shoot. If anyone knows better, please let me know!