Cluster API Provider Linode


PLEASE NOTE: This project is considered ALPHA quality and should NOT be used for production, as it is currently in active development. Use at your own risk. APIs, configuration file formats, and functionality are all subject to change frequently. That said, please try it out in your development and test environments and let us know how it works for you. Contributions welcome! Thanks!


What is Cluster API Provider Linode (CAPL)

This is a Cluster API implementation for Linode to create, configure, and manage Kubernetes clusters.


Compatibility

Cluster API Versions

CAPL is compatible only with the v1beta1 version of CAPI (v1.x).

Kubernetes Versions

CAPL is able to install and manage the versions of Kubernetes supported by the Cluster API (CAPI) project.


Documentation

Please see our Book for in-depth user and developer documentation.

Topics

This section contains information about enabling and configuring various features for Cluster API Provider Linode

Getting started with CAPL

Prerequisites

For more information please see the Linode Guide.

Setting up your cluster environment variables

Once you have provisioned your PAT, save it in an environment variable along with other required settings:

export LINODE_REGION=us-ord
export LINODE_TOKEN=<your linode PAT>
export LINODE_CONTROL_PLANE_MACHINE_TYPE=g6-standard-2
export LINODE_MACHINE_TYPE=g6-standard-2

Warning

For Regions and Images that do not yet support Akamai's cloud-init datasource CAPL will automatically use a stackscript shim to provision the node. If you are using a custom image ensure the cloud_init flag is set correctly on it

Warning

By default, clusters are provisioned within VPC. For Regions which do not have VPC support yet, use the VPCLess flavor to have clusters provisioned.

Register linode as an infrastructure provider

  1. Add linode as an infrastructure provider in ~/.cluster-api/clusterctl.yaml
    providers:
       - name: linode-linode
         url: https://github.com/linode/cluster-api-provider-linode/releases/latest/infrastructure-components.yaml
         type: InfrastructureProvider
    

Install CAPL on your management cluster

Install CAPL and enable the helm addon provider which is used by the majority of the CAPL flavors

clusterctl init --infrastructure linode-linode --addon helm

Deploying your first cluster

Please refer to the default flavor section for creating your first Kubernetes cluster on Linode using Cluster API.

Troubleshooting Guide

This guide covers common issues users might run into when using Cluster API Provider Linode. This list is work-in-progress, please feel free to open a PR to add this guide if you find that useful information is missing.

Examples of common issues

No Linode resources are getting created

This could be due to the LINODE_TOKEN either not being set in your environment or expired. If expired, provision a new token and optionally set the "Expiry" to "Never" (default expiry is 6 months).

One or more control plane replicas are missing

Take a look at the KubeadmControlPlane controller logs and look for any potential errors:

kubectl logs deploy/capi-kubeadm-control-plane-controller-manager -n capi-kubeadm-control-plane-system manager

In addition, make sure all pods on the workload cluster are healthy, including pods in the kube-system namespace.

Otherwise, ensure that the linode-ccm is installed on your workload cluster via CAAPH.

Nodes are in NotReady state

Make sure a CNI is installed on the workload cluster and that all the pods on the workload cluster are in running state.

If the Cluster is labeled with cni: <cluster-name>-cilium, check that the <cluster-name>-cilium HelmChartProxy is installed in the management cluster and that the HelmChartProxy is in a Ready state:

kubectl get cluster $CLUSTER_NAME --show-labels
kubectl get helmchartproxies

Checking CAPI and CAPL resources

To check the progression of all CAPI and CAPL resources on the management cluster you can run:

kubectl get cluster-api

Looking at the CAPL controller logs

To check the CAPL controller logs on the management cluster, run:

kubectl logs deploy/capl-controller-manager -n capl-system manager

Checking cloud-init logs (Debian / Ubuntu)

Cloud-init logs can provide more information on any issues that happened when running the bootstrap script.

Warning

Not all Debian and Ubuntu images available from Linode support cloud-init! Please see the Availability section of the Linode Metadata Service Guide.

You can also see which images have cloud-init support via the linode-cli:

linode-cli images list | grep cloud-init

Please refer to the Troubleshoot Metadata and Cloud-Init section of the Linode Metadata Service Guide.

Overview

This section documents addons for self-managed clusters.

Note

Currently, all addons are installed via Cluster API Addon Provider Helm (CAAPH).

CAAPH is installed by default in the KIND cluster created by make tilt-cluster.

For more information, please refer to the CAAPH Quick Start.

Note

The Linode Cloud Controller Manager and Linode CSI Driver addons require the ClusterResourceSet feature flag to be set on the management cluster.

This feature flag is enabled by default in the KIND cluster created by make tilt-cluster.

For more information, please refer to the ClusterResourceSet page in The Cluster API Book.

Contents

CNI

In order for pod networking to work properly, a Container Network Interface (CNI) must be installed.

Cilium

Installed by default

To install Cilium on a self-managed cluster, simply apply the cni: <cluster-name>-cilium label on the Cluster resource if not already present.

kubectl label cluster $CLUSTER_NAME cni=$CLUSTER_NAME-cilium --overwrite

Cilium will then be automatically installed via CAAPH into the labeled cluster.

Enabled Features

By default, Cilium's BGP Control Plane is enabled when using Cilium as the CNI.

CCM

In order for the InternalIP and ExternalIP of the provisioned Nodes to be set correctly, a Cloud Controller Manager (CCM) must be installed.

Linode Cloud Controller Manager

Installed by default

To install the linode-cloud-controller-manager (linode-ccm) on a self-managed cluster, simply apply the ccm: <cluster-name>-linode label on the Cluster resource if not already present.

kubectl label cluster $CLUSTER_NAME ccm=$CLUSTER_NAME-linode --overwrite

The linode-ccm will then be automatically installed via CAAPH into the labeled cluster.

Container Storage

In order for stateful workloads to create PersistentVolumes (PVs), a storage driver must be installed.

Linode CSI Driver

Installed by default

To install the csi-driver-linode on a self-managed cluster, simply apply the csi: <cluster-name>-linode label on the Cluster resource if not already present.

kubectl label cluster $CLUSTER_NAME csi=$CLUSTER_NAME-linode --overwrite

The csi-driver-linode will then be automatically installed via CAAPH into the labeled cluster.

Flavors

This section contains information about supported flavors in Cluster API Provider Linode

In clusterctl the infrastructure provider authors can provide different types of cluster templates referred to as "flavors". You can use the --flavor flag to specify which flavor to use for a cluster, e.g:

clusterctl generate cluster test-cluster --flavor clusterclass-kubeadm

To use the default flavor, omit the --flavor flag.

See the clusterctl flavors docs for more information.

Default

Specification

Control PlaneCNIDefault OSInstalls ClusterClassIPv4IPv6
KubeadmCiliumUbuntu 22.04NoYesNo

Prerequisites

Quickstart completed

Usage

  1. Generate cluster yaml
    clusterctl generate cluster test-cluster \
        --kubernetes-version v1.29.1 \
        --infrastructure linode-linode > test-cluster.yaml
    
  2. Apply cluster yaml
    kubectl apply -f test-cluster.yaml
    

Dual-Stack

Specification

Control PlaneCNIDefault OSInstalls ClusterClassIPv4IPv6
KubeadmCiliumUbuntu 22.04NoYesYes

Prerequisites

Quickstart completed

Usage

  1. Generate cluster yaml
    clusterctl generate cluster test-cluster \
        --kubernetes-version v1.29.1 \
        --infrastructure linode-linode \
        --flavor dual-stack > test-cluster.yaml
    
  2. Apply cluster yaml
    kubectl apply -f test-cluster.yaml
    

Etcd-disk

This flavor configures etcd to be on a separate disk from the OS disk. By default it configures the size of the disk to be 10 GiB and sets the quota-backend-bytes to 8589934592 (8 GiB) per recommendation from the etcd documentation.

Specification

Control PlaneCNIDefault OSInstalls ClusterClassIPv4IPv6
KubeadmCiliumUbuntu 22.04NoYesNo

Prerequisites

Quickstart completed

Usage

  1. Generate cluster yaml
    clusterctl generate cluster test-cluster \
        --kubernetes-version v1.29.1 \
        --infrastructure linode-linode \
        --flavor etcd-disk > test-cluster.yaml
    
  2. Apply cluster yaml
    kubectl apply -f test-cluster.yaml
    

Kubeadm ClusterClass

Specification

Control PlaneCNIDefault OSInstalls ClusterClassIPv4IPv6
KubeadmCiliumUbuntu 22.04YesYesNo

Prerequisites

Quickstart completed

Usage

Create clusterClass and first cluster

  1. Generate the ClusterClass and cluster manifests
    clusterctl generate cluster test-cluster \
        --kubernetes-version v1.29.1 \
        --infrastructure linode-linode \
        --flavor clusterclass-kubeadm > test-cluster.yaml
    
  2. Apply cluster manifests
    kubectl apply -f test-cluster.yaml
    

(Optional) Create a second cluster using the existing ClusterClass

  1. Generate cluster manifests
    clusterctl generate cluster test-cluster-2 \
        --kubernetes-version v1.29.1 \
        --flavor clusterclass-kubeadm > test-cluster-2.yaml
    
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    metadata:
      labels:
        ccm: test-cluster-2-linode
        cni: test-cluster-2-cilium
        crs: test-cluster-2-crs
      name: test-cluster-2
      namespace: default
    spec:
      clusterNetwork:
        pods:
          cidrBlocks:
          - 10.192.0.0/10
      topology:
        class: kubeadm
        controlPlane:
          replicas: 1
        variables:
        - name: region
          value: us-ord
        - name: controlPlaneMachineType
          value: g6-standard-2
        - name: workerMachineType
          value: g6-standard-2
        version: v1.29.1
        workers:
          machineDeployments:
          - class: default-worker
            name: md-0
            replicas: 1
    
  2. Apply cluster manifests
    kubectl apply -f test-cluster-2.yaml
    

Cluster Autoscaler

This flavor adds auto-scaling via Cluster Autoscaler.

Specification

Control PlaneCNIDefault OSInstalls ClusterClassIPv4IPv6
KubeadmCiliumUbuntu 22.04NoYesNo

Prerequisites

Quickstart completed

Usage

  1. Set up autoscaling environment variables

    We recommend using Cluster Autoscaler with the Kubernetes control plane ... version for which it was meant.

    -- Releases · kubernetes/autoscaler

    export CLUSTER_AUTOSCALER_VERSION=v1.29.0
    # Optional: If specified, these values must be explicitly quoted!
    export WORKER_MACHINE_MIN='"1"'
    export WORKER_MACHINE_MAX='"10"'
    
  2. Generate cluster yaml

    clusterctl generate cluster test-cluster \
        --kubernetes-version v1.29.1 \
        --infrastructure linode-linode \
        --flavor cluster-autoscaler > test-cluster.yaml
    
  3. Apply cluster yaml

    kubectl apply -f test-cluster.yaml
    

K3s

Specification

Control PlaneCNIDefault OSInstalls ClusterClassIPv4IPv6
k3sCiliumUbuntu 22.04NoYesNo

Prerequisites

  • Quickstart completed
  • Select a k3s kubernetes version to set for the kubernetes version
  • Installed k3s bootstrap provider into your management cluster
    • Add the following to ~/.cluster-api/clusterctl.yaml for the k3s bootstrap/control plane providers
      providers:
        - name: "k3s"
          url: https://github.com/k3s-io/cluster-api-k3s/releases/latest/bootstrap-components.yaml
          type: "BootstrapProvider"
        - name: "k3s"
          url: https://github.com/k3s-io/cluster-api-k3s/releases/latest/control-plane-components.yaml
          type: "ControlPlaneProvider"
          
      
    • Install the k3s provider into your management cluster
      clusterctl init --bootstrap k3s --control-plane k3s
      

Usage

  1. Generate cluster yaml
    clusterctl generate cluster test-cluster \
        --kubernetes-version v1.29.1+k3s2 \
        --infrastructure linode-linode \
        --flavor k3s > test-k3s-cluster.yaml
    
  2. Apply cluster yaml
    kubectl apply -f test-k3s-cluster.yaml
    

RKE2

This flavor uses RKE2 for the kubernetes distribution. By default it configures the cluster with the CIS profile:

Using the generic cis profile will ensure that the cluster passes the CIS benchmark (rke2-cis-1.XX-profile-hardened) associated with the Kubernetes version that RKE2 is running. For example, RKE2 v1.28.XX with the profile: cis will pass the rke2-cis-1.7-profile-hardened in Rancher.

Warning

Until this upstream PR is merged, CIS profile enabling will not work for RKE2 versions >= v1.29.

Specification

Control PlaneCNIDefault OSInstalls ClusterClassIPv4IPv6
rke2CiliumUbuntu 22.04NoYesNo

Prerequisites

Usage

  1. Generate cluster yaml
    clusterctl generate cluster test-cluster \
        --kubernetes-version v1.29.1+rke2r1 \
        --infrastructure linode-linode \
        --flavor rke2 > test-rke2-cluster.yaml
    
  2. Apply cluster yaml
    kubectl apply -f test-rke2-cluster.yaml
    

VPCLess

This flavor supports provisioning k8s clusters outside of VPC. It uses kubeadm for setting up control plane and uses cilium with VXLAN for pod networking.

Specification

Control PlaneCNIDefault OSInstalls ClusterClassIPv4IPv6
KubeadmCiliumUbuntu 22.04NoYesNo

Prerequisites

Quickstart completed

Notes

This flavor is identical to the default flavor with the exception that it provisions k8s clusters without VPC. Since it runs outside of VPC, native routing is not supported in this flavor and it uses VXLAN for pod to pod communication.

Usage

  1. Generate cluster yaml
    clusterctl generate cluster test-cluster \
        --infrastructure linode-linode \
        --flavor vpcless > test-cluster.yaml
    
  2. Apply cluster yaml
    kubectl apply -f test-cluster.yaml
    

Etcd

This guide covers etcd configuration for the control plane of provisioned CAPL clusters.

Default configuration

The quota-backend-bytes for etcd is set to 8589934592 (8 GiB) per recommendation from the etcd documentation.

By default, etcd is configured to be on the same disk as the root filesystem on control plane nodes. If users prefer etcd to be on a separate disk, see the etcd-disk flavor.

ETCD Backups

By default, etcd is not backed-up. To enable backups, users need to choose the etcd-backup-restore flavor.

To begin with, this will deploy a Linode OBJ bucket. This serves as the S3-compatible target to store backups.

Next up, on provisioning the cluster, etcd-backup-restore is deployed as a statefulset. The pod will need the bucket details like the name, region, endpoints and access credentials which are passed using the bucket-details secret that is created when the OBJ bucket gets created.

Enabling SSE

Users can also enable SSE (Server-side encryption) by passing a SSE AES-256 Key as an env var. All env vars here on the pod can be controlled during the provisioning process.

Warning

This is currently under development and will be available for use once the upstream PR is merged and an official image is made available

For eg:

export CLUSTER_NAME=test
export OBJ_BUCKET_REGION=us-ord-1
export ETCDBR_IMAGE=docker.io/username/your-custom-image:version
export SSE_KEY=cdQdZ3PrKgm5vmqxeqwQCuAWJ7pPVyHg
clusterctl generate cluster $CLUSTER_NAME \
  --kubernetes-version v1.29.1 \
  --infrastructure linode-linode \
  --flavor etcd-backup-restore \
  | kubectl apply -f -

Backups

CAPL supports performing etcd backups by provisioning an Object Storage bucket and access keys. This feature is not enabled by default and can be configured as an addon.

Warning

Enabling this addon requires enabling Object Storage in the account where the resources will be provisioned. Please refer to the Pricing information in Linode's Object Storage documentation.

Enabling Backups

To enable backups, use the addon flag during provisioning to select the etcd-backup-restore addon

clusterctl generate cluster $CLUSTER_NAME \
  --kubernetes-version v1.29.1 \
  --infrastructure linode-linode \
  --flavor etcd-backup-restore \
  | kubectl apply -f -

For more fine-grain control and to know more about etcd backups, refer to the backups section of the etcd page

Object Storage

Additionally, CAPL can be used to provision Object Storage buckets and access keys for general purposes by configuring a LinodeObjectStorageBucket resource.

Warning

Using this feature requires enabling Object Storage in the account where the resources will be provisioned. Please refer to the Pricing information in Linode's Object Storage documentation.

Bucket Creation

The following is the minimal required configuration needed to provision an Object Storage bucket and set of access keys.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeObjectStorageBucket
metadata:
  name: <unique-bucket-label>
  namespace: <namespace>
spec:
  cluster: <object-storage-region>
  secretType: Opaque

Upon creation of the resource, CAPL will provision a bucket in the region specified using the .metadata.name as the bucket's label.

Warning

The bucket label must be unique within the region across all accounts. Otherwise, CAPL will populate the resource status fields with errors to show that the operation failed.

Access Keys Creation

CAPL will also create read_write and read_only access keys for the bucket and store credentials in a secret in the same namespace where the LinodeObjectStorageBucket was created along with other details about the Linode OBJ Bucket:

apiVersion: v1
kind: Secret
metadata:
  name: <unique-bucket-label>-bucket-details
  namespace: <same-namespace-as-object-storage-bucket>
  ownerReferences:
    - apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
      kind: LinodeObjectStorageBucket
      name: <unique-bucket-label>
      controller: true
      uid: <unique-uid>
data:
  bucket_name: <unique-bucket-label>
  bucket_region: <linode-obj-bucket-region>
  bucket_endpoint: <hostname-to-access-bucket>
  access_key_rw: <base64-encoded-access-key>
  secret_key_rw: <base64-encoded-secret-key>
  access_key_ro: <base64-encoded-access-key>
  secret_key_ro: <base64-encoded-secret-key>

The bucket-details secret is owned and managed by CAPL during the life of the LinodeObjectStorageBucket.

Access Keys Rotation

The following configuration with keyGeneration set to a new value (different from .status.lastKeyGeneration) will instruct CAPL to rotate the access keys.

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeObjectStorageBucket
metadata:
  name: <unique-bucket-label>
  namespace: <namespace>
spec:
  cluster: <object-storage-region>
  secretType: Opaque
  keyGeneration: 1
# status:
#   lastKeyGeneration: 0

Bucket Status

Upon successful provisioning of a bucket and keys, the LinodeObjectStorageBucket resource's status will resemble the following:

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeObjectStorageBucket
metadata:
  name: <unique-bucket-label>
  namespace: <namespace>
spec:
  cluster: <object-storage-region>
  secretType: Opaque
  keyGeneration: 0
status:
  ready: true
  conditions:
    - type: Ready
      status: "True"
      lastTransitionTime: <timestamp>
  hostname: <hostname-for-bucket>
  creationTime: <bucket-creation-timestamp>
  lastKeyGeneration: 0
  keySecretName: <unique-bucket-label>-bucket-details
  accessKeyRefs:
    - <access-key-rw-id>
    - <access-key-ro-id>

Resource Deletion

When deleting a LinodeObjectStorageBucket resource, CAPL will deprovision the access keys and managed secret but retain the underlying bucket to avoid unintended data loss.

Multi-Tenancy

CAPL can manage multi-tenant workload clusters across Linode accounts. Custom resources may reference an optional Secret containing their Linode credentials (i.e. API token) to be used for the deployment of Linode resources (e.g. Linodes, VPCs, NodeBalancers, etc.) associated with the cluster.

The following example shows a basic credentials Secret:

apiVersion: v1
kind: Secret
metadata:
  name: linode-credentials
stringData:
  apiToken: <LINODE_TOKEN>

Warning

The Linode API token data must be put in a key named apiToken!

Which may be optionally consumed by one or more custom resource objects:

# Example: LinodeCluster
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeCluster
metadata:
  name: test-cluster
spec:
  credentialsRef:
    name: linode-credentials
  ...
---
# Example: LinodeVPC
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeVPC
metadata:
  name: test-vpc
spec:
  credentialsRef:
    name: linode-credentials
  ...
---
# Example: LinodeMachine
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeMachine
metadata:
  name: test-machine
spec:
  credentialsRef:
    name: linode-credentials
  ...
---
# Example: LinodeObjectStorageBucket
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeObjectStorageBucket
metadata:
  name: test-bucket
spec:
  credentialsRef:
    name: linode-credentials
  ...

Secrets from other namespaces by additionally specifying an optional .spec.credentialsRef.namespace value.

Warning

If .spec.credentialsRef is set for a LinodeCluster, it should also be set for adjacent resources (e.g. LinodeVPC).

LinodeMachine

For LinodeMachines, credentials set on the LinodeMachine object will override any credentials supplied by the owner LinodeCluster. This can allow cross-account deployment of the Linodes for a cluster.

Disks

This section contains information about OS and data disk configuration in Cluster API Provider Linode

OS Disk

This section describes how to configure the root disk for provisioned linode. By default, the OS disk will be dynamically sized to use any size available in the linode plan that is not taken up by data disks.

Setting OS Disk Size

Use the osDisk section to specify the exact size the OS disk should be. The default behaviour if this is not set is the OS disk will dynamically be sized to the maximum allowed by the linode plan with any data disk sizes taken into account.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeMachineTemplate
metadata:
  name: ${CLUSTER}-control-plane
spec:
  template:
    spec:
      region: us-ord
      type: g6-standard-4
      osDisk:
        size: 100Gi



Setting OS Disk Label

The default label on the root OS disk can be overridden by specifying a label in the osDisk field. The label can only be set if an explicit size is being set as size is a required field

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeMachineTemplate
metadata:
  name: ${CLUSTER}-control-plane
  namespace: default
spec:
  template:
    spec:
      image: ""
      region: us-ord
      type: g6-standard-4
      osDisk:
        label: root-disk
        size: 10Gi

Data Disks

This section describes how to specify additional data disks for a linode instance. These disks can use devices sdb through sdh for a total of 7 disks.

Warning

There are a couple caveats with specifying disks for a linode instance:

  1. The total size of these disks + the OS Disk cannot exceed the linode instance plan size.
  2. Instance disk configuration is currently immutable via CAPL after the instance is booted.

Warning

Currently SDB is being used by a swap disk, replacing this disk with a data disk will slow down linode creation by up to 90 seconds. This will be resolved when the disk creation refactor is finished in PR #216

Specify a data disk

A LinodeMachine can be configured with additional data disks with the key being the device to be mounted as and including an optional label and size.

  • size Required field. resource.Quantity for the size if a disk. The sum of all data disks must not be more than allowed by the linode plan.
  • label Optional field. The label for the disk, defaults to the device name
  • diskID Optional field used by the controller to track disk IDs, this should not be set unless a disk is created outside CAPL
  • filesystem Optional field used to specify the type filesystem of disk to provision, the default is ext4 and valid options are any supported linode filesystem
---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeMachineTemplate
metadata:
  name: ${CLUSTER}-control-plane
spec:
  template:
    spec:
      region: us-ord
      type: g6-standard-4
      dataDisks:
        sdc:
          label: etcd_disk
          size: 16Gi
        sdd:
          label: data_disk
          size: 10Gi

Use a data disk for an explicit etcd data disk

The following configuration can be used to configure a separate disk for etcd data on control plane nodes.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeMachineTemplate
metadata:
  name: ${CLUSTER}-control-plane
spec:
  template:
    spec:
      region: us-ord
      type: g6-standard-4
      dataDisks:
        sdc:
          label: etcd_disk
          size: 16Gi

---
kind: KubeadmControlPlane
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
metadata:
  name: "${CLUSTER_NAME}-control-plane"
spec:
    diskSetup:
      filesystems:
        - label: etcd_data
          filesystem: ext4
          device: /dev/sdc
    mounts:
      - - LABEL=etcd_data
        - /var/lib/etcd_data

Machine Health Checks

CAPL supports auto-remediation of workload cluster Nodes considered to be unhealthy via MachineHealthChecks.

Enabling Machine Health Checks

While it is possible to manually create and apply a MachineHealthCheck resource into the management cluster, using the self-healing flavor is the quickest way to get started:

clusterctl generate cluster $CLUSTER_NAME \
  --kubernetes-version v1.29.1 \
  --infrastructure linode-linode \
  --flavor self-healing \
  | kubectl apply -f -

This flavor deploys a MachineHealthCheck for the workers and another MachineHealthCheck for the control plane of the cluster. It also configures the remediation strategy of the kubeadm control plane to prevent unnecessary load on the infrastructure provider.

Configuring Machine Health Checks

Refer to the Cluster API documentation for further information on configuring and using MachineHealthChecks.

Auto-scaling

This guide covers auto-scaling for CAPL clusters. The recommended tool for auto-scaling on Cluster API is Cluster Autoscaler.

Flavor

The auto-scaling feature is provided by an add-on as part of the Cluster Autoscaler flavor.

Configuration

By default, the Cluster Autoscaler add-on runs in the management cluster, managing an external workload cluster.

+------------+             +----------+
|    mgmt    |             | workload |
| ---------- | kubeconfig  |          |
| autoscaler +------------>|          |
+------------+             +----------+

A separate Cluster Autoscaler is deployed for each workload cluster, configured to only monitor node groups for the specific namespace and cluster name combination.

Role-based Access Control (RBAC)

Management Cluster

Due to constraints with the Kubernetes RBAC system (i.e. roles cannot be subdivided beyond namespace-granularity), the Cluster Autoscaler add-on is deployed on the management cluster to prevent leaking Cluster API data between workload clusters.

Workload Cluster

Currently, the Cluster Autoscaler reuses the ${CLUSTER_NAME}-kubeconfig Secret generated by the bootstrap provider to interact with the workload cluster. The kubeconfig contents must be stored in a key named value. Due to this, all Cluster Autoscaler actions in the workload cluster are performed as the cluster-admin role.

Scale Down

Cluster Autoscaler decreases the size of the cluster when some nodes are consistently unneeded for a significant amount of time. A node is unneeded when it has low utilization and all of its important pods can be moved elsewhere.

By default, Cluster Autoscaler scales down a node after it is marked as unneeded for 10 minutes. This can be adjusted with the --scale-down-unneeded-time setting.

Kubernetes Cloud Controller Manager for Linode (CCM)

The Kubernetes Cloud Controller Manager for Linode is deployed on workload clusters and reconciles Kubernetes Node objects with their backing Linode infrastructure. When scaling down a node group, the Cluster Autoscaler also deletes the Kubernetes Node object on the workload cluster. This step preempts the Node-deletion in Kubernetes triggered by the CCM.

Additional Resources

VPC

This guide covers how VPC is used with CAPL clusters. By default, CAPL clusters are provisioned within VPC.

Default configuration

Each linode within a cluster gets provisioned with two interfaces:

  1. eth0 (for public and nodebalancer traffic)
  2. eth1 (connected to VPC, for pod-to-pod traffic)

Key facts about VPC network configuration:

  1. VPCs are provisioned with a private subnet 10.0.0.0/8.
  2. All pod-to-pod communication happens over the VPC interface (eth1).
  3. We assign a pod CIDR of range 10.192.0.0/10 for pod-to-pod communication.
  4. By default, cilium is configured with native routing
  5. Kubernetes host-scope IPAM mode is used to assign pod CIDRs to nodes. We run linode CCM with route-controller enabled which automatically adds/updates routes within VPC when pod cidrs are added/updated by k8s. This enables pod-to-pod traffic to be routable within the VPC.
  6. kube-proxy is disabled by default.

How VPC is provisioned

A VPC is tied to a region. CAPL generates LinodeVPC manifest which contains the VPC name, region and subnet information. By defult, VPC name is set to cluster name but can be overwritten by specifying relevant environment variable.

---
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha1
kind: LinodeVPC
metadata:
  name: ${VPC_NAME:=${CLUSTER_NAME}}
  labels:
    cluster.x-k8s.io/cluster-name: ${CLUSTER_NAME}
spec:
  region: ${LINODE_REGION}
  subnets:
    - ipv4: 10.0.0.0/8
      label: default

Reference to LinodeVPC object is added to LinodeCluster object which then uses the specified VPC to provision resources.

Troubleshooting

If pod-to-pod connectivity is failing

If a pod can't ping pod ips on different node, check and make sure pod CIDRs are added to ip_ranges of VPC interface.

curl --header 'Authorization: Bearer $LINODE_API_TOKEN' -X GET https://api.linode.com/v4/linode/instances/${LINODEID}/configs | jq .data[0].interfaces[].ip_ranges

Note

CIDR returned in the output of above command should match with the pod CIDR present in node's spec k get node <nodename> -o yaml | yq .spec.podCIDRs

Running cilium connectivity tests

One can also run cilium connectivity tests to make sure networking works fine within VPC. Follow the steps defined in cilium e2e tests guide to install cilium binary, set the KUBECONFIG variable and then run cilium connectivity tests.

Firewalling

This guide covers how Cilium can be set up to act as a host firewall on CAPL clusters.

Default Configuration

By default, the following policies are set to audit mode(without any enforcement) on CAPL clusters

  • Kubeadm cluster allow rules

    PortsUse-caseAllowed clients
    6443API Server TrafficWorld
    2379-2380Etcd TrafficWorld
    *In Cluster CommunicationIntra Cluster Traffic
  • k3s cluster allow rules

    PortsUse-caseAllowed clients
    6443API Server TrafficWorld
    *In Cluster CommunicationIntra Cluster and VPC Traffic
  • RKE2 cluster allow rules

    PortsUse-caseAllowed clients
    6443API Server TrafficWorld
    *In Cluster CommunicationIntra Cluster and VPC Traffic

Enabling Firewall Enforcement

In order to turn the cilium network policy from audit to enforce mode use the environment variable FW_AUDIT_ONLY=false when generating the cluster. This will set the policy-audit-mode on the cilium deployment

Adding Additional Rules

Additional rules can be added to the default-policy

apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "default-external-policy"
spec:
  description: "allow cluster intra cluster traffic along api server traffic"
  nodeSelector: {}
  ingress:
    - fromEntities:
        - cluster
    - fromCIDR:
        - 10.0.0.0/8
    - fromEntities:
        - world
      toPorts:
        - ports:
            - port: "22" # added for SSH Access to the nodes
            - port: "6443"

Alternatively, additional rules can be added by creating a new policy

apiVersion: "cilium.io/v2"
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: "ssh-access-policy"
spec:
  description: "allows ssh access to nodes"
  nodeSelector: {}
  ingress:
    - fromEntities:
        - world
      toPorts:
        - ports:
            - port: "22"

Developing Cluster API Provider Linode

Contents

Setting up

Base requirements

Warning

Ensure you have your LINODE_TOKEN set as outlined in the getting started prerequisites section.

There are no requirements since development dependencies are fetched as needed via the make targets, but a recommendation is to install Devbox

Clone the source code

git clone https://github.com/linode/cluster-api-provider-linode
cd cluster-api-provider-linode

Enable git hooks

To enable automatic code validation on code push, execute the following commands:

PATH="$PWD/bin:$PATH" make husky && husky install

If you would like to temporarily disable git hook, set SKIP_GIT_PUSH_HOOK value:

SKIP_GIT_PUSH_HOOK=1 git push
  1. Install dependent packages in your project

    devbox install
    

    This will take a while, go and grab a drink of water.

  2. Use devbox environment

    devbox shell
    

From this point you can use the devbox shell like a regular shell. The rest of the guide assumes a devbox shell is used, but the make target dependencies will install any missing dependencies if needed when running outside a devbox shell.

Get familiar with basic concepts

This provider is based on the Cluster API project. It's recommended to familiarize yourself with Cluster API resources, concepts, and conventions outlined in the Cluster API Book.

Developing

This repository uses Go Modules to track and vendor dependencies.

To pin a new dependency, run:

go get <repository>@<version>

Code Overview

The code in this repo is organized across the following packages:

  • /api contains the custom resource types managed by CAPL.
  • /cmd contains the main entrypoint for registering controllers and running the controller manager.
  • /controller contains the various controllers that run in CAPL for reconciling the custom resource types.
  • /cloud/scope contains all Kubernetes client interactions scoped to each resource reconciliation loop. Each "scope" object is expected to store both a Kubernetes client and a Linode client.
  • /cloud/services contains all Linode client interactions. Functions defined in this package all expect a "scope" object which contains a Linode client to use.
  • /mock contains gomock clients generated from /cloud/scope/client.go.
  • /util/ contains general-use helper functions used in other packages.
  • /util/reconciler contains helper functions and constants used within the /controller package.

When adding a new controller, it is preferable that controller code only use the Kubernetes and Linode clients via functions defined in /cloud/scope and /cloud/services. This ensures each separate package can be tested in isolation using mock clients.

Using tilt

Note

If you want to create RKE2 and/or K3s clusters, make sure to set the following env vars first:

export INSTALL_RKE2_PROVIDER=true
export INSTALL_K3S_PROVIDER=true

Additionally, if you want to skip the docker build step for CAPL to instead use the latest image on main from Dockerhub, set the following:

export SKIP_DOCKER_BUILD=true

To build a kind cluster and start Tilt, simply run:

make local-deploy

Once your kind management cluster is up and running, you can deploy a workload cluster.

To tear down the tilt-cluster, run

kind delete cluster --name tilt

Deploying a workload cluster

After your kind management cluster is up and running with Tilt, you should be ready to deploy your first cluster.

Generating local cluster templates

For local development, templates should be generated via:

make local-release

This creates infrastructure-local-linode/v0.0.0/ with all the cluster templates:

infrastructure-local-linode/v0.0.0
├── cluster-template-clusterclass-kubeadm.yaml
├── cluster-template-etcd-backup-restore.yaml
├── cluster-template-k3s.yaml
├── cluster-template-rke2.yaml
├── cluster-template.yaml
├── clusterclass-kubeadm.yaml
├── infrastructure-components.yaml
└── metadata.yaml

This can then be used with clusterctl by adding the following to ~/.clusterctl/cluster-api.yaml (assuming the repo exists in the $HOME directory):

providers:
  - name: local-linode
    url: ${HOME}/cluster-api-provider-linode/infrastructure-local-linode/v0.0.0/infrastructure-components.yaml
    type: InfrastructureProvider

Customizing the cluster deployment

Here is a list of required configuration parameters:

## Cluster settings
export CLUSTER_NAME=capl-cluster

## Linode settings
export LINODE_REGION=us-ord
# Multi-tenancy: This may be changed for each cluster to deploy to different Linode accounts.
export LINODE_TOKEN=<your linode PAT>
export LINODE_CONTROL_PLANE_MACHINE_TYPE=g6-standard-2
export LINODE_MACHINE_TYPE=g6-standard-2

Tip

You can also use clusterctl generate to see which variables need to be set:

clusterctl generate cluster $CLUSTER_NAME --infrastructure local-linode:v0.0.0 [--flavor <flavor>] --list-variables

Creating the workload cluster

Using the default flavor

Once you have all the necessary environment variables set, you can deploy a workload cluster with the default flavor:

clusterctl generate cluster $CLUSTER_NAME \
  --kubernetes-version v1.29.1 \
  --infrastructure local-linode:v0.0.0 \
  | kubectl apply -f -

This will provision the cluster within VPC with the CNI defaulted to cilium and the linode-ccm installed.

Using ClusterClass (alpha)

ClusterClass experimental feature is enabled by default in the KIND management cluster created via make tilt-cluster

You can use the clusterclass flavor to create a workload cluster as well, assuming the management cluster has the ClusterTopology feature gate set:

clusterctl generate cluster $CLUSTER_NAME \
  --kubernetes-version v1.29.1 \
  --infrastructure local-linode:v0.0.0 \
  --flavor clusterclass-kubeadm \
  | kubectl apply -f -

For any issues, please refer to the troubleshooting guide.

Cleaning up the workload cluster

To delete the cluster, simply run:

kubectl delete cluster $CLUSTER_NAME

Warning

VPCs are not deleted when a cluster is deleted using kubectl. One can run kubectl delete linodevpc <vpcname> to cleanup VPC once cluster is deleted.

For any issues, please refer to the troubleshooting guide.

Automated Testing

E2E Testing

To run E2E locally run:

# Required env vars to run e2e tests
export INSTALL_K3S_PROVIDER=true
export INSTALL_RKE2_PROVIDER=true
export LINODE_REGION=us-sea
export LINODE_CONTROL_PLANE_MACHINE_TYPE=g6-standard-2
export LINODE_MACHINE_TYPE=g6-standard-2

# IMPORTANT: Set linode, k3s, and rke2 providers in this config file.
# Find an example at e2e/gha-clusterctl-config.yaml
export CLUSTERCTL_CONFIG=~/.cluster-api/clusterctl.yaml

make e2etest

This command creates a KIND cluster, and executes all the defined tests.

For more details on E2E tests, please refer to E2E Testing

Warning

CAPL Releases

Release Cadence

CAPL currently has no set release cadence.

Bug Fixes

Any significant user-facing bug fix that lands in the main branch should be backported to the current and previous release lines.

Versioning Scheme

CAPL follows the semantic versioning specification.

Example versions:

  • Pre-release: v0.1.1-alpha.1
  • Minor release: v0.1.0
  • Patch release: v0.1.1
  • Major release: v1.0.0

Release Process

Update metadata.yaml (skip for patch releases)

  • Make sure metadata.yaml is up-to-date and contains the new release with the correct Cluster API contract version.
    • If not, open a PR to add it.

Release in GitHub

  • Create a new release.
    • Enter tag and select create tag on publish
    • Make sure to click "Generate Release Notes"
    • Review the generated Release Notes and make any necessary changes.
    • If the tag is a pre-release, make sure to check the "Set as a pre-release box"

Expected artifacts

  • A infrastructure-components.yaml file containing the resources needed to deploy to Kubernetes
  • A cluster-templates-*.yaml file for each supported flavor
  • A metadata.yaml file which maps release series to the Cluster API contract version

Communication

  1. Announce the release in the Kubernetes Slack on the #linode channel

CAPL Testing

Unit Tests

Executing Tests

In order to run the unit tests run the following command

make test

Creating Tests

General unit tests of functions follow the same conventions for testing using Go's testing standard library, along with the testify toolkit for making assertions.

Unit tests that require API clients use mock clients generated using gomock. To simplify the usage of mock clients, this repo also uses an internal library defined in mock/mocktest.

mocktest is usually imported as a dot import along with the mock package:

import (
  "github.com/linode/cluster-api-provider-linode/mock"

  . "github.com/linode/cluster-api-provider-linode/mock/mocktest"
)

Using mocktest involves creating a test suite that specifies the mock clients to be used within each test scope and running the test suite using a DSL for defnining test nodes belong to one or more test paths.

Example

The following is a contrived example using the mock Linode machine client.

Let's say we've written an idempotent function EnsureInstanceRuns that 1) gets an instance or creates it if it doesn't exist, 2) boots the instance if it's offline. Testing this function would mean we'd need to write test cases for all permutations, i.e.

  • instance exists and is not offline
  • instance exists but is offline, and is able to boot
  • instance exists but is offline, and is not able to boot
  • instance does not exist, and is not able to be created
  • instance does not exist, and is able to be created, and is able to boot
  • instance does not exist, and is able to be created, and is not able to boot

While writing test cases for each scenario, we'd likely find a lot of overlap between each. mocktest provides a DSL for defining each unique test case without needing to spell out all required mock client calls for each case. Here's how we could test EnsureInstanceRuns using mocktest:

func TestEnsureInstanceNotOffline(t *testing.T) {
  suite := NewSuite(t, mock.MockLinodeMachineClient{})
  
  suite.Run(
    OneOf(
      Path(
        Call("instance exists and is not offline", func(ctx context.Context, mck Mock) {
          mck.MachineClient.EXPECT().GetInstance(ctx, /* ... */).Return(&linodego.Instance{Status: linodego.InstanceRunning}, nil)
        }),
        Result("success", func(ctx context.Context, mck Mock) {
          inst, err := EnsureInstanceNotOffline(ctx, /* ... */)
          require.NoError(t, err)
          assert.Equal(t, inst.Status, linodego.InstanceRunning)
        })
      ),
      Path(
        Call("instance does not exist", func(ctx context.Context, mck Mock) {
          mck.MachineClient.EXPECT().GetInstance(ctx, /* ... */).Return(nil, linodego.Error{Code: 404})
        }),
        OneOf(
          Path(Call("able to be created", func(ctx context.Context, mck Mock) {
            mck.MachineClient.EXPECT().CreateInstance(ctx, /* ... */).Return(&linodego.Instance{Status: linodego.InstanceOffline}, nil)
          })),
          Path(
            Call("not able to be created", func(ctx context.Context, mck Mock) {/* ... */})
            Result("error", func(ctx context.Context, mck Mock) {
              inst, err := EnsureInstanceNotOffline(ctx, /* ... */)
              require.ErrorContains(t, err, "instance was not booted: failed to create instance: reasons...")
              assert.Empty(inst)
            }),
          )
        ),
      ),
      Path(Call("instance exists but is offline", func(ctx context.Context, mck Mock) {
        mck.MachineClient.EXPECT().GetInstance(ctx, /* ... */).Return(&linodego.Instance{Status: linodego.InstanceOffline}, nil)
      })),
    ),
    OneOf(
      Path(
        Call("able to boot", func(ctx context.Context, mck Mock) {/*  */})
        Result("success", func(ctx context.Context, mck Mock) {
          inst, err := EnsureInstanceNotOffline(ctx, /* ... */)
          require.NoError(t, err)
          assert.Equal(t, inst.Status, linodego.InstanceBooting)
        })
      ),
      Path(
        Call("not able to boot", func(ctx context.Context, mck Mock) {/* returns API error */})
        Result("error", func(ctx context.Context, mck Mock) {
          inst, err := EnsureInstanceNotOffline(/* options */)
          require.ErrorContains(t, err, "instance was not booted: boot failed: reasons...")
          assert.Empty(inst)
        })
      )
    ),
  )
}

In this example, the nodes passed into Run are used to describe each permutation of the function being called with different results from the mock Linode machine client.

Nodes

  • Call describes the behavior of method calls by mock clients. A Call node can belong to one or more paths.
  • Result invokes the function with mock clients and tests the output. A Result node terminates each path it belongs to.
  • OneOf is a collection of diverging paths that will be evaluated in separate test cases.
  • Path is a collection of nodes that all belong to the same test path. Each child node of a Path is evaluated in order. Note that Path is only needed for logically grouping and isolating nodes within different test cases in a OneOf node.

Setup, tear down, and event triggers

Setup and tear down nodes can be scheduled before and after each run. suite.BeforeEach receives a func(context.Context, Mock) function that will run before each path is evaluated. Likewise, suite.AfterEach will run after each path is evaluated.

In addition to the path nodes listed in the section above, a special node type Once may be specified to inject a function that will only be evaluated one time across all paths. It can be used to trigger side effects outside of mock client behavior that can impact the output of the function being tested.

Control flow

When Run is called on a test suite, paths are evaluated in parallel using t.Parallel(). Each path will be run with a separate t.Run call, and each test run will be named according to the descriptions specified in each node.

To help with visualizing the paths that will be rendered from nodes, a DescribePaths helper function can be called which returns a slice of strings describing each path. For instance, the following shows the output of DescribePaths on the paths described in the example above:

DescribePaths(/* nodes... */) /* [
  "instance exists and is not offline > success",
  "instance does not exist > not able to be created > error",
  "instance does not exist > able to be created > able to boot > success",
  "instance does not exist > able to be created > not able to boot > error",
  "instance exists but is offline > able to boot > success",
  "instance exists but is offline > not able to boot > error"
] */

Testing controllers

CAPL uses controller-runtime's envtest package which runs an instance of etcd and the Kubernetes API server for testing controllers. The test setup uses ginkgo as its test runner as well as gomega for assertions.

mocktest is also recommended when writing tests for controllers. The following is another contrived example of how to use its controller suite:

var _ = Describe("linode creation", func() {
  // Create a mocktest controller suite.
  suite := NewControllerSuite(GinkgoT(), mock.MockLinodeMachineClient{})

  obj := infrav1alpha1.LinodeMachine{
    ObjectMeta: metav1.ObjectMeta{/* ... */}
    Spec: infrav1alpha1.LinodeMachineSpec{/* ... */}
  }

  suite.Run(
    Once("create resource", func(ctx context.Context, _ Mock) {
      // Use the EnvTest k8sClient to create the resource in the test server
      Expect(k8sClient.Create(ctx, &obj).To(Succeed()))
    }),
    Call("create a linode", func(ctx context.Context, mck Mock) {
      mck.MachineClient.CreateInstance(ctx, gomock.Any(), gomock.Any()).Return(&linodego.Instance{/* ... */}, nil)
    }),
    Result("update the resource status after linode creation", func(ctx context.Context, mck Mock) {
      reconciler := LinodeMachineReconciler{
        // Configure the reconciler to use the mock client for this test path
        LinodeClient: mck.MachineClient,
        // Use a managed recorder for capturing events published during this test
        Recorder: mck.Recorder(),
        // Use a managed logger for capturing logs written during the test
        // Note: This isn't a real struct field in LinodeMachineReconciler. A logger is configured elsewhere.
        Logger: mck.Logger(),
      }

      _, err := reconciler.Reconcile(ctx, reconcile.Request{/* ... */})
      Expect(err).NotTo(HaveOccurred())
      
      // Fetch the updated object in the test server and confirm it was updated
      Expect(k8sClient.Get(ctx, client.ObjectKeyFromObject(obj))).To(Succeed())
      Expect(obj.Status.Ready).To(BeTrue())

      // Check for expected events and logs
      Expect(mck.Events()).To(ContainSubstring("Linode created!"))
      Expect(mck.Logs()).To(ContainSubstring("Linode created!"))
    }),
  )
})

E2E Tests

For e2e tests CAPL uses the Chainsaw project which leverages kind and tilt to spin up a cluster with the CAPL controllers installed and then uses chainsaw-test.yaml files to drive e2e testing.

All test live in the e2e folder with a directory structure of e2e/${COMPONENT}/${TEST_NAME}

Running Tests

In order to run e2e tests run the following commands:

# Required env vars to run e2e tests
export INSTALL_K3S_PROVIDER=true
export INSTALL_RKE2_PROVIDER=true
export LINODE_REGION=us-sea
export LINODE_CONTROL_PLANE_MACHINE_TYPE=g6-standard-2
export LINODE_MACHINE_TYPE=g6-standard-2

# IMPORTANT: Set linode, k3s, and rke2 providers in this config file.
# Find an example at e2e/gha-clusterctl-config.yaml
export CLUSTERCTL_CONFIG=~/.cluster-api/clusterctl.yaml

make e2etest

Note: By default make e2etest runs all the e2e tests defined under /e2e dir

In order to run specific test, you need to pass flags to chainsaw by setting env var E2E_SELECTOR

Additional settings can be passed to chainsaw by setting env var E2E_FLAGS

Example: Only running e2e tests for flavors (default, k3s, rke2)

make e2etest E2E_SELECTOR='flavors' E2E_FLAGS='--assert-timeout 10m0s'

Note: We need to bump up the assert timeout to 10 mins to allow the cluster to complete building and become available

There are other selectors you can use to invoke specfic tests. Please look at the table below for all the selectors available:

TestsSelector
All Testsall
All Controllersquick
All Flavors (default, k3s, rke2)flavors
K3S Clusterk3s
RKE2 Clusterrke2
Default (kubeadm) Clusterdefault-cluster
Linode Cluster Controllerlinodecluster
Linode Machine Controllerlinodemachine
Linode Obj Controllerlinodeobj
Linode VPC Controllerlinodevpc

Note: For any flavor e2e tests, please set the required env variables

Adding Tests

  1. Create a new directory under the controller you are testing with the naming scheme of e2e/${COMPONENT}/${TEST_NAME}
  2. Create a minimal chainsaw-test.yaml file in the new test dir
    # yaml-language-server: $schema=https://raw.githubusercontent.com/kyverno/chainsaw/main/.schemas/json/test-chainsaw-v1alpha1.json
    apiVersion: chainsaw.kyverno.io/v1alpha1
    kind: Test
    metadata:
      name: $TEST_NAME
    spec:
      template: true # set to true if you are going to use any chainsaw templating
      steps:
      - name: step-01
        try:
        - apply:
            file: ${resource_name}.yaml
        - assert:
            file: 01-assert.yaml
    
  3. Add any resources to create or assert on in the same directory

Reference

For reference documentation for CAPL API types, please refer to the godocs