summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--BINARIES.md680
-rw-r--r--BOOTSTRAP.md744
-rw-r--r--CERTIFICATES.md506
-rw-r--r--ISO-BUILDER.md849
4 files changed, 2779 insertions, 0 deletions
diff --git a/BINARIES.md b/BINARIES.md
new file mode 100644
index 0000000..9e9d21b
--- /dev/null
+++ b/BINARIES.md
@@ -0,0 +1,680 @@
+# Binary Acquisition and Packaging
+
+## Overview
+
+This document specifies exact versions, download sources, and installation paths for all software components in the cluster-from-systemd project.
+
+## Design Decision: Hybrid Package Strategy
+
+**Approach**: Use distro packages where stable, upstream binaries for latest features.
+
+**Rationale**:
+- **Distro packages** (dnf/yum): Base system, containerd, utilities (reliable, security updates)
+- **Upstream binaries**: Kubernetes, etcd (need specific versions, faster release cycle)
+- **Container images**: Kafka, Ceph (easier to version-lock, upstream-supported)
+- **Compiled from source**: Only if unavoidable (minimize complexity)
+
+## Base Distribution
+
+**Choice**: Rocky Linux 9.3 (or Fedora 39)
+
+**Rationale**:
+- RHEL-compatible (enterprise acceptance)
+- systemd 252+ (needed features)
+- SELinux integration
+- Long support lifecycle (Rocky: ~10 years)
+
+**Alternative**: Fedora 39 for bleeding-edge testing, migrate to Rocky for production.
+
+## Component Versions (2025-01-15 snapshot)
+
+### Kubernetes Cluster
+- **Kubernetes**: v1.29.1 (upstream binaries)
+- **etcd**: v3.5.11 (upstream binaries)
+- **containerd**: 1.7.11 (distro package)
+- **runc**: 1.1.10 (distro package)
+- **CNI plugins**: v1.4.0 (upstream binaries)
+- **Calico**: v3.27.0 (manifest/container images)
+
+### Storage (Ceph)
+- **Ceph**: Reef 18.2.1 (distro package from CentOS SIG)
+- **ceph-common**: 18.2.1
+- **ceph-mon**: 18.2.1
+- **ceph-osd**: 18.2.1
+- **ceph-mds**: 18.2.1
+
+### Messaging
+- **Kafka**: 3.6.1 (upstream tarball)
+- **OpenJDK**: 17.0.9 (distro package - required by Kafka)
+- **Mosquitto**: 2.0.18 (distro package)
+
+### DNS
+- **CoreDNS**: v1.11.1 (upstream binary)
+
+### Utilities
+- **yq**: v4.40.5 (YAML processor)
+- **jq**: 1.6 (distro package)
+- **curl**: latest (distro)
+- **openssl**: latest (distro)
+
+## Binary Installation Paths
+
+```
+/usr/local/bin/
+├── kubeadm # Kubernetes cluster bootstrapper
+├── kubectl # Kubernetes CLI
+├── kubelet # Kubernetes node agent
+├── kube-apiserver # Kubernetes API server
+├── kube-controller-manager # Kubernetes controller
+├── kube-scheduler # Kubernetes scheduler
+├── etcd # etcd binary
+├── etcdctl # etcd CLI
+├── coredns # CoreDNS binary
+└── yq # YAML processor
+
+/usr/bin/ (via distro packages)
+├── containerd
+├── ctr
+├── runc
+├── ceph
+├── ceph-mon
+├── ceph-osd
+├── mosquitto
+├── mosquitto_passwd
+├── java (OpenJDK)
+└── jq
+
+/opt/
+├── kafka/ # Kafka installation
+│ ├── bin/
+│ │ ├── kafka-server-start.sh
+│ │ ├── kafka-topics.sh
+│ │ └── ...
+│ ├── config/
+│ │ └── kraft/
+│ └── libs/
+└── cni/ # CNI plugins
+ └── bin/
+ ├── bridge
+ ├── host-local
+ ├── loopback
+ └── ...
+```
+
+## Download Sources and Installation
+
+### 1. Kubernetes Binaries
+
+**Version**: v1.29.1
+**Source**: https://github.com/kubernetes/kubernetes/releases
+**Architecture**: amd64 (x86_64)
+
+```bash
+#!/bin/bash
+# tools/download-kubernetes.sh
+
+KUBE_VERSION="v1.29.1"
+ARCH="amd64"
+BASE_URL="https://dl.k8s.io/release/${KUBE_VERSION}/bin/linux/${ARCH}"
+
+BINARIES=(
+ "kubeadm"
+ "kubectl"
+ "kubelet"
+ "kube-apiserver"
+ "kube-controller-manager"
+ "kube-scheduler"
+ "kube-proxy"
+)
+
+for binary in "${BINARIES[@]}"; do
+ echo "Downloading $binary..."
+ curl -L "${BASE_URL}/${binary}" -o "/usr/local/bin/${binary}"
+ chmod +x "/usr/local/bin/${binary}"
+done
+
+# Verify
+kubelet --version
+kubectl version --client
+```
+
+**Checksums**: Download and verify
+```bash
+curl -L "${BASE_URL}/kubelet.sha256" -o kubelet.sha256
+echo "$(cat kubelet.sha256) /usr/local/bin/kubelet" | sha256sum --check
+```
+
+### 2. etcd
+
+**Version**: v3.5.11
+**Source**: https://github.com/etcd-io/etcd/releases
+
+```bash
+#!/bin/bash
+# tools/download-etcd.sh
+
+ETCD_VERSION="v3.5.11"
+ARCH="amd64"
+TARBALL="etcd-${ETCD_VERSION}-linux-${ARCH}.tar.gz"
+URL="https://github.com/etcd-io/etcd/releases/download/${ETCD_VERSION}/${TARBALL}"
+
+curl -L "$URL" -o "/tmp/${TARBALL}"
+tar xzf "/tmp/${TARBALL}" -C /tmp
+mv /tmp/etcd-${ETCD_VERSION}-linux-${ARCH}/etcd /usr/local/bin/
+mv /tmp/etcd-${ETCD_VERSION}-linux-${ARCH}/etcdctl /usr/local/bin/
+chmod +x /usr/local/bin/etcd /usr/local/bin/etcdctl
+
+# Verify
+etcd --version
+etcdctl version
+```
+
+### 3. containerd + runc (Distro Packages)
+
+**Source**: Rocky Linux 9 AppStream repository
+
+```bash
+#!/bin/bash
+# tools/install-container-runtime.sh
+
+dnf install -y \
+ containerd.io \
+ runc \
+ containernetworking-plugins
+
+# Or from Docker repo for newer version
+dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
+dnf install -y containerd.io
+
+# Configure containerd
+mkdir -p /etc/containerd
+containerd config default > /etc/containerd/config.toml
+
+# Enable systemd cgroup driver (required for Kubernetes)
+sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
+
+systemctl enable containerd
+```
+
+### 4. CNI Plugins
+
+**Version**: v1.4.0
+**Source**: https://github.com/containernetworking/plugins/releases
+
+```bash
+#!/bin/bash
+# tools/download-cni-plugins.sh
+
+CNI_VERSION="v1.4.0"
+ARCH="amd64"
+CNI_TARBALL="cni-plugins-linux-${ARCH}-${CNI_VERSION}.tgz"
+URL="https://github.com/containernetworking/plugins/releases/download/${CNI_VERSION}/${CNI_TARBALL}"
+
+mkdir -p /opt/cni/bin
+curl -L "$URL" -o "/tmp/${CNI_TARBALL}"
+tar xzf "/tmp/${CNI_TARBALL}" -C /opt/cni/bin
+
+chmod +x /opt/cni/bin/*
+ls -lh /opt/cni/bin/
+```
+
+### 5. Calico CNI
+
+**Version**: v3.27.0
+**Source**: https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises
+
+**Installation method**: Kubernetes manifest (applied after cluster init)
+
+```bash
+#!/bin/bash
+# tools/install-calico.sh
+# Run AFTER first master is initialized
+
+CALICO_VERSION="v3.27.0"
+kubectl apply -f "https://raw.githubusercontent.com/projectcalico/calico/${CALICO_VERSION}/manifests/calico.yaml"
+```
+
+**Embedded in ISO**: Download manifest at build time, apply at bootstrap.
+
+```bash
+# At ISO build time
+mkdir -p /etc/kubernetes/manifests/calico
+curl -L "https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml" \
+ -o /etc/kubernetes/manifests/calico/calico.yaml
+```
+
+### 6. Ceph (Distro Packages)
+
+**Version**: Reef 18.2.1
+**Source**: CentOS Storage SIG repository
+
+```bash
+#!/bin/bash
+# tools/install-ceph.sh
+
+# Add Ceph repository
+cat > /etc/yum.repos.d/ceph.repo <<EOF
+[ceph]
+name=Ceph packages for \$basearch
+baseurl=https://download.ceph.com/rpm-reef/el9/\$basearch
+enabled=1
+gpgcheck=1
+gpgkey=https://download.ceph.com/keys/release.asc
+
+[ceph-noarch]
+name=Ceph noarch packages
+baseurl=https://download.ceph.com/rpm-reef/el9/noarch
+enabled=1
+gpgcheck=1
+gpgkey=https://download.ceph.com/keys/release.asc
+EOF
+
+# Install Ceph packages
+dnf install -y \
+ ceph-common \
+ ceph-mon \
+ ceph-osd \
+ ceph-mds \
+ ceph-mgr \
+ python3-ceph-argparse \
+ python3-cephfs
+
+# Verify
+ceph --version
+```
+
+### 7. Kafka
+
+**Version**: 3.6.1
+**Source**: https://kafka.apache.org/downloads
+
+```bash
+#!/bin/bash
+# tools/install-kafka.sh
+
+KAFKA_VERSION="3.6.1"
+SCALA_VERSION="2.13"
+TARBALL="kafka_${SCALA_VERSION}-${KAFKA_VERSION}.tgz"
+URL="https://archive.apache.org/dist/kafka/${KAFKA_VERSION}/${TARBALL}"
+
+# Install Java (required)
+dnf install -y java-17-openjdk-headless
+
+# Download and extract Kafka
+curl -L "$URL" -o "/tmp/${TARBALL}"
+tar xzf "/tmp/${TARBALL}" -C /opt/
+mv "/opt/kafka_${SCALA_VERSION}-${KAFKA_VERSION}" /opt/kafka
+
+# Create symlink for easy updates
+ln -sf /opt/kafka /opt/kafka-current
+
+# Verify
+/opt/kafka/bin/kafka-topics.sh --version
+```
+
+### 8. Mosquitto (Distro Package)
+
+**Version**: 2.0.18
+**Source**: EPEL repository
+
+```bash
+#!/bin/bash
+# tools/install-mosquitto.sh
+
+# Enable EPEL
+dnf install -y epel-release
+
+# Install Mosquitto
+dnf install -y mosquitto mosquitto-clients
+
+# Verify
+mosquitto -h | head -1
+```
+
+### 9. CoreDNS
+
+**Version**: v1.11.1
+**Source**: https://github.com/coredns/coredns/releases
+
+```bash
+#!/bin/bash
+# tools/download-coredns.sh
+
+COREDNS_VERSION="1.11.1"
+ARCH="amd64"
+TARBALL="coredns_${COREDNS_VERSION}_linux_${ARCH}.tgz"
+URL="https://github.com/coredns/coredns/releases/download/v${COREDNS_VERSION}/${TARBALL}"
+
+curl -L "$URL" -o "/tmp/${TARBALL}"
+tar xzf "/tmp/${TARBALL}" -C /tmp
+mv /tmp/coredns /usr/local/bin/
+chmod +x /usr/local/bin/coredns
+
+# Verify
+coredns -version
+```
+
+### 10. Utilities
+
+```bash
+#!/bin/bash
+# tools/install-utilities.sh
+
+# Install from distro repos
+dnf install -y \
+ jq \
+ curl \
+ openssl \
+ iproute \
+ iputils \
+ bind-utils \
+ wget \
+ vim \
+ tmux \
+ htop
+
+# Install yq (YAML processor)
+YQ_VERSION="v4.40.5"
+YQ_BINARY="yq_linux_amd64"
+curl -L "https://github.com/mikefarah/yq/releases/download/${YQ_VERSION}/${YQ_BINARY}" \
+ -o /usr/local/bin/yq
+chmod +x /usr/local/bin/yq
+
+# Verify
+yq --version
+jq --version
+```
+
+## Complete Installation Script
+
+**Script**: `tools/install-all-binaries.sh`
+
+```bash
+#!/bin/bash
+# Master script to install all binaries in correct order
+
+set -euo pipefail
+
+echo "==> Installing base system packages..."
+dnf update -y
+dnf install -y epel-release
+
+echo "==> Installing container runtime..."
+./tools/install-container-runtime.sh
+
+echo "==> Installing Kubernetes binaries..."
+./tools/download-kubernetes.sh
+
+echo "==> Installing etcd..."
+./tools/download-etcd.sh
+
+echo "==> Installing CNI plugins..."
+./tools/download-cni-plugins.sh
+
+echo "==> Installing Ceph..."
+./tools/install-ceph.sh
+
+echo "==> Installing Kafka..."
+./tools/install-kafka.sh
+
+echo "==> Installing Mosquitto..."
+./tools/install-mosquitto.sh
+
+echo "==> Installing CoreDNS..."
+./tools/download-coredns.sh
+
+echo "==> Installing utilities..."
+./tools/install-utilities.sh
+
+echo "==> Downloading Calico manifest..."
+mkdir -p /etc/kubernetes/manifests/calico
+curl -L "https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml" \
+ -o /etc/kubernetes/manifests/calico/calico.yaml
+
+echo "==> Binary installation complete!"
+echo ""
+echo "Installed versions:"
+kubelet --version
+etcd --version
+containerd --version
+ceph --version
+java -version 2>&1 | head -1
+mosquitto -h 2>&1 | head -1
+coredns -version
+```
+
+## Package List for ISO Builder
+
+**File**: `configs/package-list.txt`
+
+```
+# Base system
+@core
+@standard
+kernel
+kernel-modules
+kernel-modules-extra
+systemd
+systemd-networkd
+systemd-resolved
+
+# Networking
+NetworkManager
+iproute
+iputils
+bind-utils
+iptables
+nftables
+
+# Container runtime (from Docker repo)
+containerd.io
+
+# Development tools (for troubleshooting)
+strace
+tcpdump
+lsof
+vim
+tmux
+htop
+
+# Python (for cluster scripts)
+python3
+python3-pip
+python3-pyyaml
+
+# Java (for Kafka)
+java-17-openjdk-headless
+
+# MQTT
+mosquitto
+mosquitto-clients
+
+# Utilities
+jq
+curl
+wget
+openssl
+tar
+gzip
+```
+
+## Dependency Tree
+
+```
+Kubernetes
+├── containerd
+│ └── runc
+├── CNI plugins
+└── etcd
+
+Ceph
+├── python3-ceph-argparse
+└── ceph-common
+
+Kafka
+└── java-17-openjdk-headless
+
+Mosquitto
+└── (no dependencies)
+
+CoreDNS
+└── (no dependencies)
+```
+
+## Version Lock Strategy
+
+**Problem**: Prevent unexpected version changes between builds.
+
+**Solution**: Pin exact versions in scripts, not "latest".
+
+**Update process**:
+1. Test new versions in dev environment
+2. Update version variables in download scripts
+3. Rebuild ISO
+4. Run integration tests
+5. Deploy to production
+
+## Verification Checklist
+
+Run after binary installation:
+
+```bash
+#!/bin/bash
+# tools/verify-binaries.sh
+
+ERRORS=0
+
+check_binary() {
+ local binary=$1
+ local version_flag=${2:---version}
+
+ if ! command -v "$binary" &>/dev/null; then
+ echo "ERROR: Binary not found: $binary"
+ ((ERRORS++))
+ else
+ echo "✓ $binary: $($binary $version_flag 2>&1 | head -1)"
+ fi
+}
+
+echo "==> Verifying installed binaries..."
+
+# Kubernetes
+check_binary kubelet --version
+check_binary kubectl version --client
+check_binary kube-apiserver --version
+check_binary kube-controller-manager --version
+check_binary kube-scheduler --version
+
+# Container runtime
+check_binary containerd --version
+check_binary runc --version
+
+# etcd
+check_binary etcd --version
+check_binary etcdctl version
+
+# Ceph
+check_binary ceph --version
+
+# Kafka
+if [[ -x /opt/kafka/bin/kafka-topics.sh ]]; then
+ echo "✓ Kafka: $(/opt/kafka/bin/kafka-topics.sh --version 2>&1)"
+else
+ echo "ERROR: Kafka not found"
+ ((ERRORS++))
+fi
+
+# Mosquitto
+check_binary mosquitto -h
+
+# CoreDNS
+check_binary coredns -version
+
+# Utilities
+check_binary yq --version
+check_binary jq --version
+
+# CNI plugins
+if [[ -d /opt/cni/bin ]]; then
+ echo "✓ CNI plugins: $(ls /opt/cni/bin | wc -l) binaries installed"
+else
+ echo "ERROR: CNI plugins directory not found"
+ ((ERRORS++))
+fi
+
+if [[ $ERRORS -gt 0 ]]; then
+ echo "==> Verification FAILED with $ERRORS errors"
+ exit 1
+fi
+
+echo "==> Verification PASSED"
+```
+
+## Storage Requirements
+
+**Disk space needed for binaries**:
+
+```
+Kubernetes binaries: ~500 MB
+etcd: ~30 MB
+containerd/runc: ~50 MB
+CNI plugins: ~40 MB
+Ceph packages: ~150 MB
+Kafka: ~100 MB
+Mosquitto: ~5 MB
+CoreDNS: ~50 MB
+Base system: ~2 GB
+-----------------------------------
+Total: ~3 GB
+
+Recommended ISO size: 8 GB (includes configs, logs, temp space)
+```
+
+## Update Strategy
+
+**Minor version updates** (e.g., 1.29.1 → 1.29.2):
+1. Update version variable in download script
+2. Rebuild ISO
+3. Test in staging
+
+**Major version updates** (e.g., 1.29 → 1.30):
+1. Review Kubernetes changelog
+2. Update API versions in manifests
+3. Test upgrade path (rolling update)
+4. Update documentation
+
+## Airgapped Installation
+
+For environments without internet access:
+
+```bash
+#!/bin/bash
+# tools/create-binary-bundle.sh
+# Download all binaries for offline installation
+
+BUNDLE_DIR="./binary-bundle"
+mkdir -p "$BUNDLE_DIR"
+
+# Download all binaries to local directory
+# Modify download scripts to save to $BUNDLE_DIR instead of /usr/local/bin
+
+# Create tarball
+tar czf binary-bundle.tar.gz "$BUNDLE_DIR"
+
+# On target system:
+# tar xzf binary-bundle.tar.gz
+# ./binary-bundle/install-offline.sh
+```
+
+## Next Steps
+
+1. Implement all download scripts in `tools/`
+2. Test binary installation on clean Rocky 9 VM
+3. Create version-lock manifest (SHA256 checksums)
+4. Integrate with ISO builder kickstart
+5. Add to validation workflow (`verify-binaries.sh`)
+6. Document upgrade procedures
+
+---
+
+**P.S.** This is the unglamorous but critical work—getting the right bits in the right place. Using distro packages where possible keeps maintenance sane; upstream binaries for Kubernetes give you version control. The scripts are copy-paste ready. Next doc should be **BOOTSTRAP.md** to define the initialization sequence.
diff --git a/BOOTSTRAP.md b/BOOTSTRAP.md
new file mode 100644
index 0000000..f200322
--- /dev/null
+++ b/BOOTSTRAP.md
@@ -0,0 +1,744 @@
+# Cluster Bootstrap and Initialization
+
+## Overview
+
+This document defines the **first-boot initialization sequence** that transforms individual nodes into a functioning cluster. Bootstrap is the most complex phase because nodes must coordinate without a working cluster.
+
+## The Bootstrap Problem
+
+**Challenge**: Kubernetes requires a control plane to join nodes, but the control plane doesn't exist yet.
+
+**Solution**: Designate one node as **first master**, which initializes the cluster. All other nodes join the existing cluster.
+
+## Bootstrap States
+
+Each node tracks its bootstrap state to avoid re-initialization on reboot.
+
+```
+/var/lib/cluster-state/
+├── bootstrap-state # Current state: uninitialized|bootstrapped|joined
+├── cluster-joined # Timestamp of successful join
+├── node-role # master-first|master-additional|worker
+└── etcd-member-id # etcd member ID (masters only)
+```
+
+## State Transitions
+
+```
+┌─────────────────┐
+│ Uninitialized │ ← Fresh install
+└────────┬────────┘
+ │
+ ▼
+ ┌────────────────────┐
+ │ Is first master? │
+ └─┬────────────────┬─┘
+ │ YES │ NO
+ ▼ ▼
+┌─────────────┐ ┌──────────────┐
+│ Bootstrap │ │ Wait for │
+│ Cluster │ │ First Master │
+└──────┬──────┘ └──────┬───────┘
+ │ │
+ ▼ ▼
+┌─────────────┐ ┌──────────────┐
+│ Bootstrapped│ │ Join Cluster │
+└──────┬──────┘ └──────┬───────┘
+ │ │
+ └────────┬────────┘
+ ▼
+ ┌───────────────┐
+ │ Cluster Ready │
+ └───────────────┘
+```
+
+## Determining First Master
+
+**Logic**: The first master is the **first master node in cluster.yaml** (sorted by node name).
+
+**Script**: `tools/am-i-first-master.sh`
+
+```bash
+#!/bin/bash
+# Determines if this node should bootstrap the cluster
+
+set -euo pipefail
+
+CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
+NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
+NODE_CONFIG="$CONFIG_DIR/nodes/$NODE_NAME.yaml"
+CLUSTER_CONFIG="$CONFIG_DIR/cluster.yaml"
+
+# Check if this node has master/control-plane role
+ROLES=$(yq eval '.node.roles[]' "$NODE_CONFIG")
+IS_MASTER=false
+[[ "$ROLES" =~ "master" || "$ROLES" =~ "control-plane" ]] && IS_MASTER=true
+
+if [[ "$IS_MASTER" != "true" ]]; then
+ echo "worker"
+ exit 0
+fi
+
+# Find all master nodes from cluster.yaml
+MASTER_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .name' "$CLUSTER_CONFIG" | sort)
+
+# Get the first master
+FIRST_MASTER=$(echo "$MASTER_NODES" | head -1)
+
+if [[ "$NODE_NAME" == "$FIRST_MASTER" ]]; then
+ echo "master-first"
+else
+ echo "master-additional"
+fi
+```
+
+## Bootstrap Sequence
+
+### Phase 1: Network Configuration (All Nodes)
+
+**Runs**: Before cluster-detect.service
+**Script**: `tools/configure-network.sh`
+
+```bash
+#!/bin/bash
+# Configure static IP from cluster.yaml
+
+set -euo pipefail
+
+CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
+TEMP_NODE_NAME="${1:-}" # During first boot, identity not yet known
+
+# Detect node identity (simplified version for network setup)
+if [[ -z "$TEMP_NODE_NAME" ]]; then
+ # Try MAC address detection
+ MY_MAC=$(ip link show | awk '/ether/ {print $2}' | head -1)
+
+ for node_file in "$CONFIG_DIR/nodes"/*.yaml; do
+ MAC_ADDRS=$(yq eval '.node.hardware.mac_addresses[]?' "$node_file" 2>/dev/null || echo "")
+ if echo "$MAC_ADDRS" | grep -qi "$MY_MAC"; then
+ TEMP_NODE_NAME=$(yq eval '.node.name' "$node_file")
+ break
+ fi
+ done
+fi
+
+if [[ -z "$TEMP_NODE_NAME" ]]; then
+ echo "ERROR: Cannot determine node identity for network config"
+ exit 1
+fi
+
+NODE_CONFIG="$CONFIG_DIR/nodes/$TEMP_NODE_NAME.yaml"
+NODE_IP=$(yq eval '.node.ip' "$NODE_CONFIG")
+NODE_HOSTNAME=$(yq eval '.node.hostname' "$NODE_CONFIG")
+
+# Get network interface (assume first non-loopback)
+IFACE=$(ip route | grep default | awk '{print $5}' | head -1)
+
+echo "==> Configuring network for $TEMP_NODE_NAME ($NODE_IP)"
+
+# Create NetworkManager connection
+nmcli connection delete "$IFACE" 2>/dev/null || true
+nmcli connection add \
+ type ethernet \
+ con-name "$IFACE" \
+ ifname "$IFACE" \
+ ip4 "$NODE_IP/24" \
+ gw4 "192.168.1.1" \
+ ipv4.dns "8.8.8.8,8.8.4.4" \
+ ipv4.method manual
+
+nmcli connection up "$IFACE"
+
+# Set hostname
+hostnamectl set-hostname "$NODE_HOSTNAME"
+
+# Add to /etc/hosts
+grep -q "$NODE_HOSTNAME" /etc/hosts || \
+ echo "$NODE_IP $NODE_HOSTNAME" >> /etc/hosts
+
+echo "==> Network configured: $NODE_IP ($NODE_HOSTNAME)"
+```
+
+### Phase 2: First Master Bootstrap
+
+**Runs**: After cluster-detect, if first master and uninitialized
+**Script**: `tools/bootstrap-first-master.sh`
+
+```bash
+#!/bin/bash
+# Initialize Kubernetes cluster on first master node
+
+set -euo pipefail
+
+CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
+STATE_DIR="/var/lib/cluster-state"
+mkdir -p "$STATE_DIR"
+
+# Check if already bootstrapped
+if [[ -f "$STATE_DIR/bootstrap-state" ]]; then
+ STATE=$(cat "$STATE_DIR/bootstrap-state")
+ if [[ "$STATE" == "bootstrapped" ]]; then
+ echo "==> Cluster already bootstrapped, skipping"
+ exit 0
+ fi
+fi
+
+NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
+NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml")
+CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml")
+POD_CIDR=$(yq eval '.cluster.networking.pod_cidr' "$CONFIG_DIR/cluster.yaml")
+SERVICE_CIDR=$(yq eval '.cluster.networking.service_cidr' "$CONFIG_DIR/cluster.yaml")
+
+echo "==> Bootstrapping Kubernetes cluster on first master: $NODE_NAME"
+
+# 1. Initialize etcd cluster
+echo "==> Initializing etcd..."
+mkdir -p /var/lib/etcd
+
+# Create etcd initial cluster list
+MASTER_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .name + "=https://" + .ip + ":2380"' "$CONFIG_DIR/cluster.yaml" | tr '\n' ',' | sed 's/,$//')
+
+cat > /tmp/etcd-bootstrap.env <<EOF
+ETCD_NAME=$NODE_NAME
+ETCD_DATA_DIR=/var/lib/etcd
+ETCD_LISTEN_PEER_URLS=https://$NODE_IP:2380
+ETCD_LISTEN_CLIENT_URLS=https://$NODE_IP:2379,https://127.0.0.1:2379
+ETCD_INITIAL_ADVERTISE_PEER_URLS=https://$NODE_IP:2380
+ETCD_ADVERTISE_CLIENT_URLS=https://$NODE_IP:2379
+ETCD_INITIAL_CLUSTER=$MASTER_NODES
+ETCD_INITIAL_CLUSTER_STATE=new
+ETCD_INITIAL_CLUSTER_TOKEN=$CLUSTER_NAME-etcd
+ETCD_CERT_FILE=/etc/kubernetes/pki/etcd/server.crt
+ETCD_KEY_FILE=/etc/kubernetes/pki/etcd/server.key
+ETCD_CLIENT_CERT_AUTH=true
+ETCD_TRUSTED_CA_FILE=/etc/kubernetes/pki/etcd/ca.crt
+ETCD_PEER_CERT_FILE=/etc/kubernetes/pki/etcd/peer.crt
+ETCD_PEER_KEY_FILE=/etc/kubernetes/pki/etcd/peer.key
+ETCD_PEER_CLIENT_CERT_AUTH=true
+ETCD_PEER_TRUSTED_CA_FILE=/etc/kubernetes/pki/etcd/ca.crt
+EOF
+
+# Start etcd
+systemctl start etcd
+sleep 5
+
+# Verify etcd is healthy
+etcdctl --endpoints=https://127.0.0.1:2379 \
+ --cacert=/etc/kubernetes/pki/etcd/ca.crt \
+ --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
+ --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
+ endpoint health
+
+# 2. Start API server
+echo "==> Starting kube-apiserver..."
+systemctl start kube-apiserver
+sleep 10
+
+# Wait for API server to be ready
+for i in {1..30}; do
+ if curl -k https://127.0.0.1:6443/healthz &>/dev/null; then
+ echo "==> API server is ready"
+ break
+ fi
+ echo "Waiting for API server... ($i/30)"
+ sleep 2
+done
+
+# 3. Start controller manager and scheduler
+echo "==> Starting control plane components..."
+systemctl start kube-controller-manager
+systemctl start kube-scheduler
+
+# 4. Create initial kubeconfig for kubectl
+mkdir -p ~/.kube
+cat > ~/.kube/config <<EOF
+apiVersion: v1
+kind: Config
+clusters:
+- cluster:
+ certificate-authority: /etc/kubernetes/pki/ca.crt
+ server: https://127.0.0.1:6443
+ name: $CLUSTER_NAME
+contexts:
+- context:
+ cluster: $CLUSTER_NAME
+ user: kubernetes-admin
+ name: kubernetes-admin@$CLUSTER_NAME
+current-context: kubernetes-admin@$CLUSTER_NAME
+users:
+- name: kubernetes-admin
+ user:
+ client-certificate: /etc/kubernetes/pki/apiserver-kubelet-client.crt
+ client-key: /etc/kubernetes/pki/apiserver-kubelet-client.key
+EOF
+
+# 5. Install Calico CNI
+echo "==> Installing Calico CNI..."
+kubectl apply -f /etc/kubernetes/manifests/calico/calico.yaml
+
+# Wait for Calico pods to be ready
+echo "==> Waiting for Calico to be ready..."
+kubectl wait --for=condition=ready pod -l k8s-app=calico-node -n kube-system --timeout=300s || true
+
+# 6. Start kubelet on this node
+echo "==> Starting kubelet..."
+systemctl start kubelet
+
+# 7. Create bootstrap tokens for worker nodes
+echo "==> Creating bootstrap token..."
+TOKEN=$(openssl rand -hex 3).$(openssl rand -hex 8)
+echo "$TOKEN" > "$STATE_DIR/bootstrap-token"
+
+# Create token secret in Kubernetes
+kubectl create secret generic bootstrap-token-${TOKEN%.*} \
+ --type bootstrap.kubernetes.io/token \
+ --namespace kube-system \
+ --from-literal=token-id=${TOKEN%.*} \
+ --from-literal=token-secret=${TOKEN#*.} \
+ --from-literal=usage-bootstrap-authentication=true \
+ --from-literal=usage-bootstrap-signing=true
+
+# 8. Create join command for workers
+CA_CERT_HASH=$(openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | \
+ openssl rsa -pubin -outform der 2>/dev/null | \
+ openssl dgst -sha256 -hex | sed 's/^.* //')
+
+cat > "$STATE_DIR/worker-join-command" <<EOF
+# Run this on worker nodes to join the cluster
+kubeadm join $NODE_IP:6443 --token $TOKEN --discovery-token-ca-cert-hash sha256:$CA_CERT_HASH
+EOF
+
+echo "==> Bootstrap complete!"
+echo "bootstrap-state: bootstrapped" > "$STATE_DIR/bootstrap-state"
+echo "$NODE_NAME" > "$STATE_DIR/cluster-initialized-by"
+date -Iseconds > "$STATE_DIR/cluster-joined"
+
+# 9. Distribute join info to shared location (optional: NFS, HTTP server, etc.)
+# For now, workers must manually fetch from first master
+mkdir -p /var/www/html/cluster-join
+cp "$STATE_DIR/worker-join-command" /var/www/html/cluster-join/
+cp "$STATE_DIR/bootstrap-token" /var/www/html/cluster-join/
+
+echo "==> Worker join command saved to: $STATE_DIR/worker-join-command"
+```
+
+### Phase 3: Additional Master Join
+
+**Runs**: After cluster-detect, if additional master and first master is ready
+**Script**: `tools/join-additional-master.sh`
+
+```bash
+#!/bin/bash
+# Join additional master nodes to existing etcd cluster
+
+set -euo pipefail
+
+CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
+STATE_DIR="/var/lib/cluster-state"
+mkdir -p "$STATE_DIR"
+
+# Check if already joined
+if [[ -f "$STATE_DIR/bootstrap-state" ]]; then
+ STATE=$(cat "$STATE_DIR/bootstrap-state")
+ if [[ "$STATE" == "joined" ]]; then
+ echo "==> Already joined cluster, skipping"
+ exit 0
+ fi
+fi
+
+NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
+NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml")
+CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml")
+
+# Get first master IP
+FIRST_MASTER_IP=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .ip' "$CONFIG_DIR/cluster.yaml" | head -1)
+
+echo "==> Joining as additional master: $NODE_NAME"
+echo "==> First master: $FIRST_MASTER_IP"
+
+# Wait for first master API server to be ready
+echo "==> Waiting for first master API server..."
+for i in {1..60}; do
+ if curl -k https://$FIRST_MASTER_IP:6443/healthz &>/dev/null; then
+ echo "==> First master is ready"
+ break
+ fi
+ echo "Waiting for first master... ($i/60)"
+ sleep 5
+done
+
+# Add this node to etcd cluster
+echo "==> Adding node to etcd cluster..."
+
+# etcd add member command (run on first master via ssh or API)
+# For now, assume etcdctl is available and certs are distributed
+ETCD_ENDPOINTS="https://$FIRST_MASTER_IP:2379"
+
+etcdctl --endpoints=$ETCD_ENDPOINTS \
+ --cacert=/etc/kubernetes/pki/etcd/ca.crt \
+ --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
+ --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
+ member add "$NODE_NAME" --peer-urls="https://$NODE_IP:2380"
+
+# Start etcd in "existing" mode
+cat > /tmp/etcd-join.env <<EOF
+ETCD_NAME=$NODE_NAME
+ETCD_DATA_DIR=/var/lib/etcd
+ETCD_INITIAL_CLUSTER_STATE=existing
+# ... (same as bootstrap but with "existing")
+EOF
+
+systemctl start etcd
+
+# Start control plane components
+systemctl start kube-apiserver
+systemctl start kube-controller-manager
+systemctl start kube-scheduler
+systemctl start kubelet
+
+echo "joined" > "$STATE_DIR/bootstrap-state"
+date -Iseconds > "$STATE_DIR/cluster-joined"
+
+echo "==> Additional master joined successfully"
+```
+
+### Phase 4: Worker Join
+
+**Runs**: After cluster-detect, if worker node
+**Script**: `tools/join-worker.sh`
+
+```bash
+#!/bin/bash
+# Join worker node to Kubernetes cluster
+
+set -euo pipefail
+
+CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
+STATE_DIR="/var/lib/cluster-state"
+mkdir -p "$STATE_DIR"
+
+# Check if already joined
+if [[ -f "$STATE_DIR/bootstrap-state" ]]; then
+ STATE=$(cat "$STATE_DIR/bootstrap-state")
+ if [[ "$STATE" == "joined" ]]; then
+ echo "==> Already joined cluster, skipping"
+ exit 0
+ fi
+fi
+
+NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
+
+# Get first master IP
+FIRST_MASTER_IP=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .ip' "$CONFIG_DIR/cluster.yaml" | head -1)
+
+echo "==> Joining worker node: $NODE_NAME"
+echo "==> API server: https://$FIRST_MASTER_IP:6443"
+
+# Wait for API server
+for i in {1..60}; do
+ if curl -k https://$FIRST_MASTER_IP:6443/healthz &>/dev/null; then
+ echo "==> API server is ready"
+ break
+ fi
+ echo "Waiting for API server... ($i/60)"
+ sleep 5
+done
+
+# Fetch join token from first master (requires setup)
+# Option 1: HTTP endpoint on first master
+# Option 2: Shared NFS mount
+# Option 3: Embedded in cluster.yaml (less secure)
+
+# For now, assume token is in cluster.yaml or fetched via curl
+TOKEN=$(curl -s http://$FIRST_MASTER_IP/cluster-join/bootstrap-token 2>/dev/null || echo "")
+
+if [[ -z "$TOKEN" ]]; then
+ echo "ERROR: Cannot fetch bootstrap token from first master"
+ echo "Manual join required. Run on first master:"
+ echo " kubeadm token create --print-join-command"
+ exit 1
+fi
+
+CA_CERT_HASH=$(openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | \
+ openssl rsa -pubin -outform der 2>/dev/null | \
+ openssl dgst -sha256 -hex | sed 's/^.* //')
+
+# Join cluster (kubeadm-style, but we're using raw kubelet)
+# Create kubelet bootstrap kubeconfig
+cat > /etc/kubernetes/bootstrap-kubelet.conf <<EOF
+apiVersion: v1
+kind: Config
+clusters:
+- cluster:
+ certificate-authority: /etc/kubernetes/pki/ca.crt
+ server: https://$FIRST_MASTER_IP:6443
+ name: kubernetes
+contexts:
+- context:
+ cluster: kubernetes
+ user: kubelet-bootstrap
+ name: kubelet-bootstrap@kubernetes
+current-context: kubelet-bootstrap@kubernetes
+users:
+- name: kubelet-bootstrap
+ user:
+ token: $TOKEN
+EOF
+
+# Start kubelet (will auto-generate client cert via CSR)
+systemctl start kubelet
+
+# Auto-approve CSR (on first master, create script to approve)
+# kubectl certificate approve <csr-name>
+
+echo "joined" > "$STATE_DIR/bootstrap-state"
+date -Iseconds > "$STATE_DIR/cluster-joined"
+
+echo "==> Worker joined successfully"
+echo "==> Check status on master: kubectl get nodes"
+```
+
+### Phase 5: Ceph Cluster Bootstrap
+
+**Runs**: After Kubernetes is ready, on first ceph-mon node
+**Script**: `tools/bootstrap-ceph.sh`
+
+```bash
+#!/bin/bash
+# Bootstrap Ceph cluster on first monitor node
+
+set -euo pipefail
+
+CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
+STATE_DIR="/var/lib/cluster-state"
+
+# Check if already initialized
+if [[ -f "$STATE_DIR/ceph-bootstrapped" ]]; then
+ echo "==> Ceph already bootstrapped"
+ exit 0
+fi
+
+NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
+NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml")
+CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml")
+
+# Check if this is the first ceph-mon node
+CEPH_MON_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("ceph-mon")) | .name' "$CONFIG_DIR/cluster.yaml" | sort)
+FIRST_MON=$(echo "$CEPH_MON_NODES" | head -1)
+
+if [[ "$NODE_NAME" != "$FIRST_MON" ]]; then
+ echo "==> Not first monitor, skipping Ceph bootstrap"
+ exit 0
+fi
+
+echo "==> Bootstrapping Ceph cluster on first monitor: $NODE_NAME"
+
+# Generate Ceph keys (if not already done)
+/usr/local/bin/generate-ceph-keys.sh
+
+FSID=$(cat /etc/ceph/cluster.fsid)
+
+# Create ceph.conf
+cat > /etc/ceph/ceph.conf <<EOF
+[global]
+fsid = $FSID
+mon initial members = $FIRST_MON
+mon host = $NODE_IP
+public network = 192.168.1.0/24
+cluster network = 192.168.1.0/24
+auth cluster required = cephx
+auth service required = cephx
+auth client required = cephx
+osd pool default size = 3
+osd pool default min size = 2
+EOF
+
+# Create monitor map
+monmaptool --create --add $FIRST_MON $NODE_IP --fsid $FSID /tmp/monmap
+
+# Initialize monitor data directory
+mkdir -p /var/lib/ceph/mon/ceph-$FIRST_MON
+ceph-mon --mkfs -i $FIRST_MON --monmap /tmp/monmap --keyring /etc/ceph/ceph.mon.keyring
+
+# Start monitor
+systemctl start ceph-mon@$FIRST_MON
+
+# Wait for mon to be ready
+sleep 10
+
+# Enable manager module
+ceph mgr module enable dashboard
+
+echo "==> Ceph cluster bootstrapped"
+date -Iseconds > "$STATE_DIR/ceph-bootstrapped"
+```
+
+## Bootstrap Coordination Service
+
+**New systemd unit**: `cluster-bootstrap.service`
+
+```ini
+[Unit]
+Description=Cluster Bootstrap Orchestrator
+After=cluster-detect.service network-online.target
+Wants=network-online.target
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+ExecStart=/usr/local/bin/cluster-bootstrap-orchestrator.sh
+
+[Install]
+WantedBy=multi-user.target
+```
+
+**Script**: `tools/cluster-bootstrap-orchestrator.sh`
+
+```bash
+#!/bin/bash
+# Main orchestrator that calls appropriate bootstrap script based on node role
+
+set -euo pipefail
+
+NODE_ROLE=$(/usr/local/bin/am-i-first-master.sh)
+
+case "$NODE_ROLE" in
+ master-first)
+ echo "==> Detected as first master, bootstrapping cluster..."
+ /usr/local/bin/bootstrap-first-master.sh
+ ;;
+ master-additional)
+ echo "==> Detected as additional master, joining cluster..."
+ /usr/local/bin/join-additional-master.sh
+ ;;
+ worker)
+ echo "==> Detected as worker, joining cluster..."
+ /usr/local/bin/join-worker.sh
+ ;;
+ *)
+ echo "ERROR: Unknown node role: $NODE_ROLE"
+ exit 1
+ ;;
+esac
+
+# If node has ceph-mon role, bootstrap Ceph
+ROLES=$(yq eval '.node.roles[]' /etc/cluster-config/current-node.yaml)
+if [[ "$ROLES" =~ "ceph-mon" ]]; then
+ /usr/local/bin/bootstrap-ceph.sh
+fi
+```
+
+## Bootstrap Token Distribution
+
+**Problem**: Workers need the bootstrap token, but first master generates it.
+
+**Solutions**:
+
+### Option 1: HTTP Server on First Master
+```bash
+# On first master, start simple HTTP server
+python3 -m http.server 8080 -d /var/www/html/cluster-join &
+
+# On workers
+curl http://<first-master>:8080/bootstrap-token
+```
+
+### Option 2: Embedded in cluster.yaml (Less Secure)
+```yaml
+# configs/cluster.yaml
+cluster:
+ bootstrap:
+ token: "abcdef.0123456789abcdef" # Pre-generated
+```
+
+### Option 3: External Secret Store
+- First master uploads token to Vault/S3
+- Workers fetch using node identity credentials
+
+## Failure Recovery
+
+### Bootstrap Failed Midway
+
+```bash
+#!/bin/bash
+# tools/reset-bootstrap.sh
+# Clean up partial bootstrap to retry
+
+systemctl stop kubelet kube-apiserver kube-controller-manager kube-scheduler etcd
+rm -rf /var/lib/etcd/*
+rm -rf /var/lib/kubelet/*
+rm -f /var/lib/cluster-state/bootstrap-state
+echo "==> Bootstrap state reset. Reboot to retry."
+```
+
+### Additional Master Can't Join
+
+1. Check etcd cluster health on first master
+2. Verify certificates are correct (CN, SANs)
+3. Check network connectivity (port 2380, 6443)
+4. Review etcd logs: `journalctl -u etcd`
+
+## Testing Bootstrap
+
+### VM Test Scenario
+
+```bash
+# Create 3 VMs: master-01, worker-01, worker-02
+# 1. Boot master-01, wait for bootstrap
+# 2. Verify: kubectl get nodes (should show master-01)
+# 3. Boot worker-01
+# 4. Verify: kubectl get nodes (should show master-01, worker-01)
+# 5. Boot worker-02
+# 6. Verify: kubectl get nodes (should show all 3)
+```
+
+## Monitoring Bootstrap Progress
+
+**Script**: `tools/check-bootstrap-status.sh`
+
+```bash
+#!/bin/bash
+# Check current bootstrap status
+
+STATE_DIR="/var/lib/cluster-state"
+
+if [[ ! -f "$STATE_DIR/bootstrap-state" ]]; then
+ echo "Status: UNINITIALIZED"
+ exit 0
+fi
+
+STATE=$(cat "$STATE_DIR/bootstrap-state")
+echo "Bootstrap State: $STATE"
+
+if [[ -f "$STATE_DIR/cluster-joined" ]]; then
+ echo "Joined At: $(cat "$STATE_DIR/cluster-joined")"
+fi
+
+# Check service status
+echo ""
+echo "Service Status:"
+systemctl is-active etcd 2>/dev/null && echo " etcd: running" || echo " etcd: stopped"
+systemctl is-active kube-apiserver 2>/dev/null && echo " kube-apiserver: running" || echo " kube-apiserver: stopped"
+systemctl is-active kubelet 2>/dev/null && echo " kubelet: running" || echo " kubelet: stopped"
+
+# Check Kubernetes node status
+if command -v kubectl &>/dev/null; then
+ echo ""
+ echo "Kubernetes Nodes:"
+ kubectl get nodes 2>/dev/null || echo " API server not reachable"
+fi
+```
+
+## Next Steps
+
+1. Implement all bootstrap scripts
+2. Test first master initialization
+3. Test worker join
+4. Test multi-master etcd cluster formation
+5. Integrate with ISO builder
+6. Add bootstrap progress logging
+7. Create troubleshooting guide
+
+---
+
+**P.S.** This is where theory meets reality. Bootstrap is the gnarliest part because you're building the control plane with no control plane. The state machine approach (first-master vs additional-master vs worker) keeps it manageable. Token distribution is still a bit hand-wavy—recommend starting with HTTP server approach, upgrade to Vault later.
diff --git a/CERTIFICATES.md b/CERTIFICATES.md
new file mode 100644
index 0000000..26ca71b
--- /dev/null
+++ b/CERTIFICATES.md
@@ -0,0 +1,506 @@
+# Certificate and Key Management
+
+## Overview
+
+This document specifies the complete PKI infrastructure required for the cluster-from-systemd project. All services requiring mutual TLS authentication or encrypted communication need certificates generated before they can start.
+
+## Design Decision: Hybrid Approach
+
+**Strategy**: Pre-generate CA and static certs at ISO build time, generate node-specific certs on first boot.
+
+**Rationale**:
+- **Build-time**: CA certificates, service account keys (shared across cluster)
+- **Boot-time**: Node-specific certs (kubelet, etcd peer certs tied to IP/hostname)
+- **Avoids**: Chicken-and-egg problem with multi-master coordination
+- **Security**: CA private key embedded in ISO (acceptable for isolated clusters; can be removed post-bootstrap for production)
+
+## Certificate Inventory
+
+### Kubernetes PKI (10 certificate pairs + 1 key)
+
+```
+/etc/kubernetes/pki/
+├── ca.crt # Kubernetes CA certificate
+├── ca.key # Kubernetes CA private key
+├── apiserver.crt # API server certificate
+├── apiserver.key # API server private key
+├── apiserver-kubelet-client.crt # API server → kubelet client cert
+├── apiserver-kubelet-client.key
+├── front-proxy-ca.crt # Front proxy CA
+├── front-proxy-ca.key
+├── front-proxy-client.crt # Front proxy client
+├── front-proxy-client.key
+├── sa.pub # Service account public key
+├── sa.key # Service account private key (RSA)
+└── etcd/
+ ├── ca.crt # etcd CA certificate
+ ├── ca.key # etcd CA private key
+ ├── server.crt # etcd server certificate (PER NODE)
+ ├── server.key # etcd server private key (PER NODE)
+ ├── peer.crt # etcd peer certificate (PER NODE)
+ ├── peer.key # etcd peer private key (PER NODE)
+ ├── healthcheck-client.crt # etcd health check client
+ └── healthcheck-client.key
+```
+
+**Per-node certificates** (generated at boot):
+```
+/var/lib/kubelet/pki/
+├── kubelet.crt # Node-specific kubelet server cert
+├── kubelet.key # Node-specific kubelet private key
+└── kubelet-client-current.pem # Kubelet client cert (CSR auto-rotation)
+```
+
+### Ceph Authentication (4 keyrings)
+
+```
+/etc/ceph/
+├── ceph.conf # Cluster configuration
+├── ceph.client.admin.keyring # Admin keyring
+├── ceph.mon.keyring # Monitor keyring
+├── ceph.bootstrap-osd.keyring # OSD bootstrap keyring
+└── ceph.bootstrap-mds.keyring # MDS bootstrap keyring
+```
+
+### MQTT Authentication (1 password file)
+
+```
+/etc/mosquitto/
+└── passwd # Password file (mosquitto_passwd format)
+```
+
+## Certificate Generation Scripts
+
+### 1. Build-Time: Generate Cluster-Wide Certs
+
+**Script**: `tools/generate-build-certs.sh`
+
+```bash
+#!/bin/bash
+# Generates all CA certificates and shared keys at ISO build time
+# Output: certs/ directory to be embedded in ISO
+
+set -euo pipefail
+
+CERT_DIR="${1:-./certs}"
+mkdir -p "$CERT_DIR"/{kubernetes/pki/etcd,ceph,mqtt}
+
+echo "==> Generating Kubernetes CA certificates..."
+
+# 1. Kubernetes CA
+openssl genrsa -out "$CERT_DIR/kubernetes/pki/ca.key" 2048
+openssl req -x509 -new -nodes -key "$CERT_DIR/kubernetes/pki/ca.key" \
+ -subj "/CN=kubernetes-ca" \
+ -days 3650 -out "$CERT_DIR/kubernetes/pki/ca.crt"
+
+# 2. Front Proxy CA
+openssl genrsa -out "$CERT_DIR/kubernetes/pki/front-proxy-ca.key" 2048
+openssl req -x509 -new -nodes -key "$CERT_DIR/kubernetes/pki/front-proxy-ca.key" \
+ -subj "/CN=kubernetes-front-proxy-ca" \
+ -days 3650 -out "$CERT_DIR/kubernetes/pki/front-proxy-ca.crt"
+
+# 3. etcd CA
+openssl genrsa -out "$CERT_DIR/kubernetes/pki/etcd/ca.key" 2048
+openssl req -x509 -new -nodes -key "$CERT_DIR/kubernetes/pki/etcd/ca.key" \
+ -subj "/CN=etcd-ca" \
+ -days 3650 -out "$CERT_DIR/kubernetes/pki/etcd/ca.crt"
+
+# 4. Service Account Key Pair (RSA for signing JWTs)
+openssl genrsa -out "$CERT_DIR/kubernetes/pki/sa.key" 2048
+openssl rsa -in "$CERT_DIR/kubernetes/pki/sa.key" \
+ -pubout -out "$CERT_DIR/kubernetes/pki/sa.pub"
+
+# 5. Front Proxy Client (static - same for all API servers)
+openssl genrsa -out "$CERT_DIR/kubernetes/pki/front-proxy-client.key" 2048
+openssl req -new -key "$CERT_DIR/kubernetes/pki/front-proxy-client.key" \
+ -subj "/CN=front-proxy-client" \
+ -out "$CERT_DIR/kubernetes/pki/front-proxy-client.csr"
+openssl x509 -req -in "$CERT_DIR/kubernetes/pki/front-proxy-client.csr" \
+ -CA "$CERT_DIR/kubernetes/pki/front-proxy-ca.crt" \
+ -CAkey "$CERT_DIR/kubernetes/pki/front-proxy-ca.key" \
+ -CAcreateserial -days 3650 \
+ -out "$CERT_DIR/kubernetes/pki/front-proxy-client.crt"
+
+echo "==> Generating Ceph keyrings..."
+
+# Ceph uses cephx authentication (symmetric keys)
+# These will be generated using ceph-authtool at first-boot of first monitor
+# For now, create placeholder directory
+touch "$CERT_DIR/ceph/.placeholder"
+
+echo "==> Generating MQTT password file..."
+
+# Default admin user (change password post-install!)
+# Password: "admin" (hashed)
+mosquitto_passwd -c -b "$CERT_DIR/mqtt/passwd" admin admin 2>/dev/null || {
+ echo "WARNING: mosquitto_passwd not found. Creating empty file."
+ touch "$CERT_DIR/mqtt/passwd"
+}
+
+echo "==> Build-time certificates generated in $CERT_DIR"
+ls -lhR "$CERT_DIR"
+```
+
+### 2. First Boot (Master): Generate Node-Specific Certs
+
+**Script**: `tools/generate-node-certs.sh`
+
+```bash
+#!/bin/bash
+# Generates node-specific certificates on first boot
+# Run by cluster-detect.service after node identity established
+
+set -euo pipefail
+
+CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
+NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
+NODE_CONFIG="$CONFIG_DIR/nodes/$NODE_NAME.yaml"
+
+# Extract node IP from current-node.yaml
+NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml")
+CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml")
+SERVICE_CIDR=$(yq eval '.cluster.networking.service_cidr' "$CONFIG_DIR/cluster.yaml")
+
+# Kubernetes API VIP (first IP in service CIDR)
+KUBERNETES_SVC_IP=$(echo "$SERVICE_CIDR" | awk -F'[./]' '{print $1"."$2"."$3".1"}')
+
+PKI_DIR="/etc/kubernetes/pki"
+ETCD_PKI_DIR="$PKI_DIR/etcd"
+
+echo "==> Generating certificates for node: $NODE_NAME ($NODE_IP)"
+
+# Check if node is a master
+ROLES=$(yq eval '.node.roles[]' "$NODE_CONFIG")
+IS_MASTER=false
+[[ "$ROLES" =~ "master" || "$ROLES" =~ "control-plane" ]] && IS_MASTER=true
+
+if [[ "$IS_MASTER" == "true" ]]; then
+ echo "==> Generating API server certificate..."
+
+ # Create OpenSSL config for SAN
+ cat > /tmp/apiserver-csr.conf <<EOF
+[req]
+req_extensions = v3_req
+distinguished_name = req_distinguished_name
+[req_distinguished_name]
+[v3_req]
+basicConstraints = CA:FALSE
+keyUsage = nonRepudiation, digitalSignature, keyEncipherment
+extendedKeyUsage = serverAuth
+subjectAltName = @alt_names
+[alt_names]
+DNS.1 = kubernetes
+DNS.2 = kubernetes.default
+DNS.3 = kubernetes.default.svc
+DNS.4 = kubernetes.default.svc.cluster.local
+DNS.5 = $NODE_NAME
+DNS.6 = $CLUSTER_NAME
+IP.1 = $KUBERNETES_SVC_IP
+IP.2 = $NODE_IP
+IP.3 = 127.0.0.1
+EOF
+
+ openssl genrsa -out "$PKI_DIR/apiserver.key" 2048
+ openssl req -new -key "$PKI_DIR/apiserver.key" \
+ -subj "/CN=kube-apiserver" \
+ -config /tmp/apiserver-csr.conf \
+ -out "$PKI_DIR/apiserver.csr"
+ openssl x509 -req -in "$PKI_DIR/apiserver.csr" \
+ -CA "$PKI_DIR/ca.crt" -CAkey "$PKI_DIR/ca.key" \
+ -CAcreateserial -days 3650 \
+ -extensions v3_req -extfile /tmp/apiserver-csr.conf \
+ -out "$PKI_DIR/apiserver.crt"
+
+ echo "==> Generating API server kubelet client certificate..."
+ openssl genrsa -out "$PKI_DIR/apiserver-kubelet-client.key" 2048
+ openssl req -new -key "$PKI_DIR/apiserver-kubelet-client.key" \
+ -subj "/CN=kube-apiserver-kubelet-client/O=system:masters" \
+ -out "$PKI_DIR/apiserver-kubelet-client.csr"
+ openssl x509 -req -in "$PKI_DIR/apiserver-kubelet-client.csr" \
+ -CA "$PKI_DIR/ca.crt" -CAkey "$PKI_DIR/ca.key" \
+ -CAcreateserial -days 3650 \
+ -out "$PKI_DIR/apiserver-kubelet-client.crt"
+
+ echo "==> Generating etcd server certificate..."
+ cat > /tmp/etcd-server-csr.conf <<EOF
+[req]
+req_extensions = v3_req
+distinguished_name = req_distinguished_name
+[req_distinguished_name]
+[v3_req]
+basicConstraints = CA:FALSE
+keyUsage = nonRepudiation, digitalSignature, keyEncipherment
+extendedKeyUsage = serverAuth, clientAuth
+subjectAltName = @alt_names
+[alt_names]
+DNS.1 = $NODE_NAME
+DNS.2 = localhost
+IP.1 = $NODE_IP
+IP.2 = 127.0.0.1
+EOF
+
+ openssl genrsa -out "$ETCD_PKI_DIR/server.key" 2048
+ openssl req -new -key "$ETCD_PKI_DIR/server.key" \
+ -subj "/CN=$NODE_NAME" \
+ -config /tmp/etcd-server-csr.conf \
+ -out "$ETCD_PKI_DIR/server.csr"
+ openssl x509 -req -in "$ETCD_PKI_DIR/server.csr" \
+ -CA "$ETCD_PKI_DIR/ca.crt" -CAkey "$ETCD_PKI_DIR/ca.key" \
+ -CAcreateserial -days 3650 \
+ -extensions v3_req -extfile /tmp/etcd-server-csr.conf \
+ -out "$ETCD_PKI_DIR/server.crt"
+
+ echo "==> Generating etcd peer certificate..."
+ openssl genrsa -out "$ETCD_PKI_DIR/peer.key" 2048
+ openssl req -new -key "$ETCD_PKI_DIR/peer.key" \
+ -subj "/CN=$NODE_NAME" \
+ -config /tmp/etcd-server-csr.conf \
+ -out "$ETCD_PKI_DIR/peer.csr"
+ openssl x509 -req -in "$ETCD_PKI_DIR/peer.csr" \
+ -CA "$ETCD_PKI_DIR/ca.crt" -CAkey "$ETCD_PKI_DIR/ca.key" \
+ -CAcreateserial -days 3650 \
+ -extensions v3_req -extfile /tmp/etcd-server-csr.conf \
+ -out "$ETCD_PKI_DIR/peer.crt"
+
+ echo "==> Generating etcd healthcheck client certificate..."
+ openssl genrsa -out "$ETCD_PKI_DIR/healthcheck-client.key" 2048
+ openssl req -new -key "$ETCD_PKI_DIR/healthcheck-client.key" \
+ -subj "/CN=kube-etcd-healthcheck-client/O=system:masters" \
+ -out "$ETCD_PKI_DIR/healthcheck-client.csr"
+ openssl x509 -req -in "$ETCD_PKI_DIR/healthcheck-client.csr" \
+ -CA "$ETCD_PKI_DIR/ca.crt" -CAkey "$ETCD_PKI_DIR/ca.key" \
+ -CAcreateserial -days 3650 \
+ -out "$ETCD_PKI_DIR/healthcheck-client.crt"
+
+ # Set permissions
+ chmod 600 "$PKI_DIR"/*.key "$ETCD_PKI_DIR"/*.key
+fi
+
+echo "==> Generating kubelet certificate..."
+mkdir -p /var/lib/kubelet/pki
+
+cat > /tmp/kubelet-csr.conf <<EOF
+[req]
+req_extensions = v3_req
+distinguished_name = req_distinguished_name
+[req_distinguished_name]
+[v3_req]
+basicConstraints = CA:FALSE
+keyUsage = nonRepudiation, digitalSignature, keyEncipherment
+extendedKeyUsage = serverAuth
+subjectAltName = @alt_names
+[alt_names]
+DNS.1 = $NODE_NAME
+IP.1 = $NODE_IP
+EOF
+
+openssl genrsa -out /var/lib/kubelet/pki/kubelet.key 2048
+openssl req -new -key /var/lib/kubelet/pki/kubelet.key \
+ -subj "/CN=system:node:$NODE_NAME/O=system:nodes" \
+ -config /tmp/kubelet-csr.conf \
+ -out /var/lib/kubelet/pki/kubelet.csr
+openssl x509 -req -in /var/lib/kubelet/pki/kubelet.csr \
+ -CA "$PKI_DIR/ca.crt" -CAkey "$PKI_DIR/ca.key" \
+ -CAcreateserial -days 3650 \
+ -extensions v3_req -extfile /tmp/kubelet-csr.conf \
+ -out /var/lib/kubelet/pki/kubelet.crt
+
+chmod 600 /var/lib/kubelet/pki/kubelet.key
+
+echo "==> Node certificates generated successfully"
+```
+
+### 3. First Boot (First Master): Generate Ceph Keys
+
+**Script**: `tools/generate-ceph-keys.sh`
+
+```bash
+#!/bin/bash
+# Generates Ceph authentication keys on first monitor node
+# Only run on the FIRST ceph-mon node
+
+set -euo pipefail
+
+CEPH_DIR="/etc/ceph"
+mkdir -p "$CEPH_DIR"
+
+echo "==> Generating Ceph cluster FSID..."
+FSID=$(uuidgen)
+echo "$FSID" > "$CEPH_DIR/cluster.fsid"
+
+echo "==> Generating Ceph monitor keyring..."
+ceph-authtool --create-keyring "$CEPH_DIR/ceph.mon.keyring" \
+ --gen-key -n mon. --cap mon 'allow *'
+
+echo "==> Generating Ceph admin keyring..."
+ceph-authtool --create-keyring "$CEPH_DIR/ceph.client.admin.keyring" \
+ --gen-key -n client.admin \
+ --cap mon 'allow *' \
+ --cap osd 'allow *' \
+ --cap mds 'allow *' \
+ --cap mgr 'allow *'
+
+echo "==> Generating OSD bootstrap keyring..."
+ceph-authtool --create-keyring "$CEPH_DIR/ceph.bootstrap-osd.keyring" \
+ --gen-key -n client.bootstrap-osd \
+ --cap mon 'allow profile bootstrap-osd' \
+ --cap mgr 'allow r'
+
+echo "==> Generating MDS bootstrap keyring..."
+ceph-authtool --create-keyring "$CEPH_DIR/ceph.bootstrap-mds.keyring" \
+ --gen-key -n client.bootstrap-mds \
+ --cap mon 'allow profile bootstrap-mds' \
+ --cap mgr 'allow r'
+
+# Import admin keyring into mon keyring
+ceph-authtool "$CEPH_DIR/ceph.mon.keyring" --import-keyring "$CEPH_DIR/ceph.client.admin.keyring"
+
+chmod 600 "$CEPH_DIR"/*.keyring
+
+echo "==> Ceph keys generated. FSID: $FSID"
+echo "==> These keys must be distributed to all ceph nodes!"
+```
+
+## Integration with Boot Flow
+
+### Modified Boot Sequence
+
+```
+1. cluster-detect.service
+ ├─> cluster-detect.sh (identify node)
+ └─> generate-node-certs.sh (NEW - generate node-specific certs)
+
+2. generate-environment-files.sh
+ (reads certs, creates .env files with cert paths)
+
+3. cluster-activate-roles.sh
+ (enables targets based on roles)
+
+4. Services start
+ (now have valid certificates)
+```
+
+### Updated cluster-detect.service
+
+```ini
+[Unit]
+Description=Cluster Node Detection and Certificate Generation
+DefaultDependencies=no
+Before=network-pre.target
+Wants=network-pre.target
+
+[Service]
+Type=oneshot
+RemainAfterExit=yes
+ExecStart=/usr/local/bin/cluster-detect.sh
+ExecStart=/usr/local/bin/generate-node-certs.sh
+ExecStart=/usr/local/bin/generate-environment-files.sh
+ExecStart=/usr/local/bin/cluster-activate-roles.sh
+
+[Install]
+WantedBy=multi-user.target
+```
+
+## Certificate Distribution for Multi-Master
+
+**Problem**: Masters 2+ need the CA private keys to sign their own certs.
+
+**Solutions** (pick one):
+
+### Option A: Embed CA keys in ISO (simple, less secure)
+- All CA private keys baked into ISO
+- Each master generates its own node certs using shared CA
+- **Remove CA keys after cluster init** via post-bootstrap script
+
+### Option B: Fetch from first master (secure, complex)
+- First master generates CA, serves via HTTPS
+- Additional masters fetch CA bundle during bootstrap
+- Requires authentication (shared secret in cluster.yaml)
+
+### Option C: External secret store (production)
+- Build-time: Upload CA to Vault/AWS Secrets Manager
+- Boot-time: Fetch CA using node identity proof
+- Requires external dependency
+
+**Recommendation**: Start with Option A, migrate to Option C for production.
+
+## Security Hardening Checklist
+
+- [ ] Set file permissions: 600 for `.key`, 644 for `.crt`
+- [ ] Remove CA private keys after all masters bootstrapped
+- [ ] Rotate service account keys annually
+- [ ] Enable Kubernetes certificate rotation (kubelet `--rotate-certificates`)
+- [ ] Monitor certificate expiry (90 days before expiration)
+- [ ] Store root CA offline (not in ISO for production)
+- [ ] Use hardware security module (HSM) for CA keys (production)
+- [ ] Implement certificate revocation list (CRL)
+
+## Validation Script
+
+**Script**: `tools/validate-certs.sh`
+
+```bash
+#!/bin/bash
+# Validates all certificates before service startup
+
+set -euo pipefail
+
+PKI_DIR="/etc/kubernetes/pki"
+ERRORS=0
+
+check_cert() {
+ local cert=$1
+ local key=$2
+
+ if [[ ! -f "$cert" ]]; then
+ echo "ERROR: Certificate not found: $cert"
+ ((ERRORS++))
+ return
+ fi
+
+ if [[ ! -f "$key" ]]; then
+ echo "ERROR: Private key not found: $key"
+ ((ERRORS++))
+ return
+ fi
+
+ # Check expiry
+ if ! openssl x509 -in "$cert" -noout -checkend 86400; then
+ echo "WARNING: Certificate expires within 24 hours: $cert"
+ fi
+
+ # Verify key matches cert
+ cert_modulus=$(openssl x509 -in "$cert" -noout -modulus)
+ key_modulus=$(openssl rsa -in "$key" -noout -modulus)
+
+ if [[ "$cert_modulus" != "$key_modulus" ]]; then
+ echo "ERROR: Certificate and key mismatch: $cert / $key"
+ ((ERRORS++))
+ fi
+}
+
+echo "==> Validating Kubernetes certificates..."
+check_cert "$PKI_DIR/ca.crt" "$PKI_DIR/ca.key"
+check_cert "$PKI_DIR/apiserver.crt" "$PKI_DIR/apiserver.key"
+
+if [[ $ERRORS -gt 0 ]]; then
+ echo "==> Validation FAILED with $ERRORS errors"
+ exit 1
+fi
+
+echo "==> Validation PASSED"
+```
+
+## Next Steps
+
+1. Implement `tools/generate-build-certs.sh`
+2. Implement `tools/generate-node-certs.sh`
+3. Implement `tools/generate-ceph-keys.sh`
+4. Update `cluster-detect.service` to call cert generation
+5. Update service units to validate certs in `ExecStartPre=`
+6. Test certificate generation in VM
+7. Document CA key removal procedure for production
+
+---
+
+**P.S.** This gives you a working PKI. It's not production-grade (CA keys in ISO is a compromise), but it unblocks development. You can swap to Vault/external PKI later without changing the service units—they just read from `/etc/kubernetes/pki/`.
diff --git a/ISO-BUILDER.md b/ISO-BUILDER.md
new file mode 100644
index 0000000..a1c5ebd
--- /dev/null
+++ b/ISO-BUILDER.md
@@ -0,0 +1,849 @@
+# ISO Builder
+
+## Overview
+
+This document specifies how to build a **bootable ISO image** that contains the base OS, all binaries, configuration files, systemd units, and scripts required for cluster-from-systemd.
+
+## Design Decision: Kickstart + Live Image
+
+**Approach**: Create a Rocky Linux 9 live ISO with embedded kickstart for automated installation.
+
+**Rationale**:
+- **Kickstart**: Automates installation (partitioning, package selection, post-install scripts)
+- **Live CD tooling**: Well-established RHEL ecosystem tools (lorax, livecd-tools)
+- **Single ISO**: All nodes boot from the same image
+- **Reproducible**: Version-controlled build process
+
+**Alternative approaches considered**:
+- **cloud-init/ignition**: Better for cloud, but requires external metadata service
+- **PXE boot**: Network dependency, more complex infrastructure
+- **Container-based build (mkosi)**: Bleeding edge, less RHEL-native
+
+## Build System Requirements
+
+**Host OS**: Fedora 39 or Rocky Linux 9 (build environment)
+
+**Required packages**:
+```bash
+dnf install -y \
+ lorax \
+ anaconda \
+ pykickstart \
+ createrepo_c \
+ genisoimage \
+ isomd5sum \
+ syslinux \
+ git
+```
+
+**Disk space**: ~30 GB (for build artifacts, repos, and final ISO)
+
+## Directory Structure
+
+```
+iso-build/
+├── kickstart/
+│ └── cluster-node.ks # Main kickstart file
+├── repo/
+│ ├── binaries/ # Downloaded binaries (k8s, etcd, etc.)
+│ ├── rpms/ # Custom RPM packages
+│ └── repodata/ # RPM repository metadata
+├── overlay/
+│ ├── etc/
+│ │ ├── cluster-config/ # Cluster YAML configs
+│ │ │ ├── cluster.yaml
+│ │ │ ├── services/
+│ │ │ └── nodes/
+│ │ ├── systemd/system/ # Systemd units
+│ │ └── kubernetes/
+│ │ ├── pki/ # Pre-generated certificates
+│ │ └── manifests/
+│ ├── usr/local/bin/ # Cluster scripts
+│ └── var/www/html/ # Join info distribution
+├── output/
+│ └── cluster-node.iso # Final bootable ISO
+└── build.sh # Main build script
+```
+
+## Kickstart File
+
+**File**: `iso-build/kickstart/cluster-node.ks`
+
+```kickstart
+#version=RHEL9
+
+# Install mode (not upgrade)
+install
+
+# Use CDROM installation media
+cdrom
+
+# Text mode installation (no GUI)
+text
+
+# System language
+lang en_US.UTF-8
+
+# Keyboard layouts
+keyboard us
+
+# Network configuration (temporary DHCP, will be reconfigured on boot)
+network --bootproto=dhcp --device=link --activate
+
+# Root password (change this!)
+rootpw --plaintext cluster-password
+
+# System authorization
+authselect select sssd
+
+# SELinux mode
+selinux --enforcing
+
+# Firewall configuration
+firewall --enabled --service=ssh
+
+# System timezone
+timezone America/New_York --utc
+
+# Disk partitioning
+# WARNING: This will ERASE the entire disk!
+ignoredisk --only-use=sda
+clearpart --all --initlabel --drives=sda
+part /boot --fstype=xfs --size=1024
+part / --fstype=xfs --size=20480 --grow
+part swap --fstype=swap --size=4096
+
+# Bootloader
+bootloader --location=mbr --boot-drive=sda
+
+# Reboot after installation
+reboot
+
+# Package selection
+%packages
+@core
+@standard
+kernel
+systemd
+systemd-networkd
+NetworkManager
+vim
+tmux
+htop
+curl
+wget
+jq
+python3
+python3-pip
+python3-pyyaml
+openssl
+iproute
+iputils
+bind-utils
+iptables
+nftables
+%end
+
+# Pre-installation script
+%pre --log=/tmp/ks-pre.log
+#!/bin/bash
+echo "==> Starting pre-installation"
+# Could add disk detection logic here
+%end
+
+# Post-installation script
+%post --log=/root/ks-post.log --erroronfail
+#!/bin/bash
+set -x
+
+echo "==> Starting post-installation configuration"
+
+# 1. Copy cluster configuration files
+echo "==> Installing cluster configuration..."
+mkdir -p /etc/cluster-config/{services,nodes,environment}
+cp -r /mnt/install/repo/overlay/etc/cluster-config/* /etc/cluster-config/
+
+# 2. Copy systemd units
+echo "==> Installing systemd units..."
+cp /mnt/install/repo/overlay/etc/systemd/system/* /etc/systemd/system/
+systemctl daemon-reload
+
+# 3. Install cluster scripts
+echo "==> Installing cluster scripts..."
+cp /mnt/install/repo/overlay/usr/local/bin/* /usr/local/bin/
+chmod +x /usr/local/bin/*.sh
+
+# 4. Install binaries
+echo "==> Installing Kubernetes binaries..."
+BINARIES="kubelet kubectl kubeadm kube-apiserver kube-controller-manager kube-scheduler"
+for binary in $BINARIES; do
+ if [[ -f /mnt/install/repo/binaries/$binary ]]; then
+ cp /mnt/install/repo/binaries/$binary /usr/local/bin/
+ chmod +x /usr/local/bin/$binary
+ fi
+done
+
+echo "==> Installing etcd..."
+if [[ -f /mnt/install/repo/binaries/etcd ]]; then
+ cp /mnt/install/repo/binaries/etcd /usr/local/bin/
+ cp /mnt/install/repo/binaries/etcdctl /usr/local/bin/
+ chmod +x /usr/local/bin/etcd /usr/local/bin/etcdctl
+fi
+
+echo "==> Installing CoreDNS..."
+if [[ -f /mnt/install/repo/binaries/coredns ]]; then
+ cp /mnt/install/repo/binaries/coredns /usr/local/bin/
+ chmod +x /usr/local/bin/coredns
+fi
+
+echo "==> Installing CNI plugins..."
+if [[ -d /mnt/install/repo/binaries/cni ]]; then
+ mkdir -p /opt/cni/bin
+ cp -r /mnt/install/repo/binaries/cni/* /opt/cni/bin/
+ chmod +x /opt/cni/bin/*
+fi
+
+echo "==> Installing Kafka..."
+if [[ -d /mnt/install/repo/binaries/kafka ]]; then
+ mkdir -p /opt
+ cp -r /mnt/install/repo/binaries/kafka /opt/
+ ln -sf /opt/kafka /opt/kafka-current
+fi
+
+# 5. Install certificates
+echo "==> Installing certificates..."
+mkdir -p /etc/kubernetes/pki/etcd
+if [[ -d /mnt/install/repo/overlay/etc/kubernetes/pki ]]; then
+ cp -r /mnt/install/repo/overlay/etc/kubernetes/pki/* /etc/kubernetes/pki/
+ chmod 600 /etc/kubernetes/pki/*.key
+ chmod 600 /etc/kubernetes/pki/etcd/*.key
+fi
+
+# 6. Create data directories
+echo "==> Creating data directories..."
+mkdir -p /var/lib/{kubelet,etcd,kafka,ceph/{mon,osd},mosquitto}
+mkdir -p /var/lib/cluster-state
+
+# 7. Install container runtime (containerd)
+echo "==> Installing containerd..."
+dnf config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
+dnf install -y containerd.io runc
+
+mkdir -p /etc/containerd
+containerd config default > /etc/containerd/config.toml
+sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
+
+systemctl enable containerd
+
+# 8. Install Ceph packages
+echo "==> Installing Ceph packages..."
+cat > /etc/yum.repos.d/ceph.repo <<'CEPH_REPO'
+[ceph]
+name=Ceph packages for $basearch
+baseurl=https://download.ceph.com/rpm-reef/el9/$basearch
+enabled=1
+gpgcheck=1
+gpgkey=https://download.ceph.com/keys/release.asc
+
+[ceph-noarch]
+name=Ceph noarch packages
+baseurl=https://download.ceph.com/rpm-reef/el9/noarch
+enabled=1
+gpgcheck=1
+gpgkey=https://download.ceph.com/keys/release.asc
+CEPH_REPO
+
+dnf install -y ceph-common ceph-mon ceph-osd ceph-mds ceph-mgr
+
+# 9. Install Mosquitto
+echo "==> Installing Mosquitto..."
+dnf install -y epel-release
+dnf install -y mosquitto mosquitto-clients
+
+# 10. Install Java (for Kafka)
+echo "==> Installing Java..."
+dnf install -y java-17-openjdk-headless
+
+# 11. Enable cluster-detect service
+echo "==> Enabling cluster services..."
+systemctl enable cluster-detect.service
+systemctl enable cluster-bootstrap.service
+
+# 12. Disable unwanted services
+systemctl disable firewalld
+systemctl mask firewalld
+
+# 13. Configure sysctl for Kubernetes
+cat >> /etc/sysctl.d/99-kubernetes.conf <<'SYSCTL'
+net.bridge.bridge-nf-call-iptables = 1
+net.bridge.bridge-nf-call-ip6tables = 1
+net.ipv4.ip_forward = 1
+SYSCTL
+
+# 14. Load required kernel modules
+cat > /etc/modules-load.d/kubernetes.conf <<'MODULES'
+overlay
+br_netfilter
+MODULES
+
+# 15. Install yq (YAML processor)
+echo "==> Installing yq..."
+YQ_VERSION="v4.40.5"
+curl -L "https://github.com/mikefarah/yq/releases/download/${YQ_VERSION}/yq_linux_amd64" \
+ -o /usr/local/bin/yq
+chmod +x /usr/local/bin/yq
+
+# 16. Create version info
+cat > /etc/cluster-version <<VERSION
+CLUSTER_ISO_VERSION=0.1.0
+BUILD_DATE=$(date -Iseconds)
+KUBERNETES_VERSION=1.29.1
+CEPH_VERSION=18.2.1
+KAFKA_VERSION=3.6.1
+VERSION
+
+echo "==> Post-installation complete!"
+echo "==> System will reboot and detect node identity on first boot"
+
+%end
+```
+
+## Build Script
+
+**File**: `iso-build/build.sh`
+
+```bash
+#!/bin/bash
+# Main ISO build script
+
+set -euo pipefail
+
+BUILD_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+OUTPUT_DIR="$BUILD_DIR/output"
+REPO_DIR="$BUILD_DIR/repo"
+OVERLAY_DIR="$BUILD_DIR/overlay"
+KICKSTART_FILE="$BUILD_DIR/kickstart/cluster-node.ks"
+
+ISO_NAME="cluster-node.iso"
+ISO_LABEL="ClusterNode"
+ISO_VERSION="0.1.0"
+
+echo "==> Cluster-from-SystemD ISO Builder v$ISO_VERSION"
+echo "==> Build directory: $BUILD_DIR"
+
+# Check prerequisites
+command -v lorax >/dev/null || { echo "ERROR: lorax not found. Install with: dnf install lorax"; exit 1; }
+command -v createrepo_c >/dev/null || { echo "ERROR: createrepo_c not found"; exit 1; }
+
+# Create directories
+mkdir -p "$OUTPUT_DIR" "$REPO_DIR"/{binaries,rpms} "$OVERLAY_DIR"
+
+# Step 1: Download binaries
+echo ""
+echo "==> Step 1: Downloading binaries..."
+bash "$BUILD_DIR/../tools/download-kubernetes.sh" "$REPO_DIR/binaries"
+bash "$BUILD_DIR/../tools/download-etcd.sh" "$REPO_DIR/binaries"
+bash "$BUILD_DIR/../tools/download-cni-plugins.sh" "$REPO_DIR/binaries"
+bash "$BUILD_DIR/../tools/download-coredns.sh" "$REPO_DIR/binaries"
+bash "$BUILD_DIR/../tools/install-kafka.sh" "$REPO_DIR/binaries"
+
+# Step 2: Copy overlay files
+echo ""
+echo "==> Step 2: Copying overlay files..."
+
+# Copy configs
+mkdir -p "$OVERLAY_DIR/etc/cluster-config"
+cp -r "$BUILD_DIR/../configs"/* "$OVERLAY_DIR/etc/cluster-config/"
+
+# Copy systemd units
+mkdir -p "$OVERLAY_DIR/etc/systemd/system"
+cp "$BUILD_DIR/../systemd"/*.service "$OVERLAY_DIR/etc/systemd/system/"
+cp "$BUILD_DIR/../systemd"/*.target "$OVERLAY_DIR/etc/systemd/system/"
+
+# Copy scripts
+mkdir -p "$OVERLAY_DIR/usr/local/bin"
+cp "$BUILD_DIR/../tools"/*.sh "$OVERLAY_DIR/usr/local/bin/"
+cp "$BUILD_DIR/../tools"/*.py "$OVERLAY_DIR/usr/local/bin/"
+
+# Step 3: Generate certificates
+echo ""
+echo "==> Step 3: Generating certificates..."
+bash "$BUILD_DIR/../tools/generate-build-certs.sh" "$OVERLAY_DIR/etc/kubernetes/pki"
+
+# Step 4: Create local repository
+echo ""
+echo "==> Step 4: Creating local repository..."
+createrepo_c "$REPO_DIR/rpms"
+
+# Step 5: Build ISO using lorax
+echo ""
+echo "==> Step 5: Building ISO with lorax..."
+
+# Download Rocky Linux boot files
+ROCKY_VERSION="9.3"
+ROCKY_BOOT_ISO="Rocky-${ROCKY_VERSION}-x86_64-boot.iso"
+ROCKY_URL="https://download.rockylinux.org/pub/rocky/${ROCKY_VERSION}/isos/x86_64/${ROCKY_BOOT_ISO}"
+
+if [[ ! -f "$REPO_DIR/$ROCKY_BOOT_ISO" ]]; then
+ echo "==> Downloading Rocky Linux boot ISO..."
+ curl -L "$ROCKY_URL" -o "$REPO_DIR/$ROCKY_BOOT_ISO"
+fi
+
+# Mount boot ISO to extract vmlinuz and initrd
+MOUNT_DIR="/tmp/rocky-mount-$$"
+mkdir -p "$MOUNT_DIR"
+mount -o loop "$REPO_DIR/$ROCKY_BOOT_ISO" "$MOUNT_DIR"
+
+# Create working directory for ISO build
+ISO_WORK_DIR="/tmp/iso-build-$$"
+mkdir -p "$ISO_WORK_DIR"/{isolinux,images,LiveOS}
+
+# Copy boot files
+cp "$MOUNT_DIR/isolinux/vmlinuz" "$ISO_WORK_DIR/isolinux/"
+cp "$MOUNT_DIR/isolinux/initrd.img" "$ISO_WORK_DIR/isolinux/"
+cp "$MOUNT_DIR/isolinux/isolinux.bin" "$ISO_WORK_DIR/isolinux/"
+cp "$MOUNT_DIR/isolinux/ldlinux.c32" "$ISO_WORK_DIR/isolinux/"
+cp "$MOUNT_DIR/isolinux/libcom32.c32" "$ISO_WORK_DIR/isolinux/"
+cp "$MOUNT_DIR/isolinux/libutil.c32" "$ISO_WORK_DIR/isolinux/"
+
+umount "$MOUNT_DIR"
+rmdir "$MOUNT_DIR"
+
+# Create isolinux.cfg with kickstart
+cat > "$ISO_WORK_DIR/isolinux/isolinux.cfg" <<'ISOLINUX'
+default vesamenu.c32
+timeout 100
+
+display boot.msg
+
+label install
+ menu label ^Install Cluster Node
+ kernel vmlinuz
+ append initrd=initrd.img inst.stage2=hd:LABEL=ClusterNode inst.ks=hd:LABEL=ClusterNode:/ks.cfg
+
+label rescue
+ menu label ^Rescue installed system
+ kernel vmlinuz
+ append initrd=initrd.img inst.stage2=hd:LABEL=ClusterNode rescue
+ISOLINUX
+
+# Copy kickstart file to ISO root
+cp "$KICKSTART_FILE" "$ISO_WORK_DIR/ks.cfg"
+
+# Copy overlay and repo to ISO
+cp -r "$OVERLAY_DIR" "$ISO_WORK_DIR/overlay"
+cp -r "$REPO_DIR" "$ISO_WORK_DIR/repo"
+
+# Create ISO
+echo "==> Creating bootable ISO..."
+genisoimage \
+ -o "$OUTPUT_DIR/$ISO_NAME" \
+ -b isolinux/isolinux.bin \
+ -c isolinux/boot.cat \
+ -no-emul-boot \
+ -boot-load-size 4 \
+ -boot-info-table \
+ -J -R -v \
+ -V "$ISO_LABEL" \
+ "$ISO_WORK_DIR"
+
+# Make ISO bootable
+isohybrid "$OUTPUT_DIR/$ISO_NAME"
+
+# Add MD5 checksum
+implantisomd5 "$OUTPUT_DIR/$ISO_NAME"
+
+# Cleanup
+rm -rf "$ISO_WORK_DIR"
+
+echo ""
+echo "==> ISO build complete!"
+echo "==> Output: $OUTPUT_DIR/$ISO_NAME"
+echo "==> Size: $(du -h "$OUTPUT_DIR/$ISO_NAME" | cut -f1)"
+echo ""
+echo "Test with:"
+echo " qemu-system-x86_64 -cdrom $OUTPUT_DIR/$ISO_NAME -m 4096 -smp 2"
+```
+
+## Simplified Build Script (Alternative: mkosi)
+
+For a more modern, declarative approach:
+
+**File**: `iso-build/mkosi.conf`
+
+```ini
+[Output]
+ImageId=cluster-node
+ImageVersion=0.1.0
+Format=disk
+
+[Distribution]
+Distribution=rocky
+Release=9
+
+[Content]
+Packages=
+ systemd
+ kubernetes-kubeadm
+ etcd
+ vim
+ tmux
+
+ExtraTree=/path/to/overlay/
+
+[Host]
+QemuHeadless=yes
+```
+
+**Note**: mkosi is cleaner but less mature for RHEL-based distros. Recommend starting with lorax/kickstart.
+
+## Modified Download Scripts for Build
+
+Update download scripts to accept output directory:
+
+**Example**: `tools/download-kubernetes.sh`
+
+```bash
+#!/bin/bash
+# Modified to accept output directory for ISO build
+
+KUBE_VERSION="v1.29.1"
+ARCH="amd64"
+BASE_URL="https://dl.k8s.io/release/${KUBE_VERSION}/bin/linux/${ARCH}"
+OUTPUT_DIR="${1:-/usr/local/bin}"
+
+mkdir -p "$OUTPUT_DIR"
+
+BINARIES=(
+ "kubeadm"
+ "kubectl"
+ "kubelet"
+ "kube-apiserver"
+ "kube-controller-manager"
+ "kube-scheduler"
+)
+
+for binary in "${BINARIES[@]}"; do
+ echo "Downloading $binary to $OUTPUT_DIR..."
+ curl -L "${BASE_URL}/${binary}" -o "$OUTPUT_DIR/${binary}"
+ chmod +x "$OUTPUT_DIR/${binary}"
+done
+
+echo "==> Kubernetes binaries downloaded to $OUTPUT_DIR"
+```
+
+## Testing the ISO
+
+### QEMU Testing
+
+```bash
+#!/bin/bash
+# tools/test-iso-qemu.sh
+
+ISO_PATH="iso-build/output/cluster-node.iso"
+DISK_IMG="test-disk.qcow2"
+
+# Create virtual disk
+qemu-img create -f qcow2 "$DISK_IMG" 40G
+
+# Boot from ISO
+qemu-system-x86_64 \
+ -cdrom "$ISO_PATH" \
+ -hda "$DISK_IMG" \
+ -m 4096 \
+ -smp 2 \
+ -boot d \
+ -netdev user,id=net0 \
+ -device virtio-net-pci,netdev=net0 \
+ -serial stdio \
+ -display none
+
+# After installation, boot from disk
+# qemu-system-x86_64 -hda "$DISK_IMG" -m 4096 -smp 2 -boot c
+```
+
+### VirtualBox Testing
+
+```bash
+# Create VM
+VBoxManage createvm --name "cluster-test" --ostype RedHat_64 --register
+
+# Configure VM
+VBoxManage modifyvm "cluster-test" \
+ --memory 4096 \
+ --cpus 2 \
+ --nic1 bridged \
+ --bridgeadapter1 eth0
+
+# Create disk
+VBoxManage createhd --filename cluster-test.vdi --size 40960
+
+# Attach storage
+VBoxManage storagectl "cluster-test" --name "SATA" --add sata --controller IntelAhci
+VBoxManage storageattach "cluster-test" --storagectl "SATA" --port 0 --device 0 --type hdd --medium cluster-test.vdi
+VBoxManage storageattach "cluster-test" --storagectl "SATA" --port 1 --device 0 --type dvddrive --medium cluster-node.iso
+
+# Start VM
+VBoxManage startvm "cluster-test"
+```
+
+### Multi-Node Test
+
+```bash
+#!/bin/bash
+# tools/test-multi-node.sh
+# Create 3 VMs for full cluster test
+
+for i in 1 2 3; do
+ DISK="test-node-${i}.qcow2"
+ qemu-img create -f qcow2 "$DISK" 40G
+
+ qemu-system-x86_64 \
+ -name "node-${i}" \
+ -cdrom iso-build/output/cluster-node.iso \
+ -hda "$DISK" \
+ -m 4096 \
+ -smp 2 \
+ -boot d \
+ -netdev tap,id=net0,ifname=tap${i},script=no \
+ -device virtio-net-pci,netdev=net0,mac=52:54:00:12:34:1${i} \
+ -daemonize \
+ -vnc :${i}
+
+ echo "Node $i started on VNC port 590${i}"
+done
+```
+
+## Build Optimization
+
+### Caching Downloaded Files
+
+```bash
+# Use a persistent cache directory
+CACHE_DIR="$HOME/.cache/cluster-iso-build"
+mkdir -p "$CACHE_DIR"/{binaries,rpms}
+
+# In build.sh, check cache before downloading
+if [[ ! -f "$CACHE_DIR/binaries/kubelet" ]]; then
+ bash tools/download-kubernetes.sh "$CACHE_DIR/binaries"
+fi
+
+cp -r "$CACHE_DIR/binaries"/* "$REPO_DIR/binaries/"
+```
+
+### Parallel Downloads
+
+```bash
+# Download binaries in parallel
+bash tools/download-kubernetes.sh "$REPO_DIR/binaries" &
+bash tools/download-etcd.sh "$REPO_DIR/binaries" &
+bash tools/download-coredns.sh "$REPO_DIR/binaries" &
+wait
+
+echo "==> All downloads complete"
+```
+
+## Reproducible Builds
+
+### Version Lock File
+
+**File**: `iso-build/versions.lock`
+
+```yaml
+kubernetes:
+ version: "1.29.1"
+ sha256:
+ kubelet: "abc123..."
+ kubectl: "def456..."
+
+etcd:
+ version: "3.5.11"
+ sha256: "789ghi..."
+
+ceph:
+ version: "18.2.1"
+
+kafka:
+ version: "3.6.1"
+ scala_version: "2.13"
+
+base_iso:
+ name: "Rocky-9.3-x86_64-boot.iso"
+ sha256: "..."
+```
+
+### Checksum Verification
+
+```bash
+#!/bin/bash
+# tools/verify-downloads.sh
+
+set -euo pipefail
+
+VERSIONS_FILE="iso-build/versions.lock"
+BINARIES_DIR="$1"
+
+while read -r binary expected_sha; do
+ actual_sha=$(sha256sum "$BINARIES_DIR/$binary" | awk '{print $1}')
+
+ if [[ "$actual_sha" != "$expected_sha" ]]; then
+ echo "ERROR: Checksum mismatch for $binary"
+ echo " Expected: $expected_sha"
+ echo " Actual: $actual_sha"
+ exit 1
+ fi
+
+ echo "✓ $binary: checksum OK"
+done < <(yq eval '.kubernetes.sha256 | to_entries | .[] | .key + " " + .value' "$VERSIONS_FILE")
+```
+
+## CI/CD Integration
+
+### GitHub Actions Example
+
+```yaml
+# .github/workflows/build-iso.yml
+name: Build ISO
+
+on:
+ push:
+ tags:
+ - 'v*'
+
+jobs:
+ build:
+ runs-on: ubuntu-latest
+ container:
+ image: rockylinux:9
+
+ steps:
+ - name: Checkout code
+ uses: actions/checkout@v3
+
+ - name: Install build dependencies
+ run: |
+ dnf install -y lorax createrepo_c genisoimage isomd5sum
+
+ - name: Run build
+ run: |
+ bash iso-build/build.sh
+
+ - name: Upload ISO artifact
+ uses: actions/upload-artifact@v3
+ with:
+ name: cluster-node-iso
+ path: iso-build/output/cluster-node.iso
+
+ - name: Create release
+ uses: softprops/action-gh-release@v1
+ with:
+ files: iso-build/output/cluster-node.iso
+```
+
+## Troubleshooting
+
+### Build Fails: "lorax command not found"
+
+```bash
+# On Fedora
+dnf install lorax anaconda-tui
+
+# On Rocky Linux
+dnf install epel-release
+dnf install lorax
+```
+
+### ISO Won't Boot: "Missing operating system"
+
+```bash
+# Ensure isohybrid was run
+isohybrid output/cluster-node.iso
+
+# Verify boot sector
+file output/cluster-node.iso
+# Should show: "DOS/MBR boot sector"
+```
+
+### Kickstart Fails During Post-Install
+
+```bash
+# Boot ISO in rescue mode
+# Check logs
+cat /tmp/ks-post.log
+cat /tmp/anaconda.log
+
+# Common issues:
+# - Missing files in overlay
+# - Incorrect paths in kickstart
+# - Network not available during post-install
+```
+
+## Next Steps
+
+1. Create `iso-build/` directory structure
+2. Implement build.sh script
+3. Test ISO build on clean Fedora VM
+4. Test installation in QEMU
+5. Verify cluster-detect runs on first boot
+6. Test multi-node cluster formation
+7. Document customization procedures
+
+## Customization Guide
+
+### Adding Custom Packages
+
+Edit kickstart file:
+```kickstart
+%packages
+@core
+my-custom-package
+%end
+```
+
+### Changing Root Password
+
+In kickstart:
+```kickstart
+rootpw --iscrypted $6$rounds=4096$salt$hash
+```
+
+Generate hash:
+```bash
+python3 -c 'import crypt; print(crypt.crypt("your-password", crypt.mkmeth(crypt.METHOD_SHA512)))'
+```
+
+### Embedding Custom Files
+
+Add to overlay directory:
+```bash
+mkdir -p overlay/etc/my-app/
+cp my-config.yaml overlay/etc/my-app/
+```
+
+## Maintenance
+
+### Updating Base OS
+
+```bash
+# Update ROCKY_VERSION in build.sh
+ROCKY_VERSION="9.4" # When Rocky 9.4 is released
+
+# Re-download boot ISO
+rm repo/Rocky-*.iso
+bash iso-build/build.sh
+```
+
+### Updating Kubernetes Version
+
+```bash
+# Edit versions.lock
+kubernetes:
+ version: "1.30.0"
+
+# Re-run build
+bash iso-build/build.sh
+```
+
+---
+
+**P.S.** This is the glue that makes everything real—from pile of configs to bootable artifact. Kickstart is ugly but battle-tested; lorax is RHEL-native. The build script is bash-heavy but transparent. You can see exactly what's happening, which beats "magic container build tool" when debugging at 2 AM. Test in QEMU first, VirtualBox second, bare metal third.