diff options
Diffstat (limited to 'BOOTSTRAP.md')
| -rw-r--r-- | BOOTSTRAP.md | 744 |
1 files changed, 744 insertions, 0 deletions
diff --git a/BOOTSTRAP.md b/BOOTSTRAP.md new file mode 100644 index 0000000..f200322 --- /dev/null +++ b/BOOTSTRAP.md @@ -0,0 +1,744 @@ +# Cluster Bootstrap and Initialization + +## Overview + +This document defines the **first-boot initialization sequence** that transforms individual nodes into a functioning cluster. Bootstrap is the most complex phase because nodes must coordinate without a working cluster. + +## The Bootstrap Problem + +**Challenge**: Kubernetes requires a control plane to join nodes, but the control plane doesn't exist yet. + +**Solution**: Designate one node as **first master**, which initializes the cluster. All other nodes join the existing cluster. + +## Bootstrap States + +Each node tracks its bootstrap state to avoid re-initialization on reboot. + +``` +/var/lib/cluster-state/ +├── bootstrap-state # Current state: uninitialized|bootstrapped|joined +├── cluster-joined # Timestamp of successful join +├── node-role # master-first|master-additional|worker +└── etcd-member-id # etcd member ID (masters only) +``` + +## State Transitions + +``` +┌─────────────────┐ +│ Uninitialized │ ← Fresh install +└────────┬────────┘ + │ + ▼ + ┌────────────────────┐ + │ Is first master? │ + └─┬────────────────┬─┘ + │ YES │ NO + ▼ ▼ +┌─────────────┐ ┌──────────────┐ +│ Bootstrap │ │ Wait for │ +│ Cluster │ │ First Master │ +└──────┬──────┘ └──────┬───────┘ + │ │ + ▼ ▼ +┌─────────────┐ ┌──────────────┐ +│ Bootstrapped│ │ Join Cluster │ +└──────┬──────┘ └──────┬───────┘ + │ │ + └────────┬────────┘ + ▼ + ┌───────────────┐ + │ Cluster Ready │ + └───────────────┘ +``` + +## Determining First Master + +**Logic**: The first master is the **first master node in cluster.yaml** (sorted by node name). + +**Script**: `tools/am-i-first-master.sh` + +```bash +#!/bin/bash +# Determines if this node should bootstrap the cluster + +set -euo pipefail + +CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}" +NODE_NAME=$(cat "$CONFIG_DIR/node-identity") +NODE_CONFIG="$CONFIG_DIR/nodes/$NODE_NAME.yaml" +CLUSTER_CONFIG="$CONFIG_DIR/cluster.yaml" + +# Check if this node has master/control-plane role +ROLES=$(yq eval '.node.roles[]' "$NODE_CONFIG") +IS_MASTER=false +[[ "$ROLES" =~ "master" || "$ROLES" =~ "control-plane" ]] && IS_MASTER=true + +if [[ "$IS_MASTER" != "true" ]]; then + echo "worker" + exit 0 +fi + +# Find all master nodes from cluster.yaml +MASTER_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .name' "$CLUSTER_CONFIG" | sort) + +# Get the first master +FIRST_MASTER=$(echo "$MASTER_NODES" | head -1) + +if [[ "$NODE_NAME" == "$FIRST_MASTER" ]]; then + echo "master-first" +else + echo "master-additional" +fi +``` + +## Bootstrap Sequence + +### Phase 1: Network Configuration (All Nodes) + +**Runs**: Before cluster-detect.service +**Script**: `tools/configure-network.sh` + +```bash +#!/bin/bash +# Configure static IP from cluster.yaml + +set -euo pipefail + +CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}" +TEMP_NODE_NAME="${1:-}" # During first boot, identity not yet known + +# Detect node identity (simplified version for network setup) +if [[ -z "$TEMP_NODE_NAME" ]]; then + # Try MAC address detection + MY_MAC=$(ip link show | awk '/ether/ {print $2}' | head -1) + + for node_file in "$CONFIG_DIR/nodes"/*.yaml; do + MAC_ADDRS=$(yq eval '.node.hardware.mac_addresses[]?' "$node_file" 2>/dev/null || echo "") + if echo "$MAC_ADDRS" | grep -qi "$MY_MAC"; then + TEMP_NODE_NAME=$(yq eval '.node.name' "$node_file") + break + fi + done +fi + +if [[ -z "$TEMP_NODE_NAME" ]]; then + echo "ERROR: Cannot determine node identity for network config" + exit 1 +fi + +NODE_CONFIG="$CONFIG_DIR/nodes/$TEMP_NODE_NAME.yaml" +NODE_IP=$(yq eval '.node.ip' "$NODE_CONFIG") +NODE_HOSTNAME=$(yq eval '.node.hostname' "$NODE_CONFIG") + +# Get network interface (assume first non-loopback) +IFACE=$(ip route | grep default | awk '{print $5}' | head -1) + +echo "==> Configuring network for $TEMP_NODE_NAME ($NODE_IP)" + +# Create NetworkManager connection +nmcli connection delete "$IFACE" 2>/dev/null || true +nmcli connection add \ + type ethernet \ + con-name "$IFACE" \ + ifname "$IFACE" \ + ip4 "$NODE_IP/24" \ + gw4 "192.168.1.1" \ + ipv4.dns "8.8.8.8,8.8.4.4" \ + ipv4.method manual + +nmcli connection up "$IFACE" + +# Set hostname +hostnamectl set-hostname "$NODE_HOSTNAME" + +# Add to /etc/hosts +grep -q "$NODE_HOSTNAME" /etc/hosts || \ + echo "$NODE_IP $NODE_HOSTNAME" >> /etc/hosts + +echo "==> Network configured: $NODE_IP ($NODE_HOSTNAME)" +``` + +### Phase 2: First Master Bootstrap + +**Runs**: After cluster-detect, if first master and uninitialized +**Script**: `tools/bootstrap-first-master.sh` + +```bash +#!/bin/bash +# Initialize Kubernetes cluster on first master node + +set -euo pipefail + +CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}" +STATE_DIR="/var/lib/cluster-state" +mkdir -p "$STATE_DIR" + +# Check if already bootstrapped +if [[ -f "$STATE_DIR/bootstrap-state" ]]; then + STATE=$(cat "$STATE_DIR/bootstrap-state") + if [[ "$STATE" == "bootstrapped" ]]; then + echo "==> Cluster already bootstrapped, skipping" + exit 0 + fi +fi + +NODE_NAME=$(cat "$CONFIG_DIR/node-identity") +NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml") +CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml") +POD_CIDR=$(yq eval '.cluster.networking.pod_cidr' "$CONFIG_DIR/cluster.yaml") +SERVICE_CIDR=$(yq eval '.cluster.networking.service_cidr' "$CONFIG_DIR/cluster.yaml") + +echo "==> Bootstrapping Kubernetes cluster on first master: $NODE_NAME" + +# 1. Initialize etcd cluster +echo "==> Initializing etcd..." +mkdir -p /var/lib/etcd + +# Create etcd initial cluster list +MASTER_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .name + "=https://" + .ip + ":2380"' "$CONFIG_DIR/cluster.yaml" | tr '\n' ',' | sed 's/,$//') + +cat > /tmp/etcd-bootstrap.env <<EOF +ETCD_NAME=$NODE_NAME +ETCD_DATA_DIR=/var/lib/etcd +ETCD_LISTEN_PEER_URLS=https://$NODE_IP:2380 +ETCD_LISTEN_CLIENT_URLS=https://$NODE_IP:2379,https://127.0.0.1:2379 +ETCD_INITIAL_ADVERTISE_PEER_URLS=https://$NODE_IP:2380 +ETCD_ADVERTISE_CLIENT_URLS=https://$NODE_IP:2379 +ETCD_INITIAL_CLUSTER=$MASTER_NODES +ETCD_INITIAL_CLUSTER_STATE=new +ETCD_INITIAL_CLUSTER_TOKEN=$CLUSTER_NAME-etcd +ETCD_CERT_FILE=/etc/kubernetes/pki/etcd/server.crt +ETCD_KEY_FILE=/etc/kubernetes/pki/etcd/server.key +ETCD_CLIENT_CERT_AUTH=true +ETCD_TRUSTED_CA_FILE=/etc/kubernetes/pki/etcd/ca.crt +ETCD_PEER_CERT_FILE=/etc/kubernetes/pki/etcd/peer.crt +ETCD_PEER_KEY_FILE=/etc/kubernetes/pki/etcd/peer.key +ETCD_PEER_CLIENT_CERT_AUTH=true +ETCD_PEER_TRUSTED_CA_FILE=/etc/kubernetes/pki/etcd/ca.crt +EOF + +# Start etcd +systemctl start etcd +sleep 5 + +# Verify etcd is healthy +etcdctl --endpoints=https://127.0.0.1:2379 \ + --cacert=/etc/kubernetes/pki/etcd/ca.crt \ + --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \ + --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \ + endpoint health + +# 2. Start API server +echo "==> Starting kube-apiserver..." +systemctl start kube-apiserver +sleep 10 + +# Wait for API server to be ready +for i in {1..30}; do + if curl -k https://127.0.0.1:6443/healthz &>/dev/null; then + echo "==> API server is ready" + break + fi + echo "Waiting for API server... ($i/30)" + sleep 2 +done + +# 3. Start controller manager and scheduler +echo "==> Starting control plane components..." +systemctl start kube-controller-manager +systemctl start kube-scheduler + +# 4. Create initial kubeconfig for kubectl +mkdir -p ~/.kube +cat > ~/.kube/config <<EOF +apiVersion: v1 +kind: Config +clusters: +- cluster: + certificate-authority: /etc/kubernetes/pki/ca.crt + server: https://127.0.0.1:6443 + name: $CLUSTER_NAME +contexts: +- context: + cluster: $CLUSTER_NAME + user: kubernetes-admin + name: kubernetes-admin@$CLUSTER_NAME +current-context: kubernetes-admin@$CLUSTER_NAME +users: +- name: kubernetes-admin + user: + client-certificate: /etc/kubernetes/pki/apiserver-kubelet-client.crt + client-key: /etc/kubernetes/pki/apiserver-kubelet-client.key +EOF + +# 5. Install Calico CNI +echo "==> Installing Calico CNI..." +kubectl apply -f /etc/kubernetes/manifests/calico/calico.yaml + +# Wait for Calico pods to be ready +echo "==> Waiting for Calico to be ready..." +kubectl wait --for=condition=ready pod -l k8s-app=calico-node -n kube-system --timeout=300s || true + +# 6. Start kubelet on this node +echo "==> Starting kubelet..." +systemctl start kubelet + +# 7. Create bootstrap tokens for worker nodes +echo "==> Creating bootstrap token..." +TOKEN=$(openssl rand -hex 3).$(openssl rand -hex 8) +echo "$TOKEN" > "$STATE_DIR/bootstrap-token" + +# Create token secret in Kubernetes +kubectl create secret generic bootstrap-token-${TOKEN%.*} \ + --type bootstrap.kubernetes.io/token \ + --namespace kube-system \ + --from-literal=token-id=${TOKEN%.*} \ + --from-literal=token-secret=${TOKEN#*.} \ + --from-literal=usage-bootstrap-authentication=true \ + --from-literal=usage-bootstrap-signing=true + +# 8. Create join command for workers +CA_CERT_HASH=$(openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | \ + openssl rsa -pubin -outform der 2>/dev/null | \ + openssl dgst -sha256 -hex | sed 's/^.* //') + +cat > "$STATE_DIR/worker-join-command" <<EOF +# Run this on worker nodes to join the cluster +kubeadm join $NODE_IP:6443 --token $TOKEN --discovery-token-ca-cert-hash sha256:$CA_CERT_HASH +EOF + +echo "==> Bootstrap complete!" +echo "bootstrap-state: bootstrapped" > "$STATE_DIR/bootstrap-state" +echo "$NODE_NAME" > "$STATE_DIR/cluster-initialized-by" +date -Iseconds > "$STATE_DIR/cluster-joined" + +# 9. Distribute join info to shared location (optional: NFS, HTTP server, etc.) +# For now, workers must manually fetch from first master +mkdir -p /var/www/html/cluster-join +cp "$STATE_DIR/worker-join-command" /var/www/html/cluster-join/ +cp "$STATE_DIR/bootstrap-token" /var/www/html/cluster-join/ + +echo "==> Worker join command saved to: $STATE_DIR/worker-join-command" +``` + +### Phase 3: Additional Master Join + +**Runs**: After cluster-detect, if additional master and first master is ready +**Script**: `tools/join-additional-master.sh` + +```bash +#!/bin/bash +# Join additional master nodes to existing etcd cluster + +set -euo pipefail + +CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}" +STATE_DIR="/var/lib/cluster-state" +mkdir -p "$STATE_DIR" + +# Check if already joined +if [[ -f "$STATE_DIR/bootstrap-state" ]]; then + STATE=$(cat "$STATE_DIR/bootstrap-state") + if [[ "$STATE" == "joined" ]]; then + echo "==> Already joined cluster, skipping" + exit 0 + fi +fi + +NODE_NAME=$(cat "$CONFIG_DIR/node-identity") +NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml") +CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml") + +# Get first master IP +FIRST_MASTER_IP=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .ip' "$CONFIG_DIR/cluster.yaml" | head -1) + +echo "==> Joining as additional master: $NODE_NAME" +echo "==> First master: $FIRST_MASTER_IP" + +# Wait for first master API server to be ready +echo "==> Waiting for first master API server..." +for i in {1..60}; do + if curl -k https://$FIRST_MASTER_IP:6443/healthz &>/dev/null; then + echo "==> First master is ready" + break + fi + echo "Waiting for first master... ($i/60)" + sleep 5 +done + +# Add this node to etcd cluster +echo "==> Adding node to etcd cluster..." + +# etcd add member command (run on first master via ssh or API) +# For now, assume etcdctl is available and certs are distributed +ETCD_ENDPOINTS="https://$FIRST_MASTER_IP:2379" + +etcdctl --endpoints=$ETCD_ENDPOINTS \ + --cacert=/etc/kubernetes/pki/etcd/ca.crt \ + --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \ + --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \ + member add "$NODE_NAME" --peer-urls="https://$NODE_IP:2380" + +# Start etcd in "existing" mode +cat > /tmp/etcd-join.env <<EOF +ETCD_NAME=$NODE_NAME +ETCD_DATA_DIR=/var/lib/etcd +ETCD_INITIAL_CLUSTER_STATE=existing +# ... (same as bootstrap but with "existing") +EOF + +systemctl start etcd + +# Start control plane components +systemctl start kube-apiserver +systemctl start kube-controller-manager +systemctl start kube-scheduler +systemctl start kubelet + +echo "joined" > "$STATE_DIR/bootstrap-state" +date -Iseconds > "$STATE_DIR/cluster-joined" + +echo "==> Additional master joined successfully" +``` + +### Phase 4: Worker Join + +**Runs**: After cluster-detect, if worker node +**Script**: `tools/join-worker.sh` + +```bash +#!/bin/bash +# Join worker node to Kubernetes cluster + +set -euo pipefail + +CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}" +STATE_DIR="/var/lib/cluster-state" +mkdir -p "$STATE_DIR" + +# Check if already joined +if [[ -f "$STATE_DIR/bootstrap-state" ]]; then + STATE=$(cat "$STATE_DIR/bootstrap-state") + if [[ "$STATE" == "joined" ]]; then + echo "==> Already joined cluster, skipping" + exit 0 + fi +fi + +NODE_NAME=$(cat "$CONFIG_DIR/node-identity") + +# Get first master IP +FIRST_MASTER_IP=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .ip' "$CONFIG_DIR/cluster.yaml" | head -1) + +echo "==> Joining worker node: $NODE_NAME" +echo "==> API server: https://$FIRST_MASTER_IP:6443" + +# Wait for API server +for i in {1..60}; do + if curl -k https://$FIRST_MASTER_IP:6443/healthz &>/dev/null; then + echo "==> API server is ready" + break + fi + echo "Waiting for API server... ($i/60)" + sleep 5 +done + +# Fetch join token from first master (requires setup) +# Option 1: HTTP endpoint on first master +# Option 2: Shared NFS mount +# Option 3: Embedded in cluster.yaml (less secure) + +# For now, assume token is in cluster.yaml or fetched via curl +TOKEN=$(curl -s http://$FIRST_MASTER_IP/cluster-join/bootstrap-token 2>/dev/null || echo "") + +if [[ -z "$TOKEN" ]]; then + echo "ERROR: Cannot fetch bootstrap token from first master" + echo "Manual join required. Run on first master:" + echo " kubeadm token create --print-join-command" + exit 1 +fi + +CA_CERT_HASH=$(openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | \ + openssl rsa -pubin -outform der 2>/dev/null | \ + openssl dgst -sha256 -hex | sed 's/^.* //') + +# Join cluster (kubeadm-style, but we're using raw kubelet) +# Create kubelet bootstrap kubeconfig +cat > /etc/kubernetes/bootstrap-kubelet.conf <<EOF +apiVersion: v1 +kind: Config +clusters: +- cluster: + certificate-authority: /etc/kubernetes/pki/ca.crt + server: https://$FIRST_MASTER_IP:6443 + name: kubernetes +contexts: +- context: + cluster: kubernetes + user: kubelet-bootstrap + name: kubelet-bootstrap@kubernetes +current-context: kubelet-bootstrap@kubernetes +users: +- name: kubelet-bootstrap + user: + token: $TOKEN +EOF + +# Start kubelet (will auto-generate client cert via CSR) +systemctl start kubelet + +# Auto-approve CSR (on first master, create script to approve) +# kubectl certificate approve <csr-name> + +echo "joined" > "$STATE_DIR/bootstrap-state" +date -Iseconds > "$STATE_DIR/cluster-joined" + +echo "==> Worker joined successfully" +echo "==> Check status on master: kubectl get nodes" +``` + +### Phase 5: Ceph Cluster Bootstrap + +**Runs**: After Kubernetes is ready, on first ceph-mon node +**Script**: `tools/bootstrap-ceph.sh` + +```bash +#!/bin/bash +# Bootstrap Ceph cluster on first monitor node + +set -euo pipefail + +CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}" +STATE_DIR="/var/lib/cluster-state" + +# Check if already initialized +if [[ -f "$STATE_DIR/ceph-bootstrapped" ]]; then + echo "==> Ceph already bootstrapped" + exit 0 +fi + +NODE_NAME=$(cat "$CONFIG_DIR/node-identity") +NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml") +CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml") + +# Check if this is the first ceph-mon node +CEPH_MON_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("ceph-mon")) | .name' "$CONFIG_DIR/cluster.yaml" | sort) +FIRST_MON=$(echo "$CEPH_MON_NODES" | head -1) + +if [[ "$NODE_NAME" != "$FIRST_MON" ]]; then + echo "==> Not first monitor, skipping Ceph bootstrap" + exit 0 +fi + +echo "==> Bootstrapping Ceph cluster on first monitor: $NODE_NAME" + +# Generate Ceph keys (if not already done) +/usr/local/bin/generate-ceph-keys.sh + +FSID=$(cat /etc/ceph/cluster.fsid) + +# Create ceph.conf +cat > /etc/ceph/ceph.conf <<EOF +[global] +fsid = $FSID +mon initial members = $FIRST_MON +mon host = $NODE_IP +public network = 192.168.1.0/24 +cluster network = 192.168.1.0/24 +auth cluster required = cephx +auth service required = cephx +auth client required = cephx +osd pool default size = 3 +osd pool default min size = 2 +EOF + +# Create monitor map +monmaptool --create --add $FIRST_MON $NODE_IP --fsid $FSID /tmp/monmap + +# Initialize monitor data directory +mkdir -p /var/lib/ceph/mon/ceph-$FIRST_MON +ceph-mon --mkfs -i $FIRST_MON --monmap /tmp/monmap --keyring /etc/ceph/ceph.mon.keyring + +# Start monitor +systemctl start ceph-mon@$FIRST_MON + +# Wait for mon to be ready +sleep 10 + +# Enable manager module +ceph mgr module enable dashboard + +echo "==> Ceph cluster bootstrapped" +date -Iseconds > "$STATE_DIR/ceph-bootstrapped" +``` + +## Bootstrap Coordination Service + +**New systemd unit**: `cluster-bootstrap.service` + +```ini +[Unit] +Description=Cluster Bootstrap Orchestrator +After=cluster-detect.service network-online.target +Wants=network-online.target + +[Service] +Type=oneshot +RemainAfterExit=yes +ExecStart=/usr/local/bin/cluster-bootstrap-orchestrator.sh + +[Install] +WantedBy=multi-user.target +``` + +**Script**: `tools/cluster-bootstrap-orchestrator.sh` + +```bash +#!/bin/bash +# Main orchestrator that calls appropriate bootstrap script based on node role + +set -euo pipefail + +NODE_ROLE=$(/usr/local/bin/am-i-first-master.sh) + +case "$NODE_ROLE" in + master-first) + echo "==> Detected as first master, bootstrapping cluster..." + /usr/local/bin/bootstrap-first-master.sh + ;; + master-additional) + echo "==> Detected as additional master, joining cluster..." + /usr/local/bin/join-additional-master.sh + ;; + worker) + echo "==> Detected as worker, joining cluster..." + /usr/local/bin/join-worker.sh + ;; + *) + echo "ERROR: Unknown node role: $NODE_ROLE" + exit 1 + ;; +esac + +# If node has ceph-mon role, bootstrap Ceph +ROLES=$(yq eval '.node.roles[]' /etc/cluster-config/current-node.yaml) +if [[ "$ROLES" =~ "ceph-mon" ]]; then + /usr/local/bin/bootstrap-ceph.sh +fi +``` + +## Bootstrap Token Distribution + +**Problem**: Workers need the bootstrap token, but first master generates it. + +**Solutions**: + +### Option 1: HTTP Server on First Master +```bash +# On first master, start simple HTTP server +python3 -m http.server 8080 -d /var/www/html/cluster-join & + +# On workers +curl http://<first-master>:8080/bootstrap-token +``` + +### Option 2: Embedded in cluster.yaml (Less Secure) +```yaml +# configs/cluster.yaml +cluster: + bootstrap: + token: "abcdef.0123456789abcdef" # Pre-generated +``` + +### Option 3: External Secret Store +- First master uploads token to Vault/S3 +- Workers fetch using node identity credentials + +## Failure Recovery + +### Bootstrap Failed Midway + +```bash +#!/bin/bash +# tools/reset-bootstrap.sh +# Clean up partial bootstrap to retry + +systemctl stop kubelet kube-apiserver kube-controller-manager kube-scheduler etcd +rm -rf /var/lib/etcd/* +rm -rf /var/lib/kubelet/* +rm -f /var/lib/cluster-state/bootstrap-state +echo "==> Bootstrap state reset. Reboot to retry." +``` + +### Additional Master Can't Join + +1. Check etcd cluster health on first master +2. Verify certificates are correct (CN, SANs) +3. Check network connectivity (port 2380, 6443) +4. Review etcd logs: `journalctl -u etcd` + +## Testing Bootstrap + +### VM Test Scenario + +```bash +# Create 3 VMs: master-01, worker-01, worker-02 +# 1. Boot master-01, wait for bootstrap +# 2. Verify: kubectl get nodes (should show master-01) +# 3. Boot worker-01 +# 4. Verify: kubectl get nodes (should show master-01, worker-01) +# 5. Boot worker-02 +# 6. Verify: kubectl get nodes (should show all 3) +``` + +## Monitoring Bootstrap Progress + +**Script**: `tools/check-bootstrap-status.sh` + +```bash +#!/bin/bash +# Check current bootstrap status + +STATE_DIR="/var/lib/cluster-state" + +if [[ ! -f "$STATE_DIR/bootstrap-state" ]]; then + echo "Status: UNINITIALIZED" + exit 0 +fi + +STATE=$(cat "$STATE_DIR/bootstrap-state") +echo "Bootstrap State: $STATE" + +if [[ -f "$STATE_DIR/cluster-joined" ]]; then + echo "Joined At: $(cat "$STATE_DIR/cluster-joined")" +fi + +# Check service status +echo "" +echo "Service Status:" +systemctl is-active etcd 2>/dev/null && echo " etcd: running" || echo " etcd: stopped" +systemctl is-active kube-apiserver 2>/dev/null && echo " kube-apiserver: running" || echo " kube-apiserver: stopped" +systemctl is-active kubelet 2>/dev/null && echo " kubelet: running" || echo " kubelet: stopped" + +# Check Kubernetes node status +if command -v kubectl &>/dev/null; then + echo "" + echo "Kubernetes Nodes:" + kubectl get nodes 2>/dev/null || echo " API server not reachable" +fi +``` + +## Next Steps + +1. Implement all bootstrap scripts +2. Test first master initialization +3. Test worker join +4. Test multi-master etcd cluster formation +5. Integrate with ISO builder +6. Add bootstrap progress logging +7. Create troubleshooting guide + +--- + +**P.S.** This is where theory meets reality. Bootstrap is the gnarliest part because you're building the control plane with no control plane. The state machine approach (first-master vs additional-master vs worker) keeps it manageable. Token distribution is still a bit hand-wavy—recommend starting with HTTP server approach, upgrade to Vault later. |
