summaryrefslogtreecommitdiff
path: root/BOOTSTRAP.md
blob: f2003221663cd98c8dbdd7a76b94d23adfcd7eec (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
# Cluster Bootstrap and Initialization

## Overview

This document defines the **first-boot initialization sequence** that transforms individual nodes into a functioning cluster. Bootstrap is the most complex phase because nodes must coordinate without a working cluster.

## The Bootstrap Problem

**Challenge**: Kubernetes requires a control plane to join nodes, but the control plane doesn't exist yet.

**Solution**: Designate one node as **first master**, which initializes the cluster. All other nodes join the existing cluster.

## Bootstrap States

Each node tracks its bootstrap state to avoid re-initialization on reboot.

```
/var/lib/cluster-state/
├── bootstrap-state         # Current state: uninitialized|bootstrapped|joined
├── cluster-joined          # Timestamp of successful join
├── node-role              # master-first|master-additional|worker
└── etcd-member-id         # etcd member ID (masters only)
```

## State Transitions

```
┌─────────────────┐
│ Uninitialized   │  ← Fresh install
└────────┬────────┘
         │
         ▼
    ┌────────────────────┐
    │ Is first master?   │
    └─┬────────────────┬─┘
      │ YES            │ NO
      ▼                ▼
┌─────────────┐   ┌──────────────┐
│ Bootstrap   │   │ Wait for     │
│ Cluster     │   │ First Master │
└──────┬──────┘   └──────┬───────┘
       │                 │
       ▼                 ▼
┌─────────────┐   ┌──────────────┐
│ Bootstrapped│   │ Join Cluster │
└──────┬──────┘   └──────┬───────┘
       │                 │
       └────────┬────────┘
                ▼
        ┌───────────────┐
        │ Cluster Ready │
        └───────────────┘
```

## Determining First Master

**Logic**: The first master is the **first master node in cluster.yaml** (sorted by node name).

**Script**: `tools/am-i-first-master.sh`

```bash
#!/bin/bash
# Determines if this node should bootstrap the cluster

set -euo pipefail

CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
NODE_CONFIG="$CONFIG_DIR/nodes/$NODE_NAME.yaml"
CLUSTER_CONFIG="$CONFIG_DIR/cluster.yaml"

# Check if this node has master/control-plane role
ROLES=$(yq eval '.node.roles[]' "$NODE_CONFIG")
IS_MASTER=false
[[ "$ROLES" =~ "master" || "$ROLES" =~ "control-plane" ]] && IS_MASTER=true

if [[ "$IS_MASTER" != "true" ]]; then
  echo "worker"
  exit 0
fi

# Find all master nodes from cluster.yaml
MASTER_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .name' "$CLUSTER_CONFIG" | sort)

# Get the first master
FIRST_MASTER=$(echo "$MASTER_NODES" | head -1)

if [[ "$NODE_NAME" == "$FIRST_MASTER" ]]; then
  echo "master-first"
else
  echo "master-additional"
fi
```

## Bootstrap Sequence

### Phase 1: Network Configuration (All Nodes)

**Runs**: Before cluster-detect.service
**Script**: `tools/configure-network.sh`

```bash
#!/bin/bash
# Configure static IP from cluster.yaml

set -euo pipefail

CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
TEMP_NODE_NAME="${1:-}"  # During first boot, identity not yet known

# Detect node identity (simplified version for network setup)
if [[ -z "$TEMP_NODE_NAME" ]]; then
  # Try MAC address detection
  MY_MAC=$(ip link show | awk '/ether/ {print $2}' | head -1)

  for node_file in "$CONFIG_DIR/nodes"/*.yaml; do
    MAC_ADDRS=$(yq eval '.node.hardware.mac_addresses[]?' "$node_file" 2>/dev/null || echo "")
    if echo "$MAC_ADDRS" | grep -qi "$MY_MAC"; then
      TEMP_NODE_NAME=$(yq eval '.node.name' "$node_file")
      break
    fi
  done
fi

if [[ -z "$TEMP_NODE_NAME" ]]; then
  echo "ERROR: Cannot determine node identity for network config"
  exit 1
fi

NODE_CONFIG="$CONFIG_DIR/nodes/$TEMP_NODE_NAME.yaml"
NODE_IP=$(yq eval '.node.ip' "$NODE_CONFIG")
NODE_HOSTNAME=$(yq eval '.node.hostname' "$NODE_CONFIG")

# Get network interface (assume first non-loopback)
IFACE=$(ip route | grep default | awk '{print $5}' | head -1)

echo "==> Configuring network for $TEMP_NODE_NAME ($NODE_IP)"

# Create NetworkManager connection
nmcli connection delete "$IFACE" 2>/dev/null || true
nmcli connection add \
  type ethernet \
  con-name "$IFACE" \
  ifname "$IFACE" \
  ip4 "$NODE_IP/24" \
  gw4 "192.168.1.1" \
  ipv4.dns "8.8.8.8,8.8.4.4" \
  ipv4.method manual

nmcli connection up "$IFACE"

# Set hostname
hostnamectl set-hostname "$NODE_HOSTNAME"

# Add to /etc/hosts
grep -q "$NODE_HOSTNAME" /etc/hosts || \
  echo "$NODE_IP $NODE_HOSTNAME" >> /etc/hosts

echo "==> Network configured: $NODE_IP ($NODE_HOSTNAME)"
```

### Phase 2: First Master Bootstrap

**Runs**: After cluster-detect, if first master and uninitialized
**Script**: `tools/bootstrap-first-master.sh`

```bash
#!/bin/bash
# Initialize Kubernetes cluster on first master node

set -euo pipefail

CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
STATE_DIR="/var/lib/cluster-state"
mkdir -p "$STATE_DIR"

# Check if already bootstrapped
if [[ -f "$STATE_DIR/bootstrap-state" ]]; then
  STATE=$(cat "$STATE_DIR/bootstrap-state")
  if [[ "$STATE" == "bootstrapped" ]]; then
    echo "==> Cluster already bootstrapped, skipping"
    exit 0
  fi
fi

NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml")
CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml")
POD_CIDR=$(yq eval '.cluster.networking.pod_cidr' "$CONFIG_DIR/cluster.yaml")
SERVICE_CIDR=$(yq eval '.cluster.networking.service_cidr' "$CONFIG_DIR/cluster.yaml")

echo "==> Bootstrapping Kubernetes cluster on first master: $NODE_NAME"

# 1. Initialize etcd cluster
echo "==> Initializing etcd..."
mkdir -p /var/lib/etcd

# Create etcd initial cluster list
MASTER_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .name + "=https://" + .ip + ":2380"' "$CONFIG_DIR/cluster.yaml" | tr '\n' ',' | sed 's/,$//')

cat > /tmp/etcd-bootstrap.env <<EOF
ETCD_NAME=$NODE_NAME
ETCD_DATA_DIR=/var/lib/etcd
ETCD_LISTEN_PEER_URLS=https://$NODE_IP:2380
ETCD_LISTEN_CLIENT_URLS=https://$NODE_IP:2379,https://127.0.0.1:2379
ETCD_INITIAL_ADVERTISE_PEER_URLS=https://$NODE_IP:2380
ETCD_ADVERTISE_CLIENT_URLS=https://$NODE_IP:2379
ETCD_INITIAL_CLUSTER=$MASTER_NODES
ETCD_INITIAL_CLUSTER_STATE=new
ETCD_INITIAL_CLUSTER_TOKEN=$CLUSTER_NAME-etcd
ETCD_CERT_FILE=/etc/kubernetes/pki/etcd/server.crt
ETCD_KEY_FILE=/etc/kubernetes/pki/etcd/server.key
ETCD_CLIENT_CERT_AUTH=true
ETCD_TRUSTED_CA_FILE=/etc/kubernetes/pki/etcd/ca.crt
ETCD_PEER_CERT_FILE=/etc/kubernetes/pki/etcd/peer.crt
ETCD_PEER_KEY_FILE=/etc/kubernetes/pki/etcd/peer.key
ETCD_PEER_CLIENT_CERT_AUTH=true
ETCD_PEER_TRUSTED_CA_FILE=/etc/kubernetes/pki/etcd/ca.crt
EOF

# Start etcd
systemctl start etcd
sleep 5

# Verify etcd is healthy
etcdctl --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  endpoint health

# 2. Start API server
echo "==> Starting kube-apiserver..."
systemctl start kube-apiserver
sleep 10

# Wait for API server to be ready
for i in {1..30}; do
  if curl -k https://127.0.0.1:6443/healthz &>/dev/null; then
    echo "==> API server is ready"
    break
  fi
  echo "Waiting for API server... ($i/30)"
  sleep 2
done

# 3. Start controller manager and scheduler
echo "==> Starting control plane components..."
systemctl start kube-controller-manager
systemctl start kube-scheduler

# 4. Create initial kubeconfig for kubectl
mkdir -p ~/.kube
cat > ~/.kube/config <<EOF
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: https://127.0.0.1:6443
  name: $CLUSTER_NAME
contexts:
- context:
    cluster: $CLUSTER_NAME
    user: kubernetes-admin
  name: kubernetes-admin@$CLUSTER_NAME
current-context: kubernetes-admin@$CLUSTER_NAME
users:
- name: kubernetes-admin
  user:
    client-certificate: /etc/kubernetes/pki/apiserver-kubelet-client.crt
    client-key: /etc/kubernetes/pki/apiserver-kubelet-client.key
EOF

# 5. Install Calico CNI
echo "==> Installing Calico CNI..."
kubectl apply -f /etc/kubernetes/manifests/calico/calico.yaml

# Wait for Calico pods to be ready
echo "==> Waiting for Calico to be ready..."
kubectl wait --for=condition=ready pod -l k8s-app=calico-node -n kube-system --timeout=300s || true

# 6. Start kubelet on this node
echo "==> Starting kubelet..."
systemctl start kubelet

# 7. Create bootstrap tokens for worker nodes
echo "==> Creating bootstrap token..."
TOKEN=$(openssl rand -hex 3).$(openssl rand -hex 8)
echo "$TOKEN" > "$STATE_DIR/bootstrap-token"

# Create token secret in Kubernetes
kubectl create secret generic bootstrap-token-${TOKEN%.*} \
  --type bootstrap.kubernetes.io/token \
  --namespace kube-system \
  --from-literal=token-id=${TOKEN%.*} \
  --from-literal=token-secret=${TOKEN#*.} \
  --from-literal=usage-bootstrap-authentication=true \
  --from-literal=usage-bootstrap-signing=true

# 8. Create join command for workers
CA_CERT_HASH=$(openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | \
  openssl rsa -pubin -outform der 2>/dev/null | \
  openssl dgst -sha256 -hex | sed 's/^.* //')

cat > "$STATE_DIR/worker-join-command" <<EOF
# Run this on worker nodes to join the cluster
kubeadm join $NODE_IP:6443 --token $TOKEN --discovery-token-ca-cert-hash sha256:$CA_CERT_HASH
EOF

echo "==> Bootstrap complete!"
echo "bootstrap-state: bootstrapped" > "$STATE_DIR/bootstrap-state"
echo "$NODE_NAME" > "$STATE_DIR/cluster-initialized-by"
date -Iseconds > "$STATE_DIR/cluster-joined"

# 9. Distribute join info to shared location (optional: NFS, HTTP server, etc.)
# For now, workers must manually fetch from first master
mkdir -p /var/www/html/cluster-join
cp "$STATE_DIR/worker-join-command" /var/www/html/cluster-join/
cp "$STATE_DIR/bootstrap-token" /var/www/html/cluster-join/

echo "==> Worker join command saved to: $STATE_DIR/worker-join-command"
```

### Phase 3: Additional Master Join

**Runs**: After cluster-detect, if additional master and first master is ready
**Script**: `tools/join-additional-master.sh`

```bash
#!/bin/bash
# Join additional master nodes to existing etcd cluster

set -euo pipefail

CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
STATE_DIR="/var/lib/cluster-state"
mkdir -p "$STATE_DIR"

# Check if already joined
if [[ -f "$STATE_DIR/bootstrap-state" ]]; then
  STATE=$(cat "$STATE_DIR/bootstrap-state")
  if [[ "$STATE" == "joined" ]]; then
    echo "==> Already joined cluster, skipping"
    exit 0
  fi
fi

NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml")
CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml")

# Get first master IP
FIRST_MASTER_IP=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .ip' "$CONFIG_DIR/cluster.yaml" | head -1)

echo "==> Joining as additional master: $NODE_NAME"
echo "==> First master: $FIRST_MASTER_IP"

# Wait for first master API server to be ready
echo "==> Waiting for first master API server..."
for i in {1..60}; do
  if curl -k https://$FIRST_MASTER_IP:6443/healthz &>/dev/null; then
    echo "==> First master is ready"
    break
  fi
  echo "Waiting for first master... ($i/60)"
  sleep 5
done

# Add this node to etcd cluster
echo "==> Adding node to etcd cluster..."

# etcd add member command (run on first master via ssh or API)
# For now, assume etcdctl is available and certs are distributed
ETCD_ENDPOINTS="https://$FIRST_MASTER_IP:2379"

etcdctl --endpoints=$ETCD_ENDPOINTS \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
  --key=/etc/kubernetes/pki/etcd/healthcheck-client.key \
  member add "$NODE_NAME" --peer-urls="https://$NODE_IP:2380"

# Start etcd in "existing" mode
cat > /tmp/etcd-join.env <<EOF
ETCD_NAME=$NODE_NAME
ETCD_DATA_DIR=/var/lib/etcd
ETCD_INITIAL_CLUSTER_STATE=existing
# ... (same as bootstrap but with "existing")
EOF

systemctl start etcd

# Start control plane components
systemctl start kube-apiserver
systemctl start kube-controller-manager
systemctl start kube-scheduler
systemctl start kubelet

echo "joined" > "$STATE_DIR/bootstrap-state"
date -Iseconds > "$STATE_DIR/cluster-joined"

echo "==> Additional master joined successfully"
```

### Phase 4: Worker Join

**Runs**: After cluster-detect, if worker node
**Script**: `tools/join-worker.sh`

```bash
#!/bin/bash
# Join worker node to Kubernetes cluster

set -euo pipefail

CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
STATE_DIR="/var/lib/cluster-state"
mkdir -p "$STATE_DIR"

# Check if already joined
if [[ -f "$STATE_DIR/bootstrap-state" ]]; then
  STATE=$(cat "$STATE_DIR/bootstrap-state")
  if [[ "$STATE" == "joined" ]]; then
    echo "==> Already joined cluster, skipping"
    exit 0
  fi
fi

NODE_NAME=$(cat "$CONFIG_DIR/node-identity")

# Get first master IP
FIRST_MASTER_IP=$(yq eval '.nodes[] | select(.roles[] | contains("master") or contains("control-plane")) | .ip' "$CONFIG_DIR/cluster.yaml" | head -1)

echo "==> Joining worker node: $NODE_NAME"
echo "==> API server: https://$FIRST_MASTER_IP:6443"

# Wait for API server
for i in {1..60}; do
  if curl -k https://$FIRST_MASTER_IP:6443/healthz &>/dev/null; then
    echo "==> API server is ready"
    break
  fi
  echo "Waiting for API server... ($i/60)"
  sleep 5
done

# Fetch join token from first master (requires setup)
# Option 1: HTTP endpoint on first master
# Option 2: Shared NFS mount
# Option 3: Embedded in cluster.yaml (less secure)

# For now, assume token is in cluster.yaml or fetched via curl
TOKEN=$(curl -s http://$FIRST_MASTER_IP/cluster-join/bootstrap-token 2>/dev/null || echo "")

if [[ -z "$TOKEN" ]]; then
  echo "ERROR: Cannot fetch bootstrap token from first master"
  echo "Manual join required. Run on first master:"
  echo "  kubeadm token create --print-join-command"
  exit 1
fi

CA_CERT_HASH=$(openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | \
  openssl rsa -pubin -outform der 2>/dev/null | \
  openssl dgst -sha256 -hex | sed 's/^.* //')

# Join cluster (kubeadm-style, but we're using raw kubelet)
# Create kubelet bootstrap kubeconfig
cat > /etc/kubernetes/bootstrap-kubelet.conf <<EOF
apiVersion: v1
kind: Config
clusters:
- cluster:
    certificate-authority: /etc/kubernetes/pki/ca.crt
    server: https://$FIRST_MASTER_IP:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubelet-bootstrap
  name: kubelet-bootstrap@kubernetes
current-context: kubelet-bootstrap@kubernetes
users:
- name: kubelet-bootstrap
  user:
    token: $TOKEN
EOF

# Start kubelet (will auto-generate client cert via CSR)
systemctl start kubelet

# Auto-approve CSR (on first master, create script to approve)
# kubectl certificate approve <csr-name>

echo "joined" > "$STATE_DIR/bootstrap-state"
date -Iseconds > "$STATE_DIR/cluster-joined"

echo "==> Worker joined successfully"
echo "==> Check status on master: kubectl get nodes"
```

### Phase 5: Ceph Cluster Bootstrap

**Runs**: After Kubernetes is ready, on first ceph-mon node
**Script**: `tools/bootstrap-ceph.sh`

```bash
#!/bin/bash
# Bootstrap Ceph cluster on first monitor node

set -euo pipefail

CONFIG_DIR="${CONFIG_DIR:-/etc/cluster-config}"
STATE_DIR="/var/lib/cluster-state"

# Check if already initialized
if [[ -f "$STATE_DIR/ceph-bootstrapped" ]]; then
  echo "==> Ceph already bootstrapped"
  exit 0
fi

NODE_NAME=$(cat "$CONFIG_DIR/node-identity")
NODE_IP=$(yq eval '.node.ip' "$CONFIG_DIR/current-node.yaml")
CLUSTER_NAME=$(yq eval '.cluster.name' "$CONFIG_DIR/cluster.yaml")

# Check if this is the first ceph-mon node
CEPH_MON_NODES=$(yq eval '.nodes[] | select(.roles[] | contains("ceph-mon")) | .name' "$CONFIG_DIR/cluster.yaml" | sort)
FIRST_MON=$(echo "$CEPH_MON_NODES" | head -1)

if [[ "$NODE_NAME" != "$FIRST_MON" ]]; then
  echo "==> Not first monitor, skipping Ceph bootstrap"
  exit 0
fi

echo "==> Bootstrapping Ceph cluster on first monitor: $NODE_NAME"

# Generate Ceph keys (if not already done)
/usr/local/bin/generate-ceph-keys.sh

FSID=$(cat /etc/ceph/cluster.fsid)

# Create ceph.conf
cat > /etc/ceph/ceph.conf <<EOF
[global]
fsid = $FSID
mon initial members = $FIRST_MON
mon host = $NODE_IP
public network = 192.168.1.0/24
cluster network = 192.168.1.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd pool default size = 3
osd pool default min size = 2
EOF

# Create monitor map
monmaptool --create --add $FIRST_MON $NODE_IP --fsid $FSID /tmp/monmap

# Initialize monitor data directory
mkdir -p /var/lib/ceph/mon/ceph-$FIRST_MON
ceph-mon --mkfs -i $FIRST_MON --monmap /tmp/monmap --keyring /etc/ceph/ceph.mon.keyring

# Start monitor
systemctl start ceph-mon@$FIRST_MON

# Wait for mon to be ready
sleep 10

# Enable manager module
ceph mgr module enable dashboard

echo "==> Ceph cluster bootstrapped"
date -Iseconds > "$STATE_DIR/ceph-bootstrapped"
```

## Bootstrap Coordination Service

**New systemd unit**: `cluster-bootstrap.service`

```ini
[Unit]
Description=Cluster Bootstrap Orchestrator
After=cluster-detect.service network-online.target
Wants=network-online.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/local/bin/cluster-bootstrap-orchestrator.sh

[Install]
WantedBy=multi-user.target
```

**Script**: `tools/cluster-bootstrap-orchestrator.sh`

```bash
#!/bin/bash
# Main orchestrator that calls appropriate bootstrap script based on node role

set -euo pipefail

NODE_ROLE=$(/usr/local/bin/am-i-first-master.sh)

case "$NODE_ROLE" in
  master-first)
    echo "==> Detected as first master, bootstrapping cluster..."
    /usr/local/bin/bootstrap-first-master.sh
    ;;
  master-additional)
    echo "==> Detected as additional master, joining cluster..."
    /usr/local/bin/join-additional-master.sh
    ;;
  worker)
    echo "==> Detected as worker, joining cluster..."
    /usr/local/bin/join-worker.sh
    ;;
  *)
    echo "ERROR: Unknown node role: $NODE_ROLE"
    exit 1
    ;;
esac

# If node has ceph-mon role, bootstrap Ceph
ROLES=$(yq eval '.node.roles[]' /etc/cluster-config/current-node.yaml)
if [[ "$ROLES" =~ "ceph-mon" ]]; then
  /usr/local/bin/bootstrap-ceph.sh
fi
```

## Bootstrap Token Distribution

**Problem**: Workers need the bootstrap token, but first master generates it.

**Solutions**:

### Option 1: HTTP Server on First Master
```bash
# On first master, start simple HTTP server
python3 -m http.server 8080 -d /var/www/html/cluster-join &

# On workers
curl http://<first-master>:8080/bootstrap-token
```

### Option 2: Embedded in cluster.yaml (Less Secure)
```yaml
# configs/cluster.yaml
cluster:
  bootstrap:
    token: "abcdef.0123456789abcdef"  # Pre-generated
```

### Option 3: External Secret Store
- First master uploads token to Vault/S3
- Workers fetch using node identity credentials

## Failure Recovery

### Bootstrap Failed Midway

```bash
#!/bin/bash
# tools/reset-bootstrap.sh
# Clean up partial bootstrap to retry

systemctl stop kubelet kube-apiserver kube-controller-manager kube-scheduler etcd
rm -rf /var/lib/etcd/*
rm -rf /var/lib/kubelet/*
rm -f /var/lib/cluster-state/bootstrap-state
echo "==> Bootstrap state reset. Reboot to retry."
```

### Additional Master Can't Join

1. Check etcd cluster health on first master
2. Verify certificates are correct (CN, SANs)
3. Check network connectivity (port 2380, 6443)
4. Review etcd logs: `journalctl -u etcd`

## Testing Bootstrap

### VM Test Scenario

```bash
# Create 3 VMs: master-01, worker-01, worker-02
# 1. Boot master-01, wait for bootstrap
# 2. Verify: kubectl get nodes (should show master-01)
# 3. Boot worker-01
# 4. Verify: kubectl get nodes (should show master-01, worker-01)
# 5. Boot worker-02
# 6. Verify: kubectl get nodes (should show all 3)
```

## Monitoring Bootstrap Progress

**Script**: `tools/check-bootstrap-status.sh`

```bash
#!/bin/bash
# Check current bootstrap status

STATE_DIR="/var/lib/cluster-state"

if [[ ! -f "$STATE_DIR/bootstrap-state" ]]; then
  echo "Status: UNINITIALIZED"
  exit 0
fi

STATE=$(cat "$STATE_DIR/bootstrap-state")
echo "Bootstrap State: $STATE"

if [[ -f "$STATE_DIR/cluster-joined" ]]; then
  echo "Joined At: $(cat "$STATE_DIR/cluster-joined")"
fi

# Check service status
echo ""
echo "Service Status:"
systemctl is-active etcd 2>/dev/null && echo "  etcd: running" || echo "  etcd: stopped"
systemctl is-active kube-apiserver 2>/dev/null && echo "  kube-apiserver: running" || echo "  kube-apiserver: stopped"
systemctl is-active kubelet 2>/dev/null && echo "  kubelet: running" || echo "  kubelet: stopped"

# Check Kubernetes node status
if command -v kubectl &>/dev/null; then
  echo ""
  echo "Kubernetes Nodes:"
  kubectl get nodes 2>/dev/null || echo "  API server not reachable"
fi
```

## Next Steps

1. Implement all bootstrap scripts
2. Test first master initialization
3. Test worker join
4. Test multi-master etcd cluster formation
5. Integrate with ISO builder
6. Add bootstrap progress logging
7. Create troubleshooting guide

---

**P.S.** This is where theory meets reality. Bootstrap is the gnarliest part because you're building the control plane with no control plane. The state machine approach (first-master vs additional-master vs worker) keeps it manageable. Token distribution is still a bit hand-wavy—recommend starting with HTTP server approach, upgrade to Vault later.