summaryrefslogtreecommitdiff
path: root/IMPLEMENTATION.md
blob: 79852462b1f271f48ee9456ec0863a6e7a12ac66 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
# Implementation Overview

## Summary

This project creates a single bootable ISO that automatically configures itself as part of a Kubernetes cluster with integrated distributed services (Ceph, Kafka, MQTT, DNS). All services are managed directly by systemd.

## Architecture

### Boot Flow

```
┌─────────────────────────────────────────────────────────┐
│ 1. System Boots from ISO                                │
└────────────────┬────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────┐
│ 2. cluster-detect.service (Very Early Boot)             │
│    - Runs cluster-detect.sh                             │
│    - Detects node identity (MAC/IP/hostname)            │
│    - Creates /etc/cluster-config/current-node.yaml      │
│    - Writes /etc/cluster-config/node-identity           │
└────────────────┬────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────┐
│ 3. Environment File Generation                          │
│    - Runs generate-environment-files.sh                 │
│    - Creates /etc/cluster-config/environment/*.env      │
│    - Extracts node IP, cluster settings, etc.           │
└────────────────┬────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────┐
│ 4. Role Activation                                      │
│    - Runs cluster-activate-roles.sh                     │
│    - Maps roles to systemd targets                      │
│    - Enables and starts targets                         │
└────────────────┬────────────────────────────────────────┘
                 │
                 ▼
┌─────────────────────────────────────────────────────────┐
│ 5. Service Startup (Dependency Order)                   │
│    - containerd.service                                 │
│    - etcd.service (masters only)                        │
│    - kube-apiserver.service (masters only)              │
│    - kube-controller-manager.service (masters only)     │
│    - kube-scheduler.service (masters only)              │
│    - kubelet.service (all nodes)                        │
│    - kafka.service (kafka nodes)                        │
│    - ceph-mon@.service (ceph-mon nodes)                 │
│    - ceph-osd@.service (ceph-osd nodes)                 │
│    - mosquitto.service (mqtt nodes)                     │
│    - coredns.service (dns nodes)                        │
└─────────────────────────────────────────────────────────┘
```

## Components

### Configuration Files (configs/)

#### cluster.yaml
- Defines entire cluster topology
- Lists all nodes with IPs, hostnames, roles
- Specifies enabled services
- Network configuration (pod CIDR, service CIDR)

#### services/*.yaml (5 files)
- kubernetes.yaml - K8s component configuration
- ceph.yaml - Ceph storage settings
- kafka.yaml - Kafka broker configuration
- mqtt.yaml - MQTT broker settings
- dns.yaml - CoreDNS configuration

#### nodes/*.yaml (5 files)
- master-01.yaml - Control plane node
- worker-01.yaml - Worker node
- worker-02.yaml - Worker + Ceph OSD
- kafka-01.yaml - Worker + Kafka broker
- storage-01.yaml - Worker + Ceph mon + OSD

### Systemd Units (systemd/)

#### Services (11 files)
1. **containerd.service** - Container runtime for Kubernetes
2. **kubelet.service** - Kubernetes node agent
3. **kube-apiserver.service** - Kubernetes API server
4. **kube-controller-manager.service** - K8s controller manager
5. **kube-scheduler.service** - K8s scheduler
6. **etcd.service** - Key-value store for K8s
7. **kafka.service** - Kafka broker (KRaft mode)
8. **ceph-mon@.service** - Ceph monitor (template)
9. **ceph-osd@.service** - Ceph OSD (template)
10. **mosquitto.service** - MQTT broker
11. **coredns.service** - DNS server

#### Targets (7 files)
- **kubernetes-master.target** - Pulls in K8s control plane services
- **kubernetes-worker.target** - Pulls in kubelet
- **kafka.target** - Pulls in Kafka broker
- **ceph-mon.target** - Pulls in Ceph monitor
- **ceph-osd.target** - Pulls in Ceph OSD
- **mqtt.target** - Pulls in Mosquitto
- **dns.target** - Pulls in CoreDNS

#### Special Service
- **cluster-detect.service** - Runs very early to detect node identity

### Tools (tools/)

#### Core Scripts (12 files)

**Detection & Activation:**
1. **cluster-detect.sh** - Node identity detection (MAC/IP/hostname)
2. **cluster-activate-roles.sh** - Map roles to systemd targets
3. **generate-environment-files.sh** - Create env files for services

**Service Configuration Generators:**
4. **kubelet-config-generator.sh** - Generate kubelet config.yaml
5. **kube-apiserver-config-generator.sh** - Pre-start checks for API server
6. **etcd-config-generator.sh** - Initialize etcd data directory
7. **kafka-config-generator.sh** - Generate Kafka server.properties
8. **ceph-mon-init.sh** - Initialize Ceph monitor
9. **ceph-osd-init.sh** - Initialize Ceph OSD
10. **mosquitto-config-generator.sh** - Generate mosquitto.conf
11. **coredns-config-generator.sh** - Generate CoreDNS Corefile

**Validation:**
12. **validate-config.py** - Validate cluster configuration before build

## Role-to-Target Mapping

| Role | Systemd Target | Services Started |
|------|----------------|------------------|
| master / control-plane | kubernetes-master.target | kubelet, kube-apiserver, kube-controller-manager, kube-scheduler, etcd |
| worker | kubernetes-worker.target | kubelet |
| kafka-broker | kafka.target | kafka |
| ceph-mon | ceph-mon.target | ceph-mon@node |
| ceph-osd | ceph-osd.target | ceph-osd@X (per device) |
| mqtt-broker | mqtt.target | mosquitto |
| dns-server | dns.target | coredns |

## File Locations (On Installed System)

### Configuration
```
/etc/cluster-config/
├── cluster.yaml                    # Full cluster topology
├── current-node.yaml               # Symlink to this node's config
├── node-identity                   # This node's name
├── services/                       # Service configs
│   ├── kubernetes.yaml
│   ├── ceph.yaml
│   ├── kafka.yaml
│   ├── mqtt.yaml
│   └── dns.yaml
├── nodes/                          # All node configs
│   ├── master-01.yaml
│   ├── worker-01.yaml
│   └── ...
└── environment/                    # Generated env files
    ├── kubelet.env
    ├── kube-apiserver.env
    ├── kafka.env
    └── ...
```

### Scripts
```
/usr/local/bin/
├── cluster-detect.sh
├── cluster-activate-roles.sh
├── generate-environment-files.sh
├── kubelet-config-generator.sh
├── kafka-config-generator.sh
└── ...
```

### Systemd Units
```
/etc/systemd/system/
├── cluster-detect.service
├── containerd.service
├── kubelet.service
├── kube-apiserver.service
├── kubernetes-master.target
├── kafka.service
└── ...
```

### Data Directories
```
/var/lib/
├── kubelet/                        # Kubelet data and configs
├── etcd/                          # etcd data
├── kafka/                         # Kafka logs and data
├── ceph/                          # Ceph data
│   ├── mon/
│   └── osd/
└── mosquitto/                     # MQTT persistence
```

## Configuration Generation Process

1. **Build time**: User edits configs/ directory
2. **Validation**: `validate-config.py` ensures correctness
3. **ISO creation**: All configs embedded into ISO (future work)
4. **First boot**: `cluster-detect.sh` identifies node
5. **Environment generation**: `generate-environment-files.sh` creates .env files
6. **Service startup**: Each service's ExecStartPre runs config generator
7. **Runtime**: Services read from generated configs

## Security Considerations

### PKI/Certificates
- **Kubernetes**: Requires CA, API server, kubelet, etcd certs
- **Ceph**: Requires cephx authentication keys
- **MQTT**: Password file and ACLs

**TODO**: Certificate generation not yet implemented

### Service Hardening
All services use systemd security features:
- `NoNewPrivileges=true`
- `ProtectHome=true`
- `ProtectSystem=strict/full`
- `PrivateTmp=true`
- Limited capabilities (where applicable)

## Next Steps

### Critical Path to Working System
1. **Certificate/Key Generation**
   - Script to generate Kubernetes PKI
   - Script to generate Ceph keys
   - MQTT password management

2. **Network Configuration**
   - Static IP assignment
   - Network interface configuration
   - Calico CNI installation

3. **Cluster Bootstrapping**
   - First master initialization
   - Join tokens for workers
   - Multi-master etcd cluster setup
   - Ceph cluster initialization

4. **ISO Builder**
   - Take configs/ + base OS → bootable ISO
   - Integrate kickstart/cloud-init
   - Embed all scripts and systemd units

### Nice to Have
- Monitoring (Prometheus/Grafana)
- Logging (Loki/journald)
- Update mechanism
- Rollback support
- Interactive TUI for node selection
- Web dashboard for cluster status

## Testing Strategy

### Unit Testing
- Validate each config generator script
- Test role-to-target mapping
- Verify YAML parsing

### Integration Testing
- Boot test in VMs
- Multi-node cluster formation
- Service startup ordering
- Failure recovery

### End-to-End Testing
- Full cluster deployment
- Workload deployment
- Storage provisioning
- Message broker connectivity

## Known Limitations

1. **Certificate generation not implemented** - Manual PKI setup required
2. **Single master only** - Multi-master etcd cluster needs work
3. **No network config** - Assumes static IPs or DHCP reservations
4. **Ceph bootstrap incomplete** - Mon/OSD initialization stubs only
5. **No update mechanism** - Fresh install only
6. **No secrets management** - Passwords and keys in plain text

## Project Statistics

- **Configuration files**: 11 (1 cluster + 5 services + 5 nodes)
- **Systemd units**: 19 (11 services + 7 targets + 1 cluster-detect)
- **Scripts**: 12 tools
- **Total files**: 42+
- **Lines of code**: ~2500+ (estimated)

## References

- [Kubernetes Documentation](https://kubernetes.io/docs/)
- [Ceph Documentation](https://docs.ceph.com/)
- [Kafka Documentation](https://kafka.apache.org/documentation/)
- [systemd Documentation](https://systemd.io/)
- [CoreDNS Documentation](https://coredns.io/)