summaryrefslogtreecommitdiff
path: root/IMPLEMENTATION.md
diff options
context:
space:
mode:
authorgrothedev <grothedev@gmail.com>2025-10-26 19:10:59 -0400
committergrothedev <grothedev@gmail.com>2025-10-26 19:10:59 -0400
commit3dadb3aa1920f25a7f6d4b4775a83cabdbd8275b (patch)
treec27772a438203706fc3c212184268bfbb2ebf6b5 /IMPLEMENTATION.md
first commit. almost all claude. now time to review
Diffstat (limited to 'IMPLEMENTATION.md')
-rw-r--r--IMPLEMENTATION.md304
1 files changed, 304 insertions, 0 deletions
diff --git a/IMPLEMENTATION.md b/IMPLEMENTATION.md
new file mode 100644
index 0000000..7985246
--- /dev/null
+++ b/IMPLEMENTATION.md
@@ -0,0 +1,304 @@
+# Implementation Overview
+
+## Summary
+
+This project creates a single bootable ISO that automatically configures itself as part of a Kubernetes cluster with integrated distributed services (Ceph, Kafka, MQTT, DNS). All services are managed directly by systemd.
+
+## Architecture
+
+### Boot Flow
+
+```
+┌─────────────────────────────────────────────────────────┐
+│ 1. System Boots from ISO │
+└────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────┐
+│ 2. cluster-detect.service (Very Early Boot) │
+│ - Runs cluster-detect.sh │
+│ - Detects node identity (MAC/IP/hostname) │
+│ - Creates /etc/cluster-config/current-node.yaml │
+│ - Writes /etc/cluster-config/node-identity │
+└────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────┐
+│ 3. Environment File Generation │
+│ - Runs generate-environment-files.sh │
+│ - Creates /etc/cluster-config/environment/*.env │
+│ - Extracts node IP, cluster settings, etc. │
+└────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────┐
+│ 4. Role Activation │
+│ - Runs cluster-activate-roles.sh │
+│ - Maps roles to systemd targets │
+│ - Enables and starts targets │
+└────────────────┬────────────────────────────────────────┘
+ │
+ ▼
+┌─────────────────────────────────────────────────────────┐
+│ 5. Service Startup (Dependency Order) │
+│ - containerd.service │
+│ - etcd.service (masters only) │
+│ - kube-apiserver.service (masters only) │
+│ - kube-controller-manager.service (masters only) │
+│ - kube-scheduler.service (masters only) │
+│ - kubelet.service (all nodes) │
+│ - kafka.service (kafka nodes) │
+│ - ceph-mon@.service (ceph-mon nodes) │
+│ - ceph-osd@.service (ceph-osd nodes) │
+│ - mosquitto.service (mqtt nodes) │
+│ - coredns.service (dns nodes) │
+└─────────────────────────────────────────────────────────┘
+```
+
+## Components
+
+### Configuration Files (configs/)
+
+#### cluster.yaml
+- Defines entire cluster topology
+- Lists all nodes with IPs, hostnames, roles
+- Specifies enabled services
+- Network configuration (pod CIDR, service CIDR)
+
+#### services/*.yaml (5 files)
+- kubernetes.yaml - K8s component configuration
+- ceph.yaml - Ceph storage settings
+- kafka.yaml - Kafka broker configuration
+- mqtt.yaml - MQTT broker settings
+- dns.yaml - CoreDNS configuration
+
+#### nodes/*.yaml (5 files)
+- master-01.yaml - Control plane node
+- worker-01.yaml - Worker node
+- worker-02.yaml - Worker + Ceph OSD
+- kafka-01.yaml - Worker + Kafka broker
+- storage-01.yaml - Worker + Ceph mon + OSD
+
+### Systemd Units (systemd/)
+
+#### Services (11 files)
+1. **containerd.service** - Container runtime for Kubernetes
+2. **kubelet.service** - Kubernetes node agent
+3. **kube-apiserver.service** - Kubernetes API server
+4. **kube-controller-manager.service** - K8s controller manager
+5. **kube-scheduler.service** - K8s scheduler
+6. **etcd.service** - Key-value store for K8s
+7. **kafka.service** - Kafka broker (KRaft mode)
+8. **ceph-mon@.service** - Ceph monitor (template)
+9. **ceph-osd@.service** - Ceph OSD (template)
+10. **mosquitto.service** - MQTT broker
+11. **coredns.service** - DNS server
+
+#### Targets (7 files)
+- **kubernetes-master.target** - Pulls in K8s control plane services
+- **kubernetes-worker.target** - Pulls in kubelet
+- **kafka.target** - Pulls in Kafka broker
+- **ceph-mon.target** - Pulls in Ceph monitor
+- **ceph-osd.target** - Pulls in Ceph OSD
+- **mqtt.target** - Pulls in Mosquitto
+- **dns.target** - Pulls in CoreDNS
+
+#### Special Service
+- **cluster-detect.service** - Runs very early to detect node identity
+
+### Tools (tools/)
+
+#### Core Scripts (12 files)
+
+**Detection & Activation:**
+1. **cluster-detect.sh** - Node identity detection (MAC/IP/hostname)
+2. **cluster-activate-roles.sh** - Map roles to systemd targets
+3. **generate-environment-files.sh** - Create env files for services
+
+**Service Configuration Generators:**
+4. **kubelet-config-generator.sh** - Generate kubelet config.yaml
+5. **kube-apiserver-config-generator.sh** - Pre-start checks for API server
+6. **etcd-config-generator.sh** - Initialize etcd data directory
+7. **kafka-config-generator.sh** - Generate Kafka server.properties
+8. **ceph-mon-init.sh** - Initialize Ceph monitor
+9. **ceph-osd-init.sh** - Initialize Ceph OSD
+10. **mosquitto-config-generator.sh** - Generate mosquitto.conf
+11. **coredns-config-generator.sh** - Generate CoreDNS Corefile
+
+**Validation:**
+12. **validate-config.py** - Validate cluster configuration before build
+
+## Role-to-Target Mapping
+
+| Role | Systemd Target | Services Started |
+|------|----------------|------------------|
+| master / control-plane | kubernetes-master.target | kubelet, kube-apiserver, kube-controller-manager, kube-scheduler, etcd |
+| worker | kubernetes-worker.target | kubelet |
+| kafka-broker | kafka.target | kafka |
+| ceph-mon | ceph-mon.target | ceph-mon@node |
+| ceph-osd | ceph-osd.target | ceph-osd@X (per device) |
+| mqtt-broker | mqtt.target | mosquitto |
+| dns-server | dns.target | coredns |
+
+## File Locations (On Installed System)
+
+### Configuration
+```
+/etc/cluster-config/
+├── cluster.yaml # Full cluster topology
+├── current-node.yaml # Symlink to this node's config
+├── node-identity # This node's name
+├── services/ # Service configs
+│ ├── kubernetes.yaml
+│ ├── ceph.yaml
+│ ├── kafka.yaml
+│ ├── mqtt.yaml
+│ └── dns.yaml
+├── nodes/ # All node configs
+│ ├── master-01.yaml
+│ ├── worker-01.yaml
+│ └── ...
+└── environment/ # Generated env files
+ ├── kubelet.env
+ ├── kube-apiserver.env
+ ├── kafka.env
+ └── ...
+```
+
+### Scripts
+```
+/usr/local/bin/
+├── cluster-detect.sh
+├── cluster-activate-roles.sh
+├── generate-environment-files.sh
+├── kubelet-config-generator.sh
+├── kafka-config-generator.sh
+└── ...
+```
+
+### Systemd Units
+```
+/etc/systemd/system/
+├── cluster-detect.service
+├── containerd.service
+├── kubelet.service
+├── kube-apiserver.service
+├── kubernetes-master.target
+├── kafka.service
+└── ...
+```
+
+### Data Directories
+```
+/var/lib/
+├── kubelet/ # Kubelet data and configs
+├── etcd/ # etcd data
+├── kafka/ # Kafka logs and data
+├── ceph/ # Ceph data
+│ ├── mon/
+│ └── osd/
+└── mosquitto/ # MQTT persistence
+```
+
+## Configuration Generation Process
+
+1. **Build time**: User edits configs/ directory
+2. **Validation**: `validate-config.py` ensures correctness
+3. **ISO creation**: All configs embedded into ISO (future work)
+4. **First boot**: `cluster-detect.sh` identifies node
+5. **Environment generation**: `generate-environment-files.sh` creates .env files
+6. **Service startup**: Each service's ExecStartPre runs config generator
+7. **Runtime**: Services read from generated configs
+
+## Security Considerations
+
+### PKI/Certificates
+- **Kubernetes**: Requires CA, API server, kubelet, etcd certs
+- **Ceph**: Requires cephx authentication keys
+- **MQTT**: Password file and ACLs
+
+**TODO**: Certificate generation not yet implemented
+
+### Service Hardening
+All services use systemd security features:
+- `NoNewPrivileges=true`
+- `ProtectHome=true`
+- `ProtectSystem=strict/full`
+- `PrivateTmp=true`
+- Limited capabilities (where applicable)
+
+## Next Steps
+
+### Critical Path to Working System
+1. **Certificate/Key Generation**
+ - Script to generate Kubernetes PKI
+ - Script to generate Ceph keys
+ - MQTT password management
+
+2. **Network Configuration**
+ - Static IP assignment
+ - Network interface configuration
+ - Calico CNI installation
+
+3. **Cluster Bootstrapping**
+ - First master initialization
+ - Join tokens for workers
+ - Multi-master etcd cluster setup
+ - Ceph cluster initialization
+
+4. **ISO Builder**
+ - Take configs/ + base OS → bootable ISO
+ - Integrate kickstart/cloud-init
+ - Embed all scripts and systemd units
+
+### Nice to Have
+- Monitoring (Prometheus/Grafana)
+- Logging (Loki/journald)
+- Update mechanism
+- Rollback support
+- Interactive TUI for node selection
+- Web dashboard for cluster status
+
+## Testing Strategy
+
+### Unit Testing
+- Validate each config generator script
+- Test role-to-target mapping
+- Verify YAML parsing
+
+### Integration Testing
+- Boot test in VMs
+- Multi-node cluster formation
+- Service startup ordering
+- Failure recovery
+
+### End-to-End Testing
+- Full cluster deployment
+- Workload deployment
+- Storage provisioning
+- Message broker connectivity
+
+## Known Limitations
+
+1. **Certificate generation not implemented** - Manual PKI setup required
+2. **Single master only** - Multi-master etcd cluster needs work
+3. **No network config** - Assumes static IPs or DHCP reservations
+4. **Ceph bootstrap incomplete** - Mon/OSD initialization stubs only
+5. **No update mechanism** - Fresh install only
+6. **No secrets management** - Passwords and keys in plain text
+
+## Project Statistics
+
+- **Configuration files**: 11 (1 cluster + 5 services + 5 nodes)
+- **Systemd units**: 19 (11 services + 7 targets + 1 cluster-detect)
+- **Scripts**: 12 tools
+- **Total files**: 42+
+- **Lines of code**: ~2500+ (estimated)
+
+## References
+
+- [Kubernetes Documentation](https://kubernetes.io/docs/)
+- [Ceph Documentation](https://docs.ceph.com/)
+- [Kafka Documentation](https://kafka.apache.org/documentation/)
+- [systemd Documentation](https://systemd.io/)
+- [CoreDNS Documentation](https://coredns.io/)