# Configuration Schema Design ## Overview The configuration system uses YAML files organized in a hierarchical structure. Configurations are split between: - **Cluster-level config**: Global settings, network topology, service defaults - **Node-level config**: Per-node settings, roles, and service overrides ## Directory Structure ``` configs/ ├── cluster.yaml # Cluster-wide configuration ├── services/ # Service-specific configurations │ ├── kubernetes.yaml │ ├── ceph.yaml │ ├── kafka.yaml │ ├── mqtt.yaml │ └── dns.yaml └── nodes/ # Per-node configurations ├── master-01.yaml ├── worker-01.yaml ├── kafka-01.yaml └── ... ``` ## Cluster Configuration (cluster.yaml) ```yaml cluster: name: "production-cluster" domain: "cluster.local" network: pod_cidr: "10.244.0.0/16" service_cidr: "10.96.0.0/12" dns_servers: - "10.96.0.10" nodes: # List of all nodes in the cluster - name: "master-01" hostname: "master-01.cluster.local" ip: "192.168.1.10" roles: ["master", "control-plane"] - name: "worker-01" hostname: "worker-01.cluster.local" ip: "192.168.1.20" roles: ["worker"] - name: "kafka-01" hostname: "kafka-01.cluster.local" ip: "192.168.1.30" roles: ["worker", "kafka-broker"] - name: "ceph-01" hostname: "ceph-01.cluster.local" ip: "192.168.1.40" roles: ["worker", "ceph-osd", "ceph-mon"] services: # Which services are enabled cluster-wide enabled: - kubernetes - ceph - kafka - mqtt - dns ``` ## Node Configuration (nodes/{node-name}.yaml) ```yaml node: name: "master-01" roles: - "master" - "control-plane" # Node-specific overrides hostname: "master-01.cluster.local" ip: "192.168.1.10" # Hardware/resource hints resources: cpu_cores: 8 memory_gb: 32 storage_gb: 500 # Services to run on this node services: kubernetes: enabled: true type: "master" components: - "kube-apiserver" - "kube-controller-manager" - "kube-scheduler" - "etcd" ceph: enabled: false kafka: enabled: false mqtt: enabled: false dns: enabled: true type: "coredns" ``` ## Service Configuration (services/kubernetes.yaml) ```yaml service: name: "kubernetes" version: "1.28" # Service-specific configuration config: api_server: port: 6443 bind_address: "0.0.0.0" kubelet: cgroup_driver: "systemd" container_runtime: "containerd" network_plugin: "calico" feature_gates: - "EphemeralContainers=true" # Systemd unit configuration systemd: unit_file: "kubelet.service" wants: - "containerd.service" after: - "containerd.service" - "network-online.target" ``` ## Role Definitions ### Predefined Roles - **master**: Kubernetes control plane node - **worker**: Kubernetes worker node - **kafka-broker**: Kafka message broker - **kafka-controller**: Kafka controller (KRaft mode) - **ceph-mon**: Ceph monitor daemon - **ceph-osd**: Ceph object storage daemon - **ceph-mds**: Ceph metadata server - **mqtt-broker**: MQTT message broker - **dns-server**: DNS server ### Custom Roles Users can define custom roles by creating role definition files in `roles/` directory. ## Configuration Validation Rules 1. Each node must have at least one role 2. At least one node must have the "master" role 3. Service configurations must match enabled services 4. IP addresses must be unique across nodes 5. Node names must be valid DNS names 6. Required service dependencies must be met ## Single-ISO Deployment Model This system uses a **single bootable ISO** that can be installed on any node in the cluster. Node identity is detected automatically at first boot. ### ISO Contents The ISO contains configurations for the **entire cluster**: ``` /etc/cluster-config/ ├── cluster.yaml # Full cluster topology (all nodes) ├── services/ # All service configs │ ├── kubernetes.yaml │ ├── ceph.yaml │ ├── kafka.yaml │ ├── mqtt.yaml │ └── dns.yaml └── nodes/ # Configs for every node in cluster ├── master-01.yaml ├── worker-01.yaml ├── kafka-01.yaml ├── storage-01.yaml └── ... ``` ### Boot-time Configuration Resolution (First Boot) 1. **System boots** from the ISO 2. **Very early in boot**: `cluster-detect.service` starts (before other services) 3. **Node detection** (`cluster-detect.sh`): - Try to identify node by **MAC address** (compare against `hardware.mac_addresses` in node configs) - Fallback to **IP address** detection (if static IP or DHCP reservation) - Fallback to **hostname** detection - Final fallback: **Interactive prompt** on console asking user to select node identity 4. **Once identified**: - Create symlink: `/etc/cluster-config/current-node.yaml` → `/etc/cluster-config/nodes/{detected-node}.yaml` - Write `/etc/cluster-config/node-identity` with node name 5. **Role activation** (`cluster-activate-roles.sh`): - Read roles from `current-node.yaml` - Map roles to systemd targets: - `master` → `kubernetes-master.target` - `worker` → `kubernetes-worker.target` - `kafka-broker` → `kafka.target` - `ceph-osd` → `ceph-osd.target` - etc. - Enable and start appropriate targets 6. **Service startup**: - Systemd targets pull in their service units - Services read configs from `/etc/cluster-config/services/` and `/etc/cluster-config/current-node.yaml` - Services start in dependency order ### Normal Boot (Subsequent Boots) 1. System boots 2. `cluster-detect.service` runs but finds existing `/etc/cluster-config/node-identity` 3. Skips detection, proceeds to activate saved roles 4. Services start normally based on persisted systemd target enablement ## Implementation Status - ✅ Configuration schema defined - ✅ Configuration validator tool (`tools/validate-config.py`) - ✅ Node detection script (`tools/cluster-detect.sh`) - ✅ Role activation script (`tools/cluster-activate-roles.sh`) - ✅ Environment file generator (`tools/generate-environment-files.sh`) - ✅ Systemd service units and targets (19 units total) - ✅ Service unit files (containerd, kubelet, kube-apiserver, etcd, kafka, ceph, mqtt, coredns) - ✅ Service configuration generators (8 scripts) - ⏳ Certificate/key generation (Kubernetes PKI, Ceph keys) - ⏳ Network configuration on boot - ⏳ ISO builder tool - ⏳ Cluster bootstrapping (multi-master, join tokens) See [IMPLEMENTATION.md](IMPLEMENTATION.md) for complete architecture overview.