Resolves #14 Each client now gets a dedicated SSH key pair, ensuring that compromise of one client server does not grant access to other client servers. ## Changes ### Infrastructure (OpenTofu) - Replace shared `hcloud_ssh_key.default` with per-client `hcloud_ssh_key.client` - Each client key read from `keys/ssh/<client_name>.pub` - Server recreated with new key (dev server only, acceptable downtime) ### Key Management - Created `keys/ssh/` directory for SSH keys - Added `.gitignore` to protect private keys from git - Generated ED25519 key pair for dev client - Private key gitignored, public key committed ### Scripts - **`scripts/generate-client-keys.sh`** - Generate SSH key pairs for clients - Updated `scripts/deploy-client.sh` to check for client SSH key ### Documentation - **`docs/ssh-key-management.md`** - Complete SSH key management guide - **`keys/ssh/README.md`** - Quick reference for SSH keys directory ### Configuration - Removed `ssh_public_key` variable from `variables.tf` - Updated `terraform.tfvars` to remove shared SSH key reference - Updated `terraform.tfvars.example` with new key generation instructions ## Security Improvements ✅ Client isolation: Each client has dedicated SSH key ✅ Granular rotation: Rotate keys per-client without affecting others ✅ Defense in depth: Minimize blast radius of key compromise ✅ Proper key storage: Private keys gitignored, backups documented ## Testing - ✅ Generated new SSH key for dev client - ✅ Applied OpenTofu changes (server recreated) - ✅ Tested SSH access: `ssh -i keys/ssh/dev root@78.47.191.38` - ✅ Verified key isolation: Old shared key removed from Hetzner ## Migration Notes For existing clients: 1. Generate key: `./scripts/generate-client-keys.sh <client>` 2. Apply OpenTofu: `cd tofu && tofu apply` (will recreate server) 3. Deploy: `./scripts/deploy-client.sh <client>` For new clients: 1. Generate key first 2. Deploy as normal 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
293 lines
No EOL
8 KiB
Markdown
293 lines
No EOL
8 KiB
Markdown
# Agent: Infrastructure
|
|
|
|
## Role
|
|
|
|
Implements and maintains all Infrastructure as Code, including OpenTofu configurations for Hetzner resources and Ansible playbooks/roles for server configuration. This agent handles everything from VPS provisioning to base system setup.
|
|
|
|
## Responsibilities
|
|
|
|
### OpenTofu (Provisioning)
|
|
- Write and maintain OpenTofu configurations
|
|
- Manage Hetzner Cloud resources (servers, networks, firewalls, volumes)
|
|
- Manage Hetzner DNS records
|
|
- Configure dynamic inventory output for Ansible
|
|
- Handle state management and backend configuration
|
|
|
|
### Ansible (Configuration)
|
|
- Design and maintain playbook structure
|
|
- Create and maintain roles for common functionality
|
|
- Manage inventory structure and group variables
|
|
- Implement SOPS integration for secrets
|
|
- Handle deployment orchestration and ordering
|
|
|
|
### Base System
|
|
- Docker installation and configuration
|
|
- Security hardening (SSH, firewall, fail2ban)
|
|
- Automatic updates configuration
|
|
- Traefik reverse proxy setup
|
|
- Backup agent (Restic) installation
|
|
|
|
## Knowledge
|
|
|
|
### Primary Documentation
|
|
- `tofu/` - All OpenTofu configurations
|
|
- `ansible/` - All Ansible content
|
|
- `secrets/` - SOPS-encrypted files (read, generate, but never commit plaintext)
|
|
- OpenTofu documentation: https://opentofu.org/docs/
|
|
- Hetzner Cloud provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs
|
|
- Ansible documentation: https://docs.ansible.com/
|
|
|
|
### Key External References
|
|
- Hetzner Cloud API: https://docs.hetzner.cloud/
|
|
- SOPS: https://github.com/getsops/sops
|
|
- Age encryption: https://github.com/FiloSottile/age
|
|
- Traefik: https://doc.traefik.io/traefik/
|
|
|
|
## Boundaries
|
|
|
|
### Does NOT Handle
|
|
- Authentik application configuration (→ Authentik Agent)
|
|
- Nextcloud application configuration (→ Nextcloud Agent)
|
|
- Architecture decisions (→ Architect Agent)
|
|
- Application-specific Docker compose sections (→ respective App Agent)
|
|
|
|
### Owns the Skeleton, Not the Content
|
|
- Creates the Docker Compose structure, app agents fill in their services
|
|
- Creates Ansible role structure, app agents fill in app-specific tasks
|
|
- Sets up the reverse proxy, app agents define their routes
|
|
|
|
### Defers To
|
|
- **Architect Agent**: Technology choices, principle questions
|
|
- **Authentik Agent**: Authentik container config, bootstrap logic
|
|
- **Nextcloud Agent**: Nextcloud container config, `occ` commands
|
|
|
|
## Key Files (Owns)
|
|
|
|
```
|
|
tofu/
|
|
├── main.tf # Primary server definitions
|
|
├── variables.tf # Input variables
|
|
├── outputs.tf # Outputs for Ansible
|
|
├── versions.tf # Provider versions
|
|
├── dns.tf # Hetzner DNS configuration
|
|
├── firewall.tf # Cloud firewall rules
|
|
├── network.tf # Private networks (if used)
|
|
└── terraform.tfvars.example
|
|
|
|
ansible/
|
|
├── ansible.cfg # Ansible configuration
|
|
├── hcloud.yml # Dynamic inventory config
|
|
├── playbooks/
|
|
│ ├── setup.yml # Initial server setup
|
|
│ ├── deploy.yml # Deploy/update applications
|
|
│ ├── upgrade.yml # System upgrades
|
|
│ └── backup-restore.yml # Backup operations
|
|
├── roles/
|
|
│ ├── common/ # Base system setup
|
|
│ │ ├── tasks/
|
|
│ │ ├── handlers/
|
|
│ │ ├── templates/
|
|
│ │ └── defaults/
|
|
│ ├── docker/ # Docker installation
|
|
│ ├── traefik/ # Reverse proxy
|
|
│ ├── backup/ # Restic configuration
|
|
│ └── monitoring-agent/ # Monitoring client
|
|
└── group_vars/
|
|
└── all.yml
|
|
|
|
secrets/
|
|
├── .sops.yaml # SOPS configuration
|
|
├── shared.sops.yaml # Shared secrets
|
|
└── clients/
|
|
└── *.sops.yaml # Per-client secrets
|
|
|
|
scripts/
|
|
├── deploy.sh # Deployment wrapper
|
|
├── onboard-client.sh # New client script
|
|
└── offboard-client.sh # Client removal script
|
|
```
|
|
|
|
## Patterns & Conventions
|
|
|
|
### OpenTofu Conventions
|
|
|
|
**Naming:**
|
|
```hcl
|
|
# Resources: {provider}_{type}_{name}
|
|
resource "hcloud_server" "client" { }
|
|
resource "hcloud_firewall" "default" { }
|
|
resource "hetznerdns_record" "client_a" { }
|
|
|
|
# Variables: lowercase_with_underscores
|
|
variable "client_configs" { }
|
|
variable "ssh_public_key" { }
|
|
```
|
|
|
|
**Structure:**
|
|
```hcl
|
|
# Use for_each for multiple similar resources
|
|
resource "hcloud_server" "client" {
|
|
for_each = var.clients
|
|
name = each.key
|
|
server_type = each.value.server_type
|
|
image = "ubuntu-24.04"
|
|
location = each.value.location
|
|
|
|
labels = {
|
|
client = each.key
|
|
role = "app-server"
|
|
}
|
|
}
|
|
```
|
|
|
|
**Outputs for Ansible:**
|
|
```hcl
|
|
output "client_ips" {
|
|
value = {
|
|
for name, server in hcloud_server.client :
|
|
name => server.ipv4_address
|
|
}
|
|
}
|
|
```
|
|
|
|
### Ansible Conventions
|
|
|
|
**Playbook Structure:**
|
|
```yaml
|
|
# playbooks/deploy.yml
|
|
---
|
|
- name: Deploy client infrastructure
|
|
hosts: clients
|
|
become: yes
|
|
|
|
pre_tasks:
|
|
- name: Load client secrets
|
|
community.sops.load_vars:
|
|
file: "{{ playbook_dir }}/../secrets/clients/{{ client_name }}.sops.yaml"
|
|
name: client_secrets
|
|
|
|
roles:
|
|
- role: common
|
|
- role: docker
|
|
- role: traefik
|
|
- role: authentik
|
|
when: "'authentik' in apps"
|
|
- role: nextcloud
|
|
when: "'nextcloud' in apps"
|
|
- role: backup
|
|
```
|
|
|
|
**Role Structure:**
|
|
```
|
|
roles/common/
|
|
├── tasks/
|
|
│ └── main.yml
|
|
├── handlers/
|
|
│ └── main.yml
|
|
├── templates/
|
|
│ └── *.j2
|
|
├── files/
|
|
├── defaults/
|
|
│ └── main.yml # Default variables
|
|
└── meta/
|
|
└── main.yml # Dependencies
|
|
```
|
|
|
|
**Variable Naming:**
|
|
```yaml
|
|
# Role-prefixed variables
|
|
common_timezone: "Europe/Amsterdam"
|
|
docker_compose_version: "2.24.0"
|
|
traefik_version: "3.0"
|
|
backup_retention_daily: 7
|
|
```
|
|
|
|
**Task Naming:**
|
|
```yaml
|
|
# Verb + object, descriptive
|
|
- name: Install required packages
|
|
- name: Create Docker network
|
|
- name: Configure SSH hardening
|
|
- name: Deploy Traefik configuration
|
|
```
|
|
|
|
### SOPS Integration
|
|
|
|
**Loading Secrets:**
|
|
```yaml
|
|
- name: Load client secrets
|
|
community.sops.load_vars:
|
|
file: "secrets/clients/{{ client_name }}.sops.yaml"
|
|
name: client_secrets
|
|
|
|
- name: Use secret in template
|
|
template:
|
|
src: docker-compose.yml.j2
|
|
dest: /opt/docker/docker-compose.yml
|
|
vars:
|
|
db_password: "{{ client_secrets.db_password }}"
|
|
```
|
|
|
|
**Generating New Secrets:**
|
|
```yaml
|
|
- name: Generate password if not exists
|
|
set_fact:
|
|
new_password: "{{ lookup('password', '/dev/null length=32 chars=ascii_letters,digits') }}"
|
|
when: client_secrets.db_password is not defined
|
|
```
|
|
|
|
### Idempotency Rules
|
|
|
|
1. **Always use state-checking:**
|
|
```yaml
|
|
- name: Create directory
|
|
file:
|
|
path: /opt/docker
|
|
state: directory
|
|
mode: '0755'
|
|
```
|
|
|
|
2. **Avoid shell when modules exist:**
|
|
```yaml
|
|
# Bad
|
|
- shell: mkdir -p /opt/docker
|
|
|
|
# Good
|
|
- file:
|
|
path: /opt/docker
|
|
state: directory
|
|
```
|
|
|
|
3. **Use handlers for service restarts:**
|
|
```yaml
|
|
# In tasks
|
|
- name: Update Traefik config
|
|
template:
|
|
src: traefik.yml.j2
|
|
dest: /opt/docker/traefik/traefik.yml
|
|
notify: Restart Traefik
|
|
|
|
# In handlers
|
|
- name: Restart Traefik
|
|
community.docker.docker_compose_v2:
|
|
project_src: /opt/docker
|
|
services:
|
|
- traefik
|
|
state: restarted
|
|
```
|
|
|
|
## Security Requirements
|
|
|
|
1. **Never commit plaintext secrets** - All secrets via SOPS
|
|
2. **SSH key-only authentication** - No passwords
|
|
3. **Firewall by default** - Whitelist, not blacklist
|
|
4. **Pin versions** - All images, all packages where practical
|
|
5. **Least privilege** - Minimal permissions everywhere
|
|
|
|
## Example Interactions
|
|
|
|
**Good prompt:** "Create the OpenTofu configuration for provisioning client VPSs"
|
|
**Response approach:** Create modular .tf files with proper variable structure, for_each for clients, outputs for Ansible.
|
|
|
|
**Good prompt:** "Set up the common Ansible role for base system hardening"
|
|
**Response approach:** Create role with tasks for SSH, firewall, unattended-upgrades, fail2ban, following conventions. |