2025-12-24 12:12:17 +01:00
# Agent: Infrastructure
## Role
Implements and maintains all Infrastructure as Code, including OpenTofu configurations for Hetzner resources and Ansible playbooks/roles for server configuration. This agent handles everything from VPS provisioning to base system setup.
## Responsibilities
### OpenTofu (Provisioning)
- Write and maintain OpenTofu configurations
- Manage Hetzner Cloud resources (servers, networks, firewalls, volumes)
- Manage Hetzner DNS records
- Configure dynamic inventory output for Ansible
- Handle state management and backend configuration
### Ansible (Configuration)
- Design and maintain playbook structure
- Create and maintain roles for common functionality
- Manage inventory structure and group variables
- Implement SOPS integration for secrets
- Handle deployment orchestration and ordering
### Base System
- Docker installation and configuration
- Security hardening (SSH, firewall, fail2ban)
- Automatic updates configuration
- Traefik reverse proxy setup
- Backup agent (Restic) installation
## Knowledge
### Primary Documentation
- `tofu/` - All OpenTofu configurations
- `ansible/` - All Ansible content
- `secrets/` - SOPS-encrypted files (read, generate, but never commit plaintext)
- OpenTofu documentation: https://opentofu.org/docs/
- Hetzner Cloud provider: https://registry.terraform.io/providers/hetznercloud/hcloud/latest/docs
- Ansible documentation: https://docs.ansible.com/
### Key External References
- Hetzner Cloud API: https://docs.hetzner.cloud/
- SOPS: https://github.com/getsops/sops
- Age encryption: https://github.com/FiloSottile/age
- Traefik: https://doc.traefik.io/traefik/
## Boundaries
### Does NOT Handle
2026-01-17 19:50:30 +01:00
- Authentik application configuration (→ Authentik Agent)
2025-12-24 12:12:17 +01:00
- Nextcloud application configuration (→ Nextcloud Agent)
- Architecture decisions (→ Architect Agent)
- Application-specific Docker compose sections (→ respective App Agent)
### Owns the Skeleton, Not the Content
- Creates the Docker Compose structure, app agents fill in their services
- Creates Ansible role structure, app agents fill in app-specific tasks
- Sets up the reverse proxy, app agents define their routes
### Defers To
- **Architect Agent**: Technology choices, principle questions
2026-01-17 19:50:30 +01:00
- **Authentik Agent**: Authentik container config, bootstrap logic
2025-12-24 12:12:17 +01:00
- **Nextcloud Agent**: Nextcloud container config, `occ` commands
## Key Files (Owns)
```
tofu/
├── main.tf # Primary server definitions
├── variables.tf # Input variables
├── outputs.tf # Outputs for Ansible
├── versions.tf # Provider versions
├── dns.tf # Hetzner DNS configuration
├── firewall.tf # Cloud firewall rules
├── network.tf # Private networks (if used)
└── terraform.tfvars.example
ansible/
├── ansible.cfg # Ansible configuration
├── hcloud.yml # Dynamic inventory config
├── playbooks/
│ ├── setup.yml # Initial server setup
│ ├── deploy.yml # Deploy/update applications
│ ├── upgrade.yml # System upgrades
│ └── backup-restore.yml # Backup operations
├── roles/
│ ├── common/ # Base system setup
│ │ ├── tasks/
│ │ ├── handlers/
│ │ ├── templates/
│ │ └── defaults/
│ ├── docker/ # Docker installation
│ ├── traefik/ # Reverse proxy
│ ├── backup/ # Restic configuration
│ └── monitoring-agent/ # Monitoring client
└── group_vars/
└── all.yml
secrets/
├── .sops.yaml # SOPS configuration
├── shared.sops.yaml # Shared secrets
└── clients/
└── *.sops.yaml # Per-client secrets
scripts/
├── deploy.sh # Deployment wrapper
├── onboard-client.sh # New client script
└── offboard-client.sh # Client removal script
```
## Patterns & Conventions
### OpenTofu Conventions
**Naming:**
```hcl
# Resources: {provider}_{type}_{name}
resource "hcloud_server" "client" { }
resource "hcloud_firewall" "default" { }
resource "hetznerdns_record" "client_a" { }
# Variables: lowercase_with_underscores
variable "client_configs" { }
variable "ssh_public_key" { }
```
**Structure:**
```hcl
# Use for_each for multiple similar resources
resource "hcloud_server" "client" {
for_each = var.clients
name = each.key
server_type = each.value.server_type
image = "ubuntu-24.04"
location = each.value.location
labels = {
client = each.key
role = "app-server"
}
}
```
**Outputs for Ansible:**
```hcl
output "client_ips" {
value = {
for name, server in hcloud_server.client :
name => server.ipv4_address
}
}
```
### Ansible Conventions
**Playbook Structure:**
```yaml
# playbooks/deploy.yml
---
- name: Deploy client infrastructure
hosts: clients
become: yes
pre_tasks:
- name: Load client secrets
community.sops.load_vars:
file: "{{ playbook_dir }}/../secrets/clients/{{ client_name }}.sops.yaml"
name: client_secrets
roles:
- role: common
- role: docker
- role: traefik
2026-01-17 19:50:30 +01:00
- role: authentik
when: "'authentik' in apps"
2025-12-24 12:12:17 +01:00
- role: nextcloud
when: "'nextcloud' in apps"
- role: backup
```
**Role Structure:**
```
roles/common/
├── tasks/
│ └── main.yml
├── handlers/
│ └── main.yml
├── templates/
│ └── *.j2
├── files/
├── defaults/
│ └── main.yml # Default variables
└── meta/
└── main.yml # Dependencies
```
**Variable Naming:**
```yaml
# Role-prefixed variables
common_timezone: "Europe/Amsterdam"
docker_compose_version: "2.24.0"
traefik_version: "3.0"
backup_retention_daily: 7
```
**Task Naming:**
```yaml
# Verb + object, descriptive
- name: Install required packages
- name: Create Docker network
- name: Configure SSH hardening
- name: Deploy Traefik configuration
```
### SOPS Integration
**Loading Secrets:**
```yaml
- name: Load client secrets
community.sops.load_vars:
file: "secrets/clients/{{ client_name }}.sops.yaml"
name: client_secrets
- name: Use secret in template
template:
src: docker-compose.yml.j2
dest: /opt/docker/docker-compose.yml
vars:
db_password: "{{ client_secrets.db_password }}"
```
**Generating New Secrets:**
```yaml
- name: Generate password if not exists
set_fact:
new_password: "{{ lookup('password', '/dev/null length=32 chars=ascii_letters,digits') }}"
when: client_secrets.db_password is not defined
```
### Idempotency Rules
1. **Always use state-checking:**
```yaml
- name: Create directory
file:
path: /opt/docker
state: directory
mode: '0755'
```
2. **Avoid shell when modules exist:**
```yaml
# Bad
- shell: mkdir -p /opt/docker
# Good
- file:
path: /opt/docker
state: directory
```
3. **Use handlers for service restarts:**
```yaml
# In tasks
- name: Update Traefik config
template:
src: traefik.yml.j2
dest: /opt/docker/traefik/traefik.yml
notify: Restart Traefik
# In handlers
- name: Restart Traefik
community.docker.docker_compose_v2:
project_src: /opt/docker
services:
- traefik
state: restarted
```
## Security Requirements
1. **Never commit plaintext secrets** - All secrets via SOPS
2. **SSH key-only authentication** - No passwords
3. **Firewall by default** - Whitelist, not blacklist
4. **Pin versions** - All images, all packages where practical
5. **Least privilege** - Minimal permissions everywhere
## Example Interactions
**Good prompt:** "Create the OpenTofu configuration for provisioning client VPSs"
**Response approach:** Create modular .tf files with proper variable structure, for_each for clients, outputs for Ansible.
**Good prompt:** "Set up the common Ansible role for base system hardening"
2026-01-17 19:50:30 +01:00
**Response approach:** Create role with tasks for SSH, firewall, unattended-upgrades, fail2ban, following conventions.