Commit graph

15 commits

Author SHA1 Message Date
Pieter
b6c9fa666d chore: Post-workshop state - January 23rd, 2026
This commit captures the infrastructure state immediately following
the "Post-Tyranny Tech" workshop on January 23rd, 2026.

Infrastructure Status:
- 13 client servers deployed (white, valk, zwaan, specht, das, uil, vos,
  haas, wolf, ree, mees, mus, mol, kikker)
- Services: Authentik SSO, Nextcloud, Collabora Office, Traefik
- Private network architecture with edge NAT gateway
- OIDC integration between Authentik and Nextcloud
- Automated recovery flows and invitation system
- Container update monitoring with Diun
- Uptime monitoring with Uptime Kuma

Changes include:
- Multiple new client host configurations
- Network architecture improvements (private IPs + NAT)
- DNS management automation
- Container update notifications
- Email configuration via Mailgun
- SSH key generation for all clients
- Encrypted secrets for all deployments
- Health check and diagnostic scripts

Known Issues to Address:
- Nextcloud version pinned to v30 (should use 'latest' or v32)
- Zitadel references in templates (migrated to Authentik but templates not updated)
- Traefik dynamic config has obsolete static routes

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-23 20:36:31 +01:00
Pieter
55fd2be9e5 feat: Add DNS configuration and Docker improvements
Common role improvements:
- Add systemd-resolved DNS configuration (Google + Cloudflare)
- Ensures reliable DNS resolution for private network servers
- Flush handlers immediately to apply DNS before other tasks

Docker role improvements:
- Enhanced Docker daemon configuration
- Better support for private network deployments

Scripts:
- Update add-client-to-terraform.sh for new architecture

These changes ensure private network clients can resolve DNS and
access internet via NAT gateway.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-20 19:06:32 +01:00
Pieter
9a3afa325b feat: Configure status.vrije.cloud and auto-monitor integration
Updates to Uptime Kuma monitoring setup:

DNS Configuration:
- Added DNS A record for status.vrije.cloud -> 94.130.231.155
- Updated Uptime Kuma container to use status.vrije.cloud domain
- HTTPS access via nginx-proxy with Let's Encrypt SSL

Automated Monitor Management:
- Created scripts/add-client-to-monitoring.sh
- Created scripts/remove-client-from-monitoring.sh
- Integrated monitoring into deploy-client.sh (step 5/5)
- Integrated monitoring into destroy-client.sh (step 0/7)
- Deployment now prompts to add monitors after success
- Destruction now prompts to remove monitors before deletion

Email Notification Setup:
- Created docs/uptime-kuma-email-setup.md with complete guide
- SMTP configuration using smtp.strato.com
- Credentials: server@postxsociety.org
- Alerts sent to mail@postxsociety.org

Documentation:
- Updated docs/monitoring.md with new domain
- Added email setup reference
- Replaced all URLs to use status.vrije.cloud

Benefits:
 Friendly domain instead of IP address
 HTTPS access with auto-SSL
 Automated monitoring reminders on deploy/destroy
 Complete email notification guide
 Streamlined workflow for monitor management

Note: Monitor creation/deletion currently manual (API automation planned)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-18 18:55:33 +01:00
Pieter
e04efa1cb1 feat: Move Hetzner API token to SOPS encrypted secrets
Resolves #20

Changes:
- Add hcloud_token to secrets/shared.sops.yaml (encrypted with Age)
- Create scripts/load-secrets-env.sh to automatically load token from SOPS
- Update all management scripts to auto-load token if not set
- Remove plaintext tokens from tofu/terraform.tfvars
- Update documentation in README.md, scripts/README.md, and SECURITY-NOTE-tokens.md

Benefits:
 Token encrypted at rest
 Can be safely backed up to cloud storage
 Consistent with other secrets management
 Automatic loading - no manual token management needed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-18 18:17:15 +01:00
Pieter
f795920f24 🚀 GREEN CLIENT DEPLOYMENT + CRITICAL SECURITY FIXES
═══════════════════════════════════════════════════════════════
 COMPLETED: Green Client Deployment (green.vrije.cloud)
═══════════════════════════════════════════════════════════════

Services deployed and operational:
- Traefik (reverse proxy with SSL)
- Authentik SSO (auth.green.vrije.cloud)
- Nextcloud (nextcloud.green.vrije.cloud)
- Collabora Office (online document editing)
- PostgreSQL databases (Authentik + Nextcloud)
- Redis (caching + file locking)

═══════════════════════════════════════════════════════════════
🔐 CRITICAL SECURITY FIX: Unique Passwords Per Client
═══════════════════════════════════════════════════════════════

PROBLEM FIXED:
All clients were using IDENTICAL passwords from template (critical vulnerability).
If one server compromised, all servers compromised.

SOLUTION IMPLEMENTED:
 Auto-generate unique passwords per client
 Store securely in SOPS-encrypted files
 Easy retrieval with get-passwords.sh script

NEW SCRIPTS:
- scripts/generate-passwords.sh - Auto-generate unique 43-char passwords
- scripts/get-passwords.sh      - Retrieve client credentials from SOPS

UPDATED SCRIPTS:
- scripts/deploy-client.sh - Now auto-calls password generator

PASSWORD CHANGES:
- dev.sops.yaml   - Regenerated with unique passwords
- green.sops.yaml - Created with unique passwords

SECURITY PROPERTIES:
- 43-character passwords (258 bits entropy)
- Cryptographically secure (openssl rand -base64 32)
- Unique across all clients
- Stored encrypted with SOPS + age

═══════════════════════════════════════════════════════════════
🛠️  BUG FIX: Nextcloud Volume Mounting
═══════════════════════════════════════════════════════════════

PROBLEM FIXED:
Volume detection was looking for "nextcloud-data-{client}" in device ID,
but Hetzner volumes use numeric IDs (scsi-0HC_Volume_104429514).

SOLUTION:
Simplified detection to find first Hetzner volume (works for all clients):
  ls -1 /dev/disk/by-id/scsi-0HC_Volume_* | head -1

FIXED FILE:
- ansible/roles/nextcloud/tasks/mount-volume.yml:15

═══════════════════════════════════════════════════════════════
🐛 BUG FIX: Authentik Invitation Task Safety
═══════════════════════════════════════════════════════════════

PROBLEM FIXED:
invitation.yml task crashed when accessing undefined variable attribute
(enrollment_blueprint_result.rc when API not ready).

SOLUTION:
Added safety checks before accessing variable attributes:
  {{ 'In Progress' if (var is defined and var.rc is defined) else 'Complete' }}

FIXED FILE:
- ansible/roles/authentik/tasks/invitation.yml:91

═══════════════════════════════════════════════════════════════
📝 OTHER CHANGES
═══════════════════════════════════════════════════════════════

GITIGNORE:
- Added *.md (except README.md) to exclude deployment reports

GREEN CLIENT FILES:
- keys/ssh/green.pub - SSH public key for green server
- secrets/clients/green.sops.yaml - Encrypted secrets with unique passwords

═══════════════════════════════════════════════════════════════
 IMPACT: All Future Deployments Now Secure & Reliable
═══════════════════════════════════════════════════════════════

FUTURE DEPLOYMENTS:
-  Automatically get unique passwords
-  Volume mounting works reliably
-  Ansible tasks handle API delays gracefully
-  No manual intervention required

DEPLOYMENT TIME: ~15 minutes (fully automated)
AUTOMATION RATE: 95%

═══════════════════════════════════════════════════════════════

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-18 17:06:04 +01:00
Pieter
62977285ad feat: Automate OpenTofu terraform.tfvars management
Add automation to streamline client onboarding by managing terraform.tfvars:

New Script:
- scripts/add-client-to-terraform.sh: Add clients to OpenTofu config
  - Interactive and non-interactive modes
  - Configurable server type, location, volume size
  - Validates client names
  - Detects existing entries
  - Shows configuration preview before applying
  - Clear next-steps guidance

Updated Scripts:
- scripts/deploy-client.sh: Check for terraform.tfvars entry
  - Detects missing clients
  - Prompts to add automatically
  - Calls add-client-to-terraform.sh if user confirms
  - Fails gracefully with instructions if declined

- scripts/rebuild-client.sh: Validate terraform.tfvars
  - Ensures client exists before rebuild
  - Clear error if missing
  - Directs to deploy-client.sh for new clients

Benefits:
 Eliminates manual terraform.tfvars editing
 Reduces human error in configuration
 Consistent client configuration structure
 Guided workflow with clear prompts
 Validation prevents common mistakes

Test Results (blue client):
-  SSH key auto-generation (working)
-  Secrets template creation (working)
-  Terraform.tfvars automation (working)
- ⏸️ Full deployment test (in progress)

Usage:
```bash
# Standalone
./scripts/add-client-to-terraform.sh myclient

# With options
./scripts/add-client-to-terraform.sh myclient \
  --server-type=cx22 \
  --location=fsn1 \
  --volume-size=100

# Non-interactive (for scripts)
./scripts/add-client-to-terraform.sh myclient \
  --volume-size=50 \
  --non-interactive

# Integrated (automatic prompt)
./scripts/deploy-client.sh myclient
# → Detects missing terraform.tfvars entry
# → Offers to add automatically
```

This increases deployment automation from ~60% to ~85%,
leaving only security-sensitive steps (secrets editing, infrastructure approval) as manual.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-17 21:34:05 +01:00
Pieter
9eb6f2028a feat: Use Hetzner Volumes for Nextcloud data storage (issue #18)
Implement persistent block storage for Nextcloud user data, separating application and data layers:

OpenTofu Changes:
- tofu/volumes.tf: Create and attach Hetzner Volumes per client
  - Configurable size per client (default 100 GB for dev)
  - ext4 formatted, attached but not auto-mounted
- tofu/variables.tf: Add nextcloud_volume_size to client config
- tofu/terraform.tfvars: Set volume size for dev client (100 GB ~€5.40/mo)

Ansible Changes:
- ansible/roles/nextcloud/tasks/mount-volume.yml: New mount tasks
  - Detect volume device automatically
  - Format if needed, mount at /mnt/nextcloud-data
  - Add to fstab for persistence
  - Set correct permissions for www-data
- ansible/roles/nextcloud/tasks/main.yml: Include volume mounting
- ansible/roles/nextcloud/templates/docker-compose.nextcloud.yml.j2:
  - Use host mount /mnt/nextcloud-data/data instead of Docker volume
  - Keep app code in Docker volume (nextcloud-app)
  - User data now on Hetzner Volume

Scripts:
- scripts/resize-client-volume.sh: Online volume resizing
  - Resize via Hetzner API
  - Expand filesystem automatically
  - Show cost impact
  - Verify new size

Documentation:
- docs/storage-architecture.md: Complete storage guide
  - Architecture diagrams
  - Volume specifications
  - Sizing guidelines
  - Operations procedures
  - Performance considerations
  - Troubleshooting guide

- docs/volume-migration.md: Step-by-step migration
  - Safe migration from Docker volumes
  - Rollback procedures
  - Verification checklist
  - Timeline estimates

Benefits:
 Data independent from server instance
 Resize storage without rebuilding server
 Easy data migration between servers
 Better separation of concerns (app vs data)
 Simplified backup strategy
 Cost-optimized (pay for what you use)

Volume Pricing:
- 50 GB: ~€2.70/month
- 100 GB: ~€5.40/month
- 250 GB: ~€13.50/month
- Resizable online, no downtime

Note: Existing clients require manual migration
Follow docs/volume-migration.md for safe migration procedure

Closes #18

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-17 21:07:48 +01:00
Pieter
0c4d536246 feat: Add version tracking and maintenance monitoring (issue #15)
Complete implementation of automatic version tracking and drift detection:

New Scripts:
- scripts/collect-client-versions.sh: Query deployed versions from Docker
  - Connects via Ansible to running servers
  - Extracts versions from container images
  - Updates registry automatically

- scripts/check-client-versions.sh: Compare versions across clients
  - Multiple formats: table (colorized), CSV, JSON
  - Filter by outdated versions
  - Highlights drift with color coding

- scripts/detect-version-drift.sh: Identify version differences
  - Detects clients with outdated versions
  - Threshold-based staleness detection (default 30 days)
  - Actionable recommendations
  - Exit code 1 if drift detected (CI/monitoring friendly)

Updated Scripts:
- scripts/deploy-client.sh: Auto-collect versions after deployment
- scripts/rebuild-client.sh: Auto-collect versions after rebuild

Documentation:
- docs/maintenance-tracking.md: Complete maintenance guide
  - Version management workflows
  - Security update procedures
  - Monitoring integration examples
  - Troubleshooting guide

Features:
 Automatic version collection from deployed servers
 Multi-client version comparison reports
 Version drift detection with recommendations
 Integration with deployment workflows
 Export to CSV/JSON for external tools
 Canary-first update workflow support

Usage Examples:
```bash
# Collect versions
./scripts/collect-client-versions.sh dev

# Compare all clients
./scripts/check-client-versions.sh

# Detect drift
./scripts/detect-version-drift.sh

# Export for monitoring
./scripts/check-client-versions.sh --format=json
```

Closes #15

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-17 20:53:15 +01:00
Pieter
bf4659f662 feat: Implement client registry system (issue #12)
Add comprehensive client registry for tracking all deployed infrastructure:

Registry System:
- Single source of truth in clients/registry.yml
- Tracks status, server specs, versions, maintenance history
- Supports canary deployment workflow
- Automatic updates via deployment scripts

New Scripts:
- scripts/list-clients.sh: List/filter clients (table/json/csv/summary)
- scripts/client-status.sh: Detailed client info with health checks
- scripts/update-registry.sh: Manual registry updates

Updated Scripts:
- scripts/deploy-client.sh: Auto-updates registry on deploy
- scripts/rebuild-client.sh: Auto-updates registry on rebuild
- scripts/destroy-client.sh: Marks clients as destroyed

Documentation:
- docs/client-registry.md: Complete registry reference
- clients/README.md: Quick start guide

Status tracking: pending → deployed → maintenance → destroyed
Role support: canary (dev) and production clients

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-17 20:24:53 +01:00
Pieter
ac4187d041 feat: Automate SSH key and secrets generation in deployment scripts
Simplify client deployment workflow by automating SSH key generation and
secrets file creation. No more manual preparation steps!

## Changes

### Deploy Script Automation
**`scripts/deploy-client.sh`**:
- Auto-generates SSH key pair if missing (calls generate-client-keys.sh)
- Auto-creates secrets file from template if missing
- Opens SOPS editor for user to customize secrets
- Continues with deployment after setup complete

### Rebuild Script Automation
**`scripts/rebuild-client.sh`**:
- Same automation as deploy script
- Ensures SSH key and secrets exist before rebuild

### Documentation Updates
- **`README.md`** - Updated quick start workflow
- **`scripts/README.md`** - Updated script descriptions and examples

## Workflow: Before vs After

### Before (Manual)
```bash
# 1. Generate SSH key
./scripts/generate-client-keys.sh newclient

# 2. Create secrets file
cp secrets/clients/template.sops.yaml secrets/clients/newclient.sops.yaml
sops secrets/clients/newclient.sops.yaml

# 3. Add to terraform.tfvars
vim tofu/terraform.tfvars

# 4. Deploy
./scripts/deploy-client.sh newclient
```

### After (Automated)
```bash
# 1. Add to terraform.tfvars
vim tofu/terraform.tfvars

# 2. Deploy (everything else is automatic!)
./scripts/deploy-client.sh newclient
# Script automatically:
# - Generates SSH key if missing
# - Creates secrets file from template if missing
# - Opens editor for you to customize
# - Continues with deployment
```

## Benefits

 **Fewer manual steps**: 4 steps → 2 steps
 **Less error-prone**: Can't forget to generate SSH key
 **Better UX**: Script guides you through setup
 **Still flexible**: Can pre-create SSH key/secrets if desired
 **Idempotent**: Won't regenerate if already exists

## Backward Compatible

Existing workflows still work:
- If SSH key already exists, script uses it
- If secrets file already exists, script uses it
- Can still use generate-client-keys.sh manually if preferred

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-17 20:04:29 +01:00
Pieter
071ed083f7 feat: Implement per-client SSH key isolation
Resolves #14

Each client now gets a dedicated SSH key pair, ensuring that compromise
of one client server does not grant access to other client servers.

## Changes

### Infrastructure (OpenTofu)
- Replace shared `hcloud_ssh_key.default` with per-client `hcloud_ssh_key.client`
- Each client key read from `keys/ssh/<client_name>.pub`
- Server recreated with new key (dev server only, acceptable downtime)

### Key Management
- Created `keys/ssh/` directory for SSH keys
- Added `.gitignore` to protect private keys from git
- Generated ED25519 key pair for dev client
- Private key gitignored, public key committed

### Scripts
- **`scripts/generate-client-keys.sh`** - Generate SSH key pairs for clients
- Updated `scripts/deploy-client.sh` to check for client SSH key

### Documentation
- **`docs/ssh-key-management.md`** - Complete SSH key management guide
- **`keys/ssh/README.md`** - Quick reference for SSH keys directory

### Configuration
- Removed `ssh_public_key` variable from `variables.tf`
- Updated `terraform.tfvars` to remove shared SSH key reference
- Updated `terraform.tfvars.example` with new key generation instructions

## Security Improvements

 Client isolation: Each client has dedicated SSH key
 Granular rotation: Rotate keys per-client without affecting others
 Defense in depth: Minimize blast radius of key compromise
 Proper key storage: Private keys gitignored, backups documented

## Testing

-  Generated new SSH key for dev client
-  Applied OpenTofu changes (server recreated)
-  Tested SSH access: `ssh -i keys/ssh/dev root@78.47.191.38`
-  Verified key isolation: Old shared key removed from Hetzner

## Migration Notes

For existing clients:
1. Generate key: `./scripts/generate-client-keys.sh <client>`
2. Apply OpenTofu: `cd tofu && tofu apply` (will recreate server)
3. Deploy: `./scripts/deploy-client.sh <client>`

For new clients:
1. Generate key first
2. Deploy as normal

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-17 19:50:30 +01:00
Pieter
9571782382 fix: Restore Mailgun SMTP and Nextcloud OIDC integration
Fixes three critical regressions from previous deployment:

1. **Mailgun SMTP Credentials**
   - Added mailgun_api_key to secrets/shared.sops.yaml
   - Updated deploy.yml to load and merge shared secrets
   - Mailgun credentials now created automatically per client

2. **Nextcloud OIDC Integration**
   - OIDC provider creation now works (was timing issue)
   - "Login with Authentik" button restored on Nextcloud login

3. **Infrastructure Deployment**
   - Fixed deploy-client.sh to create full infrastructure (DNS + server)
   - Removed -target flag that caused incomplete deployments

Changes:
- ansible/playbooks/deploy.yml: Load shared secrets and merge into client_secrets
- secrets/shared.sops.yaml: Add Mailgun API key for all clients
- secrets/clients/dev.sops.yaml: Add dev client configuration
- scripts/deploy-client.sh: Apply full infrastructure without -target flag

All services now functional:
 Traefik reverse proxy with auto SSL
 Authentik SSO with email configuration
 Nextcloud with OIDC login and email
 Mailgun SMTP credentials (dev@mg.vrije.cloud)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-14 16:04:00 +01:00
Pieter
30b3b394a6 fix: Resolve Authentik email delivery issues
Fixed email FROM address formatting that was breaking Django's email parser.
The display name contained an '@' symbol which violated RFC 5322 format.

Changes:
- Fix Authentik email FROM address (remove @ from display name)
- Add Mailgun SMTP credential cleanup on server destruction
- Fix Mailgun delete task to use EU API endpoint
- Add cleanup playbook for graceful resource removal

This ensures:
✓ Recovery emails work immediately on new deployments
✓ SMTP credentials are automatically cleaned up when destroying servers
✓ Email configuration works correctly across all environments

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-13 09:52:23 +01:00
Pieter
bb41dbbbe3 security: Remove test script with exposed Authentik API token
GitGuardian detected high-entropy secret in test-oidc-provider.py.
This was a development/testing script with hardcoded credentials.

Actions taken:
1. Removed test-oidc-provider.py from repository
2. Token will be rotated in separate commit
3. Production deployment uses proper Ansible role with SOPS-encrypted secrets

The exposed token was only used for test environment and will be
rotated immediately.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-08 18:01:45 +01:00
Pieter
a5fe631717 feat: Complete Authentik SSO integration with automated OIDC setup
## Changes

### Identity Provider (Authentik)
-  Deployed Authentik 2025.10.3 as identity provider
-  Configured automatic bootstrap with admin account (akadmin)
-  Fixed OIDC provider creation with correct redirect_uris format
-  Added automated OAuth2/OIDC provider configuration for Nextcloud
-  API-driven provider setup eliminates manual configuration

### Nextcloud Configuration
-  Fixed reverse proxy header configuration (trusted_proxies)
-  Added missing database indices (fs_storage_path_prefix)
-  Ran mimetype migrations for proper file type handling
-  Verified PHP upload limits (16GB upload_max_filesize)
-  Configured OIDC integration with Authentik
-  "Login with Authentik" button auto-configured

### Automation Scripts
-  Added deploy-client.sh for automated client deployment
-  Added rebuild-client.sh for infrastructure rebuild
-  Added destroy-client.sh for cleanup
-  Full deployment now takes ~10-15 minutes end-to-end

### Documentation
-  Updated README with automated deployment instructions
-  Added SSO automation workflow documentation
-  Added automation status tracking
-  Updated project reference with Authentik details

### Technical Fixes
- Fixed Authentik API redirect_uris format (requires list of dicts with matching_mode)
- Fixed Nextcloud OIDC command (user_oidc:provider not user_oidc:provider:add)
- Fixed file lookup in Ansible (changed to slurp for remote files)
- Updated Traefik to v3.6 for Docker API 1.44 compatibility
- Improved error handling in app installation tasks

## Security
- All credentials stored in SOPS-encrypted secrets
- Trusted proxy configuration prevents IP spoofing
- Bootstrap tokens auto-generated and secured

## Result
Fully automated SSO deployment - no manual configuration required!

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-08 16:56:19 +01:00