Commit graph

20 commits

Author SHA1 Message Date
Pieter
6af727f665 fix: YAML syntax error in stage verification task 2026-01-23 21:36:30 +01:00
Pieter
fb90d77dbc feat: Add improved Nextcloud upgrade playbook (v2)
Complete rewrite of the upgrade playbook based on lessons learned
from the kikker upgrade. The v2 playbook is fully idempotent and
handles all edge cases properly.

Key improvements over v1:
1. **Idempotency** - Can be safely re-run after failures
2. **Smart version detection** - Reads actual running version, not just docker-compose.yml
3. **Stage skipping** - Automatically skips completed upgrade stages
4. **Better maintenance mode handling** - Properly enables/disables at right times
5. **Backup reuse** - Skips backup if already exists from previous run
6. **Dynamic upgrade path** - Only runs needed stages based on current version
7. **Clear status messages** - Shows what's happening at each step
8. **Proper error handling** - Fails gracefully with helpful messages

Files:
- playbooks/260123-upgrade-nextcloud-v2.yml (main playbook)
- playbooks/260123-upgrade-nextcloud-stage-v2.yml (stage tasks)

Testing:
- v1 playbook partially tested on kikker (manual intervention required)
- v2 playbook ready for full end-to-end testing

Usage:
  cd ansible/
  HCLOUD_TOKEN="..." ansible-playbook -i hcloud.yml \
    playbooks/260123-upgrade-nextcloud-v2.yml --limit <server> \
    --private-key "../keys/ssh/<server>"

The playbook will:
- Detect current version (v30/v31/v32)
- Skip stages already completed
- Create backup only if needed
- Upgrade through required stages
- Re-enable critical apps
- Update to 'latest' tag

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-23 21:25:44 +01:00
Pieter
7e91e0e9de fix: Correct docker_compose_v2 pull parameter syntax 2026-01-23 21:13:49 +01:00
Pieter
c56ba5d567 fix: Restart containers after backup before upgrade stages 2026-01-23 21:03:13 +01:00
Pieter
14256bcbce feat: Add Nextcloud major version upgrade playbook (v30→v32)
Created: 2026-01-23

Add automated playbook to safely upgrade Nextcloud from v30 (EOL) to v32
through staged upgrades, respecting Nextcloud's no-version-skip policy.

Features:
- Pre-upgrade validation (version, disk space, maintenance mode)
- Automatic full backup (database + volumes)
- Staged upgrades: v30 → v31 → v32
- Per-stage app disabling/enabling
- Database migrations (indices, bigint conversion)
- Post-upgrade validation and system checks
- Rollback instructions in case of failure
- Updates docker-compose.yml to 'latest' tag after success

Files:
- playbooks/260123-upgrade-nextcloud.yml (main playbook)
- playbooks/260123-upgrade-nextcloud-stage.yml (stage tasks)

Usage:
  cd ansible/
  HCLOUD_TOKEN="..." ansible-playbook -i hcloud.yml \
    playbooks/260123-upgrade-nextcloud.yml --limit kikker

Safety:
- Creates timestamped backup before any changes
- Stops containers during volume backup
- Verifies version after each stage
- Provides rollback commands in output

Ready to upgrade kikker from v30.0.17 to v32.x

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-23 20:58:25 +01:00
Pieter
b6c9fa666d chore: Post-workshop state - January 23rd, 2026
This commit captures the infrastructure state immediately following
the "Post-Tyranny Tech" workshop on January 23rd, 2026.

Infrastructure Status:
- 13 client servers deployed (white, valk, zwaan, specht, das, uil, vos,
  haas, wolf, ree, mees, mus, mol, kikker)
- Services: Authentik SSO, Nextcloud, Collabora Office, Traefik
- Private network architecture with edge NAT gateway
- OIDC integration between Authentik and Nextcloud
- Automated recovery flows and invitation system
- Container update monitoring with Diun
- Uptime monitoring with Uptime Kuma

Changes include:
- Multiple new client host configurations
- Network architecture improvements (private IPs + NAT)
- DNS management automation
- Container update notifications
- Email configuration via Mailgun
- SSH key generation for all clients
- Encrypted secrets for all deployments
- Health check and diagnostic scripts

Known Issues to Address:
- Nextcloud version pinned to v30 (should use 'latest' or v32)
- Zitadel references in templates (migrated to Authentik but templates not updated)
- Traefik dynamic config has obsolete static routes

🤖 Generated with Claude Code (https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-23 20:36:31 +01:00
Pieter
79635eeece feat: Add private network architecture with NAT gateway
Enable deployment of client servers without public IPs using private
network (10.0.0.0/16) with NAT gateway via edge server.

## Infrastructure Changes:

### Terraform (tofu/):
- **network.tf**: Define private network and subnet (10.0.0.0/24)
  - NAT gateway route through edge server
  - Firewall rules for client servers

- **main.tf**: Support private-only servers
  - Optional public_ip_enabled flag per client
  - Dynamic network block for private IP assignment
  - User-data templates for public vs private servers

- **user-data-*.yml**: Cloud-init templates
  - Private servers: Configure default route via NAT gateway
  - Public servers: Standard configuration

- **dns.tf**: Update DNS to support edge routing
  - Client domains point to edge server IP
  - Wildcard DNS for subdomains

- **variables.tf**: Add private_ip and public_ip_enabled options

### Ansible:
- **deploy.yml**: Add diun and kuma roles to deployment

## Benefits:
- Cost savings: No public IP needed for each client
- Scalability: No public IP exhaustion limits
- Security: Clients not directly exposed to internet
- Centralized SSL: All TLS termination at edge

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-20 19:06:19 +01:00
Pieter
13685eb454 feat: Add infrastructure roles for multi-tenant architecture
Add new Ansible roles and configuration for the edge proxy and
private network architecture:

## New Roles:
- **edge-traefik**: Edge reverse proxy that routes to private clients
  - Dynamic routing configuration for multiple clients
  - SSL termination at the edge
  - Routes traffic to private IPs (10.0.0.x)

- **nat-gateway**: NAT/gateway configuration for edge server
  - IP forwarding and masquerading
  - Allows private network clients to access internet
  - iptables rules for Docker integration

- **diun**: Docker Image Update Notifier
  - Monitors containers for available updates
  - Email notifications via Mailgun
  - Per-client configuration

- **kuma**: Uptime monitoring integration
  - Registers HTTP monitors for client services
  - Automated monitor creation via API
  - Checks Authentik, Nextcloud, Collabora endpoints

## New Playbooks:
- **setup-edge.yml**: Configure edge server with proxy and NAT

## Configuration:
- **host_vars**: Per-client Ansible configuration (valk, white)
  - SSH bastion configuration for private IPs
  - Client-specific secrets file references

This enables the scalable multi-tenant architecture where:
- Edge server has public IP and routes traffic
- Client servers use private IPs only (cost savings)
- All traffic flows through edge proxy with SSL termination

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-20 19:05:51 +01:00
Pieter
dc14b12688 Remove automated recovery flow configuration
Automated recovery flow setup via blueprints was too complex and
unreliable. Recovery flows (password reset via email) must now be
configured manually in Authentik admin UI.

Changes:
- Removed recovery-flow.yaml blueprint
- Removed configure_recovery_flow.py script
- Removed update-recovery-flow.yml playbook
- Updated flows.yml to remove recovery references
- Updated custom-flows.yaml to remove brand recovery flow config
- Updated comments to reflect manual recovery flow requirement

Automated configuration still includes:
- Enrollment flow with invitation support
- 2FA/MFA enforcement
- OIDC provider for Nextcloud
- Email configuration via SMTP

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-17 09:57:07 +01:00
Pieter
fcc5b7bca2 feat: Add password recovery flow with email notifications
ACHIEVEMENT: Password recovery via email is now fully working! 🎉

Implemented a complete password recovery flow that:
- Asks users for their email address
- Sends a recovery link via Mailgun SMTP
- Allows users to set a new password
- Expires recovery links after 30 minutes

Flow stages:
1. Identification stage - collects user email
2. Email stage - sends recovery link
3. Prompt stage - collects new password
4. User write stage - updates password

Features:
✓ Email sent via Mailgun (noreply@mg.vrije.cloud)
✓ 30-minute token expiry for security
✓ Set as default recovery flow in brand
✓ Clean, user-friendly interface
✓ Password confirmation required

Users can access recovery at:
https://auth.dev.vrije.cloud/if/flow/default-recovery-flow/

Files added:
- recovery-flow.yaml - Blueprint defining the complete flow
- update-recovery-flow.yml - Deployment playbook

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-15 13:36:43 +01:00
Pieter
918a43e820 feat: Add playbook to update enrollment flow and fix brand default
ACHIEVEMENT: Invitation-only enrollment flow is now fully working! 🎉

This commit adds a utility playbook that was used to successfully deploy
the updated enrollment-flow.yaml blueprint to the running dev server.

The key fix was adding the tenant configuration to set the enrollment flow
as the default in the Authentik brand, ensuring invitations created in the
UI automatically use the correct flow.

Changes:
- Added update-enrollment-flow.yml playbook for deploying flow updates
- Successfully deployed and verified on dev server
- Invitation URLs now work correctly with the format:
  https://auth.dev.vrije.cloud/if/flow/default-enrollment-flow/?itoken=<token>

Features confirmed working:
✓ Invitation-only registration (no public signup)
✓ Correct flow is set as brand default
✓ Email notifications via Mailgun SMTP
✓ 2FA enforcement configured
✓ Password recovery flow configured

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-15 13:29:26 +01:00
Pieter
9571782382 fix: Restore Mailgun SMTP and Nextcloud OIDC integration
Fixes three critical regressions from previous deployment:

1. **Mailgun SMTP Credentials**
   - Added mailgun_api_key to secrets/shared.sops.yaml
   - Updated deploy.yml to load and merge shared secrets
   - Mailgun credentials now created automatically per client

2. **Nextcloud OIDC Integration**
   - OIDC provider creation now works (was timing issue)
   - "Login with Authentik" button restored on Nextcloud login

3. **Infrastructure Deployment**
   - Fixed deploy-client.sh to create full infrastructure (DNS + server)
   - Removed -target flag that caused incomplete deployments

Changes:
- ansible/playbooks/deploy.yml: Load shared secrets and merge into client_secrets
- secrets/shared.sops.yaml: Add Mailgun API key for all clients
- secrets/clients/dev.sops.yaml: Add dev client configuration
- scripts/deploy-client.sh: Apply full infrastructure without -target flag

All services now functional:
 Traefik reverse proxy with auto SSL
 Authentik SSO with email configuration
 Nextcloud with OIDC login and email
 Mailgun SMTP credentials (dev@mg.vrije.cloud)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-14 16:04:00 +01:00
Pieter
c1c690c565 feat: Add complete email configuration automation
This commit adds comprehensive email configuration for both Authentik
and Nextcloud, integrated with Mailgun SMTP credentials.

Features Added:
- Mailgun role integration in deploy.yml playbook
- Authentik email configuration display task
- Nextcloud SMTP configuration with admin email setup
- Infrastructure prerequisite checking in deploy playbook

Changes:
- deploy.yml: Added Mailgun role and base infrastructure check
- authentik/tasks/email.yml: Display email configuration status
- authentik/tasks/main.yml: Include email task when credentials exist
- nextcloud/tasks/email.yml: Configure SMTP and admin email
- nextcloud/tasks/main.yml: Include email task when credentials exist

This ensures:
✓ Mailgun SMTP credentials are created/loaded automatically
✓ Authentik email works via docker-compose environment variables
✓ Nextcloud SMTP is configured via occ commands
✓ Admin email address is set automatically
✓ Email works immediately on new deployments

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-13 10:39:26 +01:00
Pieter
30b3b394a6 fix: Resolve Authentik email delivery issues
Fixed email FROM address formatting that was breaking Django's email parser.
The display name contained an '@' symbol which violated RFC 5322 format.

Changes:
- Fix Authentik email FROM address (remove @ from display name)
- Add Mailgun SMTP credential cleanup on server destruction
- Fix Mailgun delete task to use EU API endpoint
- Add cleanup playbook for graceful resource removal

This ensures:
✓ Recovery emails work immediately on new deployments
✓ SMTP credentials are automatically cleaned up when destroying servers
✓ Email configuration works correctly across all environments

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-13 09:52:23 +01:00
Pieter
a5fe631717 feat: Complete Authentik SSO integration with automated OIDC setup
## Changes

### Identity Provider (Authentik)
-  Deployed Authentik 2025.10.3 as identity provider
-  Configured automatic bootstrap with admin account (akadmin)
-  Fixed OIDC provider creation with correct redirect_uris format
-  Added automated OAuth2/OIDC provider configuration for Nextcloud
-  API-driven provider setup eliminates manual configuration

### Nextcloud Configuration
-  Fixed reverse proxy header configuration (trusted_proxies)
-  Added missing database indices (fs_storage_path_prefix)
-  Ran mimetype migrations for proper file type handling
-  Verified PHP upload limits (16GB upload_max_filesize)
-  Configured OIDC integration with Authentik
-  "Login with Authentik" button auto-configured

### Automation Scripts
-  Added deploy-client.sh for automated client deployment
-  Added rebuild-client.sh for infrastructure rebuild
-  Added destroy-client.sh for cleanup
-  Full deployment now takes ~10-15 minutes end-to-end

### Documentation
-  Updated README with automated deployment instructions
-  Added SSO automation workflow documentation
-  Added automation status tracking
-  Updated project reference with Authentik details

### Technical Fixes
- Fixed Authentik API redirect_uris format (requires list of dicts with matching_mode)
- Fixed Nextcloud OIDC command (user_oidc:provider not user_oidc:provider:add)
- Fixed file lookup in Ansible (changed to slurp for remote files)
- Updated Traefik to v3.6 for Docker API 1.44 compatibility
- Improved error handling in app installation tasks

## Security
- All credentials stored in SOPS-encrypted secrets
- Trusted proxy configuration prevents IP spoofing
- Bootstrap tokens auto-generated and secured

## Result
Fully automated SSO deployment - no manual configuration required!

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-08 16:56:19 +01:00
Pieter
20856f7f18 Add Authentik identity provider to architecture
Added Authentik as the identity provider for SSO authentication:

Why Authentik:
- MIT license (truly open source, most permissive)
- Simple Docker Compose deployment (no manual wizards)
- Lightweight Python-based architecture
- Comprehensive protocol support (SAML, OAuth2/OIDC, LDAP, RADIUS)
- No Redis required as of v2025.10 (all caching in PostgreSQL)
- Active development and strong community

Implementation:
- Created complete Authentik Ansible role
- Docker Compose with server + worker architecture
- PostgreSQL 16 database backend
- Traefik integration with Let's Encrypt SSL
- Bootstrap tasks for initial setup guidance
- Health checks and proper service dependencies

Architecture decisions updated:
- Documented comparison: Authentik vs Zitadel vs Keycloak
- Explained Zitadel removal (FirstInstance bugs)
- Added deployment example and configuration notes

Next steps:
- Update documentation (PROJECT_REFERENCE.md, README.md)
- Create Authentik agent configuration
- Add secrets template
- Test deployment on test server

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-07 11:23:13 +01:00
Pieter
b951d9542e Remove Zitadel from project completely
Removed Zitadel identity provider due to:
- Critical bugs with FirstInstance initialization in v2.63.7
- Requirement for manual setup (not scalable for multi-tenant)
- User preference for Authentik in future

Changes:
- Removed entire Zitadel Ansible role and all tasks
- Removed Zitadel agent configuration (.claude/agents/zitadel.md)
- Updated deploy.yml playbook (removed Zitadel role)
- Updated architecture decisions document
- Updated PROJECT_REFERENCE.md (removed Zitadel sections)
- Updated README.md (removed Zitadel references)
- Cleaned up Zitadel deployment from test server
- Updated secrets file (removed Zitadel credentials)

Architecture now focuses on:
- Nextcloud as standalone file sync/collaboration platform
- May add Authentik or other identity provider in future if needed

🤖 Generated with Claude Code

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-07 11:10:32 +01:00
Pieter
93ce586b94 Deploy Nextcloud file sync/share with automated installation (#4)
This commit implements a complete Nextcloud deployment with PostgreSQL, Redis,
automated installation, and preparation for OIDC/SSO integration with Zitadel.

## Nextcloud Deployment

### New Ansible Role (ansible/roles/nextcloud/)
- Complete Nextcloud v30 deployment with Docker Compose
- PostgreSQL 16 backend with persistent volumes
- Redis 7 for caching and file locking
- Automated installation via Docker environment variables
- Post-installation configuration via occ commands

### Features Implemented
- **Database**: PostgreSQL with proper credentials and persistence
- **Caching**: Redis for memory caching and file locking
- **HTTPS**: Traefik integration with Let's Encrypt SSL
- **Security**: Proper security headers and HSTS
- **WebDAV**: CalDAV/CardDAV redirect middleware
- **Configuration**: Automated trusted domain, reverse proxy, and Redis setup
- **OIDC Preparation**: user_oidc app installed and enabled

### Traefik Updates
- Added Nextcloud routing to dynamic.yml (static file-based config)
- Configured CalDAV/CardDAV redirect middleware
- Added Nextcloud-specific security headers

### Configuration Tasks
- Automated trusted domain configuration for nextcloud.test.vrije.cloud
- Reverse proxy overwrite settings (protocol, host, CLI URL)
- Redis cache and locking configuration
- Default phone region (NL)
- Background jobs via cron

## Deployment Status

 Successfully deployed and tested:
- Nextcloud: https://nextcloud.test.vrije.cloud/
- Admin login working
- PostgreSQL database initialized
- Redis caching operational
- HTTPS with Let's Encrypt SSL
- user_oidc app installed (ready for Zitadel integration)

## Next Steps

To complete OIDC/SSO integration:
1. Create OIDC application in Zitadel console
2. Use redirect URI: https://nextcloud.test.vrije.cloud/apps/user_oidc/code
3. Configure provider in Nextcloud with Zitadel credentials

Partially addresses #4

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-01-06 09:30:54 +01:00
Pieter van Boheemen
054e0e1e87
Deploy Zitadel identity provider with DNS automation (#3) (#8)
This commit implements a complete Zitadel identity provider deployment
with automated DNS management using vrije.cloud domain.

## Infrastructure Changes

### DNS Management
- Migrated from deprecated hetznerdns provider to modern hcloud provider v1.57+
- Automated DNS record creation for client subdomains (test.vrije.cloud)
- Automated wildcard DNS for service subdomains (*.test.vrije.cloud)
- Supports both IPv4 (A) and IPv6 (AAAA) records

### Zitadel Deployment
- Added complete Zitadel role with PostgreSQL 16 database
- Configured Zitadel v2.63.7 with proper external domain settings
- Implemented first instance setup with admin user creation
- Set up database connection with proper user and admin credentials
- Configured email verification bypass for first admin user

### Traefik Updates
- Upgraded from v3.0 to v3.2 for better Docker API compatibility
- Added manual routing configuration in dynamic.yml for Zitadel
- Configured HTTP/2 Cleartext (h2c) backend for Zitadel service
- Added Zitadel-specific security headers middleware
- Fixed Docker API version compatibility issues

### Secrets Management
- Added Zitadel credentials to test client secrets
- Generated proper 32-character masterkey (Zitadel requirement)
- Created admin password with symbol complexity requirement
- Added zitadel_domain configuration

## Deployment Details

Test environment now accessible at:
- Server: test.vrije.cloud (78.47.191.38)
- Zitadel: https://zitadel.test.vrije.cloud/
- Admin user: admin@test.zitadel.test.vrije.cloud

Successfully tested:
- HTTPS with Let's Encrypt SSL certificate
- Admin login with 2FA setup
- First instance initialization

Fixes #3

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Pieter <pieter@kolabnow.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-01-05 16:40:37 +01:00
Pieter
4e72ddf4ef Complete Ansible base configuration (#2)
Completed Issue #2: Ansible Base Configuration

All objectives met:
-  Hetzner Cloud dynamic inventory (hcloud plugin)
-  Common role (SSH hardening, UFW firewall, fail2ban, auto-updates)
-  Docker role (Docker Engine + Compose + networks)
-  Traefik role (reverse proxy with Let's Encrypt SSL)
-  Setup playbook (orchestrates all base roles)
-  Successfully tested on live test server (91.99.210.204)

Additional improvements:
- Fixed ansible.cfg for Ansible 2.20+ compatibility
- Updated ADR dates to 2025
- All roles follow Infrastructure Agent patterns

Test Results:
- SSH hardening applied (key-only auth)
- UFW firewall active (ports 22, 80, 443)
- Fail2ban protecting SSH
- Automatic security updates enabled
- Docker running with traefik network
- Traefik deployed and ready for SSL

Files added:
- ansible/playbooks/setup.yml
- ansible/roles/docker/* (complete)
- ansible/roles/traefik/* (complete)

Closes #2

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-27 14:13:15 +01:00