Complete implementation of automatic version tracking and drift detection: New Scripts: - scripts/collect-client-versions.sh: Query deployed versions from Docker - Connects via Ansible to running servers - Extracts versions from container images - Updates registry automatically - scripts/check-client-versions.sh: Compare versions across clients - Multiple formats: table (colorized), CSV, JSON - Filter by outdated versions - Highlights drift with color coding - scripts/detect-version-drift.sh: Identify version differences - Detects clients with outdated versions - Threshold-based staleness detection (default 30 days) - Actionable recommendations - Exit code 1 if drift detected (CI/monitoring friendly) Updated Scripts: - scripts/deploy-client.sh: Auto-collect versions after deployment - scripts/rebuild-client.sh: Auto-collect versions after rebuild Documentation: - docs/maintenance-tracking.md: Complete maintenance guide - Version management workflows - Security update procedures - Monitoring integration examples - Troubleshooting guide Features: ✅ Automatic version collection from deployed servers ✅ Multi-client version comparison reports ✅ Version drift detection with recommendations ✅ Integration with deployment workflows ✅ Export to CSV/JSON for external tools ✅ Canary-first update workflow support Usage Examples: ```bash # Collect versions ./scripts/collect-client-versions.sh dev # Compare all clients ./scripts/check-client-versions.sh # Detect drift ./scripts/detect-version-drift.sh # Export for monitoring ./scripts/check-client-versions.sh --format=json ``` Closes #15 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
12 KiB
Maintenance and Version Tracking
Comprehensive guide to tracking software versions, maintenance history, and detecting version drift across all deployed clients.
Overview
The infrastructure tracks:
- Software versions - Authentik, Nextcloud, Traefik, Ubuntu
- Maintenance dates - Last update, security patches, OS updates
- Version drift - Clients running different versions
- Update history - Audit trail of changes
All version and maintenance data is stored in clients/registry.yml.
Registry Structure
Each client tracks versions and maintenance:
clients:
myclient:
versions:
authentik: "2025.10.3"
nextcloud: "30.0.17"
traefik: "v3.0"
ubuntu: "24.04"
maintenance:
last_full_update: 2026-01-17
last_security_patch: 2026-01-17
last_os_update: 2026-01-17
last_backup_verified: null
Version Management Scripts
Collect Client Versions
Query actual deployed versions from a running server:
# Collect versions from dev client
./scripts/collect-client-versions.sh dev
This script:
- Connects to the server via Ansible
- Queries Docker container image tags
- Queries Ubuntu OS version
- Updates the registry automatically
Output:
Collecting versions for client: dev
Querying deployed versions...
Collecting Docker container versions...
✓ Versions collected
Collected versions:
Authentik: 2025.10.3
Nextcloud: 30.0.17
Traefik: v3.0
Ubuntu: 24.04
✓ Registry updated
Requirements:
- Server must be deployed and reachable
HCLOUD_TOKENenvironment variable set- Ansible configured with dynamic inventory
Check All Client Versions
Compare versions across all clients:
# Default: Table format with color coding
./scripts/check-client-versions.sh
# Export as CSV
./scripts/check-client-versions.sh --format=csv
# Export as JSON
./scripts/check-client-versions.sh --format=json
# Show only clients with outdated versions
./scripts/check-client-versions.sh --outdated
Table output:
═══════════════════════════════════════════════════════════════════════════════
CLIENT VERSION REPORT
═══════════════════════════════════════════════════════════════════════════════
CLIENT STATUS AUTHENTIK NEXTCLOUD TRAEFIK UBUNTU
──────────────────────────────────────────────────────────────────────────────────────────────
dev deployed 2025.10.3 30.0.17 v3.0 24.04
client1 deployed 2025.10.2 30.0.16 v3.0 24.04
Latest versions:
Authentik: 2025.10.3
Nextcloud: 30.0.17
Traefik: v3.0
Ubuntu: 24.04
Note: Red indicates outdated version
CSV output:
client,status,authentik,nextcloud,traefik,ubuntu,last_update,outdated
dev,deployed,2025.10.3,30.0.17,v3.0,24.04,2026-01-17,no
client1,deployed,2025.10.2,30.0.16,v3.0,24.04,2026-01-10,yes
JSON output:
{
"latest_versions": {
"authentik": "2025.10.3",
"nextcloud": "30.0.17",
"traefik": "v3.0",
"ubuntu": "24.04"
},
"clients": [
{
"name": "dev",
"status": "deployed",
"versions": {
"authentik": "2025.10.3",
"nextcloud": "30.0.17",
"traefik": "v3.0",
"ubuntu": "24.04"
},
"last_update": "2026-01-17",
"outdated": false
}
]
}
Detect Version Drift
Identify clients with outdated versions:
# Default: Check all deployed clients
./scripts/detect-version-drift.sh
# Check clients not updated in 30+ days
./scripts/detect-version-drift.sh --threshold=30
# Check specific application only
./scripts/detect-version-drift.sh --app=authentik
# Summary output for monitoring
./scripts/detect-version-drift.sh --format=summary
Output when drift detected:
⚠ VERSION DRIFT DETECTED
Clients with outdated versions:
• client1
Authentik: 2025.10.2 → 2025.10.3
Nextcloud: 30.0.16 → 30.0.17
• client2
Last update: 2025-12-15 (>30 days ago)
Recommended actions:
1. Test updates on canary server first:
./scripts/rebuild-client.sh dev
2. Verify canary health:
./scripts/client-status.sh dev
3. Update outdated clients:
./scripts/rebuild-client.sh client1
./scripts/rebuild-client.sh client2
Exit codes:
0- No drift detected (all clients up to date)1- Drift detected (action needed)2- Error (script failure)
Summary format (useful for monitoring):
Status: DRIFT DETECTED
Drift: Yes
Clients checked: 5
Clients with outdated versions: 2
Clients not updated in 30 days: 1
Affected clients: client1 client2
Automatic Version Collection
Version collection is automatically performed after deployments:
On New Deployment
./scripts/deploy-client.sh myclient:
- Provisions infrastructure
- Deploys applications
- Updates registry with server info
- Collects and records versions ← Automatic
On Rebuild
./scripts/rebuild-client.sh myclient:
- Destroys old infrastructure
- Provisions new infrastructure
- Deploys applications
- Updates registry
- Collects and records versions ← Automatic
If automatic collection fails (server not ready, network issue):
⚠ Could not collect versions automatically
Run manually later: ./scripts/collect-client-versions.sh myclient
Maintenance Workflows
Security Update Workflow
-
Check current state
./scripts/check-client-versions.sh -
Update canary first (dev server)
./scripts/rebuild-client.sh dev -
Verify canary
# Check health ./scripts/client-status.sh dev # Verify versions updated ./scripts/collect-client-versions.sh dev -
Detect drift (identify outdated clients)
./scripts/detect-version-drift.sh -
Roll out to production
# Update each client ./scripts/rebuild-client.sh client1 ./scripts/rebuild-client.sh client2 # Or batch update (be careful!) for client in $(./scripts/list-clients.sh --role=production --format=csv | tail -n +2 | cut -d, -f1); do ./scripts/rebuild-client.sh "$client" sleep 300 # Wait 5 minutes between updates done -
Verify all updated
./scripts/detect-version-drift.sh
Monthly Maintenance Check
Run these checks monthly:
# 1. Version report
./scripts/check-client-versions.sh > reports/versions-$(date +%Y-%m).txt
# 2. Drift detection
./scripts/detect-version-drift.sh --threshold=30
# 3. Client health
for client in $(./scripts/list-clients.sh --status=deployed --format=csv | tail -n +2 | cut -d, -f1); do
./scripts/client-status.sh "$client"
done
Update Maintenance Dates
Deployment scripts automatically update last_full_update. For other maintenance:
# After security patches (OS level)
yq eval -i ".clients.myclient.maintenance.last_security_patch = \"$(date +%Y-%m-%d)\"" clients/registry.yml
# After OS updates
yq eval -i ".clients.myclient.maintenance.last_os_update = \"$(date +%Y-%m-%d)\"" clients/registry.yml
# After backup verification
yq eval -i ".clients.myclient.maintenance.last_backup_verified = \"$(date +%Y-%m-%d)\"" clients/registry.yml
# Commit changes
git add clients/registry.yml
git commit -m "chore: Update maintenance dates"
git push
Integration with Monitoring
Continuous Drift Detection
Set up a cron job or CI pipeline:
#!/bin/bash
# check-drift.sh - Run daily
cd /path/to/infrastructure
# Check for drift
if ! ./scripts/detect-version-drift.sh --format=summary; then
# Send alert (Slack, email, etc.)
./scripts/detect-version-drift.sh | mail -s "Version Drift Detected" ops@example.com
fi
Export for External Tools
# Export version data as JSON for monitoring tools
./scripts/check-client-versions.sh --format=json > /var/monitoring/client-versions.json
# Export drift status
./scripts/detect-version-drift.sh --format=summary > /var/monitoring/drift-status.txt
Prometheus Metrics
Convert to Prometheus format:
#!/bin/bash
# export-metrics.sh
# Count clients by drift status
total=$(./scripts/list-clients.sh --status=deployed --format=csv | tail -n +2 | wc -l)
outdated=$(./scripts/check-client-versions.sh --format=csv --outdated | tail -n +2 | wc -l)
uptodate=$((total - outdated))
echo "# HELP clients_total Total number of deployed clients"
echo "# TYPE clients_total gauge"
echo "clients_total $total"
echo "# HELP clients_outdated Number of clients with outdated versions"
echo "# TYPE clients_outdated gauge"
echo "clients_outdated $outdated"
echo "# HELP clients_uptodate Number of clients with latest versions"
echo "# TYPE clients_uptodate gauge"
echo "clients_uptodate $uptodate"
Version Pinning
To prevent automatic updates, pin versions in Ansible roles:
# roles/authentik/defaults/main.yml
authentik_version: "2025.10.3" # Pinned version
# To update:
# 1. Change pinned version
# 2. Update canary: ./scripts/rebuild-client.sh dev
# 3. Verify and roll out
Troubleshooting
Version Collection Fails
Problem: collect-client-versions.sh cannot reach server
Solutions:
-
Check server is deployed and running:
./scripts/client-status.sh myclient -
Verify HCLOUD_TOKEN is set:
echo $HCLOUD_TOKEN -
Test Ansible connectivity:
cd ansible ansible -i hcloud.yml myclient -m ping -
Check Docker containers are running:
ansible -i hcloud.yml myclient -m shell -a "docker ps"
Incorrect Version Reported
Problem: Registry shows wrong version
Solutions:
-
Re-collect versions manually:
./scripts/collect-client-versions.sh myclient -
Verify Docker images:
ansible -i hcloud.yml myclient -m shell -a "docker images" -
Check container inspect:
ansible -i hcloud.yml myclient -m shell -a "docker inspect authentik-server | jq '.[0].Config.Image'"
Version Drift False Positives
Problem: Drift detected for canary with intentionally different version
Solution: Use --app filter to check specific applications:
# Check only production-critical apps
./scripts/detect-version-drift.sh --app=authentik
./scripts/detect-version-drift.sh --app=nextcloud
Best Practices
-
Always test on canary first
- Update
devclient before production - Verify health before wider rollout
- Update
-
Stagger production updates
- Don't update all clients simultaneously
- Wait 5-10 minutes between updates
- Monitor each update for issues
-
Track maintenance in registry
- Keep
last_full_updatecurrent - Record
last_security_patchdates - Document backup verification
- Keep
-
Regular drift checks
- Run weekly:
detect-version-drift.sh - Address drift within 7 days
- Maintain version consistency
- Run weekly:
-
Document version changes
- Add notes to registry when pinning versions
- Commit registry changes with descriptive messages
- Track major version upgrades separately
-
Automate reporting
- Export weekly version reports
- Alert on drift detection
- Dashboard for version overview
Related Documentation
- Client Registry - Registry system overview
- Deployment Guide - Deployment procedures
- SSH Key Management - Security and access