Add documentation and license, remove IDE files

Added comprehensive project documentation and MIT license. Removed Claude IDE configuration files from repository tracking. Documentation added: - FINDINGS.md: Complete methodology report and research findings - 159 accounts tracked, 15,190 posts collected (Jan 1 - Mar 30) - Human review results: 40.4% correct, 59.6% false positives - AI toxicity detection limitations and recommendations - OPERATIONS.md: Complete operations and maintenance guide - Service start/stop procedures - Database operations and queries - Configuration options - Troubleshooting guide - Data export instructions License: - Added MIT License to README.md - Copyright 2026 Post X Society - Open source with permissive license Repository cleanup: - Added .claude/ to .gitignore - Removed .claude/settings.local.json from tracking - Prevents IDE-specific files from being committed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-30 14:39:11 +02:00 · 2026-03-30 14:39:11 +02:00 · 1c3f57d7e5
commit 1c3f57d7e5
parent 0495f47c13
5 changed files with 771 additions and 9 deletions
--- a/.claude/settings.local.json
+++ b/.claude/settings.local.json
@ -1,9 +0,0 @@
-{
-  "permissions": {
-    "allow": [
-      "Bash(docker compose:*)"
-    ],
-    "deny": [],
-    "ask": []
-  }
-}
--- a/.gitignore
+++ b/.gitignore
@ -30,6 +30,7 @@ Thumbs.db
 # IDE
 .vscode/
 .idea/
+.claude/
 *.swp
 *.swo
 *~
--- a/FINDINGS.md
+++ b/FINDINGS.md
@ -0,0 +1,203 @@
+# Bluesky Toxicity Analysis - Main Findings
+
+## Study Overview
+**Period:** January 1 – March 30, 2026 (89 days)
+**Monitored Accounts:** 159 Dutch political accounts
+**Total Posts Collected:** 15,190 posts
+
+---
+
+## 1. Data Collection Summary
+
+### Content Distribution
+- **Primary Content (by tracked accounts):**
+  - Original Posts: 3,032
+  - Replies: 3,652
+  - **Total Primary:** 6,684 posts
+
+- **Secondary Content (mentions of tracked accounts):**
+  - Unique Mention Posts: 8,506
+  - Note: Posts mentioning multiple tracked accounts counted once
+
+### Total Dataset
+- **Combined Content:** 15,190 posts
+- **Collection Method:** Automated via Bluesky Public API (every 4 hours)
+- **Infrastructure:** Docker containers with PostgreSQL database
+
+---
+
+## 2. Toxicity Detection Results
+
+### AI Model Performance
+- **Model Used:** OpenAI GPT-4.1-nano
+- **Classification Categories:** 12 toxicity dimensions
+- **Flagging Threshold:** Overall toxicity score ≥ 0.5 (50%)
+
+### Flagged Content
+- **Primary Content (Posts/Replies):** 97 posts flagged
+- **Secondary Content (Mentions):** 413 unique posts flagged
+- **Total Flagged:** 510 unique posts
+
+### Distribution Insight
+- 81% of flagged content came from mentions (external users → politicians)
+- 19% of flagged content came from politicians themselves
+- External users directed significantly more toxic content toward politicians than politicians produced
+
+---
+
+## 3. Human Review Results
+
+### Review Completion
+- **Total Items Reviewed:** 510 posts (100% of flagged content)
+- **Review Period:** January 1 – March 30, 2026
+- **Review Interface:** Custom web application with ✓/✗/? buttons
+
+### Validation Results
+
+#### Primary Content (Posts/Replies by Politicians)
+| Status | Count | Percentage |
+|--------|-------|------------|
+| ✓ Correctly Flagged | 32 | 33.0% |
+| ✗ Incorrectly Flagged | 65 | 67.0% |
+| ? Unsure | 0 | 0.0% |
+| **Total** | **97** | **100%** |
+
+#### Secondary Content (Mentions of Politicians)
+| Status | Count | Percentage |
+|--------|-------|------------|
+| ✓ Correctly Flagged | 174 | 42.1% |
+| ✗ Incorrectly Flagged | 239 | 57.9% |
+| ? Unsure | 0 | 0.0% |
+| **Total** | **413** | **100%** |
+
+#### Combined Results
+| Status | Count | Percentage |
+|--------|-------|------------|
+| ✓ Correctly Flagged | 206 | 40.4% |
+| ✗ Incorrectly Flagged | 304 | 59.6% |
+| ? Unsure | 0 | 0.0% |
+| **Total** | **510** | **100%** |
+
+---
+
+## 4. Key Findings
+
+### 4.1 High False Positive Rate
+- **Overall False Positive Rate: 59.6%**
+- The AI model over-flagged content, with nearly 6 out of 10 flagged items being false positives
+- Primary content had worse performance (67.0% false positives) than mentions (57.9%)
+
+### 4.2 Model Limitations Identified
+1. **Threshold Sensitivity:** The 0.5 threshold appears too low for Dutch political discourse
+2. **Context Misinterpretation:** Strong policy language, political criticism, and satire frequently misclassified as toxic
+3. **Cultural/Linguistic Gaps:** Dutch political communication patterns may not align with model training data
+4. **Nuance Detection:** Difficulty distinguishing between heated but legitimate debate and actual toxicity
+
+### 4.3 Directional Toxicity Pattern
+- External mentions (8,506 posts) generated **413 flagged items** (4.9% flagging rate)
+- Primary content (6,684 posts) generated **97 flagged items** (1.5% flagging rate)
+- Politicians receive approximately **3× more toxic content** than they produce (by flagging rate)
+- However, after human review, both sources showed high false positive rates
+
+### 4.4 Accuracy Comparison
+- **Mentions accuracy:** 42.1% (slightly better)
+- **Primary content accuracy:** 33.0% (worse)
+- Neither content type achieved acceptable accuracy for automated moderation
+- Possible explanation: Politicians' language more frequently uses strong policy terms that trigger false positives
+
+---
+
+## 5. Implications for Automated Moderation
+
+### What This Study Reveals
+1. **AI Cannot Replace Human Judgment:** 59.6% false positive rate makes unsupervised automation dangerous
+2. **Threshold Optimization Needed:** Current 0.5 threshold too aggressive; may need 0.7+ for political content
+3. **Domain-Specific Training Required:** Political discourse needs specialized models or fine-tuning
+4. **Human-in-the-Loop Essential:** Automated flagging useful for triage, but human review mandatory
+
+### Recommended Approach
+- Use AI toxicity detection as **first-pass screening only**
+- Require human review for all flagged content before action
+- Consider higher thresholds (0.7–0.8) for political accounts
+- Train domain-specific models on Dutch political discourse
+- Implement appeals process for false positives
+
+---
+
+## 6. Technical Implementation Success
+
+### What Worked Well
+1. **Automated Collection:** 4-hour collection cycles captured comprehensive dataset
+2. **Human Review Interface:** Web UI with ✓/✗/? buttons efficient for manual validation
+3. **Date Filtering:** Allowed focused analysis of specific time periods
+4. **Engagement Metrics:** Successfully captured likes, replies, reposts, quotes for mentions
+5. **Deduplication Logic:** Properly handled posts mentioning multiple tracked accounts
+
+### Infrastructure Performance
+- **Uptime:** 99%+ (only brief scheduler issue Feb 23-24)
+- **Data Integrity:** PostgreSQL database handled 15K+ posts without issues
+- **Analysis Throughput:** GPT-4.1-nano processed all content efficiently
+- **Web Interface:** Responsive UI for 500+ manual reviews
+
+---
+
+## 7. Study Limitations
+
+1. **Single Model Used:** Only tested GPT-4.1-nano; ensemble approaches not evaluated
+2. **No Inter-Rater Reliability:** Single human reviewer; no validation of review consistency
+3. **Limited Context:** Dutch political context; findings may not generalize to other domains
+4. **Arbitrary Threshold:** 0.5 threshold not scientifically optimized
+5. **Limited Time Period:** 3-month window may not capture seasonal variations in discourse
+6. **No Appeal Process:** No mechanism for accounts to contest flagging decisions
+
+---
+
+## 8. Recommendations for Future Work
+
+### Short-Term Improvements
+1. **Threshold Optimization:** Test 0.6, 0.7, 0.8 thresholds and measure precision/recall
+2. **Category-Specific Tuning:** Different thresholds for different toxicity categories
+3. **Context Windows:** Analyze conversation threads, not isolated posts
+4. **Multi-Model Validation:** Test other models (Perspective API, custom fine-tuned models)
+
+### Long-Term Research
+1. **Dutch Political Corpus:** Create labeled training dataset for Dutch political discourse
+2. **Fine-Tune Models:** Train specialized classifiers on validated Dutch political content
+3. **Longitudinal Study:** Track patterns over election cycles and major events
+4. **Cross-Platform Analysis:** Compare Bluesky toxicity patterns with Twitter/X, Mastodon
+5. **Inter-Rater Reliability Study:** Multiple reviewers to validate human judgment consistency
+
+---
+
+## 9. Data Access
+
+### Database Content (as of March 30, 2026)
+- **Accounts Table:** 159 tracked political accounts
+- **Posts Table:** 6,684 posts and replies
+- **Mentions Table:** 8,506 unique mention posts
+- **Toxicity Scores:** 6,684 scored primary posts
+- **Mention Toxicity Scores:** 8,506 scored mentions
+- **Human Reviews:** 510 manual validations
+
+### Exported Datasets Available
+- Full post content with toxicity scores
+- Human review decisions with timestamps
+- Engagement metrics (likes, replies, reposts, quotes)
+- Time-series data for trend analysis
+
+---
+
+## 10. Conclusion
+
+This study demonstrates that while AI-powered toxicity detection can **identify potential concerns** in large-scale social media content, it **cannot reliably moderate** without substantial human oversight. The 59.6% false positive rate indicates current models are not suitable for automated enforcement in political discourse contexts.
+
+**Key Takeaway:** AI toxicity detection is a useful **triage tool** for human moderators, not a replacement for human judgment. Political discourse requires nuanced understanding of context, satire, and legitimate critique that current AI models cannot consistently provide.
+
+**Project Status:** Data collection complete. Web interface remains available for analysis and reporting. Database preserved for future research.
+
+---
+
+**Generated:** March 30, 2026
+**Study Period:** January 1 – March 30, 2026
+**Monitored Platform:** Bluesky Social Network
+**Geographic Focus:** Dutch Political Discourse
--- a/OPERATIONS.md
+++ b/OPERATIONS.md
@ -0,0 +1,543 @@
+# Bluesky Collector - Operations Guide
+
+## Quick Reference
+
+### Current Status (March 30, 2026)
+- **Collector:** ❌ STOPPED (data collection complete)
+- **Scheduler:** ❌ STOPPED (no further automated runs)
+- **Web Interface:** ✅ RUNNING (http://localhost:5001)
+- **Database:** ✅ RUNNING (PostgreSQL on port 5433)
+
+---
+
+## Starting and Stopping Services
+
+### View Current Service Status
+```bash
+cd /Users/pieter/Nextcloud-Hetzner/PXS\ Cloud/Projects/26004\ HEIO\ 2/04\ Applications/bluesky-collector
+docker compose ps
+```
+
+### Start All Services
+```bash
+docker compose up -d
+```
+
+This starts:
+- `db` - PostgreSQL database (port 5433)
+- `web` - Web interface (port 5001)
+- `collector` - Data collection service
+- `scheduler` - Automated collection scheduler (runs every 4 hours)
+
+### Stop Collection Only (Keep Web Interface)
+```bash
+docker compose stop scheduler collector
+```
+
+This configuration allows browsing collected data without gathering new content.
+
+### Start Collection Services
+```bash
+docker compose start scheduler collector
+```
+
+### Stop All Services
+```bash
+docker compose down
+```
+
+**Warning:** This will stop the web interface and database. Data is preserved in Docker volumes.
+
+### Stop and Remove Everything (Including Data)
+```bash
+docker compose down -v
+```
+
+**⚠️ DANGER:** This deletes all collected data permanently!
+
+---
+
+## Service Details
+
+### Database (PostgreSQL)
+- **Image:** `postgres:16-alpine`
+- **Port:** 5433 (external) → 5432 (internal)
+- **Data Volume:** `pgdata`
+- **Access:**
+  ```bash
+  docker compose exec db psql -U bluesky -d bluesky
+  ```
+
+### Web Interface
+- **URL:** http://localhost:5001
+- **Port:** 5001
+- **Stack:** Flask + Gunicorn
+- **Pages:**
+  - `/` - Dashboard with collection stats
+  - `/accounts` - Account toxicity summary
+  - `/statuses` - Posts and replies browser
+  - `/mentions` - Mentions browser
+  - `/analysis` - Toxicity analysis overview
+  - `/analysis/flagged` - Flagged content with human review
+  - `/export` - Data export options
+
+### Collector Service
+- **Schedule:** Every 4 hours (00:00, 04:00, 08:00, 12:00, 16:00, 20:00)
+- **Function:** Collects new posts and mentions from Bluesky API
+- **Logs:**
+  ```bash
+  docker compose logs -f collector
+  ```
+
+### Scheduler Service
+- **Image:** `mcuadros/ofelia`
+- **Function:** Triggers collector and analyzer jobs on schedule
+- **Jobs:**
+  - `collect` - Runs at 0 minutes past every 4th hour
+  - `analyze` - Runs at 30 minutes past every 4th hour
+- **Logs:**
+  ```bash
+  docker compose logs -f scheduler
+  ```
+
+---
+
+## Manual Operations
+
+### Run Manual Collection
+```bash
+docker compose exec collector python -m src
+```
+
+Collects posts and mentions immediately (outside of schedule).
+
+### Run Manual Analysis
+```bash
+docker compose exec collector python -m src.analyzer
+```
+
+Analyzes all unscored posts/mentions using OpenAI API.
+
+**Cost Warning:** Analysis incurs OpenAI API costs. Check batch size settings.
+
+### Analyze Specific Batch Size
+```bash
+docker compose exec collector python -m src.analyzer --batch-size 50 --limit 100
+```
+
+Options:
+- `--batch-size N` - Number of posts per API call (default: 10)
+- `--limit N` - Maximum posts to analyze (default: 0 = unlimited)
+- `--concurrency N` - Parallel API requests (default: 3)
+
+### View Recent Logs
+```bash
+# All services
+docker compose logs --tail 100
+
+# Specific service
+docker compose logs --tail 50 collector
+docker compose logs --tail 50 web
+
+# Follow logs in real-time
+docker compose logs -f collector
+```
+
+---
+
+## Database Operations
+
+### Access Database Shell
+```bash
+docker compose exec db psql -U bluesky -d bluesky
+```
+
+### Common Queries
+
+#### Check Collection Status
+```sql
+SELECT
+    started_at::date as date,
+    COUNT(*) as runs,
+    SUM(posts_collected) as total_posts,
+    SUM(mentions_collected) as total_mentions,
+    SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as successful
+FROM collection_runs
+WHERE started_at >= '2026-01-01'
+GROUP BY started_at::date
+ORDER BY date DESC;
+```
+
+#### Count Flagged Content
+```sql
+-- Posts/Replies
+SELECT COUNT(*) FROM toxicity_scores WHERE overall >= 0.5;
+
+-- Mentions (unique posts)
+SELECT COUNT(DISTINCT m.post_uri)
+FROM mention_toxicity_scores mts
+JOIN mentions m ON m.id = mts.mention_id
+WHERE mts.overall >= 0.5;
+```
+
+#### Human Review Progress
+```sql
+SELECT
+    CASE
+        WHEN review_status IS NULL THEN 'Unreviewed'
+        ELSE review_status
+    END as status,
+    COUNT(*) as count
+FROM toxicity_scores
+WHERE overall >= 0.5
+GROUP BY review_status;
+```
+
+### Backup Database
+```bash
+docker compose exec db pg_dump -U bluesky bluesky > backup_$(date +%Y%m%d).sql
+```
+
+### Restore Database
+```bash
+cat backup_20260330.sql | docker compose exec -T db psql -U bluesky -d bluesky
+```
+
+---
+
+## Rebuilding Services
+
+### Rebuild After Code Changes
+```bash
+# Rebuild specific service
+docker compose build web
+docker compose build collector
+
+# Rebuild and restart
+docker compose up -d --build web
+
+# Rebuild everything
+docker compose build
+docker compose up -d
+```
+
+### Apply Database Migrations
+```bash
+# View available migrations
+ls scripts/*.sql
+
+# Apply specific migration
+docker compose exec -T db psql -U bluesky -d bluesky < scripts/04-human-review.sql
+```
+
+---
+
+## Configuration
+
+### Environment Variables (.env file)
+```bash
+# Database
+POSTGRES_USER=bluesky
+POSTGRES_PASSWORD=changeme
+POSTGRES_PORT=5433
+
+# Web Interface
+WEB_PORT=5001
+
+# Bluesky API (for authenticated search)
+BSKY_HANDLE=your-handle.bsky.social
+BSKY_APP_PASSWORD=your-app-password
+
+# OpenAI API (for toxicity analysis)
+OPENAI_API_KEY=sk-...
+
+# Analysis Settings
+ANALYZER_MODEL=gpt-4.1-nano
+ANALYZER_CONCURRENCY=3
+ANALYZER_BATCH_SIZE=10
+ANALYZER_LIMIT=0
+
+# Collection Settings
+MAX_PAGES_PER_ACCOUNT=50
+MENTION_LOOKBACK_HOURS=12
+LOG_LEVEL=INFO
+```
+
+### Tracked Accounts (config/accounts.yml)
+```yaml
+accounts:
+  - handle: example.bsky.social  # Account to monitor
+  - handle: another.bsky.social
+```
+
+Add or remove accounts, then restart collector:
+```bash
+docker compose restart collector
+```
+
+---
+
+## Troubleshooting
+
+### Web Interface Not Loading
+```bash
+# Check if web service is running
+docker compose ps web
+
+# Check web logs for errors
+docker compose logs --tail 50 web
+
+# Restart web service
+docker compose restart web
+```
+
+### Collector Not Running
+```bash
+# Check scheduler is running
+docker compose ps scheduler
+
+# Check collector status
+docker compose ps collector
+
+# Start scheduler if stopped
+docker compose start scheduler
+
+# Check scheduler logs
+docker compose logs scheduler
+```
+
+### Database Connection Issues
+```bash
+# Check database health
+docker compose ps db
+
+# Restart database
+docker compose restart db
+
+# Check database logs
+docker compose logs db
+```
+
+### Out of Disk Space
+```bash
+# Check Docker disk usage
+docker system df
+
+# Remove unused images/containers
+docker system prune
+
+# Check database size
+docker compose exec db psql -U bluesky -d bluesky -c "SELECT pg_size_pretty(pg_database_size('bluesky'));"
+```
+
+### Analysis Failing (OpenAI API)
+```bash
+# Check API key is set
+docker compose exec collector printenv | grep OPENAI_API_KEY
+
+# Test API connectivity
+docker compose exec collector python -c "from openai import OpenAI; OpenAI(api_key='$OPENAI_API_KEY').models.list()"
+
+# Check rate limits in logs
+docker compose logs collector | grep -i "rate limit"
+```
+
+---
+
+## Performance Tuning
+
+### Increase Collection Speed
+Edit `docker-compose.yml`:
+```yaml
+environment:
+  MAX_PAGES_PER_ACCOUNT: 100  # Increase from 50
+  MENTION_LOOKBACK_HOURS: 24  # Increase lookback
+```
+
+### Increase Analysis Speed
+```yaml
+environment:
+  ANALYZER_CONCURRENCY: 5     # More parallel requests
+  ANALYZER_BATCH_SIZE: 20     # Bigger batches
+```
+
+**Cost Warning:** Higher concurrency and batch size = higher OpenAI API costs.
+
+### Change Collection Schedule
+Edit `docker-compose.yml` under collector labels:
+```yaml
+labels:
+  ofelia.job-exec.collect.schedule: "0 0 */2 * * *"  # Every 2 hours
+  ofelia.job-exec.analyze.schedule: "0 30 */2 * * *" # 30 min after collection
+```
+
+Restart scheduler after changes:
+```bash
+docker compose restart scheduler
+```
+
+---
+
+## Data Export
+
+### Export to CSV via Web Interface
+1. Navigate to http://localhost:5001/export
+2. Select date range and filters
+3. Click "Export to CSV"
+
+### Export via Command Line
+
+#### All Posts
+```bash
+docker compose exec db psql -U bluesky -d bluesky -c "COPY (
+  SELECT p.uri, p.author_did, a.handle, p.text, p.created_at, p.post_type,
+         ts.overall, ts.toxic, ts.hate_speech, ts.threat
+  FROM posts p
+  LEFT JOIN accounts a ON a.did = p.author_did
+  LEFT JOIN toxicity_scores ts ON ts.uri = p.uri
+  WHERE p.created_at >= '2026-01-01'
+) TO STDOUT CSV HEADER" > posts_export.csv
+```
+
+#### Flagged Content with Reviews
+```bash
+docker compose exec db psql -U bluesky -d bluesky -c "COPY (
+  SELECT p.uri, a.handle, p.text, p.created_at,
+         ts.overall, ts.human_reviewed, ts.review_status, ts.reviewed_at
+  FROM toxicity_scores ts
+  JOIN posts p ON p.uri = ts.uri
+  LEFT JOIN accounts a ON a.did = p.author_did
+  WHERE ts.overall >= 0.5 AND p.created_at >= '2026-01-01'
+  ORDER BY ts.overall DESC
+) TO STDOUT CSV HEADER" > flagged_export.csv
+```
+
+---
+
+## Restarting Data Collection (If Needed)
+
+### Resume Collection After Pause
+1. Start services:
+   ```bash
+   docker compose start scheduler collector
+   ```
+
+2. Verify collection runs:
+   ```bash
+   docker compose logs -f collector
+   ```
+
+3. Check database for new entries:
+   ```bash
+   docker compose exec db psql -U bluesky -d bluesky -c "
+     SELECT MAX(created_at) FROM posts;
+     SELECT COUNT(*) FROM collection_runs WHERE started_at > NOW() - INTERVAL '1 day';
+   "
+   ```
+
+### Start Fresh Collection (Keep Database)
+1. Stop services:
+   ```bash
+   docker compose down
+   ```
+
+2. Start only database and web:
+   ```bash
+   docker compose up -d db web
+   ```
+
+3. Truncate collection tracking (optional):
+   ```bash
+   docker compose exec db psql -U bluesky -d bluesky -c "TRUNCATE collection_runs;"
+   ```
+
+4. Start collector:
+   ```bash
+   docker compose up -d scheduler collector
+   ```
+
+### Complete Reset (Delete All Data)
+```bash
+# Stop everything
+docker compose down
+
+# Remove data volume
+docker volume rm bluesky-collector_pgdata
+
+# Restart from scratch
+docker compose up -d
+```
+
+**⚠️ WARNING:** This deletes all collected posts, mentions, and analysis results permanently!
+
+---
+
+## Monitoring
+
+### Collection Health Check
+```bash
+# Last 5 collection runs
+docker compose exec db psql -U bluesky -d bluesky -c "
+  SELECT started_at, finished_at, status, posts_collected, mentions_collected, errors
+  FROM collection_runs
+  ORDER BY started_at DESC
+  LIMIT 5;
+"
+```
+
+### Analysis Progress
+```bash
+# Count scored vs unscored
+docker compose exec db psql -U bluesky -d bluesky -c "
+  SELECT
+    (SELECT COUNT(*) FROM posts) as total_posts,
+    (SELECT COUNT(*) FROM toxicity_scores) as scored_posts,
+    (SELECT COUNT(*) FROM mentions) as total_mentions,
+    (SELECT COUNT(*) FROM mention_toxicity_scores) as scored_mentions;
+"
+```
+
+### Disk Usage
+```bash
+# Database size
+docker compose exec db psql -U bluesky -d bluesky -c "
+  SELECT
+    pg_size_pretty(pg_database_size('bluesky')) as db_size,
+    pg_size_pretty(pg_total_relation_size('posts')) as posts_table,
+    pg_size_pretty(pg_total_relation_size('mentions')) as mentions_table;
+"
+```
+
+---
+
+## Security Notes
+
+1. **Never commit .env file** - Contains API keys and passwords
+2. **Change default passwords** - PostgreSQL default password is `changeme`
+3. **Firewall rules** - Ports 5001 (web) and 5433 (database) exposed to localhost only
+4. **API keys** - Bluesky and OpenAI credentials stored in environment variables
+5. **Data retention** - Contains personal data (Bluesky posts); handle per GDPR requirements
+
+---
+
+## Support
+
+### Documentation
+- Main findings: `FINDINGS.md`
+- This operations guide: `OPERATIONS.md`
+- Git repository: https://forgejo.postxsociety.cloud/pieter/bluesky-collector
+
+### Logs Location
+- Docker logs: `docker compose logs [service]`
+- Application logs: `./logs/` directory (if volume mounted)
+
+### Common Issues
+1. **Port conflicts:** Change `WEB_PORT` or `POSTGRES_PORT` in .env
+2. **Out of memory:** Reduce `ANALYZER_CONCURRENCY` or `ANALYZER_BATCH_SIZE`
+3. **API rate limits:** Reduce collection frequency or batch size
+4. **Disk full:** Run `docker system prune` and consider data export/cleanup
+
+---
+
+**Last Updated:** March 30, 2026
+**Project Status:** Data collection complete, web interface available for analysis
--- a/README.md
+++ b/README.md
@ -279,3 +279,27 @@ The `pgdata` volume persists across container restarts. Back it up with standard
 ```bash
 docker compose exec -T db pg_dump -U bluesky bluesky > backup.sql
 ```
+
+## License
+
+MIT License
+
+Copyright (c) 2026 Post X Society
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.