# Mastodon Collector Collects posts, replies, and mentions from a list of Mastodon accounts and stores them in PostgreSQL. Includes a web UI for account management and data browsing, plus JSON/CSV APIs for your analysis pipeline. ## Quick Start ```bash # 1. Add accounts to monitor echo "@user@mastodon.social" >> accounts.txt # 2. Start everything docker compose up -d # 3. Open the dashboard open http://localhost:8585 ``` ## Architecture | Service | Description | Port | |---------------|------------------------------------------------|-------| | **db** | PostgreSQL 16 | 5432 | | **web** | Flask dashboard (Gunicorn) | 8585 | | **collector** | Background service, polls every 4 hours | — | ## Adding Accounts Two methods: 1. **Text file** — edit `accounts.txt`, one handle per line (`@user@instance.social`). Picked up on next collection cycle. 2. **Web UI** — go to http://localhost:8585/accounts and use the form. ## Configuration Edit `.env` to customize: ``` POSTGRES_PASSWORD=collector_secret # Change for production FLASK_SECRET_KEY=change-me-in-production POLL_INTERVAL_SECONDS=14400 # Default: 4 hours (14400s) ``` ## API Endpoints For plugging into your analysis pipeline: | Endpoint | Description | |-----------------------|--------------------------------------| | `GET /api/stats` | Overview stats (counts by type) | | `GET /api/statuses` | Paginated statuses as JSON | | `GET /export` | Download all statuses as CSV | ### `/api/statuses` parameters - `page` — page number (default: 1) - `per_page` — results per page (default: 100, max: 500) - `account_id` — filter by internal account ID - `type` — filter by status type: `post`, `reply`, `mention`, `reblog` - `since` — ISO datetime, only return statuses after this time ## Database Schema Main tables: - `monitored_accounts` — accounts being tracked - `statuses` — collected posts with plain text + HTML content - `mentions` — who was @-mentioned in each status - `media_attachments` — images/videos attached to statuses - `tags` — hashtags used - `collection_logs` — audit trail of each collection run Each status stores `raw_json` with the full Mastodon API response for future analysis needs. ## Moving to a Server ```bash # Copy the project scp -r mastodon-collector/ user@server:~/ # On the server cd mastodon-collector # Edit .env with production secrets docker compose up -d ``` ## Stopping ```bash docker compose down # Stop services, keep data docker compose down -v # Stop services AND delete database ```