mastodon-collector/README.md
Pieter 72dbf0d2b6 Initial commit: Mastodon collector application
Add Flask-based application for collecting and archiving Mastodon posts from configured accounts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-09 08:05:54 +01:00

91 lines
2.7 KiB
Markdown

# Mastodon Collector
Collects posts, replies, and mentions from a list of Mastodon accounts and stores them in PostgreSQL. Includes a web UI for account management and data browsing, plus JSON/CSV APIs for your analysis pipeline.
## Quick Start
```bash
# 1. Add accounts to monitor
echo "@user@mastodon.social" >> accounts.txt
# 2. Start everything
docker compose up -d
# 3. Open the dashboard
open http://localhost:8585
```
## Architecture
| Service | Description | Port |
|---------------|------------------------------------------------|-------|
| **db** | PostgreSQL 16 | 5432 |
| **web** | Flask dashboard (Gunicorn) | 8585 |
| **collector** | Background service, polls every 4 hours | — |
## Adding Accounts
Two methods:
1. **Text file** — edit `accounts.txt`, one handle per line (`@user@instance.social`). Picked up on next collection cycle.
2. **Web UI** — go to http://localhost:8585/accounts and use the form.
## Configuration
Edit `.env` to customize:
```
POSTGRES_PASSWORD=collector_secret # Change for production
FLASK_SECRET_KEY=change-me-in-production
POLL_INTERVAL_SECONDS=14400 # Default: 4 hours (14400s)
```
## API Endpoints
For plugging into your analysis pipeline:
| Endpoint | Description |
|-----------------------|--------------------------------------|
| `GET /api/stats` | Overview stats (counts by type) |
| `GET /api/statuses` | Paginated statuses as JSON |
| `GET /export` | Download all statuses as CSV |
### `/api/statuses` parameters
- `page` — page number (default: 1)
- `per_page` — results per page (default: 100, max: 500)
- `account_id` — filter by internal account ID
- `type` — filter by status type: `post`, `reply`, `mention`, `reblog`
- `since` — ISO datetime, only return statuses after this time
## Database Schema
Main tables:
- `monitored_accounts` — accounts being tracked
- `statuses` — collected posts with plain text + HTML content
- `mentions` — who was @-mentioned in each status
- `media_attachments` — images/videos attached to statuses
- `tags` — hashtags used
- `collection_logs` — audit trail of each collection run
Each status stores `raw_json` with the full Mastodon API response for future analysis needs.
## Moving to a Server
```bash
# Copy the project
scp -r mastodon-collector/ user@server:~/
# On the server
cd mastodon-collector
# Edit .env with production secrets
docker compose up -d
```
## Stopping
```bash
docker compose down # Stop services, keep data
docker compose down -v # Stop services AND delete database
```