92 lines
2.7 KiB
Markdown
92 lines
2.7 KiB
Markdown
|
|
# Mastodon Collector
|
||
|
|
|
||
|
|
Collects posts, replies, and mentions from a list of Mastodon accounts and stores them in PostgreSQL. Includes a web UI for account management and data browsing, plus JSON/CSV APIs for your analysis pipeline.
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# 1. Add accounts to monitor
|
||
|
|
echo "@user@mastodon.social" >> accounts.txt
|
||
|
|
|
||
|
|
# 2. Start everything
|
||
|
|
docker compose up -d
|
||
|
|
|
||
|
|
# 3. Open the dashboard
|
||
|
|
open http://localhost:8585
|
||
|
|
```
|
||
|
|
|
||
|
|
## Architecture
|
||
|
|
|
||
|
|
| Service | Description | Port |
|
||
|
|
|---------------|------------------------------------------------|-------|
|
||
|
|
| **db** | PostgreSQL 16 | 5432 |
|
||
|
|
| **web** | Flask dashboard (Gunicorn) | 8585 |
|
||
|
|
| **collector** | Background service, polls every 4 hours | — |
|
||
|
|
|
||
|
|
## Adding Accounts
|
||
|
|
|
||
|
|
Two methods:
|
||
|
|
|
||
|
|
1. **Text file** — edit `accounts.txt`, one handle per line (`@user@instance.social`). Picked up on next collection cycle.
|
||
|
|
2. **Web UI** — go to http://localhost:8585/accounts and use the form.
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
Edit `.env` to customize:
|
||
|
|
|
||
|
|
```
|
||
|
|
POSTGRES_PASSWORD=collector_secret # Change for production
|
||
|
|
FLASK_SECRET_KEY=change-me-in-production
|
||
|
|
POLL_INTERVAL_SECONDS=14400 # Default: 4 hours (14400s)
|
||
|
|
```
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
For plugging into your analysis pipeline:
|
||
|
|
|
||
|
|
| Endpoint | Description |
|
||
|
|
|-----------------------|--------------------------------------|
|
||
|
|
| `GET /api/stats` | Overview stats (counts by type) |
|
||
|
|
| `GET /api/statuses` | Paginated statuses as JSON |
|
||
|
|
| `GET /export` | Download all statuses as CSV |
|
||
|
|
|
||
|
|
### `/api/statuses` parameters
|
||
|
|
|
||
|
|
- `page` — page number (default: 1)
|
||
|
|
- `per_page` — results per page (default: 100, max: 500)
|
||
|
|
- `account_id` — filter by internal account ID
|
||
|
|
- `type` — filter by status type: `post`, `reply`, `mention`, `reblog`
|
||
|
|
- `since` — ISO datetime, only return statuses after this time
|
||
|
|
|
||
|
|
## Database Schema
|
||
|
|
|
||
|
|
Main tables:
|
||
|
|
|
||
|
|
- `monitored_accounts` — accounts being tracked
|
||
|
|
- `statuses` — collected posts with plain text + HTML content
|
||
|
|
- `mentions` — who was @-mentioned in each status
|
||
|
|
- `media_attachments` — images/videos attached to statuses
|
||
|
|
- `tags` — hashtags used
|
||
|
|
- `collection_logs` — audit trail of each collection run
|
||
|
|
|
||
|
|
Each status stores `raw_json` with the full Mastodon API response for future analysis needs.
|
||
|
|
|
||
|
|
## Moving to a Server
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Copy the project
|
||
|
|
scp -r mastodon-collector/ user@server:~/
|
||
|
|
|
||
|
|
# On the server
|
||
|
|
cd mastodon-collector
|
||
|
|
# Edit .env with production secrets
|
||
|
|
docker compose up -d
|
||
|
|
```
|
||
|
|
|
||
|
|
## Stopping
|
||
|
|
|
||
|
|
```bash
|
||
|
|
docker compose down # Stop services, keep data
|
||
|
|
docker compose down -v # Stop services AND delete database
|
||
|
|
```
|