Initial commit: Bluesky collector with toxicity analysis

- Bluesky post collector with mention tracking
- PostgreSQL database for storage
- OpenAI-based toxicity analysis
- Web UI for viewing and analyzing posts
- Docker compose setup for deployment

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Pieter 2026-02-08 13:54:36 +01:00
commit b1fd78e0c1
46 changed files with 7324 additions and 0 deletions

7
.dockerignore Normal file
View file

@ -0,0 +1,7 @@
__pycache__
*.pyc
.env
logs/
.git
.gitignore
README.md

27
.env.example Normal file
View file

@ -0,0 +1,27 @@
# PostgreSQL
POSTGRES_USER=bluesky
POSTGRES_PASSWORD=changeme
POSTGRES_PORT=5433
# Collector settings
LOG_LEVEL=INFO
MAX_PAGES_PER_ACCOUNT=50
MENTION_LOOKBACK_HOURS=12
# Bluesky authentication (required for mention search)
# Create an app password at: Settings → App Passwords
BSKY_HANDLE=
BSKY_APP_PASSWORD=
# Toxicity Analyzer (OpenAI)
# Get a key at: https://platform.openai.com/api-keys
OPENAI_API_KEY=
ANALYZER_MODEL=gpt-4.1-nano
ANALYZER_CONCURRENCY=3
ANALYZER_BATCH_SIZE=10
# Web UI
WEB_PORT=5001
# Scheduling is controlled by ofelia cron in docker-compose.yml
# Default: every 4 hours ("0 0 */4 * * *")

38
.gitignore vendored Normal file
View file

@ -0,0 +1,38 @@
# Environment variables and secrets
.env
.env.local
*.key
*.pem
# Python
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
*.so
*.egg
*.egg-info/
dist/
build/
.pytest_cache/
.coverage
htmlcov/
# Logs and data
logs/
*.log
# OS files
.DS_Store
Thumbs.db
# IDE
.vscode/
.idea/
*.swp
*.swo
*~
# Docker volumes (if any)
postgres_data/

12
Dockerfile Normal file
View file

@ -0,0 +1,12 @@
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY src/ ./src/
# Keep container alive for ofelia job-exec scheduling.
# To run manually: docker compose exec collector python -m src.collector
CMD ["tail", "-f", "/dev/null"]

281
README.md Normal file
View file

@ -0,0 +1,281 @@
# Bluesky Account Monitor
Collects posts, replies, and mentions for a list of Bluesky accounts, runs AI-powered toxicity analysis across 12 categories, and presents results on a web dashboard. Everything runs in Docker.
## Architecture
```
┌─────────────────────────────────────────┐
│ Docker Compose │
│ │
accounts.yml ───▶│ collector ──▶ PostgreSQL ◀── web (Flask) │
│ │ ▲ │
│ ▼ │ │
│ analyzer ──────────┘ │
│ │ │
│ ▼ │
│ OpenAI API │
│ │
│ scheduler (Ofelia) ── cron triggers │
└─────────────────────────────────────────┘
```
Four services:
- **db** — PostgreSQL 16 (Alpine), stores all data
- **collector** — Python async service that fetches posts and mentions from Bluesky
- **scheduler** — [Ofelia](https://github.com/mcuadros/ofelia) cron that triggers collection (every 4h) and analysis (every 4h + 30min offset)
- **web** — Flask + Gunicorn dashboard on port 5001
## Quick Start
```bash
# 1. Copy and edit your environment config
cp .env.example .env
# Fill in: BSKY_HANDLE, BSKY_APP_PASSWORD, OPENAI_API_KEY
# 2. Add your target accounts to config/accounts.yml
# 3. Start everything
docker compose up -d
# 4. Run the toxicity schema migration
docker compose exec -T db psql -U bluesky -d bluesky < scripts/02-toxicity.sql
# 5. Trigger an immediate first collection
docker compose exec collector python -m src
# 6. Run a test toxicity analysis (100 posts)
docker compose exec -e ANALYZER_LIMIT=100 collector python -m src.analyzer
# 7. Open the dashboard
open http://localhost:5001
```
## Collection
### What Gets Collected
| Source | API Endpoint | Stored In |
|--------|-------------|-----------|
| User's own posts & replies | `getAuthorFeed` (public) | `posts` table |
| Posts mentioning a user | `searchPosts` (requires auth) | `mentions` table |
All records include a `raw_json` JSONB column with the full API response for future-proof analysis.
### How It Works
- **Scheduled polling** via Ofelia — runs every 4 hours by default
- **Incremental collection** — only fetches posts newer than the last run
- **Rate limit aware** — reads API response headers and sleeps when approaching limits
- **Deduplication** — posts are upserted by URI; engagement counts are refreshed on re-encounters
## Toxicity Analysis
The analyzer classifies every post and mention using OpenAI's GPT-4.1-nano, scoring content on 12 categories from 0.0 (absent) to 1.0 (extreme):
| Category | What it detects |
|----------|----------------|
| `toxic` | Rude, disrespectful, or aggressive language |
| `threat` | Violence, harm, intimidation, calls to action |
| `hate_speech` | Targeting any protected characteristic |
| `racism` | Race/ethnicity-based hostility |
| `antisemitism` | Anti-Jewish hate, conspiracy theories, coded language |
| `islamophobia` | Anti-Muslim hate, "omvolking" narratives |
| `sexism` | Gender-based discrimination or harassment |
| `homophobia` | Anti-LGBTQ+ rhetoric |
| `insult` | Personal attacks, name-calling |
| `dehumanization` | Comparing people to animals, vermin, disease |
| `extremism` | Far-right/left rhetoric, Nazi glorification, Great Replacement |
| `ableism` | Disability-targeting language, mental health slurs |
The prompt is tuned for Dutch political discourse, recognizing coded terms like "gelukszoekers", "kutmarokkanen", "landverrader", "linkse ratten", etc. Political disagreement and criticism are not scored as toxic — only genuine hostility, hate, and threats.
### Batch Processing
Posts are sent to the API in batches (default 10 per call) to minimize cost and API calls. The ~500-token system prompt is sent once per batch instead of once per post, cutting input token cost by ~60%.
| | 1 post/call | 10 posts/call (default) |
|---|---|---|
| API calls for 60K posts | 60,000 | 6,000 |
| Estimated cost | ~$5.10 | ~$2.40 |
### Running the Analyzer
```bash
# Test run (100 posts)
docker compose exec -e ANALYZER_LIMIT=100 collector python -m src.analyzer
# Full run (all unscored posts)
docker compose exec collector python -m src.analyzer
# Check logs
docker compose logs collector | grep analyzer
cat logs/analyzer.log
```
The scheduled cron runs the analyzer automatically every 4 hours (30 minutes after each collection), so new posts are scored without manual intervention.
## Web Dashboard
Access at `http://localhost:5001` (or your configured `WEB_PORT`).
### Pages
- **Dashboard** — Overview of collection runs, account count, post/mention totals
- **Accounts** — List of tracked accounts with post counts and last activity
- **Statuses** — Browse all collected posts with filters and search
- **Mentions** — Browse mentions of tracked accounts
- **Analysis** — Toxicity overview: trend charts, category breakdown, recent analysis runs
- **Flagged Content** — Posts scoring above the flag threshold (default 0.5), filterable by category and type
- **Account Toxicity** — Per-account toxicity breakdown with comparative charts
- **Export** — Download data as CSV
## Configuration
### accounts.yml
```yaml
accounts:
- handle: alice.bsky.social
- handle: bob.bsky.social
- handle: some-org.bsky.social
```
### Environment Variables
#### Collection
| Variable | Default | Description |
|----------|---------|-------------|
| `POSTGRES_PASSWORD` | `changeme` | Database password |
| `POSTGRES_PORT` | `5432` | Exposed PostgreSQL port |
| `LOG_LEVEL` | `INFO` | Python log level |
| `MAX_PAGES_PER_ACCOUNT` | `50` | Max API pages per account per run (50 pages = 5000 posts) |
| `MENTION_LOOKBACK_HOURS` | `12` | How far back to search mentions on first run |
| `BSKY_HANDLE` | — | Your Bluesky handle (required for mention search) |
| `BSKY_APP_PASSWORD` | — | App password from Settings > App Passwords |
#### Toxicity Analysis
| Variable | Default | Description |
|----------|---------|-------------|
| `OPENAI_API_KEY` | — | OpenAI API key (required) |
| `ANALYZER_MODEL` | `gpt-4.1-nano` | OpenAI model for classification |
| `ANALYZER_CONCURRENCY` | `3` | Max concurrent API calls (batches in flight) |
| `ANALYZER_BATCH_SIZE` | `10` | Posts per API call |
| `ANALYZER_LIMIT` | `0` | Max posts to process per run (0 = all) |
| `ANALYZER_FLAG_THRESHOLD` | `0.5` | Score above which a post is flagged |
#### Web UI
| Variable | Default | Description |
|----------|---------|-------------|
| `WEB_PORT` | `5001` | Exposed web dashboard port |
## Database Schema
### Key Tables
- **`accounts`** — Tracked accounts (DID, handle, collection timestamps)
- **`posts`** — Posts from tracked accounts (text, timestamps, engagement counts, post type, raw JSON)
- **`mentions`** — Posts from anyone that mention a tracked account
- **`collection_runs`** — Audit trail of each collection run (timing, counts, errors)
- **`collection_state`** — Per-account bookmarks for incremental collection
- **`toxicity_scores`** — Per-post scores across all 12 categories + overall + flagged
- **`mention_toxicity_scores`** — Same structure for mentions
- **`analysis_runs`** — Audit trail of analyzer runs (timing, counts, cost, errors)
### Useful Queries
```sql
-- Recent posts by a specific account
SELECT created_at, post_type, text, like_count
FROM posts
WHERE author_did = (SELECT did FROM accounts WHERE handle = 'alice.bsky.social')
ORDER BY created_at DESC LIMIT 20;
-- All mentions of a tracked account
SELECT m.post_text, m.post_created_at, m.mentioning_did
FROM mentions m
JOIN accounts a ON a.did = m.mentioned_did
WHERE a.handle = 'alice.bsky.social'
ORDER BY m.post_created_at DESC;
-- Most toxic posts (overall score)
SELECT p.text, t.overall, t.toxic, t.threat, t.hate_speech, t.racism
FROM toxicity_scores t
JOIN posts p ON p.uri = t.post_uri
WHERE t.flagged = true
ORDER BY t.overall DESC LIMIT 20;
-- Toxicity by account
SELECT a.handle, avg(t.overall) AS avg_toxicity, count(*) AS scored_posts
FROM toxicity_scores t
JOIN posts p ON p.uri = t.post_uri
JOIN accounts a ON a.did = p.author_did
GROUP BY a.handle
ORDER BY avg_toxicity DESC;
-- Analysis run history
SELECT id, started_at, status, posts_scored, mentions_scored, cost_usd
FROM analysis_runs ORDER BY started_at DESC LIMIT 10;
-- Collection run history
SELECT id, started_at, status, posts_collected, mentions_collected, duration_secs
FROM collection_runs ORDER BY started_at DESC LIMIT 10;
```
## Operations
### Manual Runs
```bash
# Collect posts
docker compose exec collector python -m src
# Run toxicity analysis
docker compose exec collector python -m src.analyzer
```
### Monitoring
```bash
# Follow logs
docker compose logs -f collector
# Quick data counts
docker compose exec -T db psql -U bluesky -d bluesky -c \
"SELECT (SELECT count(*) FROM posts) AS posts, (SELECT count(*) FROM mentions) AS mentions, (SELECT count(*) FROM toxicity_scores) AS scored;"
# Last analysis run
docker compose exec -T db psql -U bluesky -d bluesky -c \
"SELECT id, started_at, status, posts_scored, mentions_scored, cost_usd FROM analysis_runs ORDER BY started_at DESC LIMIT 5;"
```
### Rebuilding After Code Changes
```bash
docker compose build collector web
docker compose up -d
```
### Add/Remove Accounts
Edit `config/accounts.yml` — changes take effect on the next scheduled or manual run. Removed accounts are marked inactive but their data is preserved.
### First Run / Backfill
The first run pages back up to `MAX_PAGES_PER_ACCOUNT` pages (default 5000 posts). For a deeper backfill, temporarily increase this value:
```bash
MAX_PAGES_PER_ACCOUNT=200 docker compose exec collector python -m src
```
### Backup
The `pgdata` volume persists across container restarts. Back it up with standard PostgreSQL tools:
```bash
docker compose exec -T db pg_dump -U bluesky bluesky > backup.sql
```

171
config/accounts.yml Normal file
View file

@ -0,0 +1,171 @@
# Bluesky accounts to track
# Source: Bluesky.ods (Dutch politicians, parties, parodies)
# Total accounts: see below
accounts:
# ── Politicians ────────────────────────────────────────
- handle: franstimmermans.groenlinkspvda.nl # Frans Timmermans (GroenLinks-PvdA)
- handle: lisawesterveld.bsky.social # Lisa Westerveld (GroenLinks-PvdA)
- handle: laurensdassen.voltnederland.org # Laurens Dassen (Volt)
- handle: femkehalsema.bsky.social # Femke Halsema (GroenLinks)
- handle: estherouwehand.bsky.social # Esther Ouwehand (Partij voor de Dieren)
- handle: laurabromet.bsky.social # Laura Bromet (GroenLinks-PvdA)
- handle: henri.cda.nl # Henri Bontenbal (CDA)
- handle: sylvanasimons.bsky.social # Sylvana Simons (BIJ1)
- handle: jesseklaver.groenlinkspvda.nl # Jesse Klaver (GroenLinks-PvdA)
- handle: robjetten.bsky.social # Rob Jetten (D66)
- handle: derk.cda.nl # Derk Boswijk (CDA)
- handle: marjoleinmoorman.bsky.social # Marjolein Moorman (GroenLinks-PvdA)
- handle: mariekekoekkoek.voltnederland.org # Marieke Koekkoek (Volt)
- handle: esmahlahlah.bsky.social # Esmah Lahlah (GroenLinks-PvdA)
- handle: habtamudehoop.bsky.social # Habtamu de Hoop (GroenLinks-PvdA)
- handle: jimmydijk.sp.nl # Jimmy Dijk (SP)
- handle: ineskostic.bsky.social # Ines Kostic (Partij voor de Dieren)
- handle: suzannekroger.bsky.social # Suzanne Kroger (GroenLinks-PvdA)
- handle: tomvanderlee.bsky.social # Tom van der Lee (GroenLinks-PvdA)
- handle: barbarakathmann.bsky.social # Barbara Kathmann (GroenLinks-PvdA)
- handle: fweisglas.bsky.social # Frans Weisglas (VVD)
- handle: rubenbrekelmans.bsky.social # Ruben Brekelmans (VVD)
- handle: katipiri.bsky.social # Kati Piri (GroenLinks-PvdA)
- handle: jpaternotte.bsky.social # Jan Paternotte (D66)
- handle: omtzigt.bsky.social # Pieter Omtzigt (NSC)
- handle: erikpjverweij.bsky.social # Erik Verweij (VVD)
- handle: kimvsparrentak.bsky.social # Kim van Sparrentak (GroenLinks-PvdA)
- handle: mirjambikker.bsky.social # Mirjam Bikker (ChristenUnie)
- handle: daniellehirsch.bsky.social # Danielle Hirsch (GroenLinks-PvdA)
- handle: lucstultiens.bsky.social # Luc Stultiens (GroenLinks-PvdA)
- handle: annakrijger.bsky.social # Anna Krijger (Partij voor de Dieren)
- handle: christinepvdd.bsky.social # Christine Teunissen (Partij voor de Dieren)
- handle: annemarijke.bsky.social # Anne-Marijke Podt (D66)
- handle: ticiaverveer.bsky.social # Ticia Verveer (VVD)
- handle: jelgerg.bsky.social # Jelger Groeneveld (D66)
- handle: annejessicalise.bsky.social # An-Jes Oudshoorn (D66)
- handle: bartgroothuis.bsky.social # Bart Groothuis (VVD)
- handle: sneller.bsky.social # Joost Sneller (D66)
- handle: pietergrinwis.bsky.social # Pieter Grinwis (ChristenUnie)
- handle: julianbushoff.bsky.social # Julian Bushoff (GroenLinks-PvdA)
- handle: daanroovers.bsky.social # Daan Roovers (GroenLinks-PvdA)
- handle: dijkhoff.bsky.social # Klaas Dijkhoff (VVD)
- handle: patijn.bsky.social # Mariette Patijn (GroenLinks-PvdA)
- handle: mikaltseggai.bsky.social # Mikal Tseggai (GroenLinks-PvdA)
- handle: momohandis.bsky.social # Mo Mohandis (GroenLinks-PvdA)
- handle: javijlbrief.bsky.social # Hans Vijlbrief (D66)
- handle: maritmaij.bsky.social # Marit Maij (GroenLinks-PvdA)
- handle: robertdenhaag.bsky.social # Robert van Asten (D66)
- handle: ilanarooderkerk.bsky.social # Ilana Rooderkerk (D66)
- handle: casparvandenberg.bsky.social # Caspar van den Berg (Onafhankelijk)
- handle: metmarleen.bsky.social # Marleen Haage (GroenLinks-PvdA)
- handle: gerbrandy.bsky.social # Gerben-Jan Gerbrandy (D66)
- handle: nvanvroonhoven.bsky.social # Nicolien van Vroonhoven (NSC)
- handle: janschoonis.bsky.social # Jan Schoonis (D66)
- handle: lisaginneken.bsky.social # Lisa van Ginneken (D66)
- handle: leoniegerritsen.bsky.social # Leonie Gerritsen (Partij voor de Dieren)
- handle: maikel.lukkezen.name # Maikel Lukkezen (D66)
- handle: paulblom.nl # Paul Blom (Partij voor de Dieren)
- handle: gertjansegers.bsky.social # Gert-Jan Segers (ChristenUnie)
- handle: fatihyaabdi.bsky.social # Fatihya Abdi (GroenLinks-PvdA)
- handle: sandrabeckerman.bsky.social # Sandra Beckerman (SP)
- handle: johannesprakken.bsky.social # Johannes Prakken (D66)
- handle: raquelgarciaher.bsky.social # Raquel Garcia Hermida-vdWalle (D66)
- handle: ericholterhues.bsky.social # Eric Holterhues (ChristenUnie)
- handle: arendkisteman.bsky.social # Arend Kisteman (VVD)
- handle: jessesixdijkstra.bsky.social # Jesse Six Dijkstra (NSC)
- handle: mariannethieme.bsky.social # Marianne Thieme (Partij voor de Dieren)
- handle: andrepoortman.bsky.social # Andre Poortman (CDA)
- handle: wiekepaulusma.bsky.social # Wieke Paulusma (D66)
- handle: svanoosterhout.bsky.social # Sjoukje van Oosterhout (GroenLinks-PvdA)
- handle: danielle-jansen.bsky.social # Danielle Jansen (NSC)
- handle: hinddekker.bsky.social # Hind Dekker-Abdulaziz (D66)
- handle: dogukanergin.bsky.social # Dogukan Ergin (DENK)
- handle: gabyperingopie.voltnederland.org # Gaby Perin-Gopie (Volt)
- handle: lisavliegenthart.bsky.social # Lisa Vliegenthart (GroenLinks-PvdA)
- handle: tiemenjan.bsky.social # Tiemen Jan van Dijk (VVD)
- handle: jerkesetz.bsky.social # Jerke Setz (ChristenUnie)
- handle: songulmutluer.bsky.social # Songul Mutluer (GroenLinks-PvdA)
- handle: faridazarkan.bsky.social # Farid Azarkan (DENK)
- handle: dirkgotink.bsky.social # Dirk Gotink (NSC)
- handle: liesvanaelst.bsky.social # Lies van Aelst (SP)
- handle: martijnbuijsse.bsky.social # Martijn Buijsse (VVD)
- handle: sandra-alberts.bsky.social # Sandra Alberts (VVD)
- handle: rikvanwijk.bsky.social # Rik van Wijk (D66)
- handle: evertbob.bsky.social # Evert Bobeldijk (D66)
- handle: frankwiertz.bsky.social # Frank Wiertz (D66)
- handle: andrewvanesch.bsky.social # Andrew van Esch (D66)
- handle: jantinezwinkels.bsky.social # Jantine Zwinkels (CDA)
- handle: kuneburgers.bsky.social # Kune Burgers (VVD)
- handle: pepijnpi.bsky.social # Pepijn Pi Van de Venne (D66)
- handle: chris10govaert.bsky.social # Christine Govaert (BBB)
- handle: marietbosman62.bsky.social # Mariet Bosman (BBB)
- handle: natalienauta.bsky.social # Natalie Nauta (BBB)
- handle: nielsoosterom.bsky.social # Niels Oosterom (BBB)
- handle: stephanvanbaarle.bsky.social # Stephan van Baarle (DENK)
- handle: ananninga.bsky.social # Annabel Nanninga (JA21)
- handle: djhvandijk.bsky.social # Diederik van Dijk (SGP)
- handle: carladikfaber.bsky.social # Carla Dik-Faber (ChristenUnie)
- handle: willemrutjens.bsky.social # Willem Rutjens (JA21)
- handle: petraverdonk.bsky.social # Petra Verdonk (GroenLinks-PvdA)
- handle: rnbarker.bsky.social # Robert Barker (Partij voor de Dieren)
- handle: bertnederveen.bsky.social # Bert Nederveen (ChristenUnie)
- handle: kirstenalblas.bsky.social # Kirsten Alblas (ChristenUnie)
- handle: benbloem.bsky.social # Ben Bloem (ChristenUnie)
# ── Party & Organization Accounts ─────────────────────
- handle: partijvoordedieren.nl # Partij voor de Dieren (Partij voor de Dieren)
- handle: d66.nl # D66 (D66)
- handle: vvdonline.bsky.social # VVD (VVD)
- handle: christenunie.bsky.social # ChristenUnie (ChristenUnie)
- handle: spnederland.bsky.social # SP (SP)
- handle: cda.nl # CDA (CDA)
- handle: pvdddenhaag.bsky.social # PvdD Den Haag (Partij voor de Dieren)
- handle: partijnsc.bsky.social # NSC (NSC)
- handle: pvdd-eerstekamer.bsky.social # PvdD Eerste Kamer (Partij voor de Dieren)
- handle: pvddeindhoven.bsky.social # PvdD Eindhoven (Partij voor de Dieren)
- handle: vvdeuropa.bsky.social # VVD Europa (VVD)
- handle: nieuwevvders.bsky.social # Nieuwe VVD'ers (VVD)
- handle: pvddfryslan.bsky.social # PvdD Fryslan (Partij voor de Dieren)
- handle: perspectief.bsky.social # PerspectieF (CU jongeren) (ChristenUnie)
- handle: pvddamsterdam.bsky.social # PvdD Amsterdam (Partij voor de Dieren)
- handle: ngpfoundation.bsky.social # NGP Foundation (PvdD) (Partij voor de Dieren)
- handle: vvd-overijssel.bsky.social # VVD Overijssel (VVD)
- handle: almeresp.bsky.social # SP Almere (SP)
- handle: d66vught.bsky.social # D66 Vught (D66)
- handle: boerburgerbeweging.bsky.social # BoerBurgerBeweging (BBB)
- handle: d66voorschoten.bsky.social # D66 Voorschoten (D66)
- handle: sp033.bsky.social # SP Amersfoort (SP)
- handle: d66onderwijs.bsky.social # D66 Onderwijs & Wetenschap (D66)
- handle: sp-eerstekamer.bsky.social # SP Eerste Kamerfractie (SP)
- handle: nsc-limburg.bsky.social # NSC Limburg (NSC)
- handle: cda-rotterdam.bsky.social # CDA Rotterdam (CDA)
- handle: vvdzaltbommel.bsky.social # VVD Zaltbommel (VVD)
- handle: forumvdemocratie.bsky.social # FvD (niet officieel) (FvD)
- handle: spamsterdam.bsky.social # SP Amsterdam (SP)
- handle: d66leudal.bsky.social # D66 Leudal (D66)
- handle: bbbzeeland.bsky.social # BBB Zeeland (BBB)
- handle: d66brabant.bsky.social # D66 Brabant (D66)
- handle: cda-sd.bsky.social # CDA Schouwen-Duiveland (CDA)
- handle: spflevoland.bsky.social # SP Flevoland (SP)
- handle: d66hoogeveen.bsky.social # D66 Hoogeveen (D66)
- handle: vvdcastricum.bsky.social # VVD Castricum (VVD)
- handle: groenlinks-pvda.bsky.social # GroenLinks-PvdA (GroenLinks-PvdA)
- handle: voltnederland.org # Volt Nederland (Volt)
- handle: social.bij1.org # BIJ1 (BIJ1)
- handle: juisteantwoord.bsky.social # JA21 (JA21)
- handle: sgpnieuws.bsky.social # SGP (SGP)
- handle: charge-volt.bsky.social # Charge (Volt wetensch.) (Volt)
- handle: spgelderland.bsky.social # SP Gelderland (SP)
- handle: spnoordholland.bsky.social # SP Noord-Holland (SP)
- handle: ja21overijssel.bsky.social # JA21 Overijssel (JA21)
- handle: groenlinkspvda072.bsky.social # GL-PvdA Alkmaar (GroenLinks-PvdA)
- handle: groenlinksassen.nl # GL Assen (GroenLinks-PvdA)
- handle: denhaagbij1.bsky.social # Den Haag BIJ1 (BIJ1)
- handle: volthouten.bsky.social # Volt Houten (Volt)
# ── Parodies & Impersonations ─────────────────────────
- handle: pvvfaber.bsky.social # Marjolein Faber (parodie)
- handle: geertwiiders.bsky.social # Geert Wiiders (parodie)
- handle: partijvddieren.bsky.social # PvdD (impersonation)
- handle: forumvdemocratie.bsky.social # FvD (niet officieel)
- handle: nwsoccontract.bsky.social # NSC (name squatter)
- handle: pieter-omtzigt.bsky.social # Pieter Omtzigt (2nd)
- handle: dilans-geweten.bsky.social # Stelt Dilan landsbelang al boven partij?

72
docker-compose.yml Normal file
View file

@ -0,0 +1,72 @@
services:
db:
image: postgres:16-alpine
environment:
POSTGRES_DB: bluesky
POSTGRES_USER: ${POSTGRES_USER:-bluesky}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
volumes:
- pgdata:/var/lib/postgresql/data
- ./scripts/init.sql:/docker-entrypoint-initdb.d/01-init.sql
ports:
- "${POSTGRES_PORT:-5432}:5432"
healthcheck:
test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER:-bluesky}"]
interval: 5s
retries: 5
restart: unless-stopped
collector:
build: .
depends_on:
db:
condition: service_healthy
environment:
DATABASE_URL: postgresql://${POSTGRES_USER:-bluesky}:${POSTGRES_PASSWORD:-changeme}@db:5432/bluesky
BSKY_PUBLIC_API: https://public.api.bsky.app
LOG_LEVEL: ${LOG_LEVEL:-INFO}
MAX_PAGES_PER_ACCOUNT: ${MAX_PAGES_PER_ACCOUNT:-50}
MENTION_LOOKBACK_HOURS: ${MENTION_LOOKBACK_HOURS:-12}
BSKY_HANDLE: ${BSKY_HANDLE:-}
BSKY_APP_PASSWORD: ${BSKY_APP_PASSWORD:-}
OPENAI_API_KEY: ${OPENAI_API_KEY:-}
ANALYZER_MODEL: ${ANALYZER_MODEL:-gpt-4.1-nano}
ANALYZER_CONCURRENCY: ${ANALYZER_CONCURRENCY:-3}
ANALYZER_BATCH_SIZE: ${ANALYZER_BATCH_SIZE:-10}
ANALYZER_LIMIT: ${ANALYZER_LIMIT:-0}
volumes:
- ./config:/app/config:ro
- ./logs:/app/logs
- ./scripts:/app/scripts:ro
labels:
ofelia.enabled: "true"
ofelia.job-exec.collect.schedule: "0 0 */4 * * *"
ofelia.job-exec.collect.command: "python -m src"
ofelia.job-exec.analyze.schedule: "0 30 */4 * * *"
ofelia.job-exec.analyze.command: "python -m src.analyzer"
restart: unless-stopped
scheduler:
image: mcuadros/ofelia:latest
depends_on:
- collector
command: daemon --docker
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
restart: unless-stopped
web:
build:
context: .
dockerfile: web.Dockerfile
depends_on:
db:
condition: service_healthy
environment:
DATABASE_URL: postgresql://${POSTGRES_USER:-bluesky}:${POSTGRES_PASSWORD:-changeme}@db:5432/bluesky
ports:
- "${WEB_PORT:-5001}:5001"
restart: unless-stopped
volumes:
pgdata:

3
requirements-web.txt Normal file
View file

@ -0,0 +1,3 @@
flask>=3.0
gunicorn>=22.0
psycopg2-binary>=2.9

6
requirements.txt Normal file
View file

@ -0,0 +1,6 @@
atproto>=0.0.55
asyncpg>=0.30.0
pyyaml>=6.0
tenacity>=8.2
httpx>=0.27.0
openai>=1.60.0

65
scripts/02-toxicity.sql Normal file
View file

@ -0,0 +1,65 @@
-- Toxicity Analysis Schema
-- Stores per-post and per-mention toxicity scores from LLM classification.
-- Toxicity scores for posts (from tracked accounts' feeds)
CREATE TABLE IF NOT EXISTS toxicity_scores (
uri TEXT PRIMARY KEY REFERENCES posts(uri) ON DELETE CASCADE,
overall REAL NOT NULL,
toxic REAL NOT NULL DEFAULT 0,
threat REAL NOT NULL DEFAULT 0,
hate_speech REAL NOT NULL DEFAULT 0,
racism REAL NOT NULL DEFAULT 0,
antisemitism REAL NOT NULL DEFAULT 0,
islamophobia REAL NOT NULL DEFAULT 0,
sexism REAL NOT NULL DEFAULT 0,
homophobia REAL NOT NULL DEFAULT 0,
insult REAL NOT NULL DEFAULT 0,
dehumanization REAL NOT NULL DEFAULT 0,
extremism REAL NOT NULL DEFAULT 0,
ableism REAL NOT NULL DEFAULT 0,
flagged BOOLEAN NOT NULL DEFAULT false,
model TEXT NOT NULL DEFAULT 'gpt-4.1-nano',
scored_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_tox_flagged ON toxicity_scores (flagged) WHERE flagged = true;
CREATE INDEX IF NOT EXISTS idx_tox_overall ON toxicity_scores (overall DESC);
CREATE INDEX IF NOT EXISTS idx_tox_scored ON toxicity_scores (scored_at DESC);
-- Toxicity scores for mentions (posts about tracked accounts)
CREATE TABLE IF NOT EXISTS mention_toxicity_scores (
mention_id BIGINT PRIMARY KEY REFERENCES mentions(id) ON DELETE CASCADE,
overall REAL NOT NULL,
toxic REAL NOT NULL DEFAULT 0,
threat REAL NOT NULL DEFAULT 0,
hate_speech REAL NOT NULL DEFAULT 0,
racism REAL NOT NULL DEFAULT 0,
antisemitism REAL NOT NULL DEFAULT 0,
islamophobia REAL NOT NULL DEFAULT 0,
sexism REAL NOT NULL DEFAULT 0,
homophobia REAL NOT NULL DEFAULT 0,
insult REAL NOT NULL DEFAULT 0,
dehumanization REAL NOT NULL DEFAULT 0,
extremism REAL NOT NULL DEFAULT 0,
ableism REAL NOT NULL DEFAULT 0,
flagged BOOLEAN NOT NULL DEFAULT false,
model TEXT NOT NULL DEFAULT 'gpt-4.1-nano',
scored_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_mtox_flagged ON mention_toxicity_scores (flagged) WHERE flagged = true;
CREATE INDEX IF NOT EXISTS idx_mtox_overall ON mention_toxicity_scores (overall DESC);
-- Analysis run audit trail
CREATE TABLE IF NOT EXISTS analysis_runs (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
finished_at TIMESTAMPTZ,
status TEXT NOT NULL DEFAULT 'running', -- running | completed | failed | partial
posts_scored INTEGER NOT NULL DEFAULT 0,
mentions_scored INTEGER NOT NULL DEFAULT 0,
errors INTEGER NOT NULL DEFAULT 0,
model TEXT NOT NULL,
cost_usd NUMERIC(10,6) DEFAULT 0,
duration_secs NUMERIC
);

86
scripts/init.sql Normal file
View file

@ -0,0 +1,86 @@
-- Bluesky Collector Schema
-- Tracks accounts, their posts/replies, and mentions from other users.
-- Tracked accounts
CREATE TABLE accounts (
did TEXT PRIMARY KEY,
handle TEXT NOT NULL,
display_name TEXT,
added_at TIMESTAMPTZ NOT NULL DEFAULT now(),
last_feed_collected TIMESTAMPTZ,
last_mention_collected TIMESTAMPTZ,
active BOOLEAN NOT NULL DEFAULT true
);
CREATE UNIQUE INDEX idx_accounts_handle ON accounts (handle);
-- Collected posts (from tracked accounts' feeds)
CREATE TABLE posts (
uri TEXT PRIMARY KEY,
cid TEXT NOT NULL,
author_did TEXT NOT NULL,
text TEXT,
created_at TIMESTAMPTZ,
indexed_at TIMESTAMPTZ,
collected_at TIMESTAMPTZ NOT NULL DEFAULT now(),
reply_parent TEXT,
reply_root TEXT,
post_type TEXT NOT NULL DEFAULT 'post', -- post | reply | repost
has_media BOOLEAN DEFAULT false,
has_embed BOOLEAN DEFAULT false,
like_count INTEGER DEFAULT 0,
reply_count INTEGER DEFAULT 0,
repost_count INTEGER DEFAULT 0,
quote_count INTEGER DEFAULT 0,
langs TEXT[],
raw_json JSONB NOT NULL
);
CREATE INDEX idx_posts_author ON posts (author_did);
CREATE INDEX idx_posts_created ON posts (created_at DESC);
CREATE INDEX idx_posts_type ON posts (post_type);
CREATE INDEX idx_posts_collected ON posts (collected_at DESC);
CREATE INDEX idx_posts_reply_root ON posts (reply_root) WHERE reply_root IS NOT NULL;
-- Mentions: posts from *anyone* that mention a tracked account
CREATE TABLE mentions (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
post_uri TEXT NOT NULL,
mentioned_did TEXT NOT NULL,
mentioning_did TEXT,
post_text TEXT,
post_created_at TIMESTAMPTZ,
collected_at TIMESTAMPTZ NOT NULL DEFAULT now(),
raw_json JSONB NOT NULL,
UNIQUE (post_uri, mentioned_did)
);
CREATE INDEX idx_mentions_mentioned ON mentions (mentioned_did);
CREATE INDEX idx_mentions_created ON mentions (post_created_at DESC);
-- Collection run audit trail
CREATE TABLE collection_runs (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
finished_at TIMESTAMPTZ,
status TEXT NOT NULL DEFAULT 'running', -- running | completed | failed | partial
accounts_total INTEGER NOT NULL DEFAULT 0,
accounts_done INTEGER NOT NULL DEFAULT 0,
posts_collected INTEGER NOT NULL DEFAULT 0,
mentions_collected INTEGER NOT NULL DEFAULT 0,
errors JSONB DEFAULT '[]'::jsonb,
duration_secs NUMERIC
);
-- Per-account collection bookmark (survives restarts)
CREATE TABLE collection_state (
account_did TEXT NOT NULL,
collection_type TEXT NOT NULL, -- feed | mentions
last_post_at TIMESTAMPTZ,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (account_did, collection_type)
);

0
src/__init__.py Normal file
View file

4
src/__main__.py Normal file
View file

@ -0,0 +1,4 @@
"""Allow running as: python -m src"""
from src.collector import main
main()

0
src/analyzer/__init__.py Normal file
View file

5
src/analyzer/__main__.py Normal file
View file

@ -0,0 +1,5 @@
"""Allow running as: python -m src.analyzer"""
from src.analyzer.analyzer import main
main()

282
src/analyzer/analyzer.py Normal file
View file

@ -0,0 +1,282 @@
"""Main toxicity analysis orchestrator.
Runs as a one-shot batch process: fetches unscored posts and mentions,
classifies them in batches with GPT-4.1-nano, and stores scores in PostgreSQL.
Usage:
python -m src.analyzer
ANALYZER_LIMIT=100 python -m src.analyzer # test on 100 posts
"""
from __future__ import annotations
import asyncio
import logging
import os
import sys
import time
from .classifier import ToxicityClassifier
from .config import AnalyzerConfig
from .db import AnalyzerDB
logger = logging.getLogger("analyzer")
def make_batches(items: list, batch_size: int) -> list[list]:
"""Split a flat list into sublists of at most batch_size."""
return [items[i : i + batch_size] for i in range(0, len(items), batch_size)]
async def classify_posts(
classifier: ToxicityClassifier,
db: AnalyzerDB,
posts: list[dict],
config: AnalyzerConfig,
) -> tuple[int, int, float]:
"""Classify posts in batches, with concurrency control.
Returns (scored_count, error_count, cost_usd).
"""
semaphore = asyncio.Semaphore(config.concurrency)
scored = 0
errors = 0
total_input_tokens = 0
total_output_tokens = 0
batches = make_batches(posts, config.batch_size)
logger.info(" Split %d posts into %d batches of ≤%d",
len(posts), len(batches), config.batch_size)
async def process_batch(batch: list[dict]) -> None:
nonlocal scored, errors, total_input_tokens, total_output_tokens
async with semaphore:
texts = [p["text"] for p in batch]
try:
results = await classifier.classify_batch(texts)
for post, scores in zip(batch, results):
try:
score_dict = scores.to_dict()
flagged = scores.is_flagged(config.flag_threshold)
await db.store_post_score(
uri=post["uri"],
scores=score_dict,
flagged=flagged,
model=config.model,
)
total_input_tokens += scores.input_tokens
total_output_tokens += scores.output_tokens
scored += 1
except Exception:
errors += 1
logger.exception(
"Failed to store score for post %s", post["uri"][:80]
)
if scored % 100 < config.batch_size:
logger.info(" Posts scored: %d / %d", scored, len(posts))
except Exception:
# Whole batch failed (API error after retries) — count all as errors
errors += len(batch)
logger.exception(
"Failed to classify batch of %d posts", len(batch)
)
tasks = [process_batch(b) for b in batches]
await asyncio.gather(*tasks)
cost = (
total_input_tokens * config.input_cost_per_m / 1_000_000
+ total_output_tokens * config.output_cost_per_m / 1_000_000
)
return scored, errors, cost
async def classify_mentions(
classifier: ToxicityClassifier,
db: AnalyzerDB,
mentions: list[dict],
config: AnalyzerConfig,
) -> tuple[int, int, float]:
"""Classify mentions in batches, with concurrency control.
Returns (scored_count, error_count, cost_usd).
"""
semaphore = asyncio.Semaphore(config.concurrency)
scored = 0
errors = 0
total_input_tokens = 0
total_output_tokens = 0
batches = make_batches(mentions, config.batch_size)
logger.info(" Split %d mentions into %d batches of ≤%d",
len(mentions), len(batches), config.batch_size)
async def process_batch(batch: list[dict]) -> None:
nonlocal scored, errors, total_input_tokens, total_output_tokens
async with semaphore:
texts = [m["post_text"] for m in batch]
try:
results = await classifier.classify_batch(texts)
for mention, scores in zip(batch, results):
try:
score_dict = scores.to_dict()
flagged = scores.is_flagged(config.flag_threshold)
await db.store_mention_score(
mention_id=mention["id"],
scores=score_dict,
flagged=flagged,
model=config.model,
)
total_input_tokens += scores.input_tokens
total_output_tokens += scores.output_tokens
scored += 1
except Exception:
errors += 1
logger.exception(
"Failed to store score for mention %d", mention["id"]
)
if scored % 100 < config.batch_size:
logger.info(" Mentions scored: %d / %d", scored, len(mentions))
except Exception:
errors += len(batch)
logger.exception(
"Failed to classify batch of %d mentions", len(batch)
)
tasks = [process_batch(b) for b in batches]
await asyncio.gather(*tasks)
cost = (
total_input_tokens * config.input_cost_per_m / 1_000_000
+ total_output_tokens * config.output_cost_per_m / 1_000_000
)
return scored, errors, cost
async def run() -> None:
config = AnalyzerConfig.from_env()
logging.basicConfig(
level=getattr(logging, config.log_level.upper(), logging.INFO),
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
handlers=[logging.StreamHandler(sys.stdout)],
)
# Also log to file
log_dir = "/app/logs"
if os.path.isdir(log_dir):
fh = logging.FileHandler(os.path.join(log_dir, "analyzer.log"))
fh.setFormatter(
logging.Formatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s")
)
logging.getLogger().addHandler(fh)
logger.info("=" * 60)
logger.info("Toxicity Analyzer starting (model: %s, concurrency: %d, batch_size: %d)",
config.model, config.concurrency, config.batch_size)
db = AnalyzerDB(config.database_url)
classifier = ToxicityClassifier(
api_key=config.openai_api_key,
model=config.model,
)
try:
await db.connect()
await db.apply_migration()
# Start analysis run
run_id = await db.start_analysis_run(model=config.model)
start_time = time.time()
# Fetch unscored items
limit = config.limit if config.limit > 0 else 0
posts = await db.get_unscored_posts(limit=limit)
mentions = await db.get_unscored_mentions(limit=limit)
logger.info("Found %d unscored posts, %d unscored mentions",
len(posts), len(mentions))
if not posts and not mentions:
logger.info("Nothing to score — exiting.")
await db.finish_analysis_run(
run_id, status="completed",
posts_scored=0, mentions_scored=0, errors=0, cost_usd=0.0,
)
return
total_cost = 0.0
total_errors = 0
# Phase 1: Classify posts
if posts:
logger.info("Phase 1: Classifying %d posts in batches of %d...",
len(posts), config.batch_size)
p_scored, p_errors, p_cost = await classify_posts(
classifier, db, posts, config,
)
logger.info(" Posts done: %d scored, %d errors, $%.4f",
p_scored, p_errors, p_cost)
total_cost += p_cost
total_errors += p_errors
else:
p_scored = 0
# Phase 2: Classify mentions
if mentions:
logger.info("Phase 2: Classifying %d mentions in batches of %d...",
len(mentions), config.batch_size)
m_scored, m_errors, m_cost = await classify_mentions(
classifier, db, mentions, config,
)
logger.info(" Mentions done: %d scored, %d errors, $%.4f",
m_scored, m_errors, m_cost)
total_cost += m_cost
total_errors += m_errors
else:
m_scored = 0
# Finalize run
duration = time.time() - start_time
status = "completed" if total_errors == 0 else "partial"
await db.finish_analysis_run(
run_id,
status=status,
posts_scored=p_scored,
mentions_scored=m_scored,
errors=total_errors,
cost_usd=total_cost,
)
logger.info("=" * 60)
logger.info("Analysis complete — status: %s", status)
logger.info(" Posts scored: %d, Mentions scored: %d, Errors: %d",
p_scored, m_scored, total_errors)
logger.info(" Estimated cost: $%.4f", total_cost)
logger.info(" Duration: %.1f seconds", duration)
except Exception:
logger.exception("Analyzer crashed")
raise
finally:
await classifier.close()
await db.close()
def main() -> None:
asyncio.run(run())
if __name__ == "__main__":
main()

287
src/analyzer/classifier.py Normal file
View file

@ -0,0 +1,287 @@
"""OpenAI-powered toxicity classifier.
Sends batches of posts to GPT-4.1-nano for multi-category toxicity scoring.
Returns a list of dicts of category score (0.01.0).
"""
from __future__ import annotations
import asyncio
import json
import logging
import random
from dataclasses import dataclass
from openai import AsyncOpenAI, RateLimitError, APITimeoutError, APIConnectionError
logger = logging.getLogger("analyzer.classifier")
# The 12 toxicity categories we score
CATEGORIES = [
"toxic",
"threat",
"hate_speech",
"racism",
"antisemitism",
"islamophobia",
"sexism",
"homophobia",
"insult",
"dehumanization",
"extremism",
"ableism",
]
# System prompt for batch classification — multiple numbered posts per call
SYSTEM_PROMPT = """\
You are a toxicity classifier for Dutch and English social media posts about politics.
You will receive one or more numbered posts. Score EACH post on every category from 0.0 (none) to 1.0 (extreme).
Respond ONLY with a JSON object mapping post numbers (as strings) to their scores. No other text.
Categories:
- toxic: rude, disrespectful, or aggressive language
- threat: threats of violence, harm, intimidation, or calls to action against a person
- hate_speech: targeting people based on any protected characteristic (race, religion, gender, sexual orientation, disability, nationality)
- racism: specifically targeting race or ethnicity (e.g. anti-Black, anti-Asian, anti-Moroccan sentiment, "Zwarte Piet" debates when derogatory)
- antisemitism: targeting Jewish people, Holocaust denial or minimization, Jewish conspiracy theories, coded language like "globalists", "Rothschilds", triple parentheses
- islamophobia: anti-Muslim hate, mosque opposition framed as hate, "Islam is not a religion" rhetoric, "takeover/omvolking" narratives, halal/hijab targeting
- sexism: gender-based discrimination, harassment, misogyny, or misandry
- homophobia: targeting sexual orientation or gender identity, anti-LGBTQ+ rhetoric
- insult: personal attacks, name-calling, belittling
- dehumanization: comparing people to animals, vermin, disease, parasites, or other dehumanizing language
- extremism: far-right or far-left extremist rhetoric, Nazi symbolism or glorification, white supremacist language, Great Replacement theory ("omvolkingstheorie"), calls for political violence, fascist/authoritarian glorification
- ableism: targeting people with disabilities, using mental health conditions as insults (e.g. "gestoord", "autist" as slur, "mongool")
Important context:
- Many posts are in Dutch. Handle Dutch slang, insults, and coded political language.
- Dutch-specific coded terms: "gelukszoekers", "kutmarokkanen", "omvolking", "landverrader", "volksverrader", "linkse ratten", "wappie", "tokkie" score appropriately based on context.
- Political disagreement and criticism are NOT toxic only score actual hostility, hate, or threats.
- Satire and parody accounts may use irony consider context but still score the literal content.
- A score of 0.0 means the category is completely absent. A score of 1.0 means extreme/explicit.
- Most posts will score 0.0 on most categories. Only flag genuine toxicity.
Example for 2 posts:
{"1":{"toxic":0.0,"threat":0.0,"hate_speech":0.0,"racism":0.0,"antisemitism":0.0,"islamophobia":0.0,"sexism":0.0,"homophobia":0.0,"insult":0.0,"dehumanization":0.0,"extremism":0.0,"ableism":0.0},"2":{"toxic":0.3,"threat":0.0,"hate_speech":0.0,"racism":0.0,"antisemitism":0.0,"islamophobia":0.0,"sexism":0.0,"homophobia":0.0,"insult":0.2,"dehumanization":0.0,"extremism":0.0,"ableism":0.0}}"""
@dataclass
class ToxicityScores:
"""Classification result for a single post."""
toxic: float = 0.0
threat: float = 0.0
hate_speech: float = 0.0
racism: float = 0.0
antisemitism: float = 0.0
islamophobia: float = 0.0
sexism: float = 0.0
homophobia: float = 0.0
insult: float = 0.0
dehumanization: float = 0.0
extremism: float = 0.0
ableism: float = 0.0
@property
def overall(self) -> float:
"""Overall toxicity = max of all categories."""
return max(
self.toxic,
self.threat,
self.hate_speech,
self.racism,
self.antisemitism,
self.islamophobia,
self.sexism,
self.homophobia,
self.insult,
self.dehumanization,
self.extremism,
self.ableism,
)
def is_flagged(self, threshold: float = 0.5) -> bool:
return self.overall >= threshold
def to_dict(self) -> dict:
return {
"toxic": self.toxic,
"threat": self.threat,
"hate_speech": self.hate_speech,
"racism": self.racism,
"antisemitism": self.antisemitism,
"islamophobia": self.islamophobia,
"sexism": self.sexism,
"homophobia": self.homophobia,
"insult": self.insult,
"dehumanization": self.dehumanization,
"extremism": self.extremism,
"ableism": self.ableism,
"overall": self.overall,
}
# Approximate token counts for cost tracking
input_tokens: int = 0
output_tokens: int = 0
def parse_scores(raw: str) -> ToxicityScores:
"""Parse the JSON scores for a single post into ToxicityScores."""
try:
data = json.loads(raw) if isinstance(raw, str) else raw
except json.JSONDecodeError:
logger.warning("Failed to parse JSON response: %s", str(raw)[:200])
return ToxicityScores()
def clamp(val) -> float:
try:
f = float(val)
return max(0.0, min(1.0, f))
except (TypeError, ValueError):
return 0.0
return ToxicityScores(
toxic=clamp(data.get("toxic")),
threat=clamp(data.get("threat")),
hate_speech=clamp(data.get("hate_speech")),
racism=clamp(data.get("racism")),
antisemitism=clamp(data.get("antisemitism")),
islamophobia=clamp(data.get("islamophobia")),
sexism=clamp(data.get("sexism")),
homophobia=clamp(data.get("homophobia")),
insult=clamp(data.get("insult")),
dehumanization=clamp(data.get("dehumanization")),
extremism=clamp(data.get("extremism")),
ableism=clamp(data.get("ableism")),
)
def parse_batch_response(raw: str, batch_size: int) -> list[ToxicityScores]:
"""Parse a batched JSON response into a list of ToxicityScores.
Expected format: {"1": {...scores...}, "2": {...scores...}, ...}
Returns a list of ToxicityScores in the same order as the input batch.
"""
try:
data = json.loads(raw)
except json.JSONDecodeError:
logger.warning("Failed to parse batch JSON: %s", raw[:300])
return [ToxicityScores() for _ in range(batch_size)]
results = []
for i in range(1, batch_size + 1):
key = str(i)
if key in data and isinstance(data[key], dict):
results.append(parse_scores(data[key]))
else:
logger.warning("Missing scores for post %d in batch response", i)
results.append(ToxicityScores())
return results
class ToxicityClassifier:
"""Async OpenAI-based toxicity classifier with batch support."""
def __init__(self, api_key: str, model: str = "gpt-4.1-nano"):
self.client = AsyncOpenAI(api_key=api_key)
self.model = model
async def classify_batch(
self, texts: list[str], max_retries: int = 5
) -> list[ToxicityScores]:
"""Classify multiple posts in a single API call.
Args:
texts: List of post texts to classify (1batch_size items).
max_retries: Number of retries on rate limit / transient errors.
Returns:
List of ToxicityScores, one per input text, in the same order.
"""
if not texts:
return []
# Handle single-item batches efficiently
batch_size = len(texts)
# Build the numbered user message
parts = []
for i, text in enumerate(texts, 1):
# Truncate very long posts
t = text.strip() if text else ""
if len(t) > 2000:
t = t[:2000]
if not t:
t = "(empty)"
parts.append(f"[{i}] {t}")
user_message = "\n\n".join(parts)
# Scale max_tokens by batch size.
# Each post's JSON scores ≈ 60 tokens compact, but the model often
# outputs formatted JSON (whitespace/newlines) which can double that.
# Use a generous budget to avoid truncation.
max_tokens = max(300, batch_size * 200)
last_err = None
for attempt in range(max_retries):
try:
response = await self.client.chat.completions.create(
model=self.model,
temperature=0,
max_tokens=max_tokens,
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_message},
],
)
content = response.choices[0].message.content or "{}"
results = parse_batch_response(content, batch_size)
# Distribute token usage evenly for cost tracking
if response.usage:
per_post_input = response.usage.prompt_tokens // batch_size
per_post_output = response.usage.completion_tokens // batch_size
for scores in results:
scores.input_tokens = per_post_input
scores.output_tokens = per_post_output
return results
except RateLimitError as e:
last_err = e
wait = min(2 ** attempt + random.uniform(0.5, 1.5), 30)
logger.debug(
"Rate limited (attempt %d/%d), waiting %.1fs",
attempt + 1, max_retries, wait,
)
await asyncio.sleep(wait)
except (APITimeoutError, APIConnectionError) as e:
last_err = e
wait = 2 ** attempt + random.uniform(0, 1)
logger.debug(
"Transient error (attempt %d/%d), retrying in %.1fs: %s",
attempt + 1, max_retries, wait, e,
)
await asyncio.sleep(wait)
except Exception:
logger.exception(
"Batch classification API call failed (%d posts)", batch_size
)
raise
# All retries exhausted
logger.error("Rate limit retries exhausted for batch of %d posts", batch_size)
raise last_err
async def classify(self, text: str, max_retries: int = 5) -> ToxicityScores:
"""Classify a single post (convenience wrapper around classify_batch)."""
results = await self.classify_batch([text], max_retries=max_retries)
return results[0]
async def close(self):
await self.client.close()

44
src/analyzer/config.py Normal file
View file

@ -0,0 +1,44 @@
"""Analyzer configuration loaded from environment variables."""
from __future__ import annotations
import os
from dataclasses import dataclass
@dataclass
class AnalyzerConfig:
database_url: str
openai_api_key: str
model: str = "gpt-4.1-nano"
concurrency: int = 3 # concurrent API calls (batches in flight)
batch_size: int = 10 # posts per API call
limit: int = 0 # 0 = no limit (process all unscored)
flag_threshold: float = 0.5
log_level: str = "INFO"
# Cost tracking (per 1M tokens)
input_cost_per_m: float = 0.10
output_cost_per_m: float = 0.40
@classmethod
def from_env(cls) -> AnalyzerConfig:
api_key = os.environ.get("OPENAI_API_KEY", "")
if not api_key:
raise ValueError(
"OPENAI_API_KEY environment variable is required. "
"Get one at https://platform.openai.com/api-keys"
)
return cls(
database_url=os.environ.get(
"DATABASE_URL",
"postgresql://bluesky:changeme@db:5432/bluesky",
),
openai_api_key=api_key,
model=os.environ.get("ANALYZER_MODEL", "gpt-4.1-nano"),
concurrency=int(os.environ.get("ANALYZER_CONCURRENCY", "3")),
batch_size=int(os.environ.get("ANALYZER_BATCH_SIZE", "10")),
limit=int(os.environ.get("ANALYZER_LIMIT", "0")),
flag_threshold=float(os.environ.get("ANALYZER_FLAG_THRESHOLD", "0.5")),
log_level=os.environ.get("LOG_LEVEL", "INFO"),
)

201
src/analyzer/db.py Normal file
View file

@ -0,0 +1,201 @@
"""Async database layer for the toxicity analyzer.
Handles fetching unscored posts/mentions and storing classification results.
"""
from __future__ import annotations
import logging
from datetime import datetime, timezone
from pathlib import Path
import asyncpg
logger = logging.getLogger("analyzer.db")
MIGRATION_FILE = Path(__file__).parent.parent.parent / "scripts" / "02-toxicity.sql"
class AnalyzerDB:
"""Async PostgreSQL operations for the analyzer."""
def __init__(self, dsn: str):
self._dsn = dsn
self._pool: asyncpg.Pool | None = None
async def connect(self) -> None:
self._pool = await asyncpg.create_pool(self._dsn, min_size=2, max_size=10)
logger.info("Database connected")
async def close(self) -> None:
if self._pool:
await self._pool.close()
async def apply_migration(self) -> None:
"""Apply the toxicity schema migration if tables don't exist."""
async with self._pool.acquire() as conn:
# Check if toxicity_scores table exists
exists = await conn.fetchval("""
SELECT EXISTS (
SELECT FROM information_schema.tables
WHERE table_name = 'toxicity_scores'
)
""")
if not exists:
logger.info("Applying toxicity schema migration...")
sql = MIGRATION_FILE.read_text()
await conn.execute(sql)
logger.info("Migration applied successfully")
else:
logger.debug("Toxicity tables already exist")
# ── Fetch unscored items ─────────────────────────────────────────────
async def get_unscored_posts(self, limit: int = 0) -> list[dict]:
"""Get posts that haven't been scored yet.
Skips reposts (no text) and posts with empty text.
"""
query = """
SELECT p.uri, p.text, p.post_type, p.author_did
FROM posts p
LEFT JOIN toxicity_scores ts ON ts.uri = p.uri
WHERE ts.uri IS NULL
AND p.post_type != 'repost'
AND p.text IS NOT NULL
AND p.text != ''
ORDER BY p.created_at DESC
"""
if limit > 0:
query += f" LIMIT {limit}"
async with self._pool.acquire() as conn:
rows = await conn.fetch(query)
return [dict(r) for r in rows]
async def get_unscored_mentions(self, limit: int = 0) -> list[dict]:
"""Get mentions that haven't been scored yet."""
query = """
SELECT m.id, m.post_text, m.mentioned_did, m.mentioning_did
FROM mentions m
LEFT JOIN mention_toxicity_scores mts ON mts.mention_id = m.id
WHERE mts.mention_id IS NULL
AND m.post_text IS NOT NULL
AND m.post_text != ''
ORDER BY m.post_created_at DESC
"""
if limit > 0:
query += f" LIMIT {limit}"
async with self._pool.acquire() as conn:
rows = await conn.fetch(query)
return [dict(r) for r in rows]
# ── Store scores ─────────────────────────────────────────────────────
async def store_post_score(
self,
uri: str,
scores: dict,
flagged: bool,
model: str,
) -> None:
"""Insert a toxicity score for a post."""
async with self._pool.acquire() as conn:
await conn.execute("""
INSERT INTO toxicity_scores
(uri, overall, toxic, threat, hate_speech, racism,
antisemitism, islamophobia,
sexism, homophobia, insult, dehumanization,
extremism, ableism, flagged, model)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16)
ON CONFLICT (uri) DO NOTHING
""",
uri,
scores["overall"],
scores["toxic"],
scores["threat"],
scores["hate_speech"],
scores["racism"],
scores["antisemitism"],
scores["islamophobia"],
scores["sexism"],
scores["homophobia"],
scores["insult"],
scores["dehumanization"],
scores["extremism"],
scores["ableism"],
flagged,
model,
)
async def store_mention_score(
self,
mention_id: int,
scores: dict,
flagged: bool,
model: str,
) -> None:
"""Insert a toxicity score for a mention."""
async with self._pool.acquire() as conn:
await conn.execute("""
INSERT INTO mention_toxicity_scores
(mention_id, overall, toxic, threat, hate_speech, racism,
antisemitism, islamophobia,
sexism, homophobia, insult, dehumanization,
extremism, ableism, flagged, model)
VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16)
ON CONFLICT (mention_id) DO NOTHING
""",
mention_id,
scores["overall"],
scores["toxic"],
scores["threat"],
scores["hate_speech"],
scores["racism"],
scores["antisemitism"],
scores["islamophobia"],
scores["sexism"],
scores["homophobia"],
scores["insult"],
scores["dehumanization"],
scores["extremism"],
scores["ableism"],
flagged,
model,
)
# ── Analysis run tracking ────────────────────────────────────────────
async def start_analysis_run(self, model: str) -> int:
"""Create a new analysis run record. Returns run ID."""
async with self._pool.acquire() as conn:
return await conn.fetchval("""
INSERT INTO analysis_runs (model) VALUES ($1)
RETURNING id
""", model)
async def finish_analysis_run(
self,
run_id: int,
status: str,
posts_scored: int,
mentions_scored: int,
errors: int,
cost_usd: float,
) -> None:
"""Finalize an analysis run with results."""
async with self._pool.acquire() as conn:
await conn.execute("""
UPDATE analysis_runs
SET finished_at = now(),
status = $2,
posts_scored = $3,
mentions_scored = $4,
errors = $5,
cost_usd = $6,
duration_secs = EXTRACT(EPOCH FROM (now() - started_at))
WHERE id = $1
""",
run_id, status, posts_scored, mentions_scored, errors, cost_usd,
)

301
src/bluesky_client.py Normal file
View file

@ -0,0 +1,301 @@
"""Bluesky AT Protocol API client with rate limiting and retry logic.
Uses httpx directly against the public API for feeds (no auth needed),
and an authenticated session via a PDS for searchPosts (which requires auth).
"""
from __future__ import annotations
import asyncio
import logging
import time
from datetime import datetime
import httpx
from tenacity import (
retry,
retry_if_exception_type,
stop_after_attempt,
wait_exponential,
)
from .models import Mention, Post
logger = logging.getLogger(__name__)
# Bluesky embed type prefixes for detecting media/embeds
_IMAGE_TYPES = {"app.bsky.embed.images#view", "app.bsky.embed.images"}
_VIDEO_TYPES = {"app.bsky.embed.video#view", "app.bsky.embed.video"}
_MEDIA_TYPES = _IMAGE_TYPES | _VIDEO_TYPES
class RateLimiter:
"""Tracks rate limit state from API response headers."""
def __init__(self):
self._remaining: int = 3000
self._reset_at: float = 0.0
self._lock = asyncio.Lock()
def update(self, headers: httpx.Headers) -> None:
remaining = headers.get("ratelimit-remaining")
reset = headers.get("ratelimit-reset")
if remaining is not None:
self._remaining = int(remaining)
if reset is not None:
self._reset_at = float(reset)
async def wait_if_needed(self) -> None:
async with self._lock:
if self._remaining <= 20:
sleep_for = max(0, self._reset_at - time.time()) + 1.0
logger.warning(
"Rate limit nearly exhausted (%d remaining). "
"Sleeping %.1f seconds until reset.",
self._remaining,
sleep_for,
)
await asyncio.sleep(sleep_for)
class BlueskyClient:
"""Async HTTP client for the Bluesky API.
Uses the public AppView for feeds and an authenticated PDS session
for searchPosts (which returns 403 on the public API).
"""
def __init__(self, base_url: str = "https://public.api.bsky.app"):
self._base = base_url.rstrip("/")
self._http = httpx.AsyncClient(timeout=30.0)
self._rate = RateLimiter()
# Authenticated session state (for searchPosts)
self._auth_token: str | None = None
self._auth_pds: str | None = None # e.g. https://bsky.social
async def login(self, handle: str, app_password: str) -> None:
"""Create an authenticated session for search requests."""
resp = await self._http.post(
"https://bsky.social/xrpc/com.atproto.server.createSession",
json={"identifier": handle, "password": app_password},
)
resp.raise_for_status()
data = resp.json()
self._auth_token = data["accessJwt"]
# Use the PDS from the DID doc if available, otherwise default
self._auth_pds = data.get("didDoc", {}).get("service", [{}])[0].get("serviceEndpoint", "https://bsky.social")
logger.info("Authenticated as %s (PDS: %s)", handle, self._auth_pds)
async def close(self) -> None:
await self._http.aclose()
# ── Low-level request helpers ───────────────────────────────────────
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=2, max=30),
retry=retry_if_exception_type(
(httpx.ConnectError, httpx.ReadTimeout, httpx.ConnectTimeout)
),
reraise=True,
)
async def _get(self, endpoint: str, params: dict) -> dict:
"""Make an unauthenticated GET request (public API)."""
await self._rate.wait_if_needed()
url = f"{self._base}/xrpc/{endpoint}"
resp = await self._http.get(url, params={k: v for k, v in params.items() if v is not None})
self._rate.update(resp.headers)
if resp.status_code == 429:
reset = resp.headers.get("ratelimit-reset")
sleep_for = max(0, float(reset) - time.time()) + 1.0 if reset else 30.0
logger.warning("HTTP 429 — sleeping %.1f seconds", sleep_for)
await asyncio.sleep(sleep_for)
raise httpx.ReadTimeout("Rate limited, retrying")
resp.raise_for_status()
return resp.json()
@retry(
stop=stop_after_attempt(4),
wait=wait_exponential(multiplier=1, min=2, max=30),
retry=retry_if_exception_type(
(httpx.ConnectError, httpx.ReadTimeout, httpx.ConnectTimeout)
),
reraise=True,
)
async def _get_auth(self, endpoint: str, params: dict) -> dict:
"""Make an authenticated GET request via the user's PDS."""
await self._rate.wait_if_needed()
base = self._auth_pds or self._base
url = f"{base}/xrpc/{endpoint}"
headers = {"Authorization": f"Bearer {self._auth_token}"} if self._auth_token else {}
resp = await self._http.get(
url,
params={k: v for k, v in params.items() if v is not None},
headers=headers,
)
self._rate.update(resp.headers)
if resp.status_code == 429:
reset = resp.headers.get("ratelimit-reset")
sleep_for = max(0, float(reset) - time.time()) + 1.0 if reset else 30.0
logger.warning("HTTP 429 (auth) — sleeping %.1f seconds", sleep_for)
await asyncio.sleep(sleep_for)
raise httpx.ReadTimeout("Rate limited, retrying")
resp.raise_for_status()
return resp.json()
# ── Handle resolution ───────────────────────────────────────────────
async def resolve_handle(self, handle: str) -> str | None:
"""Resolve a Bluesky handle to a DID. Returns None on failure."""
try:
data = await self._get(
"com.atproto.identity.resolveHandle", {"handle": handle}
)
did = data.get("did")
logger.debug("Resolved %s -> %s", handle, did)
return did
except Exception:
logger.exception("Failed to resolve handle: %s", handle)
return None
# ── Author feed ─────────────────────────────────────────────────────
async def get_author_feed_page(
self,
actor: str,
cursor: str | None = None,
limit: int = 100,
filter_type: str = "posts_with_replies",
) -> tuple[list[dict], str | None]:
"""Fetch one page of an author's feed.
Returns (list_of_raw_feed_items, next_cursor).
"""
data = await self._get(
"app.bsky.feed.getAuthorFeed",
{
"actor": actor,
"cursor": cursor,
"limit": limit,
"filter": filter_type,
},
)
return data.get("feed", []), data.get("cursor")
# ── Mention search ──────────────────────────────────────────────────
async def search_mentions_page(
self,
handle: str,
since: str | None = None,
cursor: str | None = None,
limit: int = 100,
) -> tuple[list[dict], str | None]:
"""Search for posts mentioning a handle.
Returns (list_of_raw_post_objects, next_cursor).
Uses authenticated PDS endpoint if available (public API 403s on search).
"""
getter = self._get_auth if self._auth_token else self._get
search_params = {
"q": "*",
"mentions": handle,
"since": since,
"sort": "latest",
"cursor": cursor,
"limit": limit,
}
try:
data = await getter("app.bsky.feed.searchPosts", search_params)
return data.get("posts", []), data.get("cursor")
except httpx.HTTPStatusError as e:
if e.response.status_code not in (400, 403):
raise
logger.warning(
"Mention search failed for %s (HTTP %d) — skipping",
handle, e.response.status_code,
)
return [], None
# ── Mapping helpers ─────────────────────────────────────────────────────
def _parse_dt(s: str | None) -> datetime | None:
"""Parse an ISO datetime string into a timezone-aware datetime."""
if not s:
return None
try:
# Handle the Z suffix and various ISO formats
s = s.replace("Z", "+00:00")
return datetime.fromisoformat(s)
except (ValueError, TypeError):
return None
def _detect_embed_type(embed: dict | None) -> tuple[bool, bool]:
"""Return (has_media, has_embed) from an embed object."""
if not embed:
return False, False
etype = embed.get("$type", "")
has_media = etype in _MEDIA_TYPES
has_embed = bool(embed) # any embed counts
return has_media, has_embed
def map_feed_item_to_post(item: dict) -> Post:
"""Map a raw getAuthorFeed item to a Post model."""
post_view = item.get("post", {})
record = post_view.get("record", {})
reply_ref = record.get("reply")
reason = item.get("reason")
# Determine post type
if reason and reason.get("$type") == "app.bsky.feed.defs#reasonRepost":
post_type = "repost"
elif reply_ref is not None:
post_type = "reply"
else:
post_type = "post"
has_media, has_embed = _detect_embed_type(post_view.get("embed"))
return Post(
uri=post_view.get("uri", ""),
cid=post_view.get("cid", ""),
author_did=post_view.get("author", {}).get("did", ""),
text=record.get("text"),
created_at=_parse_dt(record.get("createdAt")),
indexed_at=_parse_dt(post_view.get("indexedAt")),
reply_parent=reply_ref.get("parent", {}).get("uri") if reply_ref else None,
reply_root=reply_ref.get("root", {}).get("uri") if reply_ref else None,
post_type=post_type,
has_media=has_media,
has_embed=has_embed,
like_count=post_view.get("likeCount", 0) or 0,
reply_count=post_view.get("replyCount", 0) or 0,
repost_count=post_view.get("repostCount", 0) or 0,
quote_count=post_view.get("quoteCount", 0) or 0,
langs=record.get("langs"),
raw_json=item,
)
def map_search_post_to_mention(post_data: dict, mentioned_did: str) -> Mention:
"""Map a raw searchPosts result to a Mention model."""
record = post_data.get("record", {})
return Mention(
post_uri=post_data.get("uri", ""),
mentioned_did=mentioned_did,
mentioning_did=post_data.get("author", {}).get("did"),
post_text=record.get("text"),
post_created_at=_parse_dt(record.get("createdAt")),
raw_json=post_data,
)

293
src/collector.py Normal file
View file

@ -0,0 +1,293 @@
"""Main collector orchestrator.
Runs as a one-shot process: resolves accounts, collects feeds and mentions,
then exits. Designed to be triggered on a schedule by ofelia or cron.
Usage:
python -m src.collector
"""
from __future__ import annotations
import asyncio
import logging
import os
import sys
from datetime import datetime, timedelta, timezone
from .bluesky_client import (
BlueskyClient,
map_feed_item_to_post,
map_search_post_to_mention,
)
from .config import CollectorConfig, load_accounts
from .db import Database
from .models import Account
logger = logging.getLogger("collector")
# ── Feed collection ─────────────────────────────────────────────────────
async def collect_feed(
client: BlueskyClient,
db: Database,
account: Account,
max_pages: int,
) -> int:
"""Collect posts from an account's feed. Returns number of posts stored."""
state = await db.get_collection_state(account.did, "feed")
cutoff = state.last_post_at if state else None
all_posts = []
cursor = None
pages = 0
while pages < max_pages:
items, next_cursor = await client.get_author_feed_page(
actor=account.did, cursor=cursor, limit=100
)
if not items:
break
posts = [map_feed_item_to_post(item) for item in items]
# Check if we've reached posts older than our cutoff.
# We still upsert everything (to refresh engagement counts),
# but stop paginating once we pass the cutoff.
hit_old = False
if cutoff:
for p in posts:
if p.created_at and p.created_at <= cutoff:
hit_old = True
break
await db.upsert_posts(posts)
all_posts.extend(posts)
if hit_old or not next_cursor:
break
cursor = next_cursor
pages += 1
# Save the newest timestamp for next incremental run
dated = [p.created_at for p in all_posts if p.created_at]
if dated:
newest = max(dated)
await db.save_collection_state(account.did, "feed", last_post_at=newest)
await db.update_account_last_feed(account.did)
logger.info(
" Feed: %d posts collected (%d pages) for %s",
len(all_posts),
pages + 1,
account.handle,
)
return len(all_posts)
# ── Mention collection ──────────────────────────────────────────────────
async def collect_mentions(
client: BlueskyClient,
db: Database,
account: Account,
max_pages: int,
lookback_hours: int,
) -> int:
"""Search for posts mentioning this account. Returns number stored."""
state = await db.get_collection_state(account.did, "mentions")
if state and state.last_post_at:
since = state.last_post_at.isoformat()
else:
since = (datetime.now(timezone.utc) - timedelta(hours=lookback_hours)).isoformat()
all_mentions = []
cursor = None
pages = 0
while pages < max_pages:
posts, next_cursor = await client.search_mentions_page(
handle=account.handle, since=since, cursor=cursor, limit=100
)
if not posts:
break
mentions = [map_search_post_to_mention(p, account.did) for p in posts]
count = await db.upsert_mentions(mentions)
all_mentions.extend(mentions)
if not next_cursor:
break
cursor = next_cursor
pages += 1
# Save newest mention timestamp
dated = [m.post_created_at for m in all_mentions if m.post_created_at]
if dated:
newest = max(dated)
await db.save_collection_state(account.did, "mentions", last_post_at=newest)
await db.update_account_last_mention(account.did)
logger.info(
" Mentions: %d found (%d pages) for %s",
len(all_mentions),
pages + 1,
account.handle,
)
return len(all_mentions)
# ── Main orchestrator ───────────────────────────────────────────────────
async def run() -> None:
config = CollectorConfig.from_env()
logging.basicConfig(
level=getattr(logging, config.log_level.upper(), logging.INFO),
format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
handlers=[
logging.StreamHandler(sys.stdout),
],
)
# Also log to file if /app/logs exists
log_dir = "/app/logs"
if os.path.isdir(log_dir):
fh = logging.FileHandler(os.path.join(log_dir, "collector.log"))
fh.setFormatter(logging.Formatter("%(asctime)s [%(levelname)s] %(name)s: %(message)s"))
logging.getLogger().addHandler(fh)
logger.info("=" * 60)
logger.info("Bluesky Collector starting")
# Load handles from YAML
handles = load_accounts(config.accounts_file)
if not handles:
logger.error("No accounts found in %s — nothing to do.", config.accounts_file)
return
logger.info("Loaded %d handles from config", len(handles))
db = Database(config.database_url)
client = BlueskyClient(config.bsky_api_base)
try:
await db.connect()
# Authenticate if credentials are provided (needed for searchPosts)
if config.bsky_handle and config.bsky_app_password:
try:
await client.login(config.bsky_handle, config.bsky_app_password)
except Exception:
logger.exception(
"Authentication failed — mention search will be limited"
)
else:
logger.info(
"No BSKY_HANDLE/BSKY_APP_PASSWORD set — "
"mention search may be limited (403 on public API)"
)
# Phase 1: Resolve handles and sync to DB
logger.info("Phase 1: Resolving handles...")
accounts: list[Account] = []
for handle in handles:
did = await client.resolve_handle(handle)
if did:
acct = Account(did=did, handle=handle)
await db.upsert_account(acct)
accounts.append(acct)
else:
logger.warning("Skipping unresolvable handle: %s", handle)
if not accounts:
logger.error("No accounts could be resolved — aborting.")
return
# Mark accounts removed from config as inactive
await db.deactivate_removed_accounts({a.did for a in accounts})
# Start collection run
run_id = await db.start_run(accounts_total=len(accounts))
total_posts = 0
total_mentions = 0
accounts_done = 0
errors: list[dict] = []
# Phase 2 & 3: Collect feed + mentions for each account
logger.info("Collecting feeds and mentions for %d accounts...", len(accounts))
for acct in accounts:
# Feed
try:
n = await collect_feed(client, db, acct, config.max_pages_per_account)
total_posts += n
except Exception as e:
logger.exception("Feed collection failed for %s", acct.handle)
errors.append({"account": acct.handle, "phase": "feed", "error": str(e)})
# Mentions
try:
n = await collect_mentions(
client,
db,
acct,
config.max_pages_per_account,
config.mention_lookback_hours,
)
total_mentions += n
except Exception as e:
logger.exception("Mention collection failed for %s", acct.handle)
errors.append({"account": acct.handle, "phase": "mentions", "error": str(e)})
accounts_done += 1
# Update run record
await db.update_run_progress(
run_id,
accounts_done=accounts_done,
posts_collected=total_posts,
mentions_collected=total_mentions,
)
status = "completed" if not errors else "partial"
await db.finish_run(run_id, status=status, errors=errors)
# Summary
stats = await db.get_stats()
logger.info("=" * 60)
logger.info("Run complete — status: %s", status)
logger.info(
" This run: %d posts, %d mentions, %d errors",
total_posts,
total_mentions,
len(errors),
)
logger.info(
" Database totals: %d accounts, %d posts, %d mentions, %d runs",
stats["accounts"],
stats["posts"],
stats["mentions"],
stats["runs"],
)
except Exception:
logger.exception("Collector crashed")
raise
finally:
await client.close()
await db.close()
def main() -> None:
asyncio.run(run())
if __name__ == "__main__":
main()

46
src/config.py Normal file
View file

@ -0,0 +1,46 @@
"""Configuration loader: reads environment variables and accounts YAML."""
from __future__ import annotations
import os
from dataclasses import dataclass, field
from pathlib import Path
import yaml
@dataclass
class CollectorConfig:
database_url: str
bsky_api_base: str
accounts_file: str
log_level: str = "INFO"
max_pages_per_account: int = 50
mention_lookback_hours: int = 12
feed_page_limit: int = 100 # Bluesky API max per page
bsky_handle: str | None = None # for authenticated search
bsky_app_password: str | None = None # for authenticated search
@classmethod
def from_env(cls) -> CollectorConfig:
return cls(
database_url=os.environ["DATABASE_URL"],
bsky_api_base=os.getenv("BSKY_PUBLIC_API", "https://public.api.bsky.app"),
accounts_file=os.getenv("ACCOUNTS_FILE", "/app/config/accounts.yml"),
log_level=os.getenv("LOG_LEVEL", "INFO"),
max_pages_per_account=int(os.getenv("MAX_PAGES_PER_ACCOUNT", "50")),
mention_lookback_hours=int(os.getenv("MENTION_LOOKBACK_HOURS", "12")),
bsky_handle=os.getenv("BSKY_HANDLE"),
bsky_app_password=os.getenv("BSKY_APP_PASSWORD"),
)
def load_accounts(path: str) -> list[str]:
"""Load the list of Bluesky handles from a YAML file.
Returns a list of handle strings (e.g. ['alice.bsky.social', 'bob.bsky.social']).
"""
data = yaml.safe_load(Path(path).read_text())
if not data or "accounts" not in data:
return []
return [entry["handle"] for entry in data["accounts"] if "handle" in entry]

265
src/db.py Normal file
View file

@ -0,0 +1,265 @@
"""Async PostgreSQL database layer using asyncpg."""
from __future__ import annotations
import json
import logging
from datetime import datetime, timezone
from typing import Any
import asyncpg
from .models import Account, CollectionState, Mention, Post
logger = logging.getLogger(__name__)
class Database:
def __init__(self, dsn: str):
self._dsn = dsn
self._pool: asyncpg.Pool | None = None
async def connect(self) -> None:
self._pool = await asyncpg.create_pool(self._dsn, min_size=2, max_size=5)
logger.info("Database connection pool created")
async def close(self) -> None:
if self._pool:
await self._pool.close()
logger.info("Database connection pool closed")
# ── Account operations ──────────────────────────────────────────────
async def upsert_account(self, account: Account) -> None:
await self._pool.execute(
"""
INSERT INTO accounts (did, handle, display_name)
VALUES ($1, $2, $3)
ON CONFLICT (did) DO UPDATE SET
handle = EXCLUDED.handle,
display_name = EXCLUDED.display_name
""",
account.did,
account.handle,
account.display_name,
)
async def deactivate_removed_accounts(self, active_dids: set[str]) -> None:
"""Set active=false for accounts no longer in the config."""
if not active_dids:
return
await self._pool.execute(
"""
UPDATE accounts SET active = false
WHERE did != ALL($1::text[]) AND active = true
""",
list(active_dids),
)
async def update_account_last_feed(self, did: str) -> None:
await self._pool.execute(
"UPDATE accounts SET last_feed_collected = now() WHERE did = $1", did
)
async def update_account_last_mention(self, did: str) -> None:
await self._pool.execute(
"UPDATE accounts SET last_mention_collected = now() WHERE did = $1", did
)
# ── Post operations ─────────────────────────────────────────────────
async def upsert_posts(self, posts: list[Post]) -> int:
"""Batch upsert posts. Returns the number of rows affected."""
if not posts:
return 0
count = 0
async with self._pool.acquire() as conn:
async with conn.transaction():
for p in posts:
result = await conn.execute(
"""
INSERT INTO posts (
uri, cid, author_did, text, created_at, indexed_at,
reply_parent, reply_root, post_type,
has_media, has_embed,
like_count, reply_count, repost_count, quote_count,
langs, raw_json
) VALUES (
$1, $2, $3, $4, $5, $6,
$7, $8, $9,
$10, $11,
$12, $13, $14, $15,
$16, $17
)
ON CONFLICT (uri) DO UPDATE SET
cid = EXCLUDED.cid,
like_count = EXCLUDED.like_count,
reply_count = EXCLUDED.reply_count,
repost_count = EXCLUDED.repost_count,
quote_count = EXCLUDED.quote_count,
collected_at = now()
""",
p.uri,
p.cid,
p.author_did,
p.text,
p.created_at,
p.indexed_at,
p.reply_parent,
p.reply_root,
p.post_type,
p.has_media,
p.has_embed,
p.like_count,
p.reply_count,
p.repost_count,
p.quote_count,
p.langs,
json.dumps(p.raw_json),
)
# asyncpg returns e.g. "INSERT 0 1"
count += 1
return count
# ── Mention operations ──────────────────────────────────────────────
async def upsert_mentions(self, mentions: list[Mention]) -> int:
if not mentions:
return 0
count = 0
async with self._pool.acquire() as conn:
async with conn.transaction():
for m in mentions:
result = await conn.execute(
"""
INSERT INTO mentions (
post_uri, mentioned_did, mentioning_did,
post_text, post_created_at, raw_json
) VALUES ($1, $2, $3, $4, $5, $6)
ON CONFLICT (post_uri, mentioned_did) DO NOTHING
""",
m.post_uri,
m.mentioned_did,
m.mentioning_did,
m.post_text,
m.post_created_at,
json.dumps(m.raw_json),
)
if "INSERT 0 1" in result:
count += 1
return count
# ── Collection state ────────────────────────────────────────────────
async def get_collection_state(
self, account_did: str, collection_type: str
) -> CollectionState | None:
row = await self._pool.fetchrow(
"""
SELECT account_did, collection_type, last_post_at
FROM collection_state
WHERE account_did = $1 AND collection_type = $2
""",
account_did,
collection_type,
)
if not row:
return None
return CollectionState(
account_did=row["account_did"],
collection_type=row["collection_type"],
last_post_at=row["last_post_at"],
)
async def save_collection_state(
self, account_did: str, collection_type: str, last_post_at: datetime | None
) -> None:
await self._pool.execute(
"""
INSERT INTO collection_state (account_did, collection_type, last_post_at, updated_at)
VALUES ($1, $2, $3, now())
ON CONFLICT (account_did, collection_type) DO UPDATE SET
last_post_at = EXCLUDED.last_post_at,
updated_at = now()
""",
account_did,
collection_type,
last_post_at,
)
# ── Collection run tracking ─────────────────────────────────────────
async def start_run(self, accounts_total: int) -> int:
row = await self._pool.fetchrow(
"""
INSERT INTO collection_runs (accounts_total)
VALUES ($1)
RETURNING id
""",
accounts_total,
)
return row["id"]
async def update_run_progress(
self,
run_id: int,
*,
accounts_done: int | None = None,
posts_collected: int | None = None,
mentions_collected: int | None = None,
) -> None:
parts = []
args: list[Any] = []
idx = 1
if accounts_done is not None:
idx += 1
parts.append(f"accounts_done = ${idx}")
args.append(accounts_done)
if posts_collected is not None:
idx += 1
parts.append(f"posts_collected = ${idx}")
args.append(posts_collected)
if mentions_collected is not None:
idx += 1
parts.append(f"mentions_collected = ${idx}")
args.append(mentions_collected)
if not parts:
return
sql = f"UPDATE collection_runs SET {', '.join(parts)} WHERE id = $1"
await self._pool.execute(sql, run_id, *args)
async def finish_run(
self, run_id: int, status: str, errors: list[dict] | None = None
) -> None:
await self._pool.execute(
"""
UPDATE collection_runs SET
finished_at = now(),
status = $2,
errors = $3,
duration_secs = EXTRACT(EPOCH FROM (now() - started_at))
WHERE id = $1
""",
run_id,
status,
json.dumps(errors or []),
)
# ── Stats (useful for verification) ─────────────────────────────────
async def get_stats(self) -> dict[str, int]:
row = await self._pool.fetchrow(
"""
SELECT
(SELECT count(*) FROM accounts WHERE active) AS accounts,
(SELECT count(*) FROM posts) AS posts,
(SELECT count(*) FROM mentions) AS mentions,
(SELECT count(*) FROM collection_runs) AS runs
"""
)
return dict(row)

56
src/models.py Normal file
View file

@ -0,0 +1,56 @@
"""Data models mirroring the PostgreSQL schema."""
from __future__ import annotations
from dataclasses import dataclass, field
from datetime import datetime
from typing import Any
@dataclass
class Account:
did: str
handle: str
display_name: str | None = None
added_at: datetime | None = None
last_feed_collected: datetime | None = None
last_mention_collected: datetime | None = None
active: bool = True
@dataclass
class Post:
uri: str
cid: str
author_did: str
text: str | None
created_at: datetime | None
indexed_at: datetime | None
reply_parent: str | None
reply_root: str | None
post_type: str # "post", "reply", "repost"
has_media: bool
has_embed: bool
like_count: int
reply_count: int
repost_count: int
quote_count: int
langs: list[str] | None
raw_json: dict[str, Any] = field(default_factory=dict)
@dataclass
class Mention:
post_uri: str
mentioned_did: str
mentioning_did: str | None
post_text: str | None
post_created_at: datetime | None
raw_json: dict[str, Any] = field(default_factory=dict)
@dataclass
class CollectionState:
account_did: str
collection_type: str # "feed" or "mentions"
last_post_at: datetime | None = None

0
src/web/__init__.py Normal file
View file

67
src/web/app.py Normal file
View file

@ -0,0 +1,67 @@
"""Flask application factory for the Bluesky Collector web UI."""
from __future__ import annotations
import os
from flask import Flask
from . import db as webdb
from .helpers import (
bsky_post_url,
encode_uri,
format_dt,
format_number,
time_ago,
truncate,
)
def create_app() -> Flask:
app = Flask(
__name__,
template_folder="templates",
)
app.secret_key = os.environ.get("SECRET_KEY", "bluesky-collector-dev-key")
app.config["DATABASE_URL"] = os.environ.get(
"DATABASE_URL",
"postgresql://bluesky:changeme@db:5432/bluesky",
)
# Initialize database pool
webdb.init_pool(app.config["DATABASE_URL"])
# Register Jinja2 globals/filters
app.jinja_env.filters["format_dt"] = format_dt
app.jinja_env.filters["time_ago"] = time_ago
app.jinja_env.filters["truncate_text"] = truncate
app.jinja_env.filters["format_number"] = format_number
app.jinja_env.globals["encode_uri"] = encode_uri
app.jinja_env.globals["bsky_post_url"] = bsky_post_url
# Register blueprints
from .routes.dashboard import bp as dashboard_bp
from .routes.accounts import bp as accounts_bp
from .routes.statuses import bp as statuses_bp
from .routes.mentions import bp as mentions_bp
from .routes.export import bp as export_bp
from .routes.analysis import bp as analysis_bp
app.register_blueprint(dashboard_bp)
app.register_blueprint(accounts_bp)
app.register_blueprint(statuses_bp)
app.register_blueprint(mentions_bp)
app.register_blueprint(export_bp)
app.register_blueprint(analysis_bp)
# Teardown
@app.teardown_appcontext
def close_db(exc):
pass # Pool is long-lived, closed at shutdown
import atexit
atexit.register(webdb.close_pool)
return app

669
src/web/db.py Normal file
View file

@ -0,0 +1,669 @@
"""Synchronous PostgreSQL query layer for the Flask web UI.
Uses psycopg2 with a simple connection pool. All functions return
dicts or lists of dicts for easy template rendering.
"""
from __future__ import annotations
import os
from contextlib import contextmanager
import psycopg2
import psycopg2.extras
import psycopg2.pool
_pool: psycopg2.pool.ThreadedConnectionPool | None = None
def init_pool(dsn: str | None = None, minconn: int = 1, maxconn: int = 5) -> None:
"""Initialize the connection pool. Called once at app startup."""
global _pool
dsn = dsn or os.environ["DATABASE_URL"]
_pool = psycopg2.pool.ThreadedConnectionPool(minconn, maxconn, dsn)
def close_pool() -> None:
global _pool
if _pool:
_pool.closeall()
_pool = None
@contextmanager
def get_cursor():
"""Yield a dict cursor from the pool, auto-returning the connection."""
conn = _pool.getconn()
try:
with conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor) as cur:
yield cur
conn.commit()
except Exception:
conn.rollback()
raise
finally:
_pool.putconn(conn)
# ── Dashboard ────────────────────────────────────────────────────────────
def get_dashboard_stats() -> dict:
with get_cursor() as cur:
cur.execute("""
SELECT
(SELECT count(*) FROM accounts WHERE active) AS accounts,
(SELECT count(*) FROM posts) AS posts,
(SELECT count(*) FROM mentions) AS mentions,
(SELECT count(*) FROM collection_runs) AS runs
""")
return dict(cur.fetchone())
def get_recent_runs(limit: int = 10) -> list[dict]:
with get_cursor() as cur:
cur.execute("""
SELECT id, started_at, finished_at, status,
accounts_total, accounts_done,
posts_collected, mentions_collected,
errors, duration_secs
FROM collection_runs
ORDER BY started_at DESC
LIMIT %s
""", (limit,))
return [dict(r) for r in cur.fetchall()]
# ── Accounts ─────────────────────────────────────────────────────────────
ACCOUNT_SORT_COLS = {
"handle": "a.handle",
"posts": "post_count",
"mentions": "mention_count",
"last_feed": "a.last_feed_collected",
"last_mention": "a.last_mention_collected",
}
def get_accounts(
search: str | None = None,
sort: str = "handle",
direction: str = "asc",
limit: int = 50,
offset: int = 0,
) -> tuple[list[dict], int]:
"""Return (accounts_list, total_count)."""
sort_col = ACCOUNT_SORT_COLS.get(sort, "a.handle")
dir_sql = "DESC" if direction == "desc" else "ASC"
where = "WHERE a.active = true"
params: list = []
if search:
where += " AND a.handle ILIKE %s"
params.append(f"%{search}%")
with get_cursor() as cur:
# Total count
cur.execute(f"SELECT count(*) AS cnt FROM accounts a {where}", params)
total = cur.fetchone()["cnt"]
# Paginated results with counts
cur.execute(f"""
SELECT a.did, a.handle, a.display_name, a.active,
a.last_feed_collected, a.last_mention_collected, a.added_at,
(SELECT count(*) FROM posts p WHERE p.author_did = a.did) AS post_count,
(SELECT count(*) FROM mentions m WHERE m.mentioned_did = a.did) AS mention_count
FROM accounts a
{where}
ORDER BY {sort_col} {dir_sql} NULLS LAST
LIMIT %s OFFSET %s
""", params + [limit, offset])
rows = [dict(r) for r in cur.fetchall()]
return rows, total
def get_account_by_did(did: str) -> dict | None:
with get_cursor() as cur:
cur.execute("""
SELECT a.did, a.handle, a.display_name, a.active,
a.last_feed_collected, a.last_mention_collected, a.added_at,
(SELECT count(*) FROM posts p WHERE p.author_did = a.did) AS post_count,
(SELECT count(*) FROM mentions m WHERE m.mentioned_did = a.did) AS mention_count
FROM accounts a
WHERE a.did = %s
""", (did,))
row = cur.fetchone()
return dict(row) if row else None
# ── Posts / Statuses ─────────────────────────────────────────────────────
POST_SORT_COLS = {
"created": "p.created_at",
"likes": "p.like_count",
"replies": "p.reply_count",
"reposts": "p.repost_count",
}
def get_posts(
account_did: str | None = None,
post_type: str | None = None,
search: str | None = None,
sort: str = "created",
direction: str = "desc",
limit: int = 50,
offset: int = 0,
) -> tuple[list[dict], int]:
"""Return (posts_list, total_count)."""
sort_col = POST_SORT_COLS.get(sort, "p.created_at")
dir_sql = "DESC" if direction == "desc" else "ASC"
conditions = []
params: list = []
if account_did:
conditions.append("p.author_did = %s")
params.append(account_did)
if post_type:
conditions.append("p.post_type = %s")
params.append(post_type)
if search:
conditions.append("p.text ILIKE %s")
params.append(f"%{search}%")
where = ("WHERE " + " AND ".join(conditions)) if conditions else ""
with get_cursor() as cur:
cur.execute(f"SELECT count(*) AS cnt FROM posts p {where}", params)
total = cur.fetchone()["cnt"]
cur.execute(f"""
SELECT p.uri, p.cid, p.author_did, p.text, p.post_type,
p.created_at, p.indexed_at, p.collected_at,
p.reply_parent, p.reply_root,
p.has_media, p.has_embed,
p.like_count, p.reply_count, p.repost_count, p.quote_count,
p.langs,
a.handle AS author_handle
FROM posts p
LEFT JOIN accounts a ON a.did = p.author_did
{where}
ORDER BY {sort_col} {dir_sql} NULLS LAST
LIMIT %s OFFSET %s
""", params + [limit, offset])
rows = [dict(r) for r in cur.fetchall()]
return rows, total
def get_post_by_uri(uri: str) -> dict | None:
with get_cursor() as cur:
cur.execute("""
SELECT p.uri, p.cid, p.author_did, p.text, p.post_type,
p.created_at, p.indexed_at, p.collected_at,
p.reply_parent, p.reply_root,
p.has_media, p.has_embed,
p.like_count, p.reply_count, p.repost_count, p.quote_count,
p.langs, p.raw_json,
a.handle AS author_handle
FROM posts p
LEFT JOIN accounts a ON a.did = p.author_did
WHERE p.uri = %s
""", (uri,))
row = cur.fetchone()
return dict(row) if row else None
def get_replies_to(uri: str, limit: int = 50) -> list[dict]:
"""Get posts that are replies to the given URI."""
with get_cursor() as cur:
cur.execute("""
SELECT p.uri, p.author_did, p.text, p.post_type,
p.created_at, p.like_count, p.reply_count, p.repost_count,
a.handle AS author_handle
FROM posts p
LEFT JOIN accounts a ON a.did = p.author_did
WHERE p.reply_parent = %s
ORDER BY p.created_at ASC
LIMIT %s
""", (uri, limit))
return [dict(r) for r in cur.fetchall()]
# ── Mentions ─────────────────────────────────────────────────────────────
def get_mentions(
mentioned_did: str | None = None,
search: str | None = None,
limit: int = 50,
offset: int = 0,
) -> tuple[list[dict], int]:
conditions = []
params: list = []
if mentioned_did:
conditions.append("m.mentioned_did = %s")
params.append(mentioned_did)
if search:
conditions.append("m.post_text ILIKE %s")
params.append(f"%{search}%")
where = ("WHERE " + " AND ".join(conditions)) if conditions else ""
with get_cursor() as cur:
cur.execute(f"SELECT count(*) AS cnt FROM mentions m {where}", params)
total = cur.fetchone()["cnt"]
cur.execute(f"""
SELECT m.id, m.post_uri, m.mentioned_did, m.mentioning_did,
m.post_text, m.post_created_at, m.collected_at,
a.handle AS mentioned_handle
FROM mentions m
LEFT JOIN accounts a ON a.did = m.mentioned_did
{where}
ORDER BY m.post_created_at DESC NULLS LAST
LIMIT %s OFFSET %s
""", params + [limit, offset])
rows = [dict(r) for r in cur.fetchall()]
return rows, total
# ── Export helpers ────────────────────────────────────────────────────────
def iter_posts_csv(
account_did: str | None = None,
since: str | None = None,
until: str | None = None,
):
"""Generator yielding post rows as dicts for CSV export."""
conditions = []
params: list = []
if account_did:
conditions.append("p.author_did = %s")
params.append(account_did)
if since:
conditions.append("p.created_at >= %s")
params.append(since)
if until:
conditions.append("p.created_at <= %s")
params.append(until)
where = ("WHERE " + " AND ".join(conditions)) if conditions else ""
with get_cursor() as cur:
cur.execute(f"""
SELECT p.uri, p.author_did, a.handle AS author_handle,
p.text, p.post_type, p.created_at,
p.like_count, p.reply_count, p.repost_count, p.quote_count,
p.has_media, p.has_embed, p.reply_parent, p.reply_root
FROM posts p
LEFT JOIN accounts a ON a.did = p.author_did
{where}
ORDER BY p.created_at DESC
""", params)
for row in cur:
yield dict(row)
def iter_mentions_csv(
mentioned_did: str | None = None,
since: str | None = None,
until: str | None = None,
):
"""Generator yielding mention rows as dicts for CSV export."""
conditions = []
params: list = []
if mentioned_did:
conditions.append("m.mentioned_did = %s")
params.append(mentioned_did)
if since:
conditions.append("m.post_created_at >= %s")
params.append(since)
if until:
conditions.append("m.post_created_at <= %s")
params.append(until)
where = ("WHERE " + " AND ".join(conditions)) if conditions else ""
with get_cursor() as cur:
cur.execute(f"""
SELECT m.post_uri, m.mentioned_did, a.handle AS mentioned_handle,
m.mentioning_did, m.post_text, m.post_created_at
FROM mentions m
LEFT JOIN accounts a ON a.did = m.mentioned_did
{where}
ORDER BY m.post_created_at DESC
""", params)
for row in cur:
yield dict(row)
def get_accounts_for_select() -> list[dict]:
"""Get a simple list of active accounts for dropdown selectors."""
with get_cursor() as cur:
cur.execute("""
SELECT did, handle FROM accounts
WHERE active = true
ORDER BY handle
""")
return [dict(r) for r in cur.fetchall()]
# ── Analysis queries ─────────────────────────────────────────────────────
TOXICITY_CATEGORIES = [
"toxic", "threat", "hate_speech", "racism",
"antisemitism", "islamophobia", "sexism", "homophobia",
"insult", "dehumanization", "extremism", "ableism",
]
def _check_toxicity_tables() -> bool:
"""Check if the toxicity tables exist (migration applied)."""
with get_cursor() as cur:
cur.execute("""
SELECT EXISTS (
SELECT FROM information_schema.tables
WHERE table_name = 'toxicity_scores'
)
""")
return cur.fetchone()["exists"]
def get_analysis_stats() -> dict:
"""Get overview stats for the analysis dashboard."""
if not _check_toxicity_tables():
return {
"total_scored_posts": 0, "total_scored_mentions": 0,
"flagged_posts": 0, "flagged_mentions": 0,
"avg_toxicity_posts": 0, "avg_toxicity_mentions": 0,
"total_posts": 0, "total_mentions": 0,
}
with get_cursor() as cur:
cur.execute("""
SELECT
(SELECT count(*) FROM toxicity_scores) AS total_scored_posts,
(SELECT count(*) FROM mention_toxicity_scores) AS total_scored_mentions,
(SELECT count(*) FROM toxicity_scores WHERE flagged) AS flagged_posts,
(SELECT count(*) FROM mention_toxicity_scores WHERE flagged) AS flagged_mentions,
(SELECT coalesce(avg(overall), 0) FROM toxicity_scores) AS avg_toxicity_posts,
(SELECT coalesce(avg(overall), 0) FROM mention_toxicity_scores) AS avg_toxicity_mentions,
(SELECT count(*) FROM posts WHERE post_type != 'repost' AND text IS NOT NULL AND text != '') AS total_posts,
(SELECT count(*) FROM mentions WHERE post_text IS NOT NULL AND post_text != '') AS total_mentions
""")
return dict(cur.fetchone())
def get_toxicity_trend(weeks: int = 12) -> list[dict]:
"""Get weekly average toxicity scores for trend chart.
Returns rows with: week, avg_post_toxicity, avg_mention_toxicity,
flagged_post_count, flagged_mention_count.
"""
if not _check_toxicity_tables():
return []
with get_cursor() as cur:
cur.execute("""
WITH weeks AS (
SELECT generate_series(
date_trunc('week', now() - interval '%s weeks'),
date_trunc('week', now()),
'1 week'::interval
) AS week_start
),
post_stats AS (
SELECT date_trunc('week', p.created_at) AS week_start,
avg(ts.overall) AS avg_tox,
count(*) FILTER (WHERE ts.flagged) AS flagged_count,
count(*) AS total
FROM toxicity_scores ts
JOIN posts p ON p.uri = ts.uri
WHERE p.created_at >= now() - interval '%s weeks'
GROUP BY 1
),
mention_stats AS (
SELECT date_trunc('week', m.post_created_at) AS week_start,
avg(mts.overall) AS avg_tox,
count(*) FILTER (WHERE mts.flagged) AS flagged_count,
count(*) AS total
FROM mention_toxicity_scores mts
JOIN mentions m ON m.id = mts.mention_id
WHERE m.post_created_at >= now() - interval '%s weeks'
GROUP BY 1
)
SELECT w.week_start AS week,
coalesce(ps.avg_tox, 0) AS avg_post_toxicity,
coalesce(ms.avg_tox, 0) AS avg_mention_toxicity,
coalesce(ps.flagged_count, 0) AS flagged_posts,
coalesce(ms.flagged_count, 0) AS flagged_mentions,
coalesce(ps.total, 0) AS post_count,
coalesce(ms.total, 0) AS mention_count
FROM weeks w
LEFT JOIN post_stats ps ON ps.week_start = w.week_start
LEFT JOIN mention_stats ms ON ms.week_start = w.week_start
ORDER BY w.week_start
""", (weeks, weeks, weeks))
return [dict(r) for r in cur.fetchall()]
def get_category_averages() -> dict:
"""Get average score for each toxicity category across all scored items."""
if not _check_toxicity_tables():
return {cat: 0.0 for cat in TOXICITY_CATEGORIES}
cols = ", ".join(f"coalesce(avg({cat}), 0) AS {cat}" for cat in TOXICITY_CATEGORIES)
with get_cursor() as cur:
cur.execute(f"""
SELECT {cols}
FROM (
SELECT {', '.join(TOXICITY_CATEGORIES)} FROM toxicity_scores
UNION ALL
SELECT {', '.join(TOXICITY_CATEGORIES)} FROM mention_toxicity_scores
) combined
""")
return dict(cur.fetchone())
def get_recent_analysis_runs(limit: int = 5) -> list[dict]:
"""Get the latest analysis runs."""
if not _check_toxicity_tables():
return []
with get_cursor() as cur:
cur.execute("""
SELECT id, started_at, finished_at, status, model,
posts_scored, mentions_scored, errors,
cost_usd, duration_secs
FROM analysis_runs
ORDER BY started_at DESC
LIMIT %s
""", (limit,))
return [dict(r) for r in cur.fetchall()]
def get_flagged_content(
content_type: str | None = None,
category: str | None = None,
account_did: str | None = None,
threshold: float = 0.5,
limit: int = 50,
offset: int = 0,
) -> tuple[list[dict], int]:
"""Get flagged posts and mentions combined.
Returns (items, total_count). Each item has:
item_type ('post', 'reply', or 'mention'), text, author info, scores.
"""
if not _check_toxicity_tables():
return [], 0
# Build the UNION query
cat_filter = ""
if category and category in TOXICITY_CATEGORIES:
cat_filter = f"AND {category} >= {threshold}"
post_conditions = f"WHERE ts.overall >= %s {cat_filter}"
mention_conditions = f"WHERE mts.overall >= %s {cat_filter}"
params_posts: list = [threshold]
params_mentions: list = [threshold]
if account_did:
post_conditions += " AND p.author_did = %s"
params_posts.append(account_did)
mention_conditions += " AND m.mentioned_did = %s"
params_mentions.append(account_did)
type_filter_post = ""
type_filter_mention = ""
if content_type == "mention":
type_filter_post = "AND false" # exclude posts
elif content_type in ("post", "reply"):
type_filter_mention = "AND false" # exclude mentions
if content_type == "post":
post_conditions += " AND p.post_type = 'post'"
elif content_type == "reply":
post_conditions += " AND p.post_type = 'reply'"
with get_cursor() as cur:
# Count
cur.execute(f"""
SELECT count(*) AS cnt FROM (
SELECT 1 FROM toxicity_scores ts
JOIN posts p ON p.uri = ts.uri
{post_conditions} {type_filter_post}
UNION ALL
SELECT 1 FROM mention_toxicity_scores mts
JOIN mentions m ON m.id = mts.mention_id
{mention_conditions} {type_filter_mention}
) sub
""", params_posts + params_mentions)
total = cur.fetchone()["cnt"]
# Paginated results
cur.execute(f"""
SELECT * FROM (
SELECT
'post' AS source_type,
p.post_type AS item_type,
p.uri AS item_id,
p.text,
p.author_did,
a.handle AS author_handle,
NULL::text AS mentioned_did,
NULL::text AS mentioned_handle,
p.created_at,
ts.overall, ts.toxic, ts.threat, ts.hate_speech,
ts.racism, ts.antisemitism, ts.islamophobia,
ts.sexism, ts.homophobia, ts.insult, ts.dehumanization,
ts.extremism, ts.ableism
FROM toxicity_scores ts
JOIN posts p ON p.uri = ts.uri
LEFT JOIN accounts a ON a.did = p.author_did
{post_conditions} {type_filter_post}
UNION ALL
SELECT
'mention' AS source_type,
'mention' AS item_type,
m.post_uri AS item_id,
m.post_text AS text,
m.mentioning_did AS author_did,
NULL AS author_handle,
m.mentioned_did,
ma.handle AS mentioned_handle,
m.post_created_at AS created_at,
mts.overall, mts.toxic, mts.threat, mts.hate_speech,
mts.racism, mts.antisemitism, mts.islamophobia,
mts.sexism, mts.homophobia, mts.insult, mts.dehumanization,
mts.extremism, mts.ableism
FROM mention_toxicity_scores mts
JOIN mentions m ON m.id = mts.mention_id
LEFT JOIN accounts ma ON ma.did = m.mentioned_did
{mention_conditions} {type_filter_mention}
) combined
ORDER BY overall DESC, created_at DESC
LIMIT %s OFFSET %s
""", params_posts + params_mentions + [limit, offset])
rows = [dict(r) for r in cur.fetchall()]
# Determine top category for each row
for row in rows:
top_cat = max(TOXICITY_CATEGORIES, key=lambda c: row.get(c, 0))
row["top_category"] = top_cat
row["top_score"] = row.get(top_cat, 0)
return rows, total
def get_account_toxicity_summary(
sort: str = "mention_tox",
direction: str = "desc",
limit: int = 50,
offset: int = 0,
) -> tuple[list[dict], int]:
"""Get per-account toxicity summary.
Returns (accounts, total).
"""
if not _check_toxicity_tables():
return [], 0
sort_cols = {
"handle": "a.handle",
"post_tox": "avg_post_tox",
"mention_tox": "avg_mention_tox",
"flagged_posts": "flagged_posts",
"flagged_mentions": "flagged_mentions",
}
sort_col = sort_cols.get(sort, "avg_mention_tox")
dir_sql = "DESC" if direction == "desc" else "ASC"
with get_cursor() as cur:
cur.execute("SELECT count(*) AS cnt FROM accounts WHERE active")
total = cur.fetchone()["cnt"]
cur.execute(f"""
SELECT
a.did, a.handle, a.display_name,
coalesce(post_agg.avg_tox, 0) AS avg_post_tox,
coalesce(post_agg.flagged, 0) AS flagged_posts,
coalesce(post_agg.total, 0) AS scored_posts,
coalesce(mention_agg.avg_tox, 0) AS avg_mention_tox,
coalesce(mention_agg.flagged, 0) AS flagged_mentions,
coalesce(mention_agg.total, 0) AS scored_mentions
FROM accounts a
LEFT JOIN (
SELECT p.author_did,
avg(ts.overall) AS avg_tox,
count(*) FILTER (WHERE ts.flagged) AS flagged,
count(*) AS total
FROM toxicity_scores ts
JOIN posts p ON p.uri = ts.uri
GROUP BY p.author_did
) post_agg ON post_agg.author_did = a.did
LEFT JOIN (
SELECT m.mentioned_did,
avg(mts.overall) AS avg_tox,
count(*) FILTER (WHERE mts.flagged) AS flagged,
count(*) AS total
FROM mention_toxicity_scores mts
JOIN mentions m ON m.id = mts.mention_id
GROUP BY m.mentioned_did
) mention_agg ON mention_agg.mentioned_did = a.did
WHERE a.active = true
ORDER BY {sort_col} {dir_sql} NULLS LAST
LIMIT %s OFFSET %s
""", (limit, offset))
rows = [dict(r) for r in cur.fetchall()]
return rows, total

109
src/web/helpers.py Normal file
View file

@ -0,0 +1,109 @@
"""Utility functions for the web UI: URI encoding, date formatting, link building."""
from __future__ import annotations
import base64
import re
from datetime import datetime, timezone
# ── URI encoding for route parameters ────────────────────────────────────
# AT URIs look like: at://did:plc:xxx/app.bsky.feed.post/rkey
# They contain / and : which break URL routing, so we base64url-encode them.
def encode_uri(uri: str) -> str:
"""Base64url-encode an AT URI for use in URL paths."""
return base64.urlsafe_b64encode(uri.encode()).decode().rstrip("=")
def decode_uri(encoded: str) -> str:
"""Decode a base64url-encoded AT URI."""
# Add back padding
padding = 4 - len(encoded) % 4
if padding != 4:
encoded += "=" * padding
return base64.urlsafe_b64decode(encoded.encode()).decode()
# ── Bluesky link construction ────────────────────────────────────────────
_AT_URI_RE = re.compile(r"at://([^/]+)/app\.bsky\.feed\.post/(.+)")
def bsky_post_url(uri: str, handle: str | None = None) -> str | None:
"""Build a bsky.app URL from an AT URI.
If handle is provided, uses it for a nicer URL. Otherwise uses the DID.
Returns None if the URI doesn't match expected format.
"""
m = _AT_URI_RE.match(uri)
if not m:
return None
did_or_handle = handle or m.group(1)
rkey = m.group(2)
return f"https://bsky.app/profile/{did_or_handle}/post/{rkey}"
def extract_rkey(uri: str) -> str | None:
"""Extract the record key from an AT URI."""
m = _AT_URI_RE.match(uri)
return m.group(2) if m else None
# ── Date/time formatting ─────────────────────────────────────────────────
def format_dt(dt: datetime | None, fmt: str = "%Y-%m-%d %H:%M") -> str:
"""Format a datetime for display, returns '' if None."""
if dt is None:
return "\u2014"
return dt.strftime(fmt)
def time_ago(dt: datetime | None) -> str:
"""Return a human-readable 'time ago' string."""
if dt is None:
return "\u2014"
now = datetime.now(timezone.utc)
if dt.tzinfo is None:
dt = dt.replace(tzinfo=timezone.utc)
delta = now - dt
seconds = int(delta.total_seconds())
if seconds < 60:
return "just now"
elif seconds < 3600:
m = seconds // 60
return f"{m}m ago"
elif seconds < 86400:
h = seconds // 3600
return f"{h}h ago"
elif seconds < 604800:
d = seconds // 86400
return f"{d}d ago"
else:
return dt.strftime("%b %d, %Y")
# ── Text helpers ─────────────────────────────────────────────────────────
def truncate(text: str | None, length: int = 200) -> str:
"""Truncate text to a max length, adding ellipsis if needed."""
if not text:
return ""
if len(text) <= length:
return text
return text[:length].rsplit(" ", 1)[0] + "\u2026"
def format_number(n: int | None) -> str:
"""Format a number with K/M suffixes for display."""
if n is None:
return "0"
if n >= 1_000_000:
return f"{n / 1_000_000:.1f}M"
if n >= 1_000:
return f"{n / 1_000:.1f}K"
return str(n)

View file

View file

@ -0,0 +1,53 @@
"""Flask blueprint for the accounts listing page."""
from __future__ import annotations
from flask import Blueprint, render_template, request
from ..db import get_accounts
bp = Blueprint("accounts", __name__, url_prefix="/accounts")
@bp.route("/")
def index():
"""List tracked accounts with search, sorting, and pagination."""
# Query parameters
search = request.args.get("search", "").strip() or None
sort = request.args.get("sort", "handle")
direction = request.args.get("dir", "asc")
page = max(1, request.args.get("page", 1, type=int))
per_page = 50
# Validate sort column
allowed_sorts = {"handle", "posts", "mentions", "last_feed", "last_mention"}
if sort not in allowed_sorts:
sort = "handle"
# Validate direction
if direction not in ("asc", "desc"):
direction = "asc"
accounts, total = get_accounts(
search=search,
sort=sort,
direction=direction,
limit=per_page,
offset=(page - 1) * per_page,
)
total_pages = max(1, (total + per_page - 1) // per_page)
# Clamp page to valid range
if page > total_pages:
page = total_pages
return render_template(
"accounts.html",
accounts=accounts,
total=total,
page=page,
total_pages=total_pages,
search=search or "",
sort=sort,
direction=direction,
)

134
src/web/routes/analysis.py Normal file
View file

@ -0,0 +1,134 @@
"""Analysis dashboard routes: toxicity overview, flagged content, account breakdown."""
from __future__ import annotations
import json
from flask import Blueprint, render_template, request
from ..db import (
TOXICITY_CATEGORIES,
get_account_toxicity_summary,
get_accounts_for_select,
get_analysis_stats,
get_category_averages,
get_flagged_content,
get_recent_analysis_runs,
get_toxicity_trend,
)
bp = Blueprint("analysis", __name__, url_prefix="/analysis")
@bp.route("/")
def index():
stats = get_analysis_stats()
trend = get_toxicity_trend(weeks=12)
categories = get_category_averages()
runs = get_recent_analysis_runs(limit=5)
# Prepare chart data as JSON for Chart.js
trend_json = json.dumps([
{
"week": r["week"].strftime("%Y-%m-%d") if r["week"] else "",
"avg_post_toxicity": round(float(r["avg_post_toxicity"]), 4),
"avg_mention_toxicity": round(float(r["avg_mention_toxicity"]), 4),
"flagged_posts": int(r["flagged_posts"]),
"flagged_mentions": int(r["flagged_mentions"]),
}
for r in trend
])
categories_json = json.dumps({
k: round(float(v), 4) for k, v in categories.items()
})
return render_template(
"analysis.html",
stats=stats,
trend_json=trend_json,
categories_json=categories_json,
categories=TOXICITY_CATEGORIES,
runs=runs,
)
@bp.route("/flagged")
def flagged():
content_type = request.args.get("type") or None
category = request.args.get("category") or None
account_did = request.args.get("account") or None
threshold = request.args.get("threshold", 0.5, type=float)
page = max(1, request.args.get("page", 1, type=int))
per_page = 50
items, total = get_flagged_content(
content_type=content_type,
category=category,
account_did=account_did,
threshold=threshold,
limit=per_page,
offset=(page - 1) * per_page,
)
total_pages = max(1, (total + per_page - 1) // per_page)
accounts = get_accounts_for_select()
return render_template(
"flagged.html",
items=items,
total=total,
page=page,
total_pages=total_pages,
accounts=accounts,
categories=TOXICITY_CATEGORIES,
content_type=content_type or "",
category=category or "",
account_did=account_did or "",
threshold=threshold,
)
@bp.route("/accounts")
def accounts():
sort = request.args.get("sort", "mention_tox")
direction = request.args.get("dir", "desc")
page = max(1, request.args.get("page", 1, type=int))
per_page = 50
# Validate
valid_sorts = {"handle", "post_tox", "mention_tox", "flagged_posts", "flagged_mentions"}
if sort not in valid_sorts:
sort = "mention_tox"
if direction not in ("asc", "desc"):
direction = "desc"
rows, total = get_account_toxicity_summary(
sort=sort, direction=direction,
limit=per_page, offset=(page - 1) * per_page,
)
total_pages = max(1, (total + per_page - 1) // per_page)
# Top 20 most-targeted for bar chart
top_targeted, _ = get_account_toxicity_summary(
sort="mention_tox", direction="desc", limit=20, offset=0,
)
top_targeted_json = json.dumps([
{
"handle": r["handle"],
"avg_mention_tox": round(float(r["avg_mention_tox"]), 4),
"flagged_mentions": int(r["flagged_mentions"]),
}
for r in top_targeted
if float(r["avg_mention_tox"]) > 0
])
return render_template(
"account_toxicity.html",
accounts=rows,
total=total,
page=page,
total_pages=total_pages,
sort=sort,
direction=direction,
top_targeted_json=top_targeted_json,
)

View file

@ -0,0 +1,12 @@
from flask import Blueprint, render_template
from ..db import get_dashboard_stats, get_recent_runs
bp = Blueprint("dashboard", __name__)
@bp.route("/")
def index():
stats = get_dashboard_stats()
runs = get_recent_runs(limit=10)
return render_template("dashboard.html", stats=stats, runs=runs)

91
src/web/routes/export.py Normal file
View file

@ -0,0 +1,91 @@
"""Export routes: CSV download for posts and mentions."""
from __future__ import annotations
import csv
import io
from datetime import datetime, timezone
from flask import Blueprint, Response, render_template, request, stream_with_context
from ..db import get_accounts_for_select, iter_mentions_csv, iter_posts_csv
bp = Blueprint("export", __name__, url_prefix="/export")
@bp.route("/")
def index():
accounts = get_accounts_for_select()
return render_template("export.html", accounts=accounts)
@bp.route("/posts.csv")
def posts_csv():
account_did = request.args.get("account") or None
since = request.args.get("since") or None
until = request.args.get("until") or None
def generate():
output = io.StringIO()
writer = csv.writer(output)
# Header row
header = [
"uri", "author_did", "author_handle", "text", "post_type",
"created_at", "like_count", "reply_count", "repost_count",
"quote_count", "has_media", "has_embed", "reply_parent", "reply_root",
]
writer.writerow(header)
yield output.getvalue()
output.seek(0)
output.truncate(0)
for row in iter_posts_csv(account_did=account_did, since=since, until=until):
writer.writerow([row.get(col, "") for col in header])
yield output.getvalue()
output.seek(0)
output.truncate(0)
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
filename = f"bluesky_posts_{timestamp}.csv"
return Response(
stream_with_context(generate()),
mimetype="text/csv",
headers={"Content-Disposition": f"attachment; filename={filename}"},
)
@bp.route("/mentions.csv")
def mentions_csv():
mentioned_did = request.args.get("account") or None
since = request.args.get("since") or None
until = request.args.get("until") or None
def generate():
output = io.StringIO()
writer = csv.writer(output)
header = [
"post_uri", "mentioned_did", "mentioned_handle",
"mentioning_did", "post_text", "post_created_at",
]
writer.writerow(header)
yield output.getvalue()
output.seek(0)
output.truncate(0)
for row in iter_mentions_csv(mentioned_did=mentioned_did, since=since, until=until):
writer.writerow([row.get(col, "") for col in header])
yield output.getvalue()
output.seek(0)
output.truncate(0)
timestamp = datetime.now(timezone.utc).strftime("%Y%m%d_%H%M%S")
filename = f"bluesky_mentions_{timestamp}.csv"
return Response(
stream_with_context(generate()),
mimetype="text/csv",
headers={"Content-Disposition": f"attachment; filename={filename}"},
)

View file

@ -0,0 +1,26 @@
from flask import Blueprint, render_template, request
from ..db import get_mentions, get_accounts_for_select
bp = Blueprint("mentions", __name__, url_prefix="/mentions")
@bp.route("/")
def index():
mentioned_did = request.args.get("account") or None
search = request.args.get("search", "").strip() or None
page = max(1, request.args.get("page", 1, type=int))
per_page = 50
mentions, total = get_mentions(
mentioned_did=mentioned_did, search=search,
limit=per_page, offset=(page - 1) * per_page,
)
total_pages = max(1, (total + per_page - 1) // per_page)
accounts = get_accounts_for_select()
return render_template(
"mentions.html",
mentions=mentions, total=total,
page=page, total_pages=total_pages,
accounts=accounts,
mentioned_did=mentioned_did or "", search=search or "",
)

View file

@ -0,0 +1,45 @@
from flask import Blueprint, render_template, request, abort
from ..db import get_posts, get_post_by_uri, get_replies_to, get_accounts_for_select
from ..helpers import decode_uri
bp = Blueprint("statuses", __name__, url_prefix="/statuses")
@bp.route("/")
def index():
account_did = request.args.get("account") or None
post_type = request.args.get("type") or None
search = request.args.get("search", "").strip() or None
sort = request.args.get("sort", "created")
direction = request.args.get("dir", "desc")
page = max(1, request.args.get("page", 1, type=int))
per_page = 50
posts, total = get_posts(
account_did=account_did, post_type=post_type, search=search,
sort=sort, direction=direction,
limit=per_page, offset=(page - 1) * per_page,
)
total_pages = max(1, (total + per_page - 1) // per_page)
accounts = get_accounts_for_select()
return render_template(
"statuses.html",
posts=posts, total=total,
page=page, total_pages=total_pages,
accounts=accounts,
account_did=account_did or "", post_type=post_type or "",
search=search or "", sort=sort, direction=direction,
)
@bp.route("/<encoded_uri>")
def detail(encoded_uri):
uri = decode_uri(encoded_uri)
post = get_post_by_uri(uri)
if not post:
abort(404)
replies = get_replies_to(uri)
# Get parent post if this is a reply
parent = None
if post.get("reply_parent"):
parent = get_post_by_uri(post["reply_parent"])
return render_template("status_detail.html", post=post, replies=replies, parent=parent)

View file

@ -0,0 +1,621 @@
{% extends "base.html" %}
{% block title %}Account Toxicity Analysis{% endblock %}
{% macro sort_header(col, label) %}
{% set new_dir = 'desc' if (sort == col and direction == 'asc') else 'asc' %}
<a href="{{ url_for('analysis.accounts', sort=col, dir=new_dir, page=1) }}" class="sort-link{% if sort == col %} active{% endif %}">
{{ label }}
{% if sort == col %}
<span class="sort-arrow">{% if direction == 'asc' %}&#9650;{% else %}&#9660;{% endif %}</span>
{% endif %}
</a>
{% endmacro %}
{% block content %}
<div class="account-toxicity-container">
<!-- Page Header -->
<div class="page-header">
<div>
<h1>Account Toxicity Analysis</h1>
<p class="page-subtitle">Toxicity metrics across monitored accounts</p>
</div>
</div>
<!-- Chart Section -->
<div class="chart-section">
<div class="chart-card">
<h2 class="chart-title">Most Targeted Accounts</h2>
<p class="chart-subtitle">Average mention toxicity for top 20 most-targeted accounts</p>
<div class="chart-container">
<canvas id="toxicity-chart"></canvas>
</div>
</div>
</div>
<!-- Table Section -->
<div class="table-section">
<h2 class="section-title">Account Details</h2>
{% if accounts %}
<div class="table-wrapper">
<table class="accounts-table">
<thead>
<tr>
<th>{{ sort_header('handle', 'Account') }}</th>
<th>{{ sort_header('post_tox', 'Avg Post Toxicity') }}</th>
<th>{{ sort_header('flagged_posts', 'Flagged Posts') }}</th>
<th>{{ sort_header('mention_tox', 'Avg Mention Toxicity') }}</th>
<th>{{ sort_header('flagged_mentions', 'Flagged Mentions') }}</th>
</tr>
</thead>
<tbody>
{% for account in accounts %}
<tr class="account-row">
<!-- Account Name -->
<td class="col-account">
<div class="account-info">
<a href="https://bsky.app/profile/{{ account.handle }}" target="_blank" rel="noopener" class="account-handle">
@{{ account.handle }}
</a>
{% if account.display_name %}
<div class="account-display-name">{{ account.display_name }}</div>
{% endif %}
</div>
</td>
<!-- Avg Post Toxicity -->
<td class="col-score">
<div class="score-bar-container">
{% set post_pct = (account.avg_post_tox * 100) | int %}
{% if account.avg_post_tox < 0.3 %}
{% set bar_class = 'score-bar-low' %}
{% elif account.avg_post_tox < 0.6 %}
{% set bar_class = 'score-bar-medium' %}
{% else %}
{% set bar_class = 'score-bar-high' %}
{% endif %}
<div class="score-bar {{ bar_class }}" style="width: {{ post_pct }}%"></div>
<span class="score-number">{{ "%.2f" | format(account.avg_post_tox) }}</span>
</div>
</td>
<!-- Flagged Posts Count -->
<td class="col-count">
<span class="count-badge">
{{ account.flagged_posts | format_number }}
<span class="count-total">/ {{ account.scored_posts | format_number }}</span>
</span>
</td>
<!-- Avg Mention Toxicity -->
<td class="col-score">
<div class="score-bar-container">
{% set mention_pct = (account.avg_mention_tox * 100) | int %}
{% if account.avg_mention_tox < 0.3 %}
{% set bar_class = 'score-bar-low' %}
{% elif account.avg_mention_tox < 0.6 %}
{% set bar_class = 'score-bar-medium' %}
{% else %}
{% set bar_class = 'score-bar-high' %}
{% endif %}
<div class="score-bar {{ bar_class }}" style="width: {{ mention_pct }}%"></div>
<span class="score-number">{{ "%.2f" | format(account.avg_mention_tox) }}</span>
</div>
</td>
<!-- Flagged Mentions Count -->
<td class="col-count">
<span class="count-badge">
{{ account.flagged_mentions | format_number }}
<span class="count-total">/ {{ account.scored_mentions | format_number }}</span>
</span>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
<!-- Pagination -->
{% if total_pages > 1 %}
<div class="pagination">
{% if page > 1 %}
<a href="{{ url_for('analysis.accounts', page=1, sort=sort, dir=direction) }}" class="btn-pagination">First</a>
<a href="{{ url_for('analysis.accounts', page=page-1, sort=sort, dir=direction) }}" class="btn-pagination">Previous</a>
{% endif %}
<span class="pagination-info">Page {{ page }} of {{ total_pages }}</span>
{% if page < total_pages %}
<a href="{{ url_for('analysis.accounts', page=page+1, sort=sort, dir=direction) }}" class="btn-pagination">Next</a>
<a href="{{ url_for('analysis.accounts', page=total_pages, sort=sort, dir=direction) }}" class="btn-pagination">Last</a>
{% endif %}
</div>
{% endif %}
{% else %}
<!-- Empty State -->
<div class="empty-state">
<p class="empty-icon"></p>
<p class="empty-text">No accounts found</p>
<p class="empty-subtext">Start monitoring accounts to see toxicity analysis</p>
</div>
{% endif %}
</div>
</div>
{% endblock %}
{% block extra_css %}
<style>
:root {
--dark-bg: #1a1a2e;
--dark-card: #16213e;
--dark-nav: #0f3460;
--dark-text: #e0e0e0;
--accent-primary: #00b4d8;
--tox-low: #2ecc71;
--tox-medium: #f39c12;
--tox-high: #e74c3c;
}
.account-toxicity-container {
padding: 2rem;
max-width: 1400px;
margin: 0 auto;
}
/* Page Header */
.page-header {
margin-bottom: 2.5rem;
}
.page-header h1 {
font-size: 2rem;
font-weight: 700;
color: var(--dark-text);
margin: 0 0 0.5rem 0;
}
.page-subtitle {
font-size: 1rem;
color: rgba(255, 255, 255, 0.6);
margin: 0;
}
/* Chart Section */
.chart-section {
margin-bottom: 3rem;
}
.chart-card {
background: var(--dark-card);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 0.5rem;
padding: 2rem;
}
.chart-title {
font-size: 1.3rem;
font-weight: 600;
color: var(--dark-text);
margin: 0 0 0.5rem 0;
}
.chart-subtitle {
font-size: 0.9rem;
color: rgba(255, 255, 255, 0.5);
margin: 0 0 1.5rem 0;
}
.chart-container {
position: relative;
height: 400px;
width: 100%;
}
/* Table Section */
.table-section {
margin-top: 2rem;
}
.section-title {
font-size: 1.3rem;
font-weight: 600;
color: var(--dark-text);
margin: 0 0 1.5rem 0;
}
/* Table Wrapper */
.table-wrapper {
background: var(--dark-card);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 0.5rem;
overflow-x: auto;
margin-bottom: 2rem;
}
/* Table Styles */
.accounts-table {
width: 100%;
border-collapse: collapse;
font-size: 0.9rem;
}
.accounts-table thead {
background: rgba(0, 0, 0, 0.3);
border-bottom: 2px solid rgba(255, 255, 255, 0.1);
}
.accounts-table th {
padding: 1rem;
text-align: left;
font-weight: 600;
color: var(--dark-text);
white-space: nowrap;
}
.accounts-table td {
padding: 1rem;
border-bottom: 1px solid rgba(255, 255, 255, 0.05);
color: var(--dark-text);
}
.accounts-table tbody tr:hover {
background: rgba(0, 180, 216, 0.05);
}
/* Column Styles */
.col-account {
min-width: 250px;
}
.col-score {
width: 180px;
}
.col-count {
width: 140px;
text-align: center;
}
/* Account Info */
.account-info {
display: flex;
flex-direction: column;
gap: 0.25rem;
}
.account-handle {
color: var(--accent-primary);
text-decoration: none;
font-weight: 600;
transition: color 0.2s ease;
}
.account-handle:hover {
color: rgba(0, 180, 216, 0.8);
text-decoration: underline;
}
.account-display-name {
font-size: 0.85rem;
color: rgba(255, 255, 255, 0.6);
}
/* Sort Header */
.sort-link {
color: var(--dark-text);
text-decoration: none;
display: inline-flex;
align-items: center;
gap: 0.5rem;
transition: color 0.2s ease;
cursor: pointer;
}
.sort-link:hover {
color: var(--accent-primary);
}
.sort-link.active {
color: var(--accent-primary);
font-weight: 700;
}
.sort-arrow {
font-size: 0.75rem;
display: inline-block;
}
/* Score Bar */
.score-bar-container {
position: relative;
height: 30px;
background: rgba(255, 255, 255, 0.05);
border-radius: 0.25rem;
overflow: hidden;
display: flex;
align-items: center;
padding: 0 0.5rem;
}
.score-bar {
position: absolute;
height: 100%;
left: 0;
top: 0;
transition: width 0.3s ease;
}
.score-bar-low {
background: linear-gradient(90deg, rgba(46, 204, 113, 0.3), rgba(46, 204, 113, 0.5));
}
.score-bar-medium {
background: linear-gradient(90deg, rgba(243, 156, 18, 0.3), rgba(243, 156, 18, 0.5));
}
.score-bar-high {
background: linear-gradient(90deg, rgba(231, 76, 60, 0.3), rgba(231, 76, 60, 0.5));
}
.score-number {
position: relative;
z-index: 1;
font-weight: 600;
color: var(--dark-text);
font-size: 0.85rem;
}
/* Count Badge */
.count-badge {
display: inline-flex;
align-items: center;
gap: 0.25rem;
background: rgba(0, 180, 216, 0.1);
color: var(--accent-primary);
padding: 0.35rem 0.75rem;
border-radius: 0.25rem;
font-weight: 600;
font-size: 0.85rem;
}
.count-total {
color: rgba(255, 255, 255, 0.5);
font-weight: 400;
}
/* Empty State */
.empty-state {
text-align: center;
padding: 4rem 2rem;
background: var(--dark-card);
border: 2px dashed rgba(255, 255, 255, 0.2);
border-radius: 0.5rem;
}
.empty-icon {
font-size: 3rem;
color: rgba(255, 255, 255, 0.2);
margin: 0 0 1rem 0;
}
.empty-text {
font-size: 1.2rem;
font-weight: 600;
color: var(--dark-text);
margin: 0 0 0.5rem 0;
}
.empty-subtext {
color: rgba(255, 255, 255, 0.5);
margin: 0;
}
/* Pagination */
.pagination {
display: flex;
justify-content: center;
align-items: center;
gap: 1rem;
margin-top: 2rem;
}
.pagination-info {
color: var(--dark-text);
font-weight: 500;
min-width: 150px;
text-align: center;
}
.btn-pagination {
background: var(--dark-card);
color: var(--accent-primary);
border: 1px solid var(--accent-primary);
padding: 0.5rem 1rem;
border-radius: 0.375rem;
text-decoration: none;
font-weight: 600;
font-size: 0.9rem;
transition: all 0.2s ease;
cursor: pointer;
}
.btn-pagination:hover {
background: var(--accent-primary);
color: var(--dark-bg);
}
.btn-pagination:active {
transform: scale(0.98);
}
/* Chart.js Custom Styling */
canvas {
max-height: 400px;
}
/* Responsive */
@media (max-width: 1024px) {
.col-account {
min-width: 200px;
}
.col-score {
width: 150px;
}
.col-count {
width: 120px;
}
}
@media (max-width: 768px) {
.account-toxicity-container {
padding: 1rem;
}
.page-header h1 {
font-size: 1.5rem;
}
.chart-container {
height: 300px;
}
.chart-card {
padding: 1.5rem;
}
.accounts-table {
font-size: 0.8rem;
}
.accounts-table th,
.accounts-table td {
padding: 0.75rem 0.5rem;
}
.col-account {
min-width: 160px;
}
.col-score,
.col-count {
width: auto;
min-width: 100px;
}
.section-title {
font-size: 1.1rem;
}
}
@media (max-width: 480px) {
.page-header h1 {
font-size: 1.2rem;
}
.accounts-table {
font-size: 0.75rem;
}
.accounts-table th,
.accounts-table td {
padding: 0.5rem;
}
.pagination {
flex-wrap: wrap;
gap: 0.5rem;
}
.pagination-info {
width: 100%;
order: 3;
}
}
</style>
<script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.7/chart.umd.min.js"></script>
<script>
document.addEventListener('DOMContentLoaded', function() {
const chartData = {{ top_targeted_json | safe }};
if (chartData && chartData.length > 0) {
const labels = chartData.map(item => '@' + item.handle);
const scores = chartData.map(item => item.avg_mention_tox);
const flagged = chartData.map(item => item.flagged_mentions);
const ctx = document.getElementById('toxicity-chart');
if (ctx) {
new Chart(ctx, {
type: 'bar',
data: {
labels: labels,
datasets: [{
label: 'Avg Mention Toxicity',
data: scores,
backgroundColor: scores.map(score => {
if (score < 0.3) return 'rgba(46, 204, 113, 0.7)';
if (score < 0.6) return 'rgba(243, 156, 18, 0.7)';
return 'rgba(231, 76, 60, 0.7)';
}),
borderColor: scores.map(score => {
if (score < 0.3) return 'rgba(46, 204, 113, 1)';
if (score < 0.6) return 'rgba(243, 156, 18, 1)';
return 'rgba(231, 76, 60, 1)';
}),
borderWidth: 1,
borderRadius: 4
}]
},
options: {
indexAxis: 'y',
responsive: true,
maintainAspectRatio: false,
plugins: {
legend: {
display: false
},
tooltip: {
callbacks: {
afterLabel: function(context) {
return 'Flagged: ' + flagged[context.dataIndex];
}
}
}
},
scales: {
x: {
beginAtZero: true,
max: 1,
ticks: {
color: 'rgba(224, 224, 224, 0.7)',
callback: function(value) {
return (value * 100).toFixed(0) + '%';
}
},
grid: {
color: 'rgba(255, 255, 255, 0.1)',
drawBorder: false
}
},
y: {
ticks: {
color: 'rgba(224, 224, 224, 0.9)'
},
grid: {
display: false,
drawBorder: false
}
}
}
}
});
}
}
});
</script>
{% endblock %}

View file

@ -0,0 +1,349 @@
{% extends "base.html" %}
{% block title %}Accounts{% endblock %}
{% block content %}
<div class="page-header">
<h1>Tracked Accounts</h1>
<span class="badge">{{ total | format_number }} total</span>
</div>
{# ── Search bar ──────────────────────────────────────────────────────── #}
<form method="get" action="{{ url_for('accounts.index') }}" class="search-form">
{# Preserve current sort parameters #}
<input type="hidden" name="sort" value="{{ sort }}">
<input type="hidden" name="dir" value="{{ direction }}">
<div class="search-row">
<input
type="text"
name="search"
value="{{ search }}"
placeholder="Search by handle..."
class="search-input"
aria-label="Search accounts"
>
<button type="submit" class="btn btn-primary">Search</button>
{% if search %}
<a href="{{ url_for('accounts.index', sort=sort, dir=direction) }}" class="btn btn-secondary">Clear</a>
{% endif %}
</div>
</form>
{# ── Sortable column header macro ────────────────────────────────────── #}
{% macro sort_header(col, label) %}
{% set new_dir = 'desc' if (sort == col and direction == 'asc') else 'asc' %}
<a href="{{ url_for('accounts.index', search=search, sort=col, dir=new_dir, page=1) }}" class="sort-link{% if sort == col %} active{% endif %}">
{{ label }}
{% if sort == col %}
<span class="sort-arrow">{% if direction == 'asc' %}&#9650;{% else %}&#9660;{% endif %}</span>
{% endif %}
</a>
{% endmacro %}
{# ── Accounts table ──────────────────────────────────────────────────── #}
{% if accounts %}
<div class="table-wrap">
<table class="data-table">
<thead>
<tr>
<th>{{ sort_header('handle', 'Handle') }}</th>
<th class="num">{{ sort_header('posts', 'Posts') }}</th>
<th class="num">{{ sort_header('mentions', 'Mentions') }}</th>
<th>{{ sort_header('last_feed', 'Last Feed') }}</th>
<th>{{ sort_header('last_mention', 'Last Mention') }}</th>
</tr>
</thead>
<tbody>
{% for acct in accounts %}
<tr>
<td class="handle-cell">
<a href="https://bsky.app/profile/{{ acct.handle }}" target="_blank" rel="noopener" class="handle-link">
@{{ acct.handle }}
</a>
{% if acct.display_name %}
<span class="display-name">{{ acct.display_name | truncate_text }}</span>
{% endif %}
</td>
<td class="num">
<a href="{{ url_for('statuses.index', account=acct.did) }}" class="count-link">
{{ acct.post_count | format_number }}
</a>
</td>
<td class="num">
<a href="{{ url_for('mentions.index', account=acct.did) }}" class="count-link">
{{ acct.mention_count | format_number }}
</a>
</td>
<td title="{{ acct.last_feed_collected | format_dt }}">
{{ acct.last_feed_collected | time_ago }}
</td>
<td title="{{ acct.last_mention_collected | format_dt }}">
{{ acct.last_mention_collected | time_ago }}
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<div class="empty-state">
{% if search %}
<p>No accounts found matching "{{ search }}".</p>
{% else %}
<p>No tracked accounts yet.</p>
{% endif %}
</div>
{% endif %}
{# ── Pagination ──────────────────────────────────────────────────────── #}
{% if total_pages > 1 %}
<nav class="pagination" aria-label="Page navigation">
{# Previous button #}
{% if page > 1 %}
<a href="{{ url_for('accounts.index', search=search, sort=sort, dir=direction, page=page - 1) }}" class="page-link">&laquo; Previous</a>
{% else %}
<span class="page-link disabled">&laquo; Previous</span>
{% endif %}
{# Page numbers #}
{% set start_page = [1, page - 2] | max %}
{% set end_page = [total_pages, page + 2] | min %}
{% if start_page > 1 %}
<a href="{{ url_for('accounts.index', search=search, sort=sort, dir=direction, page=1) }}" class="page-link">1</a>
{% if start_page > 2 %}
<span class="page-ellipsis">&hellip;</span>
{% endif %}
{% endif %}
{% for p in range(start_page, end_page + 1) %}
{% if p == page %}
<span class="page-link current">{{ p }}</span>
{% else %}
<a href="{{ url_for('accounts.index', search=search, sort=sort, dir=direction, page=p) }}" class="page-link">{{ p }}</a>
{% endif %}
{% endfor %}
{% if end_page < total_pages %}
{% if end_page < total_pages - 1 %}
<span class="page-ellipsis">&hellip;</span>
{% endif %}
<a href="{{ url_for('accounts.index', search=search, sort=sort, dir=direction, page=total_pages) }}" class="page-link">{{ total_pages }}</a>
{% endif %}
{# Next button #}
{% if page < total_pages %}
<a href="{{ url_for('accounts.index', search=search, sort=sort, dir=direction, page=page + 1) }}" class="page-link">Next &raquo;</a>
{% else %}
<span class="page-link disabled">Next &raquo;</span>
{% endif %}
</nav>
{% endif %}
{% endblock %}
{% block extra_css %}
<style>
.page-header {
display: flex;
align-items: center;
gap: 1rem;
margin-bottom: 1.5rem;
}
.page-header h1 {
margin: 0;
font-size: 1.5rem;
}
.badge {
background: #0f3460;
color: #00b4d8;
padding: 0.25rem 0.75rem;
border-radius: 1rem;
font-size: 0.85rem;
}
/* Search */
.search-form {
margin-bottom: 1.5rem;
}
.search-row {
display: flex;
gap: 0.5rem;
align-items: center;
}
.search-input {
flex: 1;
max-width: 400px;
padding: 0.5rem 0.75rem;
border: 1px solid #2a2a4a;
border-radius: 0.375rem;
background: #1a1a2e;
color: #e0e0e0;
font-size: 0.95rem;
}
.search-input::placeholder {
color: #666;
}
.search-input:focus {
outline: none;
border-color: #00b4d8;
box-shadow: 0 0 0 2px rgba(0, 180, 216, 0.2);
}
.btn {
padding: 0.5rem 1rem;
border: none;
border-radius: 0.375rem;
cursor: pointer;
font-size: 0.9rem;
text-decoration: none;
display: inline-block;
}
.btn-primary {
background: #00b4d8;
color: #1a1a2e;
font-weight: 600;
}
.btn-primary:hover {
background: #0096b7;
}
.btn-secondary {
background: #2a2a4a;
color: #e0e0e0;
}
.btn-secondary:hover {
background: #3a3a5a;
}
/* Table */
.table-wrap {
overflow-x: auto;
border-radius: 0.5rem;
background: #16213e;
border: 1px solid #2a2a4a;
}
.data-table {
width: 100%;
border-collapse: collapse;
font-size: 0.9rem;
}
.data-table thead {
background: #0f3460;
}
.data-table th {
padding: 0.75rem 1rem;
text-align: left;
font-weight: 600;
white-space: nowrap;
color: #e0e0e0;
}
.data-table th.num {
text-align: right;
}
.data-table td {
padding: 0.6rem 1rem;
border-top: 1px solid #2a2a4a;
color: #e0e0e0;
}
.data-table td.num {
text-align: right;
}
.data-table tbody tr:hover {
background: rgba(0, 180, 216, 0.05);
}
/* Sort links */
.sort-link {
color: #e0e0e0;
text-decoration: none;
white-space: nowrap;
}
.sort-link:hover {
color: #00b4d8;
}
.sort-link.active {
color: #00b4d8;
}
.sort-arrow {
font-size: 0.7rem;
margin-left: 0.25rem;
}
/* Handle cell */
.handle-cell {
display: flex;
flex-direction: column;
gap: 0.15rem;
}
.handle-link {
color: #00b4d8;
text-decoration: none;
font-weight: 500;
}
.handle-link:hover {
text-decoration: underline;
}
.display-name {
font-size: 0.8rem;
color: #888;
}
/* Count links */
.count-link {
color: #e0e0e0;
text-decoration: none;
}
.count-link:hover {
color: #00b4d8;
text-decoration: underline;
}
/* Empty state */
.empty-state {
text-align: center;
padding: 3rem 1rem;
color: #888;
background: #16213e;
border-radius: 0.5rem;
border: 1px solid #2a2a4a;
}
/* Pagination */
.pagination {
display: flex;
justify-content: center;
align-items: center;
gap: 0.25rem;
margin-top: 1.5rem;
flex-wrap: wrap;
}
.page-link {
padding: 0.4rem 0.75rem;
border-radius: 0.375rem;
background: #16213e;
color: #e0e0e0;
text-decoration: none;
font-size: 0.9rem;
border: 1px solid #2a2a4a;
transition: background 0.15s, color 0.15s;
}
.page-link:hover:not(.disabled):not(.current) {
background: #0f3460;
color: #00b4d8;
border-color: #00b4d8;
}
.page-link.current {
background: #00b4d8;
color: #1a1a2e;
font-weight: 600;
border-color: #00b4d8;
}
.page-link.disabled {
color: #555;
cursor: default;
opacity: 0.5;
}
.page-ellipsis {
padding: 0.4rem 0.25rem;
color: #888;
}
</style>
{% endblock %}

View file

@ -0,0 +1,704 @@
{% extends "base.html" %}
{% block title %}Toxicity Analysis Dashboard{% endblock %}
{% block extra_css %}
<style>
/* Color scheme */
:root {
--bg-primary: #1a1a2e;
--bg-secondary: #16213e;
--nav-bg: #0f3460;
--text-primary: #e0e0e0;
--text-secondary: #b0b0b0;
--accent: #00b4d8;
--danger: #e74c3c;
--warning: #f39c12;
--success: #27ae60;
--category-1: #00b4d8;
--category-2: #e67e22;
--category-3: #9b59b6;
--category-4: #1abc9c;
--category-5: #e74c3c;
--category-6: #f39c12;
--category-7: #3498db;
--category-8: #2ecc71;
}
/* Layout */
.page-header {
margin-bottom: 2rem;
}
.page-header h1 {
font-size: 2.5rem;
font-weight: 700;
color: var(--text-primary);
margin-bottom: 0.5rem;
}
.page-header .subtitle {
font-size: 1rem;
color: var(--text-secondary);
}
/* Grid layout */
.stats-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));
gap: 1.5rem;
margin-bottom: 2rem;
}
/* Stat cards */
.stat-card {
background-color: var(--bg-secondary);
border-radius: 8px;
padding: 1.5rem;
border-left: 4px solid var(--accent);
display: flex;
flex-direction: column;
justify-content: space-between;
}
.stat-card.danger {
border-left-color: var(--danger);
}
.stat-card.warning {
border-left-color: var(--warning);
}
.stat-card-label {
font-size: 0.875rem;
color: var(--text-secondary);
text-transform: uppercase;
letter-spacing: 0.5px;
margin-bottom: 0.75rem;
}
.stat-card-value {
font-size: 1.75rem;
font-weight: 700;
color: var(--text-primary);
margin-bottom: 0.5rem;
}
.stat-card-detail {
font-size: 0.875rem;
color: var(--text-secondary);
}
/* Percentage bar */
.percentage-bar {
width: 100%;
height: 8px;
background-color: rgba(224, 224, 224, 0.1);
border-radius: 4px;
overflow: hidden;
margin-top: 0.75rem;
}
.percentage-bar-fill {
height: 100%;
background-color: var(--accent);
border-radius: 4px;
transition: width 0.3s ease;
}
.stat-card.danger .percentage-bar-fill {
background-color: var(--danger);
}
/* Cards */
.card {
background-color: var(--bg-secondary);
border-radius: 8px;
padding: 1.5rem;
margin-bottom: 2rem;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
}
.card-title {
font-size: 1.25rem;
font-weight: 600;
color: var(--text-primary);
margin-bottom: 1.5rem;
border-bottom: 2px solid rgba(0, 180, 216, 0.2);
padding-bottom: 0.75rem;
}
/* Chart containers */
.chart-container {
position: relative;
height: 400px;
width: 100%;
}
.chart-container.horizontal {
height: 300px;
}
/* Links section */
.quick-links {
display: flex;
gap: 1rem;
flex-wrap: wrap;
}
.quick-link {
display: inline-flex;
align-items: center;
padding: 0.75rem 1.25rem;
background-color: rgba(0, 180, 216, 0.1);
border: 1px solid var(--accent);
border-radius: 6px;
color: var(--accent);
text-decoration: none;
font-weight: 500;
transition: all 0.3s ease;
}
.quick-link:hover {
background-color: var(--accent);
color: var(--bg-primary);
}
.quick-link::after {
content: " →";
margin-left: 0.5rem;
}
/* Runs table */
.runs-table {
width: 100%;
border-collapse: collapse;
}
.runs-table thead {
background-color: rgba(0, 180, 216, 0.1);
border-bottom: 2px solid var(--accent);
}
.runs-table th {
padding: 1rem;
text-align: left;
font-weight: 600;
color: var(--text-primary);
font-size: 0.875rem;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.runs-table td {
padding: 0.75rem 1rem;
border-bottom: 1px solid rgba(224, 224, 224, 0.1);
color: var(--text-secondary);
font-size: 0.875rem;
}
.runs-table tbody tr:hover {
background-color: rgba(0, 180, 216, 0.05);
}
/* Status badge */
.status-badge {
display: inline-block;
padding: 0.35rem 0.75rem;
border-radius: 4px;
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.status-badge.completed {
background-color: rgba(39, 174, 96, 0.2);
color: var(--success);
}
.status-badge.in-progress {
background-color: rgba(243, 156, 18, 0.2);
color: var(--warning);
}
.status-badge.failed {
background-color: rgba(231, 76, 60, 0.2);
color: var(--danger);
}
/* Responsive */
@media (max-width: 768px) {
.page-header h1 {
font-size: 1.75rem;
}
.stats-grid {
grid-template-columns: 1fr;
}
.chart-container {
height: 300px;
}
.quick-links {
flex-direction: column;
}
.quick-link {
width: 100%;
justify-content: center;
}
.runs-table th,
.runs-table td {
padding: 0.5rem 0.75rem;
font-size: 0.75rem;
}
}
/* Empty state */
.empty-state {
text-align: center;
padding: 2rem;
color: var(--text-secondary);
}
.empty-state p {
margin: 0;
}
/* Number formatting */
.number-highlight {
color: var(--accent);
font-weight: 600;
}
.percentage-highlight {
color: var(--warning);
font-weight: 600;
}
.percentage-highlight.high {
color: var(--danger);
}
</style>
{% endblock %}
{% block content %}
<div class="page-header">
<h1>Toxicity Analysis</h1>
<p class="subtitle">
{{ stats.total_scored_posts | format_number }} / {{ stats.total_posts | format_number }} posts scored
<span style="margin: 0 0.5rem;"></span>
{{ stats.total_scored_mentions | format_number }} / {{ stats.total_mentions | format_number }} mentions scored
</p>
</div>
<!-- Stats Grid -->
<div class="stats-grid">
<!-- Total Scored Card -->
<div class="stat-card">
<div>
<div class="stat-card-label">Total Scored</div>
<div class="stat-card-value">{{ (stats.total_scored_posts + stats.total_scored_mentions) | format_number }}</div>
</div>
<div class="stat-card-detail">
{{ stats.total_scored_posts | format_number }} posts
<span style="color: var(--text-secondary); margin: 0 0.25rem;">+</span>
{{ stats.total_scored_mentions | format_number }} mentions
</div>
</div>
<!-- Flagged Posts Card -->
<div class="stat-card {% if stats.flagged_posts > 0 and (stats.flagged_posts / (stats.total_scored_posts or 1)) > 0.05 %}danger{% endif %}">
<div>
<div class="stat-card-label">Flagged Posts</div>
<div class="stat-card-value">{{ stats.flagged_posts | format_number }}</div>
</div>
<div class="stat-card-detail">
<span class="{% if stats.flagged_posts > 0 and (stats.flagged_posts / (stats.total_scored_posts or 1)) > 0.05 %}percentage-highlight high{% else %}percentage-highlight{% endif %}">
{{ "%.2f" | format(100.0 * stats.flagged_posts / (stats.total_scored_posts or 1)) }}%
</span>
of scored posts
</div>
<div class="percentage-bar">
<div class="percentage-bar-fill" style="width: {{ 100.0 * stats.flagged_posts / (stats.total_scored_posts or 1) }}%"></div>
</div>
</div>
<!-- Flagged Mentions Card -->
<div class="stat-card">
<div>
<div class="stat-card-label">Flagged Mentions</div>
<div class="stat-card-value">{{ stats.flagged_mentions | format_number }}</div>
</div>
<div class="stat-card-detail">
<span class="percentage-highlight">
{{ "%.2f" | format(100.0 * stats.flagged_mentions / (stats.total_scored_mentions or 1)) }}%
</span>
of scored mentions
</div>
<div class="percentage-bar">
<div class="percentage-bar-fill" style="width: {{ 100.0 * stats.flagged_mentions / (stats.total_scored_mentions or 1) }}%"></div>
</div>
</div>
<!-- Avg Toxicity Card -->
<div class="stat-card">
<div>
<div class="stat-card-label">Average Toxicity</div>
<div class="stat-card-value">{{ "%.1f" | format(100.0 * ((stats.avg_toxicity_posts + stats.avg_toxicity_mentions) / 2.0)) }}%</div>
</div>
<div class="stat-card-detail">
Posts: {{ "%.2f" | format(100.0 * stats.avg_toxicity_posts) }}%
<span style="color: var(--text-secondary); margin: 0 0.25rem;"></span>
Mentions: {{ "%.2f" | format(100.0 * stats.avg_toxicity_mentions) }}%
</div>
<div class="percentage-bar">
<div class="percentage-bar-fill" style="width: {{ 100.0 * ((stats.avg_toxicity_posts + stats.avg_toxicity_mentions) / 2.0) }}%"></div>
</div>
</div>
</div>
<!-- Trend Chart -->
<div class="card">
<div class="card-title">Toxicity Trends Over Time</div>
<div class="chart-container">
<canvas id="trendChart"></canvas>
</div>
</div>
<!-- Category Breakdown -->
<div class="card">
<div class="card-title">Toxicity by Category</div>
<div class="chart-container horizontal">
<canvas id="categoriesChart"></canvas>
</div>
</div>
<!-- Recent Analysis Runs -->
<div class="card">
<div class="card-title">Recent Analysis Runs</div>
{% if runs %}
<div style="overflow-x: auto;">
<table class="runs-table">
<thead>
<tr>
<th>Started</th>
<th>Duration</th>
<th>Posts Scored</th>
<th>Mentions Scored</th>
<th>Errors</th>
<th>Cost</th>
<th>Status</th>
</tr>
</thead>
<tbody>
{% for run in runs[:5] %}
<tr>
<td>{{ run.started_at | time_ago }}</td>
<td>{% if run.duration_secs is not none %}{{ "%.0f" | format(run.duration_secs | float) }}s{% else %}—{% endif %}</td>
<td>{{ run.posts_scored | format_number }}</td>
<td>{{ run.mentions_scored | format_number }}</td>
<td>{{ run.errors }}</td>
<td>${{ "%.4f" | format(run.cost_usd | default(0) | float) }}</td>
<td>
<span class="status-badge {% if run.status == 'completed' %}completed{% elif run.status == 'in_progress' %}in-progress{% elif run.status == 'failed' %}failed{% endif %}">
{{ run.status }}
</span>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% else %}
<div class="empty-state">
<p>No analysis runs yet. Start a new analysis to see results here.</p>
</div>
{% endif %}
</div>
<!-- Quick Links -->
<div style="margin-top: 2rem;">
<div class="quick-links">
<a href="{{ url_for('analysis.flagged') }}" class="quick-link">View Flagged Content</a>
<a href="{{ url_for('analysis.accounts') }}" class="quick-link">Account Breakdown</a>
</div>
</div>
<!-- Chart.js Script -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/Chart.js/4.4.7/chart.umd.min.js"></script>
<script>
// Chart color scheme
const chartColors = {
accent: '#00b4d8',
orange: '#e67e22',
gridLine: '#2a2a3e',
text: '#e0e0e0',
categories: [
'#00b4d8', // 1
'#e67e22', // 2
'#9b59b6', // 3
'#1abc9c', // 4
'#e74c3c', // 5
'#f39c12', // 6
'#3498db', // 7
'#2ecc71' // 8
]
};
// Trend Chart
{% if trend_json %}
const trendData = {{ trend_json | safe }};
const trendCtx = document.getElementById('trendChart').getContext('2d');
const trendChart = new Chart(trendCtx, {
type: 'line',
data: {
labels: trendData.map(d => d.week),
datasets: [
{
label: 'Avg Post Toxicity',
data: trendData.map(d => d.avg_post_toxicity),
borderColor: chartColors.accent,
backgroundColor: 'rgba(0, 180, 216, 0.05)',
borderWidth: 2,
tension: 0.4,
fill: true,
yAxisID: 'y',
pointBackgroundColor: chartColors.accent,
pointBorderColor: '#1a1a2e',
pointBorderWidth: 2,
pointRadius: 5,
pointHoverRadius: 7
},
{
label: 'Avg Mention Toxicity',
data: trendData.map(d => d.avg_mention_toxicity),
borderColor: chartColors.orange,
backgroundColor: 'rgba(230, 126, 34, 0.05)',
borderWidth: 2,
tension: 0.4,
fill: true,
yAxisID: 'y',
pointBackgroundColor: chartColors.orange,
pointBorderColor: '#1a1a2e',
pointBorderWidth: 2,
pointRadius: 5,
pointHoverRadius: 7
},
{
label: 'Flagged Posts',
data: trendData.map(d => d.flagged_posts),
type: 'bar',
borderColor: 'rgba(231, 76, 60, 0.5)',
backgroundColor: 'rgba(231, 76, 60, 0.2)',
yAxisID: 'y1',
borderWidth: 1,
barThickness: 8,
categoryPercentage: 0.8,
maxBarThickness: 15
},
{
label: 'Flagged Mentions',
data: trendData.map(d => d.flagged_mentions),
type: 'bar',
borderColor: 'rgba(243, 156, 18, 0.5)',
backgroundColor: 'rgba(243, 156, 18, 0.2)',
yAxisID: 'y1',
borderWidth: 1,
barThickness: 8,
categoryPercentage: 0.8,
maxBarThickness: 15
}
]
},
options: {
responsive: true,
maintainAspectRatio: false,
interaction: {
mode: 'index',
intersect: false
},
plugins: {
legend: {
display: true,
labels: {
color: chartColors.text,
usePointStyle: true,
padding: 15,
font: {
size: 12
}
}
},
tooltip: {
backgroundColor: '#0f3460',
titleColor: chartColors.text,
bodyColor: chartColors.text,
borderColor: chartColors.accent,
borderWidth: 1,
padding: 10,
displayColors: true
}
},
scales: {
x: {
grid: {
color: chartColors.gridLine,
drawBorder: false
},
ticks: {
color: chartColors.text,
font: {
size: 11
}
}
},
y: {
type: 'linear',
display: true,
position: 'left',
min: 0,
max: 1,
ticks: {
color: chartColors.text,
font: {
size: 11
},
callback: function(value) {
return (value * 100).toFixed(0) + '%';
}
},
grid: {
color: chartColors.gridLine,
drawBorder: false
},
title: {
display: true,
text: 'Toxicity Score',
color: chartColors.text,
font: {
size: 12,
weight: 'bold'
}
}
},
y1: {
type: 'linear',
display: true,
position: 'right',
grid: {
drawOnChartArea: false
},
ticks: {
color: chartColors.text,
font: {
size: 11
}
},
title: {
display: true,
text: 'Flagged Count',
color: chartColors.text,
font: {
size: 12,
weight: 'bold'
}
}
}
}
}
});
{% endif %}
// Category Chart
{% if categories_json %}
const categoriesData = {{ categories_json | safe }};
const categoryNames = {{ categories | tojson | safe }};
const categoriesCtx = document.getElementById('categoriesChart').getContext('2d');
const categoriesChart = new Chart(categoriesCtx, {
type: 'bar',
data: {
labels: categoryNames,
datasets: [
{
label: 'Average Toxicity Score',
data: categoryNames.map(cat => categoriesData[cat] || 0),
backgroundColor: chartColors.categories,
borderColor: chartColors.categories.map(c => c.replace('0.', '1.')),
borderWidth: 1.5,
borderRadius: 4
}
]
},
options: {
indexAxis: 'y',
responsive: true,
maintainAspectRatio: false,
plugins: {
legend: {
display: false
},
tooltip: {
backgroundColor: '#0f3460',
titleColor: chartColors.text,
bodyColor: chartColors.text,
borderColor: chartColors.accent,
borderWidth: 1,
padding: 10,
callbacks: {
label: function(context) {
return 'Score: ' + (context.parsed.x * 100).toFixed(2) + '%';
}
}
}
},
scales: {
x: {
min: 0,
max: 1,
grid: {
color: chartColors.gridLine,
drawBorder: false
},
ticks: {
color: chartColors.text,
font: {
size: 11
},
callback: function(value) {
return (value * 100).toFixed(0) + '%';
}
},
title: {
display: true,
text: 'Average Toxicity',
color: chartColors.text,
font: {
size: 12,
weight: 'bold'
}
}
},
y: {
grid: {
drawOnChartArea: false,
drawBorder: false
},
ticks: {
color: chartColors.text,
font: {
size: 11
}
}
}
}
}
});
{% endif %}
</script>
{% endblock %}

688
src/web/templates/base.html Normal file
View file

@ -0,0 +1,688 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{% block title %}Dashboard{% endblock %} - Bluesky Collector</title>
<style>
/* ===== Reset & Base ===== */
*, *::before, *::after {
box-sizing: border-box;
margin: 0;
padding: 0;
}
html {
font-size: 15px;
-webkit-font-smoothing: antialiased;
-moz-osx-font-smoothing: grayscale;
}
body {
font-family: -apple-system, BlinkMacSystemFont, "Segoe UI", Roboto, Oxygen,
Ubuntu, Cantarell, "Helvetica Neue", Arial, sans-serif;
background-color: #1a1a2e;
color: #e0e0e0;
line-height: 1.6;
min-height: 100vh;
display: flex;
flex-direction: column;
}
a {
color: #00b4d8;
text-decoration: none;
transition: color 0.2s ease;
}
a:hover {
color: #48cae4;
}
/* ===== Navigation ===== */
.navbar {
background-color: #0f3460;
padding: 0 2rem;
display: flex;
align-items: center;
justify-content: space-between;
height: 60px;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.3);
position: sticky;
top: 0;
z-index: 1000;
}
.navbar-brand {
display: flex;
align-items: center;
gap: 0.5rem;
font-size: 1.25rem;
font-weight: 700;
color: #ffffff;
letter-spacing: 0.02em;
}
.navbar-brand .brand-icon {
font-size: 1.4rem;
}
.navbar-nav {
display: flex;
list-style: none;
gap: 0.25rem;
align-items: center;
}
.navbar-nav a {
display: block;
padding: 0.5rem 1rem;
color: #b0c4de;
border-radius: 6px;
font-size: 0.9rem;
font-weight: 500;
transition: background-color 0.2s ease, color 0.2s ease;
}
.navbar-nav a:hover {
background-color: rgba(0, 180, 216, 0.15);
color: #e0e0e0;
}
.navbar-nav a.active {
background-color: rgba(0, 180, 216, 0.2);
color: #00b4d8;
}
/* ===== Main Content ===== */
.main-content {
flex: 1;
padding: 2rem;
max-width: 1280px;
width: 100%;
margin: 0 auto;
}
.page-header {
margin-bottom: 1.75rem;
}
.page-header h1 {
font-size: 1.6rem;
font-weight: 700;
color: #ffffff;
}
.page-header p {
color: #8899aa;
margin-top: 0.25rem;
font-size: 0.95rem;
}
/* ===== Cards ===== */
.card {
background-color: #16213e;
border-radius: 10px;
padding: 1.5rem;
box-shadow: 0 2px 12px rgba(0, 0, 0, 0.25);
border: 1px solid rgba(255, 255, 255, 0.04);
transition: transform 0.15s ease, box-shadow 0.15s ease;
}
.card:hover {
transform: translateY(-2px);
box-shadow: 0 6px 20px rgba(0, 0, 0, 0.35);
}
.card-title {
font-size: 0.85rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.06em;
color: #8899aa;
margin-bottom: 0.5rem;
}
.card-value {
font-size: 2rem;
font-weight: 700;
color: #ffffff;
line-height: 1.2;
}
.card-footer-text {
font-size: 0.8rem;
color: #667788;
margin-top: 0.5rem;
}
/* Stat cards grid */
.stats-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
gap: 1.25rem;
margin-bottom: 2rem;
}
.stat-card {
text-align: center;
padding: 1.75rem 1.5rem;
}
.stat-card .card-value {
font-size: 2.25rem;
}
/* ===== Tables ===== */
.table-wrapper {
background-color: #16213e;
border-radius: 10px;
overflow: hidden;
box-shadow: 0 2px 12px rgba(0, 0, 0, 0.25);
border: 1px solid rgba(255, 255, 255, 0.04);
}
.table-header {
padding: 1.25rem 1.5rem;
border-bottom: 1px solid rgba(255, 255, 255, 0.06);
}
.table-header h2 {
font-size: 1.1rem;
font-weight: 600;
color: #ffffff;
}
table {
width: 100%;
border-collapse: collapse;
}
thead th {
background-color: rgba(0, 0, 0, 0.15);
padding: 0.75rem 1rem;
text-align: left;
font-size: 0.8rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.05em;
color: #8899aa;
border-bottom: 1px solid rgba(255, 255, 255, 0.06);
white-space: nowrap;
}
tbody tr {
border-bottom: 1px solid rgba(255, 255, 255, 0.03);
transition: background-color 0.15s ease;
}
tbody tr:nth-child(even) {
background-color: rgba(0, 0, 0, 0.08);
}
tbody tr:hover {
background-color: rgba(0, 180, 216, 0.06);
}
tbody td {
padding: 0.75rem 1rem;
font-size: 0.9rem;
color: #d0d0d0;
vertical-align: top;
}
.table-empty {
text-align: center;
padding: 2.5rem 1rem;
color: #667788;
font-size: 0.95rem;
}
/* Alignment helpers */
.text-right {
text-align: right;
}
.text-center {
text-align: center;
}
/* ===== Badges ===== */
.badge {
display: inline-block;
padding: 0.2rem 0.65rem;
border-radius: 50px;
font-size: 0.75rem;
font-weight: 600;
letter-spacing: 0.03em;
text-transform: capitalize;
line-height: 1.4;
}
/* Post type badges */
.badge-post {
background-color: rgba(0, 180, 216, 0.15);
color: #00b4d8;
}
.badge-reply {
background-color: rgba(155, 89, 182, 0.15);
color: #9b59b6;
}
.badge-repost {
background-color: rgba(230, 126, 34, 0.15);
color: #e67e22;
}
/* Status badges */
.badge-completed {
background-color: rgba(39, 174, 96, 0.15);
color: #27ae60;
}
.badge-partial {
background-color: rgba(243, 156, 18, 0.15);
color: #f39c12;
}
.badge-running {
background-color: rgba(52, 152, 219, 0.15);
color: #3498db;
}
.badge-failed {
background-color: rgba(231, 76, 60, 0.15);
color: #e74c3c;
}
/* ===== Pagination ===== */
.pagination {
display: flex;
justify-content: center;
align-items: center;
gap: 0.35rem;
margin-top: 1.5rem;
padding: 1rem 0;
flex-wrap: wrap;
}
.pagination a,
.pagination span {
display: inline-flex;
align-items: center;
justify-content: center;
min-width: 36px;
height: 36px;
padding: 0 0.5rem;
border-radius: 8px;
font-size: 0.85rem;
font-weight: 500;
color: #b0c4de;
background-color: #16213e;
border: 1px solid rgba(255, 255, 255, 0.06);
transition: background-color 0.2s ease, color 0.2s ease, border-color 0.2s ease;
}
.pagination a:hover {
background-color: rgba(0, 180, 216, 0.12);
border-color: rgba(0, 180, 216, 0.3);
color: #00b4d8;
}
.pagination .active {
background-color: #00b4d8;
color: #ffffff;
border-color: #00b4d8;
font-weight: 700;
}
.pagination .disabled {
opacity: 0.35;
pointer-events: none;
}
.pagination .ellipsis {
background: none;
border: none;
color: #667788;
cursor: default;
}
/* ===== Buttons ===== */
.btn {
display: inline-flex;
align-items: center;
gap: 0.4rem;
padding: 0.55rem 1.15rem;
border-radius: 8px;
font-size: 0.85rem;
font-weight: 600;
border: none;
cursor: pointer;
text-decoration: none;
transition: background-color 0.2s ease, transform 0.1s ease;
}
.btn:active {
transform: scale(0.97);
}
.btn:hover {
text-decoration: none;
}
.btn-primary {
background-color: #00b4d8;
color: #ffffff;
}
.btn-primary:hover {
background-color: #0096b7;
color: #ffffff;
}
.btn-secondary {
background-color: rgba(255, 255, 255, 0.08);
color: #e0e0e0;
}
.btn-secondary:hover {
background-color: rgba(255, 255, 255, 0.14);
}
.btn-danger {
background-color: rgba(231, 76, 60, 0.15);
color: #e74c3c;
}
.btn-danger:hover {
background-color: rgba(231, 76, 60, 0.25);
}
/* ===== Filter Bar ===== */
.filter-bar {
display: flex;
flex-wrap: wrap;
gap: 0.75rem;
align-items: center;
margin-bottom: 1.25rem;
padding: 1rem 1.25rem;
background-color: #16213e;
border-radius: 10px;
border: 1px solid rgba(255, 255, 255, 0.04);
}
.filter-bar select,
.filter-bar input[type="text"] {
background-color: #1a1a2e;
color: #e0e0e0;
border: 1px solid rgba(255, 255, 255, 0.12);
border-radius: 6px;
padding: 0.5rem 0.75rem;
font-size: 0.85rem;
transition: border-color 0.2s ease;
}
.filter-bar select:focus,
.filter-bar input[type="text"]:focus {
outline: none;
border-color: #00b4d8;
box-shadow: 0 0 0 2px rgba(0, 180, 216, 0.15);
}
/* ===== Utility ===== */
.mono {
font-family: "SF Mono", "Fira Code", "Fira Mono", Menlo, Consolas,
"DejaVu Sans Mono", monospace;
font-size: 0.85em;
}
.text-muted {
color: #667788;
}
.text-accent {
color: #00b4d8;
}
.mt-1 { margin-top: 0.5rem; }
.mt-2 { margin-top: 1rem; }
.mt-3 { margin-top: 1.5rem; }
.mb-1 { margin-bottom: 0.5rem; }
.mb-2 { margin-bottom: 1rem; }
.mb-3 { margin-bottom: 1.5rem; }
.text-preview {
max-width: 400px;
overflow: hidden;
text-overflow: ellipsis;
}
.result-count {
font-size: 0.85rem;
color: #8899aa;
margin-bottom: 0.75rem;
}
.back-link {
display: inline-block;
margin-bottom: 1rem;
font-size: 0.9rem;
}
.post-text {
white-space: pre-wrap;
word-break: break-word;
line-height: 1.7;
margin: 1rem 0;
}
.reply-card {
border-left: 3px solid #9b59b6;
padding-left: 1rem;
margin-bottom: 0.75rem;
}
h1, h2, h3 {
color: #ffffff;
}
/* ===== Details / Collapsible ===== */
details {
margin-top: 1rem;
}
details summary {
cursor: pointer;
color: #00b4d8;
font-size: 0.9rem;
margin-bottom: 0.5rem;
}
details pre {
background-color: #1a1a2e;
padding: 1rem;
border-radius: 8px;
overflow-x: auto;
font-size: 0.8rem;
color: #c0c0c0;
max-height: 500px;
overflow-y: auto;
}
/* ===== Flash Messages ===== */
.flash-messages {
margin-bottom: 1.5rem;
}
.flash {
padding: 0.85rem 1.25rem;
border-radius: 8px;
font-size: 0.9rem;
margin-bottom: 0.5rem;
}
.flash-success {
background-color: rgba(39, 174, 96, 0.12);
color: #27ae60;
border: 1px solid rgba(39, 174, 96, 0.2);
}
.flash-error {
background-color: rgba(231, 76, 60, 0.12);
color: #e74c3c;
border: 1px solid rgba(231, 76, 60, 0.2);
}
.flash-info {
background-color: rgba(0, 180, 216, 0.12);
color: #00b4d8;
border: 1px solid rgba(0, 180, 216, 0.2);
}
/* ===== Stats Row (inline) ===== */
.stats-row {
display: flex;
gap: 1.5rem;
flex-wrap: wrap;
margin: 1rem 0;
}
.stat-item {
display: flex;
align-items: center;
gap: 0.35rem;
font-size: 0.9rem;
color: #8899aa;
}
.stat-item strong {
color: #e0e0e0;
}
/* ===== Footer ===== */
.footer {
text-align: center;
padding: 1.5rem 2rem;
color: #4a5568;
font-size: 0.8rem;
border-top: 1px solid rgba(255, 255, 255, 0.04);
margin-top: auto;
}
.footer span {
color: #667788;
}
/* ===== Responsive ===== */
@media (max-width: 768px) {
.navbar {
padding: 0 1rem;
flex-wrap: wrap;
height: auto;
padding-top: 0.75rem;
padding-bottom: 0.75rem;
gap: 0.5rem;
}
.navbar-nav {
gap: 0.15rem;
flex-wrap: wrap;
}
.navbar-nav a {
padding: 0.4rem 0.7rem;
font-size: 0.82rem;
}
.main-content {
padding: 1.25rem;
}
.stats-grid {
grid-template-columns: repeat(2, 1fr);
gap: 0.75rem;
}
.stat-card .card-value {
font-size: 1.6rem;
}
table {
font-size: 0.82rem;
}
thead th,
tbody td {
padding: 0.55rem 0.65rem;
}
}
@media (max-width: 480px) {
.stats-grid {
grid-template-columns: 1fr;
}
}
</style>
{% block extra_css %}{% endblock %}
</head>
<body>
<nav class="navbar">
<div class="navbar-brand">
<span class="brand-icon">&#129419;</span>
Bluesky Collector
</div>
<ul class="navbar-nav">
<li>
<a href="/" class="{% if request.path == '/' %}active{% endif %}">
Dashboard
</a>
</li>
<li>
<a href="/accounts" class="{% if request.path.startswith('/accounts') %}active{% endif %}">
Accounts
</a>
</li>
<li>
<a href="/statuses" class="{% if request.path.startswith('/statuses') %}active{% endif %}">
Statuses
</a>
</li>
<li>
<a href="/mentions" class="{% if request.path.startswith('/mentions') %}active{% endif %}">
Mentions
</a>
</li>
<li>
<a href="/analysis" class="{% if request.path.startswith('/analysis') %}active{% endif %}">
Analysis
</a>
</li>
<li>
<a href="/export" class="{% if request.path.startswith('/export') %}active{% endif %}">
Export
</a>
</li>
</ul>
</nav>
<main class="main-content">
{% with messages = get_flashed_messages(with_categories=true) %}
{% if messages %}
<div class="flash-messages">
{% for category, message in messages %}
<div class="flash flash-{{ category }}">{{ message }}</div>
{% endfor %}
</div>
{% endif %}
{% endwith %}
{% block content %}{% endblock %}
</main>
<footer class="footer">
<span>Bluesky Collector</span>
</footer>
</body>
</html>

View file

@ -0,0 +1,93 @@
{% extends "base.html" %}
{% block title %}Dashboard{% endblock %}
{% block content %}
<div class="page-header">
<h1>Dashboard</h1>
<p>Overview of your Bluesky data collection</p>
</div>
<!-- Stat Cards -->
<div class="stats-grid">
<div class="card stat-card">
<div class="card-title">Accounts</div>
<div class="card-value">{{ stats.accounts | format_number }}</div>
<div class="card-footer-text">Tracked accounts</div>
</div>
<div class="card stat-card">
<div class="card-title">Posts</div>
<div class="card-value">{{ stats.posts | format_number }}</div>
<div class="card-footer-text">Collected posts</div>
</div>
<div class="card stat-card">
<div class="card-title">Mentions</div>
<div class="card-value">{{ stats.mentions | format_number }}</div>
<div class="card-footer-text">Detected mentions</div>
</div>
<div class="card stat-card">
<div class="card-title">Collection Runs</div>
<div class="card-value">{{ stats.runs | format_number }}</div>
<div class="card-footer-text">Total runs</div>
</div>
</div>
<!-- Recent Collection Runs -->
<div class="table-wrapper">
<div class="table-header">
<h2>Recent Collection Runs</h2>
</div>
<table>
<thead>
<tr>
<th>Started</th>
<th>Duration</th>
<th>Status</th>
<th class="text-right">Accounts</th>
<th class="text-right">Posts</th>
<th class="text-right">Mentions</th>
<th class="text-right">Errors</th>
</tr>
</thead>
<tbody>
{% for run in runs %}
<tr>
<td class="mono">{{ run.started_at | format_dt }}</td>
<td>
{% if run.duration_secs is not none %}
{% set minutes = (run.duration_secs // 60) | int %}
{% set seconds = (run.duration_secs % 60) | int %}
{% if minutes > 0 %}
{{ minutes }}m {{ seconds }}s
{% else %}
{{ seconds }}s
{% endif %}
{% else %}
&mdash;
{% endif %}
</td>
<td>
<span class="badge badge-{{ run.status }}">{{ run.status }}</span>
</td>
<td class="text-right">{{ run.accounts_done | format_number }}</td>
<td class="text-right">{{ run.posts_collected | format_number }}</td>
<td class="text-right">{{ run.mentions_collected | format_number }}</td>
<td class="text-right">
{% if run.errors %}
<span class="text-accent">{{ run.errors | length }}</span>
{% else %}
0
{% endif %}
</td>
</tr>
{% else %}
<tr>
<td colspan="7" class="table-empty">
No collection runs yet. Start a collection to see results here.
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{% endblock %}

View file

@ -0,0 +1,114 @@
{% extends "base.html" %}
{% block title %}Export{% endblock %}
{% block content %}
<div class="page-header">
<h1>Export Data</h1>
<p class="text-muted">Download posts and mentions as CSV files for analysis.</p>
</div>
<div class="export-grid">
<!-- Posts Export -->
<div class="card">
<h2>Export Posts</h2>
<p class="text-muted">Download all collected posts from tracked accounts.</p>
<form action="{{ url_for('export.posts_csv') }}" method="get" class="export-form">
<div class="form-group">
<label for="post-account">Filter by Account</label>
<select id="post-account" name="account" class="form-control">
<option value="">All accounts</option>
{% for a in accounts %}
<option value="{{ a.did }}">{{ a.handle }}</option>
{% endfor %}
</select>
</div>
<div class="form-row">
<div class="form-group">
<label for="post-since">From date</label>
<input type="date" id="post-since" name="since" class="form-control">
</div>
<div class="form-group">
<label for="post-until">To date</label>
<input type="date" id="post-until" name="until" class="form-control">
</div>
</div>
<button type="submit" class="btn btn-primary">
Download Posts CSV
</button>
</form>
</div>
<!-- Mentions Export -->
<div class="card">
<h2>Export Mentions</h2>
<p class="text-muted">Download all collected mentions of tracked accounts.</p>
<form action="{{ url_for('export.mentions_csv') }}" method="get" class="export-form">
<div class="form-group">
<label for="mention-account">Filter by Mentioned Account</label>
<select id="mention-account" name="account" class="form-control">
<option value="">All accounts</option>
{% for a in accounts %}
<option value="{{ a.did }}">{{ a.handle }}</option>
{% endfor %}
</select>
</div>
<div class="form-row">
<div class="form-group">
<label for="mention-since">From date</label>
<input type="date" id="mention-since" name="since" class="form-control">
</div>
<div class="form-group">
<label for="mention-until">To date</label>
<input type="date" id="mention-until" name="until" class="form-control">
</div>
</div>
<button type="submit" class="btn btn-primary">
Download Mentions CSV
</button>
</form>
</div>
</div>
<style>
.export-grid {
display: grid;
grid-template-columns: repeat(auto-fit, minmax(400px, 1fr));
gap: 1.5rem;
margin-top: 1.5rem;
}
.export-form {
margin-top: 1rem;
}
.form-group {
margin-bottom: 1rem;
}
.form-group label {
display: block;
margin-bottom: 0.4rem;
color: #b0b0b0;
font-size: 0.9rem;
}
.form-control {
width: 100%;
padding: 0.5rem 0.75rem;
background: #1a1a2e;
border: 1px solid #2a2a4a;
border-radius: 6px;
color: #e0e0e0;
font-size: 0.95rem;
}
.form-control:focus {
border-color: #00b4d8;
outline: none;
}
.form-row {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 1rem;
}
.export-form .btn {
margin-top: 0.5rem;
width: 100%;
}
</style>
{% endblock %}

View file

@ -0,0 +1,594 @@
{% extends "base.html" %}
{% block title %}Flagged Content{% endblock %}
{% block content %}
<div class="flagged-container">
<!-- Page Header -->
<div class="page-header">
<h1>Flagged Content</h1>
<span class="total-badge">{{ total | format_number }}</span>
</div>
<!-- Filter Bar -->
<div class="filter-bar">
<form method="get" action="{{ url_for('analysis.flagged') }}" class="filter-form">
<div class="filter-group">
<label for="content-type">Type:</label>
<select id="content-type" name="content_type" class="filter-select">
<option value="">All Types</option>
<option value="post" {% if content_type == 'post' %}selected{% endif %}>Post</option>
<option value="reply" {% if content_type == 'reply' %}selected{% endif %}>Reply</option>
<option value="mention" {% if content_type == 'mention' %}selected{% endif %}>Mention</option>
</select>
</div>
<div class="filter-group">
<label for="category">Category:</label>
<select id="category" name="category" class="filter-select">
<option value="">All Categories</option>
{% for cat in categories %}
<option value="{{ cat }}" {% if category == cat %}selected{% endif %}>{{ cat }}</option>
{% endfor %}
</select>
</div>
<div class="filter-group">
<label for="account-did">Account:</label>
<select id="account-did" name="account_did" class="filter-select">
<option value="">All Accounts</option>
{% for acc in accounts %}
<option value="{{ acc.did }}" {% if account_did == acc.did %}selected{% endif %}>{{ acc.handle }}</option>
{% endfor %}
</select>
</div>
<div class="filter-group">
<label for="threshold">Threshold:</label>
<input type="number" id="threshold" name="threshold" min="0.0" max="1.0" step="0.1" value="{{ threshold or 0.5 }}" class="filter-input" placeholder="0.5">
</div>
<button type="submit" class="btn-apply">Apply Filters</button>
</form>
</div>
<!-- Content Table -->
{% if items %}
<div class="table-wrapper">
<table class="flagged-table">
<thead>
<tr>
<th>Type</th>
<th>Author</th>
<th>Content</th>
<th>Score</th>
<th>Category</th>
<th>Created</th>
</tr>
</thead>
<tbody>
{% for item in items %}
<tr class="item-row">
<!-- Type Badge -->
<td class="col-type">
<span class="badge badge-{{ item.item_type }}">
{% if item.item_type == 'post' %}
Post
{% elif item.item_type == 'reply' %}
Reply
{% elif item.item_type == 'mention' %}
Mention
{% endif %}
</span>
</td>
<!-- Author -->
<td class="col-author">
{% if item.author_handle %}
<a href="https://bsky.app/profile/{{ item.author_handle }}" target="_blank" rel="noopener" class="author-link">
@{{ item.author_handle }}
</a>
{% else %}
<span class="author-did" title="{{ item.author_did }}">{{ item.author_did[:30] }}…</span>
{% endif %}
{% if item.item_type == 'mention' and item.mentioned_handle %}
<span class="mention-arrow"></span>
<a href="https://bsky.app/profile/{{ item.mentioned_handle }}" target="_blank" rel="noopener" class="author-link">
@{{ item.mentioned_handle }}
</a>
{% endif %}
</td>
<!-- Content Text -->
<td class="col-text">
{% if item.source_type == 'post' %}
<a href="{{ url_for('statuses.detail', encoded_uri=encode_uri(item.item_id)) }}" class="content-link">
{{ item.text | truncate_text(200) }}
</a>
{% else %}
<span class="content-text">{{ item.text | truncate_text(200) }}</span>
{% endif %}
</td>
<!-- Score with Bar -->
<td class="col-score">
<div class="score-bar-container">
{% set score_pct = (item.overall * 100) | int %}
{% if item.overall < 0.3 %}
{% set bar_class = 'score-bar-low' %}
{% elif item.overall < 0.6 %}
{% set bar_class = 'score-bar-medium' %}
{% else %}
{% set bar_class = 'score-bar-high' %}
{% endif %}
<div class="score-bar {{ bar_class }}" style="width: {{ score_pct }}%"></div>
<span class="score-number">{{ "%.2f" | format(item.overall) }}</span>
</div>
</td>
<!-- Top Category -->
<td class="col-category">
{% if item.top_category %}
<span class="badge badge-category">{{ item.top_category }}</span>
{% else %}
<span class="text-muted"></span>
{% endif %}
</td>
<!-- Created Time -->
<td class="col-created">
<span class="time-ago" title="{{ item.created_at }}">
{{ item.created_at | time_ago }}
</span>
</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
<!-- Pagination -->
{% if total_pages > 1 %}
<div class="pagination">
{% if page > 1 %}
<a href="{{ url_for('analysis.flagged', page=1, content_type=content_type, category=category, account_did=account_did, threshold=threshold) }}" class="btn-pagination">First</a>
<a href="{{ url_for('analysis.flagged', page=page-1, content_type=content_type, category=category, account_did=account_did, threshold=threshold) }}" class="btn-pagination">Previous</a>
{% endif %}
<span class="pagination-info">Page {{ page }} of {{ total_pages }}</span>
{% if page < total_pages %}
<a href="{{ url_for('analysis.flagged', page=page+1, content_type=content_type, category=category, account_did=account_did, threshold=threshold) }}" class="btn-pagination">Next</a>
<a href="{{ url_for('analysis.flagged', page=total_pages, content_type=content_type, category=category, account_did=account_did, threshold=threshold) }}" class="btn-pagination">Last</a>
{% endif %}
</div>
{% endif %}
{% else %}
<!-- Empty State -->
<div class="empty-state">
<p class="empty-icon"></p>
<p class="empty-text">No flagged content found</p>
<p class="empty-subtext">Try adjusting your filters or threshold</p>
</div>
{% endif %}
</div>
{% endblock %}
{% block extra_css %}
<style>
:root {
--dark-bg: #1a1a2e;
--dark-card: #16213e;
--dark-nav: #0f3460;
--dark-text: #e0e0e0;
--accent-primary: #00b4d8;
--badge-post: #00b4d8;
--badge-reply: #9b59b6;
--badge-mention: #2ecc71;
--tox-low: #2ecc71;
--tox-medium: #f39c12;
--tox-high: #e74c3c;
}
.flagged-container {
padding: 2rem;
max-width: 1400px;
margin: 0 auto;
}
/* Page Header */
.page-header {
display: flex;
align-items: center;
gap: 1rem;
margin-bottom: 2rem;
}
.page-header h1 {
font-size: 2rem;
font-weight: 700;
color: var(--dark-text);
margin: 0;
}
.total-badge {
background: var(--accent-primary);
color: var(--dark-bg);
font-weight: 600;
padding: 0.5rem 1rem;
border-radius: 2rem;
font-size: 0.9rem;
}
/* Filter Bar */
.filter-bar {
background: var(--dark-card);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 0.5rem;
padding: 1.5rem;
margin-bottom: 2rem;
}
.filter-form {
display: flex;
flex-wrap: wrap;
gap: 1rem;
align-items: flex-end;
}
.filter-group {
display: flex;
flex-direction: column;
gap: 0.5rem;
flex: 1;
min-width: 150px;
}
.filter-group label {
font-size: 0.9rem;
font-weight: 600;
color: var(--dark-text);
}
.filter-select,
.filter-input {
background: var(--dark-bg);
border: 1px solid rgba(255, 255, 255, 0.2);
border-radius: 0.375rem;
color: var(--dark-text);
padding: 0.625rem;
font-size: 0.9rem;
font-family: inherit;
}
.filter-select:hover,
.filter-input:hover {
border-color: rgba(255, 255, 255, 0.3);
}
.filter-select:focus,
.filter-input:focus {
outline: none;
border-color: var(--accent-primary);
background: var(--dark-bg);
color: var(--dark-text);
}
.btn-apply {
background: var(--accent-primary);
color: var(--dark-bg);
border: none;
border-radius: 0.375rem;
padding: 0.625rem 1.25rem;
font-weight: 600;
cursor: pointer;
font-size: 0.9rem;
transition: all 0.2s ease;
}
.btn-apply:hover {
opacity: 0.9;
transform: translateY(-1px);
}
.btn-apply:active {
transform: translateY(0);
}
/* Table Wrapper */
.table-wrapper {
background: var(--dark-card);
border: 1px solid rgba(255, 255, 255, 0.1);
border-radius: 0.5rem;
overflow-x: auto;
margin-bottom: 2rem;
}
/* Table Styles */
.flagged-table {
width: 100%;
border-collapse: collapse;
font-size: 0.9rem;
}
.flagged-table thead {
background: rgba(0, 0, 0, 0.3);
border-bottom: 2px solid rgba(255, 255, 255, 0.1);
}
.flagged-table th {
padding: 1rem;
text-align: left;
font-weight: 600;
color: var(--dark-text);
white-space: nowrap;
}
.flagged-table td {
padding: 1rem;
border-bottom: 1px solid rgba(255, 255, 255, 0.05);
color: var(--dark-text);
}
.flagged-table tbody tr:hover {
background: rgba(0, 180, 216, 0.05);
}
/* Column Styles */
.col-type {
width: 90px;
}
.col-author {
width: 200px;
}
.col-text {
min-width: 300px;
}
.col-score {
width: 150px;
}
.col-category {
width: 140px;
}
.col-created {
width: 120px;
}
/* Badges */
.badge {
display: inline-block;
padding: 0.35rem 0.75rem;
border-radius: 0.25rem;
font-size: 0.8rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.badge-post {
background: rgba(0, 180, 216, 0.2);
color: var(--badge-post);
}
.badge-reply {
background: rgba(155, 89, 182, 0.2);
color: var(--badge-reply);
}
.badge-mention {
background: rgba(46, 204, 113, 0.2);
color: var(--badge-mention);
}
.badge-category {
background: rgba(0, 180, 216, 0.15);
color: var(--accent-primary);
}
/* Author Links */
.author-link {
color: var(--accent-primary);
text-decoration: none;
transition: color 0.2s ease;
}
.author-link:hover {
color: rgba(0, 180, 216, 0.8);
text-decoration: underline;
}
.mention-arrow {
color: rgba(255, 255, 255, 0.3);
margin: 0 0.5rem;
}
/* Content Link */
.content-link {
color: var(--dark-text);
text-decoration: none;
transition: color 0.2s ease;
}
.content-link:hover {
color: var(--accent-primary);
}
.content-text {
color: rgba(255, 255, 255, 0.7);
}
/* Score Bar */
.score-bar-container {
position: relative;
height: 30px;
background: rgba(255, 255, 255, 0.05);
border-radius: 0.25rem;
overflow: hidden;
display: flex;
align-items: center;
padding: 0 0.5rem;
}
.score-bar {
position: absolute;
height: 100%;
left: 0;
top: 0;
transition: width 0.3s ease;
}
.score-bar-low {
background: linear-gradient(90deg, rgba(46, 204, 113, 0.3), rgba(46, 204, 113, 0.5));
}
.score-bar-medium {
background: linear-gradient(90deg, rgba(243, 156, 18, 0.3), rgba(243, 156, 18, 0.5));
}
.score-bar-high {
background: linear-gradient(90deg, rgba(231, 76, 60, 0.3), rgba(231, 76, 60, 0.5));
}
.score-number {
position: relative;
z-index: 1;
font-weight: 600;
color: var(--dark-text);
font-size: 0.85rem;
}
/* Time Ago */
.time-ago {
color: rgba(255, 255, 255, 0.6);
font-size: 0.85rem;
cursor: help;
}
.time-ago:hover {
color: var(--dark-text);
}
.text-muted {
color: rgba(255, 255, 255, 0.3);
}
/* Empty State */
.empty-state {
text-align: center;
padding: 4rem 2rem;
background: var(--dark-card);
border: 2px dashed rgba(255, 255, 255, 0.2);
border-radius: 0.5rem;
}
.empty-icon {
font-size: 3rem;
color: rgba(255, 255, 255, 0.2);
margin: 0 0 1rem 0;
}
.empty-text {
font-size: 1.2rem;
font-weight: 600;
color: var(--dark-text);
margin: 0 0 0.5rem 0;
}
.empty-subtext {
color: rgba(255, 255, 255, 0.5);
margin: 0;
}
/* Pagination */
.pagination {
display: flex;
justify-content: center;
align-items: center;
gap: 1rem;
margin-top: 2rem;
}
.pagination-info {
color: var(--dark-text);
font-weight: 500;
min-width: 150px;
text-align: center;
}
.btn-pagination {
background: var(--dark-card);
color: var(--accent-primary);
border: 1px solid var(--accent-primary);
padding: 0.5rem 1rem;
border-radius: 0.375rem;
text-decoration: none;
font-weight: 600;
font-size: 0.9rem;
transition: all 0.2s ease;
cursor: pointer;
}
.btn-pagination:hover {
background: var(--accent-primary);
color: var(--dark-bg);
}
.btn-pagination:active {
transform: scale(0.98);
}
/* Responsive */
@media (max-width: 1024px) {
.filter-form {
flex-direction: column;
}
.filter-group {
width: 100%;
}
.col-text {
min-width: 250px;
}
}
@media (max-width: 768px) {
.flagged-container {
padding: 1rem;
}
.page-header {
flex-direction: column;
align-items: flex-start;
}
.page-header h1 {
font-size: 1.5rem;
}
.table-wrapper {
font-size: 0.8rem;
}
.flagged-table th,
.flagged-table td {
padding: 0.75rem 0.5rem;
}
.col-author,
.col-text {
min-width: 180px;
}
.col-created {
width: 100px;
}
}
</style>
{% endblock %}

View file

@ -0,0 +1,116 @@
{% extends "base.html" %}
{% block title %}Mentions{% endblock %}
{% block content %}
<div class="page-header">
<h1>Mentions</h1>
<p>Track when monitored accounts are mentioned by other users.</p>
</div>
{# ── Filter Bar ──────────────────────────────────────────────────── #}
<form class="filter-bar" method="get" action="/mentions">
<select name="account">
<option value="">All Accounts</option>
{% for acct in accounts %}
<option value="{{ acct.did }}" {% if mentioned_did == acct.did %}selected{% endif %}>
@{{ acct.handle }}
</option>
{% endfor %}
</select>
<input type="text" name="search" placeholder="Search text..." value="{{ search }}" style="min-width: 200px;">
<button type="submit" class="btn btn-primary">Apply</button>
</form>
<div class="result-count">{{ total | format_number }} mentions found</div>
{# ── Mentions Table ──────────────────────────────────────────────── #}
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Mentioned Account</th>
<th>Mentioning User</th>
<th>Text</th>
<th>Created</th>
</tr>
</thead>
<tbody>
{% for mention in mentions %}
<tr>
<td>
{% if mention.mentioned_handle %}
<a href="https://bsky.app/profile/{{ mention.mentioned_handle }}" target="_blank" rel="noopener">
@{{ mention.mentioned_handle }}
</a>
{% else %}
<span class="text-muted mono">{{ mention.mentioned_did[:25] }}...</span>
{% endif %}
</td>
<td>
<span class="text-muted mono" title="{{ mention.mentioning_did }}">
{{ mention.mentioning_did[:30] }}...
</span>
</td>
<td class="text-preview">
{% if mention.post_uri %}
<a href="/statuses/{{ encode_uri(mention.post_uri) }}">
{{ mention.post_text | truncate_text(200) }}
</a>
{% else %}
{{ mention.post_text | truncate_text(200) }}
{% endif %}
</td>
<td title="{{ mention.post_created_at | format_dt }}">
{{ mention.post_created_at | time_ago }}
</td>
</tr>
{% else %}
<tr>
<td colspan="4" class="table-empty">No mentions found.</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{# ── Pagination ──────────────────────────────────────────────────── #}
{% if total_pages > 1 %}
<div class="pagination">
{% if page > 1 %}
<a href="?account={{ mentioned_did }}&search={{ search }}&page={{ page - 1 }}">Prev</a>
{% else %}
<span class="disabled">Prev</span>
{% endif %}
{% set start_page = [1, page - 3] | max %}
{% set end_page = [total_pages, page + 3] | min %}
{% if start_page > 1 %}
<a href="?account={{ mentioned_did }}&search={{ search }}&page=1">1</a>
{% if start_page > 2 %}<span class="ellipsis">...</span>{% endif %}
{% endif %}
{% for p in range(start_page, end_page + 1) %}
{% if p == page %}
<span class="active">{{ p }}</span>
{% else %}
<a href="?account={{ mentioned_did }}&search={{ search }}&page={{ p }}">{{ p }}</a>
{% endif %}
{% endfor %}
{% if end_page < total_pages %}
{% if end_page < total_pages - 1 %}<span class="ellipsis">...</span>{% endif %}
<a href="?account={{ mentioned_did }}&search={{ search }}&page={{ total_pages }}">{{ total_pages }}</a>
{% endif %}
{% if page < total_pages %}
<a href="?account={{ mentioned_did }}&search={{ search }}&page={{ page + 1 }}">Next</a>
{% else %}
<span class="disabled">Next</span>
{% endif %}
</div>
{% endif %}
{% endblock %}

View file

@ -0,0 +1,134 @@
{% extends "base.html" %}
{% block title %}Status Detail{% endblock %}
{% block content %}
<a href="/statuses" class="back-link">&larr; Back to statuses</a>
{# ── Main Post Card ──────────────────────────────────────────────── #}
<div class="card mb-3">
<div style="display: flex; align-items: center; gap: 0.75rem; margin-bottom: 0.75rem;">
{% if post.author_handle %}
<a href="https://bsky.app/profile/{{ post.author_handle }}" target="_blank" rel="noopener"
style="font-weight: 600; font-size: 1.05rem;">
@{{ post.author_handle }}
</a>
{% else %}
<span class="text-muted mono">{{ post.author_did }}</span>
{% endif %}
{% if post.post_type == 'post' %}
<span class="badge badge-post">Post</span>
{% elif post.post_type == 'reply' %}
<span class="badge badge-reply">Reply</span>
{% elif post.post_type == 'repost' %}
<span class="badge badge-repost">Repost</span>
{% else %}
<span class="badge">{{ post.post_type }}</span>
{% endif %}
</div>
<div class="text-muted" style="font-size: 0.85rem;">
{{ post.created_at | format_dt }} ({{ post.created_at | time_ago }})
</div>
{# ── Reply context ───────────────────────────────────────────── #}
{% if post.reply_parent %}
<div style="margin: 0.75rem 0; padding: 0.6rem 0.85rem; background: rgba(155,89,182,0.08);
border-radius: 6px; border-left: 3px solid #9b59b6; font-size: 0.85rem;">
In reply to:
{% if parent %}
<a href="/statuses/{{ encode_uri(parent.uri) }}">
{% if parent.author_handle %}@{{ parent.author_handle }}{% else %}{{ parent.author_did[:30] }}...{% endif %}
&mdash; {{ parent.text | truncate_text(120) }}
</a>
{% else %}
<span class="text-muted mono">{{ post.reply_parent }}</span>
{% endif %}
</div>
{% endif %}
{# ── Full post text ──────────────────────────────────────────── #}
<div class="post-text">{{ post.text }}</div>
{# ── Engagement stats ────────────────────────────────────────── #}
<div class="stats-row">
<div class="stat-item">
<span>Likes:</span>
<strong>{{ post.like_count | format_number }}</strong>
</div>
<div class="stat-item">
<span>Replies:</span>
<strong>{{ post.reply_count | format_number }}</strong>
</div>
<div class="stat-item">
<span>Reposts:</span>
<strong>{{ post.repost_count | format_number }}</strong>
</div>
<div class="stat-item">
<span>Quotes:</span>
<strong>{{ post.quote_count | format_number }}</strong>
</div>
</div>
{# ── Additional metadata ─────────────────────────────────────── #}
<div class="text-muted mt-1" style="font-size: 0.85rem;">
{% if post.has_media %}<span style="margin-right: 0.75rem;">Has media</span>{% endif %}
{% if post.has_embed %}<span style="margin-right: 0.75rem;">Has embed</span>{% endif %}
{% if post.langs %}<span style="margin-right: 0.75rem;">Language: {{ post.langs }}</span>{% endif %}
</div>
{# ── External link ───────────────────────────────────────────── #}
{% set bsky_url = bsky_post_url(post.uri, post.author_handle) %}
{% if bsky_url %}
<div class="mt-2">
<a href="{{ bsky_url }}" target="_blank" rel="noopener" class="btn btn-primary">
View on Bluesky &rarr;
</a>
</div>
{% endif %}
<div class="text-muted mt-2" style="font-size: 0.8rem;">
Indexed: {{ post.indexed_at | format_dt }} &middot;
Collected: {{ post.collected_at | format_dt }}
</div>
</div>
{# ── Replies Section ─────────────────────────────────────────────── #}
{% if replies %}
<h2 class="mb-2">Replies ({{ replies | length }})</h2>
{% for reply in replies %}
<div class="card reply-card">
<div style="display: flex; align-items: center; gap: 0.5rem; margin-bottom: 0.4rem;">
{% if reply.author_handle %}
<a href="https://bsky.app/profile/{{ reply.author_handle }}" target="_blank" rel="noopener"
style="font-weight: 600; font-size: 0.9rem;">
@{{ reply.author_handle }}
</a>
{% else %}
<span class="text-muted mono" style="font-size: 0.85rem;">{{ reply.author_did[:30] }}...</span>
{% endif %}
<span class="text-muted" style="font-size: 0.85rem;">{{ reply.created_at | time_ago }}</span>
</div>
<div style="margin-bottom: 0.35rem;">
<a href="/statuses/{{ encode_uri(reply.uri) }}">
{{ reply.text | truncate_text(300) }}
</a>
</div>
<div class="text-muted" style="font-size: 0.8rem;">
Likes: {{ reply.like_count | format_number }} &middot;
Replies: {{ reply.reply_count | format_number }} &middot;
Reposts: {{ reply.repost_count | format_number }}
</div>
</div>
{% endfor %}
{% endif %}
{# ── Raw JSON ────────────────────────────────────────────────────── #}
{% if post.raw_json %}
<details>
<summary>Raw JSON</summary>
<pre>{{ post.raw_json | tojson(indent=2) }}</pre>
</details>
{% endif %}
{% endblock %}

View file

@ -0,0 +1,143 @@
{% extends "base.html" %}
{% block title %}Statuses{% endblock %}
{% block content %}
<div class="page-header">
<h1>Statuses</h1>
<p>Browse and search collected posts, replies, and reposts.</p>
</div>
{# ── Filter Bar ──────────────────────────────────────────────────── #}
<form class="filter-bar" method="get" action="/statuses">
<select name="account">
<option value="">All Accounts</option>
{% for acct in accounts %}
<option value="{{ acct.did }}" {% if account_did == acct.did %}selected{% endif %}>
@{{ acct.handle }}
</option>
{% endfor %}
</select>
<select name="type">
<option value="">All Types</option>
<option value="post" {% if post_type == 'post' %}selected{% endif %}>Post</option>
<option value="reply" {% if post_type == 'reply' %}selected{% endif %}>Reply</option>
<option value="repost" {% if post_type == 'repost' %}selected{% endif %}>Repost</option>
</select>
<input type="text" name="search" placeholder="Search text..." value="{{ search }}" style="min-width: 200px;">
<select name="sort">
<option value="created" {% if sort == 'created' %}selected{% endif %}>Created</option>
<option value="likes" {% if sort == 'likes' %}selected{% endif %}>Likes</option>
<option value="replies" {% if sort == 'replies' %}selected{% endif %}>Replies</option>
<option value="reposts" {% if sort == 'reposts' %}selected{% endif %}>Reposts</option>
</select>
<select name="dir">
<option value="desc" {% if direction == 'desc' %}selected{% endif %}>Desc</option>
<option value="asc" {% if direction == 'asc' %}selected{% endif %}>Asc</option>
</select>
<button type="submit" class="btn btn-primary">Apply</button>
</form>
<div class="result-count">{{ total | format_number }} statuses found</div>
{# ── Posts Table ─────────────────────────────────────────────────── #}
<div class="table-wrapper">
<table>
<thead>
<tr>
<th>Author</th>
<th>Text</th>
<th>Type</th>
<th>Created</th>
<th class="text-right">Likes</th>
<th class="text-right">Replies</th>
<th class="text-right">Reposts</th>
</tr>
</thead>
<tbody>
{% for post in posts %}
<tr>
<td>
{% if post.author_handle %}
<a href="https://bsky.app/profile/{{ post.author_handle }}" target="_blank" rel="noopener">
@{{ post.author_handle }}
</a>
{% else %}
<span class="text-muted mono">{{ post.author_did[:20] }}...</span>
{% endif %}
</td>
<td class="text-preview">
<a href="/statuses/{{ encode_uri(post.uri) }}">
{{ post.text | truncate_text(200) }}
</a>
</td>
<td>
{% if post.post_type == 'post' %}
<span class="badge badge-post">Post</span>
{% elif post.post_type == 'reply' %}
<span class="badge badge-reply">Reply</span>
{% elif post.post_type == 'repost' %}
<span class="badge badge-repost">Repost</span>
{% else %}
<span class="badge">{{ post.post_type }}</span>
{% endif %}
</td>
<td title="{{ post.created_at | format_dt }}">
{{ post.created_at | time_ago }}
</td>
<td class="text-right">{{ post.like_count | format_number }}</td>
<td class="text-right">{{ post.reply_count | format_number }}</td>
<td class="text-right">{{ post.repost_count | format_number }}</td>
</tr>
{% else %}
<tr>
<td colspan="7" class="table-empty">No statuses found.</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
{# ── Pagination ──────────────────────────────────────────────────── #}
{% if total_pages > 1 %}
<div class="pagination">
{% if page > 1 %}
<a href="?account={{ account_did }}&type={{ post_type }}&search={{ search }}&sort={{ sort }}&dir={{ direction }}&page={{ page - 1 }}">Prev</a>
{% else %}
<span class="disabled">Prev</span>
{% endif %}
{% set start_page = [1, page - 3] | max %}
{% set end_page = [total_pages, page + 3] | min %}
{% if start_page > 1 %}
<a href="?account={{ account_did }}&type={{ post_type }}&search={{ search }}&sort={{ sort }}&dir={{ direction }}&page=1">1</a>
{% if start_page > 2 %}<span class="ellipsis">...</span>{% endif %}
{% endif %}
{% for p in range(start_page, end_page + 1) %}
{% if p == page %}
<span class="active">{{ p }}</span>
{% else %}
<a href="?account={{ account_did }}&type={{ post_type }}&search={{ search }}&sort={{ sort }}&dir={{ direction }}&page={{ p }}">{{ p }}</a>
{% endif %}
{% endfor %}
{% if end_page < total_pages %}
{% if end_page < total_pages - 1 %}<span class="ellipsis">...</span>{% endif %}
<a href="?account={{ account_did }}&type={{ post_type }}&search={{ search }}&sort={{ sort }}&dir={{ direction }}&page={{ total_pages }}">{{ total_pages }}</a>
{% endif %}
{% if page < total_pages %}
<a href="?account={{ account_did }}&type={{ post_type }}&search={{ search }}&sort={{ sort }}&dir={{ direction }}&page={{ page + 1 }}">Next</a>
{% else %}
<span class="disabled">Next</span>
{% endif %}
</div>
{% endif %}
{% endblock %}

10
web.Dockerfile Normal file
View file

@ -0,0 +1,10 @@
FROM python:3.12-slim
WORKDIR /app
COPY requirements-web.txt .
RUN pip install --no-cache-dir -r requirements-web.txt
COPY src/ ./src/
CMD ["gunicorn", "-b", "0.0.0.0:5001", "-w", "2", "src.web.app:create_app()"]