From ac2b50751b6c6a861b04991b1b06cf2d2cb11f92 Mon Sep 17 00:00:00 2001 From: Pieter Date: Tue, 31 Mar 2026 09:25:18 +0200 Subject: [PATCH] Fix flagged page styling by adding CSS/JS blocks to base template MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add {% block extra_css %} to base.html head section - Add {% block extra_js %} to base.html before closing body tag - Add encode_uri template filter for URL encoding Flagged page CSS and JavaScript now load correctly, fixing: - Filter bar styling - Table formatting - Review button styles and functionality 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- TOXICITY_ANALYSIS.md | 242 ---------------------------------------- app/templates/base.html | 2 + app/web.py | 7 ++ 3 files changed, 9 insertions(+), 242 deletions(-) delete mode 100644 TOXICITY_ANALYSIS.md diff --git a/TOXICITY_ANALYSIS.md b/TOXICITY_ANALYSIS.md deleted file mode 100644 index 8241a3a..0000000 --- a/TOXICITY_ANALYSIS.md +++ /dev/null @@ -1,242 +0,0 @@ -# Toxicity Analysis System - -This document describes the toxicity analysis system for the Mastodon collector, adapted from the Bluesky collector implementation. - -## Overview - -The toxicity analysis system uses OpenAI's GPT-4o-mini to classify Mastodon posts across 12 toxicity categories: - -- **toxic**: rude, disrespectful, or aggressive language -- **threat**: threats of violence, harm, or intimidation -- **hate_speech**: targeting based on protected characteristics -- **racism**: race/ethnicity-based targeting -- **antisemitism**: anti-Jewish content -- **islamophobia**: anti-Muslim content -- **sexism**: gender-based discrimination -- **homophobia**: anti-LGBTQ+ content -- **insult**: personal attacks and name-calling -- **dehumanization**: comparing people to animals/vermin -- **extremism**: far-right/left extremist rhetoric -- **ableism**: targeting people with disabilities - -## Architecture - -The system consists of: - -1. **Analyzer Module** (`app/analyzer/`) - Async batch processor for classification -2. **Database Schema** (`scripts/02-toxicity.sql`) - Toxicity scores and analysis runs -3. **Web Interface** - Dashboard and flagged content review -4. **API Endpoints** - For manual review of flagged content - -## Setup - -### 1. Environment Variables - -Add to your `.env` file: - -```bash -# OpenAI API key for toxicity analysis -OPENAI_API_KEY=sk-... - -# Analyzer configuration (optional) -ANALYZER_MODEL=gpt-4o-mini -ANALYZER_BATCH_SIZE=10 -ANALYZER_CONCURRENCY=5 -ANALYZER_FLAG_THRESHOLD=0.5 -ANALYZER_LIMIT=0 # 0 = no limit, or set to test on limited number -``` - -### 2. Database Migration - -The toxicity schema is applied automatically when the analyzer runs for the first time. It creates: - -- `toxicity_scores` table - stores scores for each status -- `analysis_runs` table - audit trail of analysis runs - -To manually apply the migration: - -```bash -docker exec -i mastodon-collector-db-1 psql -U collector -d mastodon_collector < scripts/02-toxicity.sql -``` - -### 3. Install Dependencies - -Dependencies are already added to `requirements.txt`: -- `openai==1.58.1` - OpenAI API client -- `asyncpg==0.30.0` - Async PostgreSQL driver - -Rebuild the Docker containers to install: - -```bash -docker-compose build -docker-compose up -d -``` - -## Running the Analyzer - -### One-Time Analysis - -Run the analyzer manually to score all unscored statuses: - -```bash -docker exec mastodon-collector-collector-1 python -m app.analyzer -``` - -### Test on Limited Sample - -To test on 100 statuses first: - -```bash -docker exec mastodon-collector-collector-1 bash -c "ANALYZER_LIMIT=100 python -m app.analyzer" -``` - -### Automated Analysis (Future) - -You can schedule the analyzer to run periodically using cron or a scheduler service. For example, add to your `docker-compose.yml`: - -```yaml - analyzer: - build: . - command: python -m app.analyzer - environment: - - DATABASE_URL=postgresql://collector:${POSTGRES_PASSWORD}@db:5432/mastodon_collector - - OPENAI_API_KEY=${OPENAI_API_KEY} - - ANALYZER_LIMIT=${ANALYZER_LIMIT:-0} - depends_on: - - db - restart: "no" # Run once, don't restart -``` - -Then trigger manually: -```bash -docker-compose run --rm analyzer -``` - -## Web Interface - -### Analysis Dashboard - -Visit http://localhost:8585/analysis to see: - -- Overall statistics (total scored, flagged count, averages) -- Toxicity trends over time -- Category breakdown chart -- Recent analysis runs - -### Flagged Content Review - -Visit http://localhost:8585/analysis/flagged to: - -- Browse flagged content (threshold >= 0.5 by default) -- Filter by category, account, date range, review status -- Sort by overall toxicity or specific categories -- Manually review and mark items as: - - ✓ Correct (correctly flagged) - - ✗ Incorrect (false positive) - - ? Unsure - -### Review Workflow - -1. Click on flagged items to review -2. Use the review buttons (✓, ✗, ?) to mark your assessment -3. Filter by `review_status=unreviewed` to focus on items needing review -4. Use reviewed data to improve the classifier or adjust thresholds - -## Cost Estimation - -Based on GPT-4o-mini pricing (as of Jan 2025): -- Input: $0.150 per 1M tokens -- Output: $0.600 per 1M tokens - -Typical costs: -- ~1,000 statuses = $0.05-0.15 -- ~10,000 statuses = $0.50-1.50 - -The analyzer logs estimated costs after each run. - -## Architecture Details - -### Batch Processing - -The analyzer processes statuses in batches (default: 10 per API call) with concurrency control (default: 5 simultaneous batches). This optimizes for: - -- Cost efficiency (batch API calls) -- Rate limit compliance -- Parallel processing speed - -### Scoring Logic - -Each status receives: -- 12 category scores (0.0 - 1.0) -- Overall score = max of all categories -- Flagged if overall >= threshold (default 0.5) - -### Human Review - -Manual reviews help: -- Validate AI classifications -- Identify patterns of false positives -- Build training data for future improvements -- Adjust thresholds per category if needed - -## Dutch Language Support - -The classifier is specifically trained to handle Dutch political content, including: - -- Dutch slang and coded terms ("gelukszoekers", "omvolking", "wappie", etc.) -- Political context and satire -- Zwarte Piet debates -- Dutch far-right rhetoric - -## Templates - -The Bluesky collector templates can be adapted for Mastodon. Key files to create: - -1. `app/templates/analysis.html` - Main dashboard -2. `app/templates/flagged.html` - Flagged content browser - -These templates should include: -- Chart.js for visualizations -- Filter forms for exploration -- Review buttons for manual validation - -## Troubleshooting - -### No statuses being scored - -- Check that statuses exist: `SELECT COUNT(*) FROM statuses WHERE content IS NOT NULL AND reblog_of_id IS NULL;` -- Check migration applied: `\dt toxicity_scores` in psql -- Check OPENAI_API_KEY is set - -### Rate limit errors - -- Reduce `ANALYZER_CONCURRENCY` (try 2-3) -- Reduce `ANALYZER_BATCH_SIZE` (try 5) -- The analyzer retries with exponential backoff automatically - -### High false positive rate - -- Increase `ANALYZER_FLAG_THRESHOLD` (try 0.6 or 0.7) -- Review flagged items and look for patterns -- Dutch political content can be intense but not necessarily toxic - -### Template errors - -- Ensure templates exist in `app/templates/` -- Check that analysis helper functions are imported correctly -- Verify template filters are defined (`format_number`, `time_ago`, etc.) - -## Next Steps - -1. Copy analysis templates from Bluesky collector to `app/templates/` -2. Add navigation links to analysis dashboard in base template -3. Run initial analysis on sample data -4. Review flagged content and adjust thresholds -5. Set up automated analysis runs (cron/scheduler) -6. Monitor costs and performance - -## References - -- Bluesky collector: https://forgejo.postxsociety.cloud/pieter/bluesky-collector -- OpenAI API: https://platform.openai.com/docs -- asyncpg: https://magicstack.github.io/asyncpg/ diff --git a/app/templates/base.html b/app/templates/base.html index 462be28..5ced6db 100644 --- a/app/templates/base.html +++ b/app/templates/base.html @@ -234,6 +234,7 @@ .justify-between { justify-content: space-between; } .truncate { white-space: nowrap; overflow: hidden; text-overflow: ellipsis; max-width: 400px; } + {% block extra_css %}{% endblock %}