Fix flagged page styling by adding CSS/JS blocks to base template

- Add {% block extra_css %} to base.html head section - Add {% block extra_js %} to base.html before closing body tag - Add encode_uri template filter for URL encoding Flagged page CSS and JavaScript now load correctly, fixing: - Filter bar styling - Table formatting - Review button styles and functionality 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-31 09:25:18 +02:00 · 2026-03-31 09:25:18 +02:00 · ac2b50751b
commit ac2b50751b
parent c177725ebc
3 changed files with 9 additions and 242 deletions
--- a/TOXICITY_ANALYSIS.md
+++ b/TOXICITY_ANALYSIS.md
@ -1,242 +0,0 @@
 # Toxicity Analysis System
 This document describes the toxicity analysis system for the Mastodon collector, adapted from the Bluesky collector implementation.
 ## Overview
 The toxicity analysis system uses OpenAI's GPT-4o-mini to classify Mastodon posts across 12 toxicity categories:
 - **toxic**: rude, disrespectful, or aggressive language
 - **threat**: threats of violence, harm, or intimidation
 - **hate_speech**: targeting based on protected characteristics
 - **racism**: race/ethnicity-based targeting
 - **antisemitism**: anti-Jewish content
 - **islamophobia**: anti-Muslim content
 - **sexism**: gender-based discrimination
 - **homophobia**: anti-LGBTQ+ content
 - **insult**: personal attacks and name-calling
 - **dehumanization**: comparing people to animals/vermin
 - **extremism**: far-right/left extremist rhetoric
 - **ableism**: targeting people with disabilities
 ## Architecture
 The system consists of:
 1. **Analyzer Module** (`app/analyzer/`) - Async batch processor for classification
 2. **Database Schema** (`scripts/02-toxicity.sql`) - Toxicity scores and analysis runs
 3. **Web Interface** - Dashboard and flagged content review
 4. **API Endpoints** - For manual review of flagged content
 ## Setup
 ### 1. Environment Variables
 Add to your `.env` file:
 ```bash
 # OpenAI API key for toxicity analysis
 OPENAI_API_KEY=sk-...
 # Analyzer configuration (optional)
 ANALYZER_MODEL=gpt-4o-mini
 ANALYZER_BATCH_SIZE=10
 ANALYZER_CONCURRENCY=5
 ANALYZER_FLAG_THRESHOLD=0.5
 ANALYZER_LIMIT=0  # 0 = no limit, or set to test on limited number
 ```
 ### 2. Database Migration
 The toxicity schema is applied automatically when the analyzer runs for the first time. It creates:
 - `toxicity_scores` table - stores scores for each status
 - `analysis_runs` table - audit trail of analysis runs
 To manually apply the migration:
 ```bash
 docker exec -i mastodon-collector-db-1 psql -U collector -d mastodon_collector < scripts/02-toxicity.sql
 ```
 ### 3. Install Dependencies
 Dependencies are already added to `requirements.txt`:
 - `openai==1.58.1` - OpenAI API client
 - `asyncpg==0.30.0` - Async PostgreSQL driver
 Rebuild the Docker containers to install:
 ```bash
 docker-compose build
 docker-compose up -d
 ```
 ## Running the Analyzer
 ### One-Time Analysis
 Run the analyzer manually to score all unscored statuses:
 ```bash
 docker exec mastodon-collector-collector-1 python -m app.analyzer
 ```
 ### Test on Limited Sample
 To test on 100 statuses first:
 ```bash
 docker exec mastodon-collector-collector-1 bash -c "ANALYZER_LIMIT=100 python -m app.analyzer"
 ```
 ### Automated Analysis (Future)
 You can schedule the analyzer to run periodically using cron or a scheduler service. For example, add to your `docker-compose.yml`:
 ```yaml
  analyzer:
    build: .
    command: python -m app.analyzer
    environment:
      - DATABASE_URL=postgresql://collector:${POSTGRES_PASSWORD}@db:5432/mastodon_collector
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - ANALYZER_LIMIT=${ANALYZER_LIMIT:-0}
    depends_on:
      - db
    restart: "no"  # Run once, don't restart
 ```
 Then trigger manually:
 ```bash
 docker-compose run --rm analyzer
 ```
 ## Web Interface
 ### Analysis Dashboard
 Visit http://localhost:8585/analysis to see:
 - Overall statistics (total scored, flagged count, averages)
 - Toxicity trends over time
 - Category breakdown chart
 - Recent analysis runs
 ### Flagged Content Review
 Visit http://localhost:8585/analysis/flagged to:
 - Browse flagged content (threshold >= 0.5 by default)
 - Filter by category, account, date range, review status
 - Sort by overall toxicity or specific categories
 - Manually review and mark items as:
  - ✓ Correct (correctly flagged)
  - ✗ Incorrect (false positive)
  - ? Unsure
 ### Review Workflow
 1. Click on flagged items to review
 2. Use the review buttons (✓, ✗, ?) to mark your assessment
 3. Filter by `review_status=unreviewed` to focus on items needing review
 4. Use reviewed data to improve the classifier or adjust thresholds
 ## Cost Estimation
 Based on GPT-4o-mini pricing (as of Jan 2025):
 - Input: $0.150 per 1M tokens
 - Output: $0.600 per 1M tokens
 Typical costs:
 - ~1,000 statuses = $0.05-0.15
 - ~10,000 statuses = $0.50-1.50
 The analyzer logs estimated costs after each run.
 ## Architecture Details
 ### Batch Processing
 The analyzer processes statuses in batches (default: 10 per API call) with concurrency control (default: 5 simultaneous batches). This optimizes for:
 - Cost efficiency (batch API calls)
 - Rate limit compliance
 - Parallel processing speed
 ### Scoring Logic
 Each status receives:
 - 12 category scores (0.0 - 1.0)
 - Overall score = max of all categories
 - Flagged if overall >= threshold (default 0.5)
 ### Human Review
 Manual reviews help:
 - Validate AI classifications
 - Identify patterns of false positives
 - Build training data for future improvements
 - Adjust thresholds per category if needed
 ## Dutch Language Support
 The classifier is specifically trained to handle Dutch political content, including:
 - Dutch slang and coded terms ("gelukszoekers", "omvolking", "wappie", etc.)
 - Political context and satire
 - Zwarte Piet debates
 - Dutch far-right rhetoric
 ## Templates
 The Bluesky collector templates can be adapted for Mastodon. Key files to create:
 1. `app/templates/analysis.html` - Main dashboard
 2. `app/templates/flagged.html` - Flagged content browser
 These templates should include:
 - Chart.js for visualizations
 - Filter forms for exploration
 - Review buttons for manual validation
 ## Troubleshooting
 ### No statuses being scored
 - Check that statuses exist: `SELECT COUNT(*) FROM statuses WHERE content IS NOT NULL AND reblog_of_id IS NULL;`
 - Check migration applied: `\dt toxicity_scores` in psql
 - Check OPENAI_API_KEY is set
 ### Rate limit errors
 - Reduce `ANALYZER_CONCURRENCY` (try 2-3)
 - Reduce `ANALYZER_BATCH_SIZE` (try 5)
 - The analyzer retries with exponential backoff automatically
 ### High false positive rate
 - Increase `ANALYZER_FLAG_THRESHOLD` (try 0.6 or 0.7)
 - Review flagged items and look for patterns
 - Dutch political content can be intense but not necessarily toxic
 ### Template errors
 - Ensure templates exist in `app/templates/`
 - Check that analysis helper functions are imported correctly
 - Verify template filters are defined (`format_number`, `time_ago`, etc.)
 ## Next Steps
 1. Copy analysis templates from Bluesky collector to `app/templates/`
 2. Add navigation links to analysis dashboard in base template
 3. Run initial analysis on sample data
 4. Review flagged content and adjust thresholds
 5. Set up automated analysis runs (cron/scheduler)
 6. Monitor costs and performance
 ## References
 - Bluesky collector: https://forgejo.postxsociety.cloud/pieter/bluesky-collector
 - OpenAI API: https://platform.openai.com/docs
 - asyncpg: https://magicstack.github.io/asyncpg/
--- a/app/templates/base.html
+++ b/app/templates/base.html
@ -234,6 +234,7 @@
        .justify-between { justify-content: space-between; }
        .truncate { white-space: nowrap; overflow: hidden; text-overflow: ellipsis; max-width: 400px; }
    </style>
    {% block extra_css %}{% endblock %}
 </head>
 <body>
    <nav>
@ -259,5 +260,6 @@
            {% block content %}{% endblock %}
        </div>
    </main>
    {% block extra_js %}{% endblock %}
 </body>
 </html>
--- a/app/web.py
+++ b/app/web.py
@ -75,6 +75,13 @@ def truncate_text(text, length=200):
    return text[:length] + "..."
@app.template_filter('encode_uri')
 def encode_uri(uri):
    """URL encode a URI for use in query parameters."""
    from urllib.parse import quote
    return quote(str(uri), safe='')
 # Initialize database on startup
 with app.app_context():
    init_db()