Fix flagged page styling by adding CSS/JS blocks to base template
- Add {% block extra_css %} to base.html head section
- Add {% block extra_js %} to base.html before closing body tag
- Add encode_uri template filter for URL encoding
Flagged page CSS and JavaScript now load correctly, fixing:
- Filter bar styling
- Table formatting
- Review button styles and functionality
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
parent
c177725ebc
commit
ac2b50751b
3 changed files with 9 additions and 242 deletions
|
|
@ -1,242 +0,0 @@
|
||||||
# Toxicity Analysis System
|
|
||||||
|
|
||||||
This document describes the toxicity analysis system for the Mastodon collector, adapted from the Bluesky collector implementation.
|
|
||||||
|
|
||||||
## Overview
|
|
||||||
|
|
||||||
The toxicity analysis system uses OpenAI's GPT-4o-mini to classify Mastodon posts across 12 toxicity categories:
|
|
||||||
|
|
||||||
- **toxic**: rude, disrespectful, or aggressive language
|
|
||||||
- **threat**: threats of violence, harm, or intimidation
|
|
||||||
- **hate_speech**: targeting based on protected characteristics
|
|
||||||
- **racism**: race/ethnicity-based targeting
|
|
||||||
- **antisemitism**: anti-Jewish content
|
|
||||||
- **islamophobia**: anti-Muslim content
|
|
||||||
- **sexism**: gender-based discrimination
|
|
||||||
- **homophobia**: anti-LGBTQ+ content
|
|
||||||
- **insult**: personal attacks and name-calling
|
|
||||||
- **dehumanization**: comparing people to animals/vermin
|
|
||||||
- **extremism**: far-right/left extremist rhetoric
|
|
||||||
- **ableism**: targeting people with disabilities
|
|
||||||
|
|
||||||
## Architecture
|
|
||||||
|
|
||||||
The system consists of:
|
|
||||||
|
|
||||||
1. **Analyzer Module** (`app/analyzer/`) - Async batch processor for classification
|
|
||||||
2. **Database Schema** (`scripts/02-toxicity.sql`) - Toxicity scores and analysis runs
|
|
||||||
3. **Web Interface** - Dashboard and flagged content review
|
|
||||||
4. **API Endpoints** - For manual review of flagged content
|
|
||||||
|
|
||||||
## Setup
|
|
||||||
|
|
||||||
### 1. Environment Variables
|
|
||||||
|
|
||||||
Add to your `.env` file:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# OpenAI API key for toxicity analysis
|
|
||||||
OPENAI_API_KEY=sk-...
|
|
||||||
|
|
||||||
# Analyzer configuration (optional)
|
|
||||||
ANALYZER_MODEL=gpt-4o-mini
|
|
||||||
ANALYZER_BATCH_SIZE=10
|
|
||||||
ANALYZER_CONCURRENCY=5
|
|
||||||
ANALYZER_FLAG_THRESHOLD=0.5
|
|
||||||
ANALYZER_LIMIT=0 # 0 = no limit, or set to test on limited number
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. Database Migration
|
|
||||||
|
|
||||||
The toxicity schema is applied automatically when the analyzer runs for the first time. It creates:
|
|
||||||
|
|
||||||
- `toxicity_scores` table - stores scores for each status
|
|
||||||
- `analysis_runs` table - audit trail of analysis runs
|
|
||||||
|
|
||||||
To manually apply the migration:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker exec -i mastodon-collector-db-1 psql -U collector -d mastodon_collector < scripts/02-toxicity.sql
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. Install Dependencies
|
|
||||||
|
|
||||||
Dependencies are already added to `requirements.txt`:
|
|
||||||
- `openai==1.58.1` - OpenAI API client
|
|
||||||
- `asyncpg==0.30.0` - Async PostgreSQL driver
|
|
||||||
|
|
||||||
Rebuild the Docker containers to install:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker-compose build
|
|
||||||
docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
## Running the Analyzer
|
|
||||||
|
|
||||||
### One-Time Analysis
|
|
||||||
|
|
||||||
Run the analyzer manually to score all unscored statuses:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker exec mastodon-collector-collector-1 python -m app.analyzer
|
|
||||||
```
|
|
||||||
|
|
||||||
### Test on Limited Sample
|
|
||||||
|
|
||||||
To test on 100 statuses first:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker exec mastodon-collector-collector-1 bash -c "ANALYZER_LIMIT=100 python -m app.analyzer"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Automated Analysis (Future)
|
|
||||||
|
|
||||||
You can schedule the analyzer to run periodically using cron or a scheduler service. For example, add to your `docker-compose.yml`:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
analyzer:
|
|
||||||
build: .
|
|
||||||
command: python -m app.analyzer
|
|
||||||
environment:
|
|
||||||
- DATABASE_URL=postgresql://collector:${POSTGRES_PASSWORD}@db:5432/mastodon_collector
|
|
||||||
- OPENAI_API_KEY=${OPENAI_API_KEY}
|
|
||||||
- ANALYZER_LIMIT=${ANALYZER_LIMIT:-0}
|
|
||||||
depends_on:
|
|
||||||
- db
|
|
||||||
restart: "no" # Run once, don't restart
|
|
||||||
```
|
|
||||||
|
|
||||||
Then trigger manually:
|
|
||||||
```bash
|
|
||||||
docker-compose run --rm analyzer
|
|
||||||
```
|
|
||||||
|
|
||||||
## Web Interface
|
|
||||||
|
|
||||||
### Analysis Dashboard
|
|
||||||
|
|
||||||
Visit http://localhost:8585/analysis to see:
|
|
||||||
|
|
||||||
- Overall statistics (total scored, flagged count, averages)
|
|
||||||
- Toxicity trends over time
|
|
||||||
- Category breakdown chart
|
|
||||||
- Recent analysis runs
|
|
||||||
|
|
||||||
### Flagged Content Review
|
|
||||||
|
|
||||||
Visit http://localhost:8585/analysis/flagged to:
|
|
||||||
|
|
||||||
- Browse flagged content (threshold >= 0.5 by default)
|
|
||||||
- Filter by category, account, date range, review status
|
|
||||||
- Sort by overall toxicity or specific categories
|
|
||||||
- Manually review and mark items as:
|
|
||||||
- ✓ Correct (correctly flagged)
|
|
||||||
- ✗ Incorrect (false positive)
|
|
||||||
- ? Unsure
|
|
||||||
|
|
||||||
### Review Workflow
|
|
||||||
|
|
||||||
1. Click on flagged items to review
|
|
||||||
2. Use the review buttons (✓, ✗, ?) to mark your assessment
|
|
||||||
3. Filter by `review_status=unreviewed` to focus on items needing review
|
|
||||||
4. Use reviewed data to improve the classifier or adjust thresholds
|
|
||||||
|
|
||||||
## Cost Estimation
|
|
||||||
|
|
||||||
Based on GPT-4o-mini pricing (as of Jan 2025):
|
|
||||||
- Input: $0.150 per 1M tokens
|
|
||||||
- Output: $0.600 per 1M tokens
|
|
||||||
|
|
||||||
Typical costs:
|
|
||||||
- ~1,000 statuses = $0.05-0.15
|
|
||||||
- ~10,000 statuses = $0.50-1.50
|
|
||||||
|
|
||||||
The analyzer logs estimated costs after each run.
|
|
||||||
|
|
||||||
## Architecture Details
|
|
||||||
|
|
||||||
### Batch Processing
|
|
||||||
|
|
||||||
The analyzer processes statuses in batches (default: 10 per API call) with concurrency control (default: 5 simultaneous batches). This optimizes for:
|
|
||||||
|
|
||||||
- Cost efficiency (batch API calls)
|
|
||||||
- Rate limit compliance
|
|
||||||
- Parallel processing speed
|
|
||||||
|
|
||||||
### Scoring Logic
|
|
||||||
|
|
||||||
Each status receives:
|
|
||||||
- 12 category scores (0.0 - 1.0)
|
|
||||||
- Overall score = max of all categories
|
|
||||||
- Flagged if overall >= threshold (default 0.5)
|
|
||||||
|
|
||||||
### Human Review
|
|
||||||
|
|
||||||
Manual reviews help:
|
|
||||||
- Validate AI classifications
|
|
||||||
- Identify patterns of false positives
|
|
||||||
- Build training data for future improvements
|
|
||||||
- Adjust thresholds per category if needed
|
|
||||||
|
|
||||||
## Dutch Language Support
|
|
||||||
|
|
||||||
The classifier is specifically trained to handle Dutch political content, including:
|
|
||||||
|
|
||||||
- Dutch slang and coded terms ("gelukszoekers", "omvolking", "wappie", etc.)
|
|
||||||
- Political context and satire
|
|
||||||
- Zwarte Piet debates
|
|
||||||
- Dutch far-right rhetoric
|
|
||||||
|
|
||||||
## Templates
|
|
||||||
|
|
||||||
The Bluesky collector templates can be adapted for Mastodon. Key files to create:
|
|
||||||
|
|
||||||
1. `app/templates/analysis.html` - Main dashboard
|
|
||||||
2. `app/templates/flagged.html` - Flagged content browser
|
|
||||||
|
|
||||||
These templates should include:
|
|
||||||
- Chart.js for visualizations
|
|
||||||
- Filter forms for exploration
|
|
||||||
- Review buttons for manual validation
|
|
||||||
|
|
||||||
## Troubleshooting
|
|
||||||
|
|
||||||
### No statuses being scored
|
|
||||||
|
|
||||||
- Check that statuses exist: `SELECT COUNT(*) FROM statuses WHERE content IS NOT NULL AND reblog_of_id IS NULL;`
|
|
||||||
- Check migration applied: `\dt toxicity_scores` in psql
|
|
||||||
- Check OPENAI_API_KEY is set
|
|
||||||
|
|
||||||
### Rate limit errors
|
|
||||||
|
|
||||||
- Reduce `ANALYZER_CONCURRENCY` (try 2-3)
|
|
||||||
- Reduce `ANALYZER_BATCH_SIZE` (try 5)
|
|
||||||
- The analyzer retries with exponential backoff automatically
|
|
||||||
|
|
||||||
### High false positive rate
|
|
||||||
|
|
||||||
- Increase `ANALYZER_FLAG_THRESHOLD` (try 0.6 or 0.7)
|
|
||||||
- Review flagged items and look for patterns
|
|
||||||
- Dutch political content can be intense but not necessarily toxic
|
|
||||||
|
|
||||||
### Template errors
|
|
||||||
|
|
||||||
- Ensure templates exist in `app/templates/`
|
|
||||||
- Check that analysis helper functions are imported correctly
|
|
||||||
- Verify template filters are defined (`format_number`, `time_ago`, etc.)
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
1. Copy analysis templates from Bluesky collector to `app/templates/`
|
|
||||||
2. Add navigation links to analysis dashboard in base template
|
|
||||||
3. Run initial analysis on sample data
|
|
||||||
4. Review flagged content and adjust thresholds
|
|
||||||
5. Set up automated analysis runs (cron/scheduler)
|
|
||||||
6. Monitor costs and performance
|
|
||||||
|
|
||||||
## References
|
|
||||||
|
|
||||||
- Bluesky collector: https://forgejo.postxsociety.cloud/pieter/bluesky-collector
|
|
||||||
- OpenAI API: https://platform.openai.com/docs
|
|
||||||
- asyncpg: https://magicstack.github.io/asyncpg/
|
|
||||||
|
|
@ -234,6 +234,7 @@
|
||||||
.justify-between { justify-content: space-between; }
|
.justify-between { justify-content: space-between; }
|
||||||
.truncate { white-space: nowrap; overflow: hidden; text-overflow: ellipsis; max-width: 400px; }
|
.truncate { white-space: nowrap; overflow: hidden; text-overflow: ellipsis; max-width: 400px; }
|
||||||
</style>
|
</style>
|
||||||
|
{% block extra_css %}{% endblock %}
|
||||||
</head>
|
</head>
|
||||||
<body>
|
<body>
|
||||||
<nav>
|
<nav>
|
||||||
|
|
@ -259,5 +260,6 @@
|
||||||
{% block content %}{% endblock %}
|
{% block content %}{% endblock %}
|
||||||
</div>
|
</div>
|
||||||
</main>
|
</main>
|
||||||
|
{% block extra_js %}{% endblock %}
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|
|
||||||
|
|
@ -75,6 +75,13 @@ def truncate_text(text, length=200):
|
||||||
return text[:length] + "..."
|
return text[:length] + "..."
|
||||||
|
|
||||||
|
|
||||||
|
@app.template_filter('encode_uri')
|
||||||
|
def encode_uri(uri):
|
||||||
|
"""URL encode a URI for use in query parameters."""
|
||||||
|
from urllib.parse import quote
|
||||||
|
return quote(str(uri), safe='')
|
||||||
|
|
||||||
|
|
||||||
# Initialize database on startup
|
# Initialize database on startup
|
||||||
with app.app_context():
|
with app.app_context():
|
||||||
init_db()
|
init_db()
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue