Add toxicity analysis system for Mastodon statuses
Implements comprehensive toxicity analysis following the Bluesky collector architecture:
- Analyzer module with async batch processing using GPT-4o-mini
- Database schema for toxicity scores and analysis run tracking
- 12 toxicity categories (toxic, threat, hate_speech, racism, antisemitism, islamophobia, sexism, homophobia, insult, dehumanization, extremism, ableism)
- Web interface routes for analysis dashboard and flagged content review
- Manual review API endpoint for human validation
- Analysis helper functions for database queries
- Dutch language support with coded political term recognition
Usage:
docker exec mastodon-collector-collector-1 python -m app.analyzer
See TOXICITY_ANALYSIS.md for full documentation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-30 14:43:35 +02:00
|
|
|
"""Configuration for the toxicity analyzer."""
|
|
|
|
|
|
|
|
|
|
from __future__ import annotations
|
|
|
|
|
|
|
|
|
|
import os
|
|
|
|
|
from dataclasses import dataclass
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@dataclass
|
|
|
|
|
class AnalyzerConfig:
|
|
|
|
|
"""Configuration for the toxicity analyzer."""
|
|
|
|
|
|
|
|
|
|
database_url: str
|
2026-04-18 20:27:09 +02:00
|
|
|
llm_api_key: str
|
Add toxicity analysis system for Mastodon statuses
Implements comprehensive toxicity analysis following the Bluesky collector architecture:
- Analyzer module with async batch processing using GPT-4o-mini
- Database schema for toxicity scores and analysis run tracking
- 12 toxicity categories (toxic, threat, hate_speech, racism, antisemitism, islamophobia, sexism, homophobia, insult, dehumanization, extremism, ableism)
- Web interface routes for analysis dashboard and flagged content review
- Manual review API endpoint for human validation
- Analysis helper functions for database queries
- Dutch language support with coded political term recognition
Usage:
docker exec mastodon-collector-collector-1 python -m app.analyzer
See TOXICITY_ANALYSIS.md for full documentation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-30 14:43:35 +02:00
|
|
|
model: str = "gpt-4o-mini"
|
|
|
|
|
batch_size: int = 10
|
|
|
|
|
concurrency: int = 5
|
|
|
|
|
flag_threshold: float = 0.5
|
|
|
|
|
limit: int = 0 # 0 = no limit
|
|
|
|
|
log_level: str = "INFO"
|
|
|
|
|
|
|
|
|
|
# Pricing (as of 2025, per million tokens)
|
|
|
|
|
input_cost_per_m: float = 0.150 # $0.150 per 1M input tokens
|
|
|
|
|
output_cost_per_m: float = 0.600 # $0.600 per 1M output tokens
|
|
|
|
|
|
|
|
|
|
@classmethod
|
|
|
|
|
def from_env(cls) -> AnalyzerConfig:
|
|
|
|
|
"""Load configuration from environment variables."""
|
|
|
|
|
return cls(
|
|
|
|
|
database_url=os.environ["DATABASE_URL"],
|
2026-04-18 20:27:09 +02:00
|
|
|
llm_api_key=os.environ["LLM_API_KEY"],
|
Add toxicity analysis system for Mastodon statuses
Implements comprehensive toxicity analysis following the Bluesky collector architecture:
- Analyzer module with async batch processing using GPT-4o-mini
- Database schema for toxicity scores and analysis run tracking
- 12 toxicity categories (toxic, threat, hate_speech, racism, antisemitism, islamophobia, sexism, homophobia, insult, dehumanization, extremism, ableism)
- Web interface routes for analysis dashboard and flagged content review
- Manual review API endpoint for human validation
- Analysis helper functions for database queries
- Dutch language support with coded political term recognition
Usage:
docker exec mastodon-collector-collector-1 python -m app.analyzer
See TOXICITY_ANALYSIS.md for full documentation.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-30 14:43:35 +02:00
|
|
|
model=os.getenv("ANALYZER_MODEL", "gpt-4o-mini"),
|
|
|
|
|
batch_size=int(os.getenv("ANALYZER_BATCH_SIZE", "10")),
|
|
|
|
|
concurrency=int(os.getenv("ANALYZER_CONCURRENCY", "5")),
|
|
|
|
|
flag_threshold=float(os.getenv("ANALYZER_FLAG_THRESHOLD", "0.5")),
|
|
|
|
|
limit=int(os.getenv("ANALYZER_LIMIT", "0")),
|
|
|
|
|
log_level=os.getenv("ANALYZER_LOG_LEVEL", "INFO"),
|
|
|
|
|
)
|