mastodon-collector/app/analyzer/config.py

"""Configuration for the toxicity analyzer."""

from __future__ import annotations

import os
from dataclasses import dataclass


@dataclass
class AnalyzerConfig:
    """Configuration for the toxicity analyzer."""

    database_url: str
    llm_api_key: str
    model: str = "gpt-4o-mini"
    batch_size: int = 10
    concurrency: int = 5
    flag_threshold: float = 0.5
    limit: int = 0  # 0 = no limit
    log_level: str = "INFO"

    # Pricing (as of 2025, per million tokens)
    input_cost_per_m: float = 0.150  # $0.150 per 1M input tokens
    output_cost_per_m: float = 0.600  # $0.600 per 1M output tokens

    @classmethod
    def from_env(cls) -> AnalyzerConfig:
        """Load configuration from environment variables."""
        return cls(
            database_url=os.environ["DATABASE_URL"],
            llm_api_key=os.environ["LLM_API_KEY"],
            model=os.getenv("ANALYZER_MODEL", "gpt-4o-mini"),
            batch_size=int(os.getenv("ANALYZER_BATCH_SIZE", "10")),
            concurrency=int(os.getenv("ANALYZER_CONCURRENCY", "5")),
            flag_threshold=float(os.getenv("ANALYZER_FLAG_THRESHOLD", "0.5")),
            limit=int(os.getenv("ANALYZER_LIMIT", "0")),
            log_level=os.getenv("ANALYZER_LOG_LEVEL", "INFO"),
        )
Add toxicity analysis system for Mastodon statuses Implements comprehensive toxicity analysis following the Bluesky collector architecture: - Analyzer module with async batch processing using GPT-4o-mini - Database schema for toxicity scores and analysis run tracking - 12 toxicity categories (toxic, threat, hate_speech, racism, antisemitism, islamophobia, sexism, homophobia, insult, dehumanization, extremism, ableism) - Web interface routes for analysis dashboard and flagged content review - Manual review API endpoint for human validation - Analysis helper functions for database queries - Dutch language support with coded political term recognition Usage: docker exec mastodon-collector-collector-1 python -m app.analyzer See TOXICITY_ANALYSIS.md for full documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2026-03-30 14:43:35 +02:00			`"""Configuration for the toxicity analyzer."""`

			`from __future__ import annotations`

			`import os`
			`from dataclasses import dataclass`


			`@dataclass`
			`class AnalyzerConfig:`
			`"""Configuration for the toxicity analyzer."""`

			`database_url: str`
Add generic LLM provider terminology - Update all documentation to use "LLM API" instead of "OpenAI GPT-4o-mini" - Rename OPENAI_API_KEY to LLM_API_KEY in configuration - Update code comments to reflect generic LLM usage - Keep OpenAI-compatible client library (supports any LLM provider) - Add LOCAL_OPERATIONS.md and accounts.txt to .gitignore 2026-04-18 20:27:09 +02:00			`llm_api_key: str`
Add toxicity analysis system for Mastodon statuses Implements comprehensive toxicity analysis following the Bluesky collector architecture: - Analyzer module with async batch processing using GPT-4o-mini - Database schema for toxicity scores and analysis run tracking - 12 toxicity categories (toxic, threat, hate_speech, racism, antisemitism, islamophobia, sexism, homophobia, insult, dehumanization, extremism, ableism) - Web interface routes for analysis dashboard and flagged content review - Manual review API endpoint for human validation - Analysis helper functions for database queries - Dutch language support with coded political term recognition Usage: docker exec mastodon-collector-collector-1 python -m app.analyzer See TOXICITY_ANALYSIS.md for full documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2026-03-30 14:43:35 +02:00			`model: str = "gpt-4o-mini"`
			`batch_size: int = 10`
			`concurrency: int = 5`
			`flag_threshold: float = 0.5`
			`limit: int = 0 # 0 = no limit`
			`log_level: str = "INFO"`

			`# Pricing (as of 2025, per million tokens)`
			`input_cost_per_m: float = 0.150 # $0.150 per 1M input tokens`
			`output_cost_per_m: float = 0.600 # $0.600 per 1M output tokens`

			`@classmethod`
			`def from_env(cls) -> AnalyzerConfig:`
			`"""Load configuration from environment variables."""`
			`return cls(`
			`database_url=os.environ["DATABASE_URL"],`
Add generic LLM provider terminology - Update all documentation to use "LLM API" instead of "OpenAI GPT-4o-mini" - Rename OPENAI_API_KEY to LLM_API_KEY in configuration - Update code comments to reflect generic LLM usage - Keep OpenAI-compatible client library (supports any LLM provider) - Add LOCAL_OPERATIONS.md and accounts.txt to .gitignore 2026-04-18 20:27:09 +02:00			`llm_api_key=os.environ["LLM_API_KEY"],`
Add toxicity analysis system for Mastodon statuses Implements comprehensive toxicity analysis following the Bluesky collector architecture: - Analyzer module with async batch processing using GPT-4o-mini - Database schema for toxicity scores and analysis run tracking - 12 toxicity categories (toxic, threat, hate_speech, racism, antisemitism, islamophobia, sexism, homophobia, insult, dehumanization, extremism, ableism) - Web interface routes for analysis dashboard and flagged content review - Manual review API endpoint for human validation - Analysis helper functions for database queries - Dutch language support with coded political term recognition Usage: docker exec mastodon-collector-collector-1 python -m app.analyzer See TOXICITY_ANALYSIS.md for full documentation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> 2026-03-30 14:43:35 +02:00			`model=os.getenv("ANALYZER_MODEL", "gpt-4o-mini"),`
			`batch_size=int(os.getenv("ANALYZER_BATCH_SIZE", "10")),`
			`concurrency=int(os.getenv("ANALYZER_CONCURRENCY", "5")),`
			`flag_threshold=float(os.getenv("ANALYZER_FLAG_THRESHOLD", "0.5")),`
			`limit=int(os.getenv("ANALYZER_LIMIT", "0")),`
			`log_level=os.getenv("ANALYZER_LOG_LEVEL", "INFO"),`
			`)`