Update README with toxicity analysis features and add MIT license

- Document toxicity analysis capabilities and features - Add configuration for OPENAI_API_KEY - Include instructions for running analysis - Add cost estimation and database schema info - Link to ANALYSIS_REPORT.md for research findings - Add MIT License 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-31 17:55:03 +02:00 · 2026-03-31 17:55:03 +02:00 · 870a0710b5
commit 870a0710b5
parent 2faf6c660b
2 changed files with 74 additions and 1 deletions
--- a/21
+++ b/21
@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Pieter Steenman
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 # Mastodon Collector

-Collects posts, replies, and mentions from a list of Mastodon accounts and stores them in PostgreSQL. Includes a web UI for account management and data browsing, plus JSON/CSV APIs for your analysis pipeline.
+Collects posts, replies, and mentions from a list of Mastodon accounts and stores them in PostgreSQL. Includes automated toxicity analysis using OpenAI GPT-4o-mini, a web UI for account management, data browsing, and manual review of flagged content, plus JSON/CSV APIs for your analysis pipeline.

 ## Quick Start

@ -38,8 +38,47 @@ Edit `.env` to customize:
 POSTGRES_PASSWORD=collector_secret      # Change for production
 FLASK_SECRET_KEY=change-me-in-production
 POLL_INTERVAL_SECONDS=14400             # Default: 4 hours (14400s)
+OPENAI_API_KEY=sk-...                   # Required for toxicity analysis
 ```

+## Toxicity Analysis
+
+The system includes automated toxicity detection and manual review capabilities:
+
+### Features
+
+- **Automated Classification**: Uses OpenAI GPT-4o-mini to analyze posts across 12 toxicity dimensions:
+  - General toxicity, threats, hate speech
+  - Racism, antisemitism, islamophobia
+  - Sexism, homophobia, ableism
+  - Insults, dehumanization, extremism
+- **Flagging System**: Posts with overall toxicity ≥ 0.5 are automatically flagged for review
+- **Manual Review Interface**: Web dashboard at `/analysis/flagged` for human validation
+- **Analysis Dashboard**: Statistics, trends, and category breakdowns at `/analysis`
+
+### Running Analysis
+
+```bash
+# Analyze all unscored statuses (run inside collector container)
+docker exec mastodon-collector-collector-1 bash -c "python -m app.analyzer"
+
+# Limit to first 100 statuses for testing
+docker exec mastodon-collector-collector-1 bash -c "ANALYZER_LIMIT=100 python -m app.analyzer"
+```
+
+### Analysis Database Schema
+
+Additional tables for toxicity analysis:
+
+- `toxicity_scores` — toxicity scores per status (12 categories + overall)
+- `analysis_runs` — audit trail of analysis runs with costs and duration
+
+### Cost Estimation
+
+- Batch processing: ~10 posts per API call
+- Estimated cost: ~$0.12 per 1,000 posts analyzed
+- Example: 16,906 posts ≈ $1.95
+
 ## API Endpoints

 For plugging into your analysis pipeline:
@ -89,3 +128,16 @@ docker compose up -d
 docker compose down          # Stop services, keep data
 docker compose down -v       # Stop services AND delete database
 ```
+
+## Research & Reporting
+
+See [ANALYSIS_REPORT.md](ANALYSIS_REPORT.md) for a complete methodology report including:
+- Data collection statistics
+- Toxicity analysis methodology
+- Manual review results and findings
+- False positive analysis
+- Limitations and considerations
+
+## License
+
+MIT License - see [LICENSE](LICENSE) file for details.