Added comprehensive project documentation and MIT license. Removed Claude IDE configuration files from repository tracking. Documentation added: - FINDINGS.md: Complete methodology report and research findings - 159 accounts tracked, 15,190 posts collected (Jan 1 - Mar 30) - Human review results: 40.4% correct, 59.6% false positives - AI toxicity detection limitations and recommendations - OPERATIONS.md: Complete operations and maintenance guide - Service start/stop procedures - Database operations and queries - Configuration options - Troubleshooting guide - Data export instructions License: - Added MIT License to README.md - Copyright 2026 Post X Society - Open source with permissive license Repository cleanup: - Added .claude/ to .gitignore - Removed .claude/settings.local.json from tracking - Prevents IDE-specific files from being committed 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
8.3 KiB
Bluesky Toxicity Analysis - Main Findings
Study Overview
Period: January 1 – March 30, 2026 (89 days) Monitored Accounts: 159 Dutch political accounts Total Posts Collected: 15,190 posts
1. Data Collection Summary
Content Distribution
-
Primary Content (by tracked accounts):
- Original Posts: 3,032
- Replies: 3,652
- Total Primary: 6,684 posts
-
Secondary Content (mentions of tracked accounts):
- Unique Mention Posts: 8,506
- Note: Posts mentioning multiple tracked accounts counted once
Total Dataset
- Combined Content: 15,190 posts
- Collection Method: Automated via Bluesky Public API (every 4 hours)
- Infrastructure: Docker containers with PostgreSQL database
2. Toxicity Detection Results
AI Model Performance
- Model Used: OpenAI GPT-4.1-nano
- Classification Categories: 12 toxicity dimensions
- Flagging Threshold: Overall toxicity score ≥ 0.5 (50%)
Flagged Content
- Primary Content (Posts/Replies): 97 posts flagged
- Secondary Content (Mentions): 413 unique posts flagged
- Total Flagged: 510 unique posts
Distribution Insight
- 81% of flagged content came from mentions (external users → politicians)
- 19% of flagged content came from politicians themselves
- External users directed significantly more toxic content toward politicians than politicians produced
3. Human Review Results
Review Completion
- Total Items Reviewed: 510 posts (100% of flagged content)
- Review Period: January 1 – March 30, 2026
- Review Interface: Custom web application with ✓/✗/? buttons
Validation Results
Primary Content (Posts/Replies by Politicians)
| Status | Count | Percentage |
|---|---|---|
| ✓ Correctly Flagged | 32 | 33.0% |
| ✗ Incorrectly Flagged | 65 | 67.0% |
| ? Unsure | 0 | 0.0% |
| Total | 97 | 100% |
Secondary Content (Mentions of Politicians)
| Status | Count | Percentage |
|---|---|---|
| ✓ Correctly Flagged | 174 | 42.1% |
| ✗ Incorrectly Flagged | 239 | 57.9% |
| ? Unsure | 0 | 0.0% |
| Total | 413 | 100% |
Combined Results
| Status | Count | Percentage |
|---|---|---|
| ✓ Correctly Flagged | 206 | 40.4% |
| ✗ Incorrectly Flagged | 304 | 59.6% |
| ? Unsure | 0 | 0.0% |
| Total | 510 | 100% |
4. Key Findings
4.1 High False Positive Rate
- Overall False Positive Rate: 59.6%
- The AI model over-flagged content, with nearly 6 out of 10 flagged items being false positives
- Primary content had worse performance (67.0% false positives) than mentions (57.9%)
4.2 Model Limitations Identified
- Threshold Sensitivity: The 0.5 threshold appears too low for Dutch political discourse
- Context Misinterpretation: Strong policy language, political criticism, and satire frequently misclassified as toxic
- Cultural/Linguistic Gaps: Dutch political communication patterns may not align with model training data
- Nuance Detection: Difficulty distinguishing between heated but legitimate debate and actual toxicity
4.3 Directional Toxicity Pattern
- External mentions (8,506 posts) generated 413 flagged items (4.9% flagging rate)
- Primary content (6,684 posts) generated 97 flagged items (1.5% flagging rate)
- Politicians receive approximately 3× more toxic content than they produce (by flagging rate)
- However, after human review, both sources showed high false positive rates
4.4 Accuracy Comparison
- Mentions accuracy: 42.1% (slightly better)
- Primary content accuracy: 33.0% (worse)
- Neither content type achieved acceptable accuracy for automated moderation
- Possible explanation: Politicians' language more frequently uses strong policy terms that trigger false positives
5. Implications for Automated Moderation
What This Study Reveals
- AI Cannot Replace Human Judgment: 59.6% false positive rate makes unsupervised automation dangerous
- Threshold Optimization Needed: Current 0.5 threshold too aggressive; may need 0.7+ for political content
- Domain-Specific Training Required: Political discourse needs specialized models or fine-tuning
- Human-in-the-Loop Essential: Automated flagging useful for triage, but human review mandatory
Recommended Approach
- Use AI toxicity detection as first-pass screening only
- Require human review for all flagged content before action
- Consider higher thresholds (0.7–0.8) for political accounts
- Train domain-specific models on Dutch political discourse
- Implement appeals process for false positives
6. Technical Implementation Success
What Worked Well
- Automated Collection: 4-hour collection cycles captured comprehensive dataset
- Human Review Interface: Web UI with ✓/✗/? buttons efficient for manual validation
- Date Filtering: Allowed focused analysis of specific time periods
- Engagement Metrics: Successfully captured likes, replies, reposts, quotes for mentions
- Deduplication Logic: Properly handled posts mentioning multiple tracked accounts
Infrastructure Performance
- Uptime: 99%+ (only brief scheduler issue Feb 23-24)
- Data Integrity: PostgreSQL database handled 15K+ posts without issues
- Analysis Throughput: GPT-4.1-nano processed all content efficiently
- Web Interface: Responsive UI for 500+ manual reviews
7. Study Limitations
- Single Model Used: Only tested GPT-4.1-nano; ensemble approaches not evaluated
- No Inter-Rater Reliability: Single human reviewer; no validation of review consistency
- Limited Context: Dutch political context; findings may not generalize to other domains
- Arbitrary Threshold: 0.5 threshold not scientifically optimized
- Limited Time Period: 3-month window may not capture seasonal variations in discourse
- No Appeal Process: No mechanism for accounts to contest flagging decisions
8. Recommendations for Future Work
Short-Term Improvements
- Threshold Optimization: Test 0.6, 0.7, 0.8 thresholds and measure precision/recall
- Category-Specific Tuning: Different thresholds for different toxicity categories
- Context Windows: Analyze conversation threads, not isolated posts
- Multi-Model Validation: Test other models (Perspective API, custom fine-tuned models)
Long-Term Research
- Dutch Political Corpus: Create labeled training dataset for Dutch political discourse
- Fine-Tune Models: Train specialized classifiers on validated Dutch political content
- Longitudinal Study: Track patterns over election cycles and major events
- Cross-Platform Analysis: Compare Bluesky toxicity patterns with Twitter/X, Mastodon
- Inter-Rater Reliability Study: Multiple reviewers to validate human judgment consistency
9. Data Access
Database Content (as of March 30, 2026)
- Accounts Table: 159 tracked political accounts
- Posts Table: 6,684 posts and replies
- Mentions Table: 8,506 unique mention posts
- Toxicity Scores: 6,684 scored primary posts
- Mention Toxicity Scores: 8,506 scored mentions
- Human Reviews: 510 manual validations
Exported Datasets Available
- Full post content with toxicity scores
- Human review decisions with timestamps
- Engagement metrics (likes, replies, reposts, quotes)
- Time-series data for trend analysis
10. Conclusion
This study demonstrates that while AI-powered toxicity detection can identify potential concerns in large-scale social media content, it cannot reliably moderate without substantial human oversight. The 59.6% false positive rate indicates current models are not suitable for automated enforcement in political discourse contexts.
Key Takeaway: AI toxicity detection is a useful triage tool for human moderators, not a replacement for human judgment. Political discourse requires nuanced understanding of context, satire, and legitimate critique that current AI models cannot consistently provide.
Project Status: Data collection complete. Web interface remains available for analysis and reporting. Database preserved for future research.
Generated: March 30, 2026 Study Period: January 1 – March 30, 2026 Monitored Platform: Bluesky Social Network Geographic Focus: Dutch Political Discourse