# Bluesky Toxicity Analysis - Main Findings ## Study Overview **Period:** January 1 – March 30, 2026 (89 days) **Monitored Accounts:** 159 Dutch political accounts **Total Posts Collected:** 15,190 posts --- ## 1. Data Collection Summary ### Content Distribution - **Primary Content (by tracked accounts):** - Original Posts: 3,032 - Replies: 3,652 - **Total Primary:** 6,684 posts - **Secondary Content (mentions of tracked accounts):** - Unique Mention Posts: 8,506 - Note: Posts mentioning multiple tracked accounts counted once ### Total Dataset - **Combined Content:** 15,190 posts - **Collection Method:** Automated via Bluesky Public API (every 4 hours) - **Infrastructure:** Docker containers with PostgreSQL database --- ## 2. Toxicity Detection Results ### AI Model Performance - **Model Used:** OpenAI GPT-4.1-nano - **Classification Categories:** 12 toxicity dimensions - **Flagging Threshold:** Overall toxicity score ≥ 0.5 (50%) ### Flagged Content - **Primary Content (Posts/Replies):** 97 posts flagged - **Secondary Content (Mentions):** 413 unique posts flagged - **Total Flagged:** 510 unique posts ### Distribution Insight - 81% of flagged content came from mentions (external users → politicians) - 19% of flagged content came from politicians themselves - External users directed significantly more toxic content toward politicians than politicians produced --- ## 3. Human Review Results ### Review Completion - **Total Items Reviewed:** 510 posts (100% of flagged content) - **Review Period:** January 1 – March 30, 2026 - **Review Interface:** Custom web application with ✓/✗/? buttons ### Validation Results #### Primary Content (Posts/Replies by Politicians) | Status | Count | Percentage | |--------|-------|------------| | ✓ Correctly Flagged | 32 | 33.0% | | ✗ Incorrectly Flagged | 65 | 67.0% | | ? Unsure | 0 | 0.0% | | **Total** | **97** | **100%** | #### Secondary Content (Mentions of Politicians) | Status | Count | Percentage | |--------|-------|------------| | ✓ Correctly Flagged | 174 | 42.1% | | ✗ Incorrectly Flagged | 239 | 57.9% | | ? Unsure | 0 | 0.0% | | **Total** | **413** | **100%** | #### Combined Results | Status | Count | Percentage | |--------|-------|------------| | ✓ Correctly Flagged | 206 | 40.4% | | ✗ Incorrectly Flagged | 304 | 59.6% | | ? Unsure | 0 | 0.0% | | **Total** | **510** | **100%** | --- ## 4. Key Findings ### 4.1 High False Positive Rate - **Overall False Positive Rate: 59.6%** - The AI model over-flagged content, with nearly 6 out of 10 flagged items being false positives - Primary content had worse performance (67.0% false positives) than mentions (57.9%) ### 4.2 Model Limitations Identified 1. **Threshold Sensitivity:** The 0.5 threshold appears too low for Dutch political discourse 2. **Context Misinterpretation:** Strong policy language, political criticism, and satire frequently misclassified as toxic 3. **Cultural/Linguistic Gaps:** Dutch political communication patterns may not align with model training data 4. **Nuance Detection:** Difficulty distinguishing between heated but legitimate debate and actual toxicity ### 4.3 Directional Toxicity Pattern - External mentions (8,506 posts) generated **413 flagged items** (4.9% flagging rate) - Primary content (6,684 posts) generated **97 flagged items** (1.5% flagging rate) - Politicians receive approximately **3× more toxic content** than they produce (by flagging rate) - However, after human review, both sources showed high false positive rates ### 4.4 Accuracy Comparison - **Mentions accuracy:** 42.1% (slightly better) - **Primary content accuracy:** 33.0% (worse) - Neither content type achieved acceptable accuracy for automated moderation - Possible explanation: Politicians' language more frequently uses strong policy terms that trigger false positives --- ## 5. Implications for Automated Moderation ### What This Study Reveals 1. **AI Cannot Replace Human Judgment:** 59.6% false positive rate makes unsupervised automation dangerous 2. **Threshold Optimization Needed:** Current 0.5 threshold too aggressive; may need 0.7+ for political content 3. **Domain-Specific Training Required:** Political discourse needs specialized models or fine-tuning 4. **Human-in-the-Loop Essential:** Automated flagging useful for triage, but human review mandatory ### Recommended Approach - Use AI toxicity detection as **first-pass screening only** - Require human review for all flagged content before action - Consider higher thresholds (0.7–0.8) for political accounts - Train domain-specific models on Dutch political discourse - Implement appeals process for false positives --- ## 6. Technical Implementation Success ### What Worked Well 1. **Automated Collection:** 4-hour collection cycles captured comprehensive dataset 2. **Human Review Interface:** Web UI with ✓/✗/? buttons efficient for manual validation 3. **Date Filtering:** Allowed focused analysis of specific time periods 4. **Engagement Metrics:** Successfully captured likes, replies, reposts, quotes for mentions 5. **Deduplication Logic:** Properly handled posts mentioning multiple tracked accounts ### Infrastructure Performance - **Uptime:** 99%+ (only brief scheduler issue Feb 23-24) - **Data Integrity:** PostgreSQL database handled 15K+ posts without issues - **Analysis Throughput:** GPT-4.1-nano processed all content efficiently - **Web Interface:** Responsive UI for 500+ manual reviews --- ## 7. Study Limitations 1. **Single Model Used:** Only tested GPT-4.1-nano; ensemble approaches not evaluated 2. **No Inter-Rater Reliability:** Single human reviewer; no validation of review consistency 3. **Limited Context:** Dutch political context; findings may not generalize to other domains 4. **Arbitrary Threshold:** 0.5 threshold not scientifically optimized 5. **Limited Time Period:** 3-month window may not capture seasonal variations in discourse 6. **No Appeal Process:** No mechanism for accounts to contest flagging decisions --- ## 8. Recommendations for Future Work ### Short-Term Improvements 1. **Threshold Optimization:** Test 0.6, 0.7, 0.8 thresholds and measure precision/recall 2. **Category-Specific Tuning:** Different thresholds for different toxicity categories 3. **Context Windows:** Analyze conversation threads, not isolated posts 4. **Multi-Model Validation:** Test other models (Perspective API, custom fine-tuned models) ### Long-Term Research 1. **Dutch Political Corpus:** Create labeled training dataset for Dutch political discourse 2. **Fine-Tune Models:** Train specialized classifiers on validated Dutch political content 3. **Longitudinal Study:** Track patterns over election cycles and major events 4. **Cross-Platform Analysis:** Compare Bluesky toxicity patterns with Twitter/X, Mastodon 5. **Inter-Rater Reliability Study:** Multiple reviewers to validate human judgment consistency --- ## 9. Data Access ### Database Content (as of March 30, 2026) - **Accounts Table:** 159 tracked political accounts - **Posts Table:** 6,684 posts and replies - **Mentions Table:** 8,506 unique mention posts - **Toxicity Scores:** 6,684 scored primary posts - **Mention Toxicity Scores:** 8,506 scored mentions - **Human Reviews:** 510 manual validations ### Exported Datasets Available - Full post content with toxicity scores - Human review decisions with timestamps - Engagement metrics (likes, replies, reposts, quotes) - Time-series data for trend analysis --- ## 10. Conclusion This study demonstrates that while AI-powered toxicity detection can **identify potential concerns** in large-scale social media content, it **cannot reliably moderate** without substantial human oversight. The 59.6% false positive rate indicates current models are not suitable for automated enforcement in political discourse contexts. **Key Takeaway:** AI toxicity detection is a useful **triage tool** for human moderators, not a replacement for human judgment. Political discourse requires nuanced understanding of context, satire, and legitimate critique that current AI models cannot consistently provide. **Project Status:** Data collection complete. Web interface remains available for analysis and reporting. Database preserved for future research. --- **Generated:** March 30, 2026 **Study Period:** January 1 – March 30, 2026 **Monitored Platform:** Bluesky Social Network **Geographic Focus:** Dutch Political Discourse