204 lines
8.3 KiB
Markdown
204 lines
8.3 KiB
Markdown
|
|
# Bluesky Toxicity Analysis - Main Findings
|
|||
|
|
|
|||
|
|
## Study Overview
|
|||
|
|
**Period:** January 1 – March 30, 2026 (89 days)
|
|||
|
|
**Monitored Accounts:** 159 Dutch political accounts
|
|||
|
|
**Total Posts Collected:** 15,190 posts
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 1. Data Collection Summary
|
|||
|
|
|
|||
|
|
### Content Distribution
|
|||
|
|
- **Primary Content (by tracked accounts):**
|
|||
|
|
- Original Posts: 3,032
|
|||
|
|
- Replies: 3,652
|
|||
|
|
- **Total Primary:** 6,684 posts
|
|||
|
|
|
|||
|
|
- **Secondary Content (mentions of tracked accounts):**
|
|||
|
|
- Unique Mention Posts: 8,506
|
|||
|
|
- Note: Posts mentioning multiple tracked accounts counted once
|
|||
|
|
|
|||
|
|
### Total Dataset
|
|||
|
|
- **Combined Content:** 15,190 posts
|
|||
|
|
- **Collection Method:** Automated via Bluesky Public API (every 4 hours)
|
|||
|
|
- **Infrastructure:** Docker containers with PostgreSQL database
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 2. Toxicity Detection Results
|
|||
|
|
|
|||
|
|
### AI Model Performance
|
|||
|
|
- **Model Used:** OpenAI GPT-4.1-nano
|
|||
|
|
- **Classification Categories:** 12 toxicity dimensions
|
|||
|
|
- **Flagging Threshold:** Overall toxicity score ≥ 0.5 (50%)
|
|||
|
|
|
|||
|
|
### Flagged Content
|
|||
|
|
- **Primary Content (Posts/Replies):** 97 posts flagged
|
|||
|
|
- **Secondary Content (Mentions):** 413 unique posts flagged
|
|||
|
|
- **Total Flagged:** 510 unique posts
|
|||
|
|
|
|||
|
|
### Distribution Insight
|
|||
|
|
- 81% of flagged content came from mentions (external users → politicians)
|
|||
|
|
- 19% of flagged content came from politicians themselves
|
|||
|
|
- External users directed significantly more toxic content toward politicians than politicians produced
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 3. Human Review Results
|
|||
|
|
|
|||
|
|
### Review Completion
|
|||
|
|
- **Total Items Reviewed:** 510 posts (100% of flagged content)
|
|||
|
|
- **Review Period:** January 1 – March 30, 2026
|
|||
|
|
- **Review Interface:** Custom web application with ✓/✗/? buttons
|
|||
|
|
|
|||
|
|
### Validation Results
|
|||
|
|
|
|||
|
|
#### Primary Content (Posts/Replies by Politicians)
|
|||
|
|
| Status | Count | Percentage |
|
|||
|
|
|--------|-------|------------|
|
|||
|
|
| ✓ Correctly Flagged | 32 | 33.0% |
|
|||
|
|
| ✗ Incorrectly Flagged | 65 | 67.0% |
|
|||
|
|
| ? Unsure | 0 | 0.0% |
|
|||
|
|
| **Total** | **97** | **100%** |
|
|||
|
|
|
|||
|
|
#### Secondary Content (Mentions of Politicians)
|
|||
|
|
| Status | Count | Percentage |
|
|||
|
|
|--------|-------|------------|
|
|||
|
|
| ✓ Correctly Flagged | 174 | 42.1% |
|
|||
|
|
| ✗ Incorrectly Flagged | 239 | 57.9% |
|
|||
|
|
| ? Unsure | 0 | 0.0% |
|
|||
|
|
| **Total** | **413** | **100%** |
|
|||
|
|
|
|||
|
|
#### Combined Results
|
|||
|
|
| Status | Count | Percentage |
|
|||
|
|
|--------|-------|------------|
|
|||
|
|
| ✓ Correctly Flagged | 206 | 40.4% |
|
|||
|
|
| ✗ Incorrectly Flagged | 304 | 59.6% |
|
|||
|
|
| ? Unsure | 0 | 0.0% |
|
|||
|
|
| **Total** | **510** | **100%** |
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 4. Key Findings
|
|||
|
|
|
|||
|
|
### 4.1 High False Positive Rate
|
|||
|
|
- **Overall False Positive Rate: 59.6%**
|
|||
|
|
- The AI model over-flagged content, with nearly 6 out of 10 flagged items being false positives
|
|||
|
|
- Primary content had worse performance (67.0% false positives) than mentions (57.9%)
|
|||
|
|
|
|||
|
|
### 4.2 Model Limitations Identified
|
|||
|
|
1. **Threshold Sensitivity:** The 0.5 threshold appears too low for Dutch political discourse
|
|||
|
|
2. **Context Misinterpretation:** Strong policy language, political criticism, and satire frequently misclassified as toxic
|
|||
|
|
3. **Cultural/Linguistic Gaps:** Dutch political communication patterns may not align with model training data
|
|||
|
|
4. **Nuance Detection:** Difficulty distinguishing between heated but legitimate debate and actual toxicity
|
|||
|
|
|
|||
|
|
### 4.3 Directional Toxicity Pattern
|
|||
|
|
- External mentions (8,506 posts) generated **413 flagged items** (4.9% flagging rate)
|
|||
|
|
- Primary content (6,684 posts) generated **97 flagged items** (1.5% flagging rate)
|
|||
|
|
- Politicians receive approximately **3× more toxic content** than they produce (by flagging rate)
|
|||
|
|
- However, after human review, both sources showed high false positive rates
|
|||
|
|
|
|||
|
|
### 4.4 Accuracy Comparison
|
|||
|
|
- **Mentions accuracy:** 42.1% (slightly better)
|
|||
|
|
- **Primary content accuracy:** 33.0% (worse)
|
|||
|
|
- Neither content type achieved acceptable accuracy for automated moderation
|
|||
|
|
- Possible explanation: Politicians' language more frequently uses strong policy terms that trigger false positives
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 5. Implications for Automated Moderation
|
|||
|
|
|
|||
|
|
### What This Study Reveals
|
|||
|
|
1. **AI Cannot Replace Human Judgment:** 59.6% false positive rate makes unsupervised automation dangerous
|
|||
|
|
2. **Threshold Optimization Needed:** Current 0.5 threshold too aggressive; may need 0.7+ for political content
|
|||
|
|
3. **Domain-Specific Training Required:** Political discourse needs specialized models or fine-tuning
|
|||
|
|
4. **Human-in-the-Loop Essential:** Automated flagging useful for triage, but human review mandatory
|
|||
|
|
|
|||
|
|
### Recommended Approach
|
|||
|
|
- Use AI toxicity detection as **first-pass screening only**
|
|||
|
|
- Require human review for all flagged content before action
|
|||
|
|
- Consider higher thresholds (0.7–0.8) for political accounts
|
|||
|
|
- Train domain-specific models on Dutch political discourse
|
|||
|
|
- Implement appeals process for false positives
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 6. Technical Implementation Success
|
|||
|
|
|
|||
|
|
### What Worked Well
|
|||
|
|
1. **Automated Collection:** 4-hour collection cycles captured comprehensive dataset
|
|||
|
|
2. **Human Review Interface:** Web UI with ✓/✗/? buttons efficient for manual validation
|
|||
|
|
3. **Date Filtering:** Allowed focused analysis of specific time periods
|
|||
|
|
4. **Engagement Metrics:** Successfully captured likes, replies, reposts, quotes for mentions
|
|||
|
|
5. **Deduplication Logic:** Properly handled posts mentioning multiple tracked accounts
|
|||
|
|
|
|||
|
|
### Infrastructure Performance
|
|||
|
|
- **Uptime:** 99%+ (only brief scheduler issue Feb 23-24)
|
|||
|
|
- **Data Integrity:** PostgreSQL database handled 15K+ posts without issues
|
|||
|
|
- **Analysis Throughput:** GPT-4.1-nano processed all content efficiently
|
|||
|
|
- **Web Interface:** Responsive UI for 500+ manual reviews
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 7. Study Limitations
|
|||
|
|
|
|||
|
|
1. **Single Model Used:** Only tested GPT-4.1-nano; ensemble approaches not evaluated
|
|||
|
|
2. **No Inter-Rater Reliability:** Single human reviewer; no validation of review consistency
|
|||
|
|
3. **Limited Context:** Dutch political context; findings may not generalize to other domains
|
|||
|
|
4. **Arbitrary Threshold:** 0.5 threshold not scientifically optimized
|
|||
|
|
5. **Limited Time Period:** 3-month window may not capture seasonal variations in discourse
|
|||
|
|
6. **No Appeal Process:** No mechanism for accounts to contest flagging decisions
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 8. Recommendations for Future Work
|
|||
|
|
|
|||
|
|
### Short-Term Improvements
|
|||
|
|
1. **Threshold Optimization:** Test 0.6, 0.7, 0.8 thresholds and measure precision/recall
|
|||
|
|
2. **Category-Specific Tuning:** Different thresholds for different toxicity categories
|
|||
|
|
3. **Context Windows:** Analyze conversation threads, not isolated posts
|
|||
|
|
4. **Multi-Model Validation:** Test other models (Perspective API, custom fine-tuned models)
|
|||
|
|
|
|||
|
|
### Long-Term Research
|
|||
|
|
1. **Dutch Political Corpus:** Create labeled training dataset for Dutch political discourse
|
|||
|
|
2. **Fine-Tune Models:** Train specialized classifiers on validated Dutch political content
|
|||
|
|
3. **Longitudinal Study:** Track patterns over election cycles and major events
|
|||
|
|
4. **Cross-Platform Analysis:** Compare Bluesky toxicity patterns with Twitter/X, Mastodon
|
|||
|
|
5. **Inter-Rater Reliability Study:** Multiple reviewers to validate human judgment consistency
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 9. Data Access
|
|||
|
|
|
|||
|
|
### Database Content (as of March 30, 2026)
|
|||
|
|
- **Accounts Table:** 159 tracked political accounts
|
|||
|
|
- **Posts Table:** 6,684 posts and replies
|
|||
|
|
- **Mentions Table:** 8,506 unique mention posts
|
|||
|
|
- **Toxicity Scores:** 6,684 scored primary posts
|
|||
|
|
- **Mention Toxicity Scores:** 8,506 scored mentions
|
|||
|
|
- **Human Reviews:** 510 manual validations
|
|||
|
|
|
|||
|
|
### Exported Datasets Available
|
|||
|
|
- Full post content with toxicity scores
|
|||
|
|
- Human review decisions with timestamps
|
|||
|
|
- Engagement metrics (likes, replies, reposts, quotes)
|
|||
|
|
- Time-series data for trend analysis
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
## 10. Conclusion
|
|||
|
|
|
|||
|
|
This study demonstrates that while AI-powered toxicity detection can **identify potential concerns** in large-scale social media content, it **cannot reliably moderate** without substantial human oversight. The 59.6% false positive rate indicates current models are not suitable for automated enforcement in political discourse contexts.
|
|||
|
|
|
|||
|
|
**Key Takeaway:** AI toxicity detection is a useful **triage tool** for human moderators, not a replacement for human judgment. Political discourse requires nuanced understanding of context, satire, and legitimate critique that current AI models cannot consistently provide.
|
|||
|
|
|
|||
|
|
**Project Status:** Data collection complete. Web interface remains available for analysis and reporting. Database preserved for future research.
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**Generated:** March 30, 2026
|
|||
|
|
**Study Period:** January 1 – March 30, 2026
|
|||
|
|
**Monitored Platform:** Bluesky Social Network
|
|||
|
|
**Geographic Focus:** Dutch Political Discourse
|