bluesky-collector/FINDINGS.md
Pieter 1c3f57d7e5 Add documentation and license, remove IDE files
Added comprehensive project documentation and MIT license. Removed Claude
IDE configuration files from repository tracking.

Documentation added:
- FINDINGS.md: Complete methodology report and research findings
  - 159 accounts tracked, 15,190 posts collected (Jan 1 - Mar 30)
  - Human review results: 40.4% correct, 59.6% false positives
  - AI toxicity detection limitations and recommendations
- OPERATIONS.md: Complete operations and maintenance guide
  - Service start/stop procedures
  - Database operations and queries
  - Configuration options
  - Troubleshooting guide
  - Data export instructions

License:
- Added MIT License to README.md
- Copyright 2026 Post X Society
- Open source with permissive license

Repository cleanup:
- Added .claude/ to .gitignore
- Removed .claude/settings.local.json from tracking
- Prevents IDE-specific files from being committed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-30 14:39:11 +02:00

203 lines
8.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Bluesky Toxicity Analysis - Main Findings
## Study Overview
**Period:** January 1 March 30, 2026 (89 days)
**Monitored Accounts:** 159 Dutch political accounts
**Total Posts Collected:** 15,190 posts
---
## 1. Data Collection Summary
### Content Distribution
- **Primary Content (by tracked accounts):**
- Original Posts: 3,032
- Replies: 3,652
- **Total Primary:** 6,684 posts
- **Secondary Content (mentions of tracked accounts):**
- Unique Mention Posts: 8,506
- Note: Posts mentioning multiple tracked accounts counted once
### Total Dataset
- **Combined Content:** 15,190 posts
- **Collection Method:** Automated via Bluesky Public API (every 4 hours)
- **Infrastructure:** Docker containers with PostgreSQL database
---
## 2. Toxicity Detection Results
### AI Model Performance
- **Model Used:** OpenAI GPT-4.1-nano
- **Classification Categories:** 12 toxicity dimensions
- **Flagging Threshold:** Overall toxicity score ≥ 0.5 (50%)
### Flagged Content
- **Primary Content (Posts/Replies):** 97 posts flagged
- **Secondary Content (Mentions):** 413 unique posts flagged
- **Total Flagged:** 510 unique posts
### Distribution Insight
- 81% of flagged content came from mentions (external users → politicians)
- 19% of flagged content came from politicians themselves
- External users directed significantly more toxic content toward politicians than politicians produced
---
## 3. Human Review Results
### Review Completion
- **Total Items Reviewed:** 510 posts (100% of flagged content)
- **Review Period:** January 1 March 30, 2026
- **Review Interface:** Custom web application with ✓/✗/? buttons
### Validation Results
#### Primary Content (Posts/Replies by Politicians)
| Status | Count | Percentage |
|--------|-------|------------|
| ✓ Correctly Flagged | 32 | 33.0% |
| ✗ Incorrectly Flagged | 65 | 67.0% |
| ? Unsure | 0 | 0.0% |
| **Total** | **97** | **100%** |
#### Secondary Content (Mentions of Politicians)
| Status | Count | Percentage |
|--------|-------|------------|
| ✓ Correctly Flagged | 174 | 42.1% |
| ✗ Incorrectly Flagged | 239 | 57.9% |
| ? Unsure | 0 | 0.0% |
| **Total** | **413** | **100%** |
#### Combined Results
| Status | Count | Percentage |
|--------|-------|------------|
| ✓ Correctly Flagged | 206 | 40.4% |
| ✗ Incorrectly Flagged | 304 | 59.6% |
| ? Unsure | 0 | 0.0% |
| **Total** | **510** | **100%** |
---
## 4. Key Findings
### 4.1 High False Positive Rate
- **Overall False Positive Rate: 59.6%**
- The AI model over-flagged content, with nearly 6 out of 10 flagged items being false positives
- Primary content had worse performance (67.0% false positives) than mentions (57.9%)
### 4.2 Model Limitations Identified
1. **Threshold Sensitivity:** The 0.5 threshold appears too low for Dutch political discourse
2. **Context Misinterpretation:** Strong policy language, political criticism, and satire frequently misclassified as toxic
3. **Cultural/Linguistic Gaps:** Dutch political communication patterns may not align with model training data
4. **Nuance Detection:** Difficulty distinguishing between heated but legitimate debate and actual toxicity
### 4.3 Directional Toxicity Pattern
- External mentions (8,506 posts) generated **413 flagged items** (4.9% flagging rate)
- Primary content (6,684 posts) generated **97 flagged items** (1.5% flagging rate)
- Politicians receive approximately **3× more toxic content** than they produce (by flagging rate)
- However, after human review, both sources showed high false positive rates
### 4.4 Accuracy Comparison
- **Mentions accuracy:** 42.1% (slightly better)
- **Primary content accuracy:** 33.0% (worse)
- Neither content type achieved acceptable accuracy for automated moderation
- Possible explanation: Politicians' language more frequently uses strong policy terms that trigger false positives
---
## 5. Implications for Automated Moderation
### What This Study Reveals
1. **AI Cannot Replace Human Judgment:** 59.6% false positive rate makes unsupervised automation dangerous
2. **Threshold Optimization Needed:** Current 0.5 threshold too aggressive; may need 0.7+ for political content
3. **Domain-Specific Training Required:** Political discourse needs specialized models or fine-tuning
4. **Human-in-the-Loop Essential:** Automated flagging useful for triage, but human review mandatory
### Recommended Approach
- Use AI toxicity detection as **first-pass screening only**
- Require human review for all flagged content before action
- Consider higher thresholds (0.70.8) for political accounts
- Train domain-specific models on Dutch political discourse
- Implement appeals process for false positives
---
## 6. Technical Implementation Success
### What Worked Well
1. **Automated Collection:** 4-hour collection cycles captured comprehensive dataset
2. **Human Review Interface:** Web UI with ✓/✗/? buttons efficient for manual validation
3. **Date Filtering:** Allowed focused analysis of specific time periods
4. **Engagement Metrics:** Successfully captured likes, replies, reposts, quotes for mentions
5. **Deduplication Logic:** Properly handled posts mentioning multiple tracked accounts
### Infrastructure Performance
- **Uptime:** 99%+ (only brief scheduler issue Feb 23-24)
- **Data Integrity:** PostgreSQL database handled 15K+ posts without issues
- **Analysis Throughput:** GPT-4.1-nano processed all content efficiently
- **Web Interface:** Responsive UI for 500+ manual reviews
---
## 7. Study Limitations
1. **Single Model Used:** Only tested GPT-4.1-nano; ensemble approaches not evaluated
2. **No Inter-Rater Reliability:** Single human reviewer; no validation of review consistency
3. **Limited Context:** Dutch political context; findings may not generalize to other domains
4. **Arbitrary Threshold:** 0.5 threshold not scientifically optimized
5. **Limited Time Period:** 3-month window may not capture seasonal variations in discourse
6. **No Appeal Process:** No mechanism for accounts to contest flagging decisions
---
## 8. Recommendations for Future Work
### Short-Term Improvements
1. **Threshold Optimization:** Test 0.6, 0.7, 0.8 thresholds and measure precision/recall
2. **Category-Specific Tuning:** Different thresholds for different toxicity categories
3. **Context Windows:** Analyze conversation threads, not isolated posts
4. **Multi-Model Validation:** Test other models (Perspective API, custom fine-tuned models)
### Long-Term Research
1. **Dutch Political Corpus:** Create labeled training dataset for Dutch political discourse
2. **Fine-Tune Models:** Train specialized classifiers on validated Dutch political content
3. **Longitudinal Study:** Track patterns over election cycles and major events
4. **Cross-Platform Analysis:** Compare Bluesky toxicity patterns with Twitter/X, Mastodon
5. **Inter-Rater Reliability Study:** Multiple reviewers to validate human judgment consistency
---
## 9. Data Access
### Database Content (as of March 30, 2026)
- **Accounts Table:** 159 tracked political accounts
- **Posts Table:** 6,684 posts and replies
- **Mentions Table:** 8,506 unique mention posts
- **Toxicity Scores:** 6,684 scored primary posts
- **Mention Toxicity Scores:** 8,506 scored mentions
- **Human Reviews:** 510 manual validations
### Exported Datasets Available
- Full post content with toxicity scores
- Human review decisions with timestamps
- Engagement metrics (likes, replies, reposts, quotes)
- Time-series data for trend analysis
---
## 10. Conclusion
This study demonstrates that while AI-powered toxicity detection can **identify potential concerns** in large-scale social media content, it **cannot reliably moderate** without substantial human oversight. The 59.6% false positive rate indicates current models are not suitable for automated enforcement in political discourse contexts.
**Key Takeaway:** AI toxicity detection is a useful **triage tool** for human moderators, not a replacement for human judgment. Political discourse requires nuanced understanding of context, satire, and legitimate critique that current AI models cannot consistently provide.
**Project Status:** Data collection complete. Web interface remains available for analysis and reporting. Database preserved for future research.
---
**Generated:** March 30, 2026
**Study Period:** January 1 March 30, 2026
**Monitored Platform:** Bluesky Social Network
**Geographic Focus:** Dutch Political Discourse