bluesky-collector/FINDINGS.md
Pieter 1c3f57d7e5 Add documentation and license, remove IDE files
Added comprehensive project documentation and MIT license. Removed Claude
IDE configuration files from repository tracking.

Documentation added:
- FINDINGS.md: Complete methodology report and research findings
  - 159 accounts tracked, 15,190 posts collected (Jan 1 - Mar 30)
  - Human review results: 40.4% correct, 59.6% false positives
  - AI toxicity detection limitations and recommendations
- OPERATIONS.md: Complete operations and maintenance guide
  - Service start/stop procedures
  - Database operations and queries
  - Configuration options
  - Troubleshooting guide
  - Data export instructions

License:
- Added MIT License to README.md
- Copyright 2026 Post X Society
- Open source with permissive license

Repository cleanup:
- Added .claude/ to .gitignore
- Removed .claude/settings.local.json from tracking
- Prevents IDE-specific files from being committed

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-03-30 14:39:11 +02:00

8.3 KiB
Raw Blame History

Bluesky Toxicity Analysis - Main Findings

Study Overview

Period: January 1 March 30, 2026 (89 days) Monitored Accounts: 159 Dutch political accounts Total Posts Collected: 15,190 posts


1. Data Collection Summary

Content Distribution

  • Primary Content (by tracked accounts):

    • Original Posts: 3,032
    • Replies: 3,652
    • Total Primary: 6,684 posts
  • Secondary Content (mentions of tracked accounts):

    • Unique Mention Posts: 8,506
    • Note: Posts mentioning multiple tracked accounts counted once

Total Dataset

  • Combined Content: 15,190 posts
  • Collection Method: Automated via Bluesky Public API (every 4 hours)
  • Infrastructure: Docker containers with PostgreSQL database

2. Toxicity Detection Results

AI Model Performance

  • Model Used: OpenAI GPT-4.1-nano
  • Classification Categories: 12 toxicity dimensions
  • Flagging Threshold: Overall toxicity score ≥ 0.5 (50%)

Flagged Content

  • Primary Content (Posts/Replies): 97 posts flagged
  • Secondary Content (Mentions): 413 unique posts flagged
  • Total Flagged: 510 unique posts

Distribution Insight

  • 81% of flagged content came from mentions (external users → politicians)
  • 19% of flagged content came from politicians themselves
  • External users directed significantly more toxic content toward politicians than politicians produced

3. Human Review Results

Review Completion

  • Total Items Reviewed: 510 posts (100% of flagged content)
  • Review Period: January 1 March 30, 2026
  • Review Interface: Custom web application with ✓/✗/? buttons

Validation Results

Primary Content (Posts/Replies by Politicians)

Status Count Percentage
✓ Correctly Flagged 32 33.0%
✗ Incorrectly Flagged 65 67.0%
? Unsure 0 0.0%
Total 97 100%

Secondary Content (Mentions of Politicians)

Status Count Percentage
✓ Correctly Flagged 174 42.1%
✗ Incorrectly Flagged 239 57.9%
? Unsure 0 0.0%
Total 413 100%

Combined Results

Status Count Percentage
✓ Correctly Flagged 206 40.4%
✗ Incorrectly Flagged 304 59.6%
? Unsure 0 0.0%
Total 510 100%

4. Key Findings

4.1 High False Positive Rate

  • Overall False Positive Rate: 59.6%
  • The AI model over-flagged content, with nearly 6 out of 10 flagged items being false positives
  • Primary content had worse performance (67.0% false positives) than mentions (57.9%)

4.2 Model Limitations Identified

  1. Threshold Sensitivity: The 0.5 threshold appears too low for Dutch political discourse
  2. Context Misinterpretation: Strong policy language, political criticism, and satire frequently misclassified as toxic
  3. Cultural/Linguistic Gaps: Dutch political communication patterns may not align with model training data
  4. Nuance Detection: Difficulty distinguishing between heated but legitimate debate and actual toxicity

4.3 Directional Toxicity Pattern

  • External mentions (8,506 posts) generated 413 flagged items (4.9% flagging rate)
  • Primary content (6,684 posts) generated 97 flagged items (1.5% flagging rate)
  • Politicians receive approximately 3× more toxic content than they produce (by flagging rate)
  • However, after human review, both sources showed high false positive rates

4.4 Accuracy Comparison

  • Mentions accuracy: 42.1% (slightly better)
  • Primary content accuracy: 33.0% (worse)
  • Neither content type achieved acceptable accuracy for automated moderation
  • Possible explanation: Politicians' language more frequently uses strong policy terms that trigger false positives

5. Implications for Automated Moderation

What This Study Reveals

  1. AI Cannot Replace Human Judgment: 59.6% false positive rate makes unsupervised automation dangerous
  2. Threshold Optimization Needed: Current 0.5 threshold too aggressive; may need 0.7+ for political content
  3. Domain-Specific Training Required: Political discourse needs specialized models or fine-tuning
  4. Human-in-the-Loop Essential: Automated flagging useful for triage, but human review mandatory
  • Use AI toxicity detection as first-pass screening only
  • Require human review for all flagged content before action
  • Consider higher thresholds (0.70.8) for political accounts
  • Train domain-specific models on Dutch political discourse
  • Implement appeals process for false positives

6. Technical Implementation Success

What Worked Well

  1. Automated Collection: 4-hour collection cycles captured comprehensive dataset
  2. Human Review Interface: Web UI with ✓/✗/? buttons efficient for manual validation
  3. Date Filtering: Allowed focused analysis of specific time periods
  4. Engagement Metrics: Successfully captured likes, replies, reposts, quotes for mentions
  5. Deduplication Logic: Properly handled posts mentioning multiple tracked accounts

Infrastructure Performance

  • Uptime: 99%+ (only brief scheduler issue Feb 23-24)
  • Data Integrity: PostgreSQL database handled 15K+ posts without issues
  • Analysis Throughput: GPT-4.1-nano processed all content efficiently
  • Web Interface: Responsive UI for 500+ manual reviews

7. Study Limitations

  1. Single Model Used: Only tested GPT-4.1-nano; ensemble approaches not evaluated
  2. No Inter-Rater Reliability: Single human reviewer; no validation of review consistency
  3. Limited Context: Dutch political context; findings may not generalize to other domains
  4. Arbitrary Threshold: 0.5 threshold not scientifically optimized
  5. Limited Time Period: 3-month window may not capture seasonal variations in discourse
  6. No Appeal Process: No mechanism for accounts to contest flagging decisions

8. Recommendations for Future Work

Short-Term Improvements

  1. Threshold Optimization: Test 0.6, 0.7, 0.8 thresholds and measure precision/recall
  2. Category-Specific Tuning: Different thresholds for different toxicity categories
  3. Context Windows: Analyze conversation threads, not isolated posts
  4. Multi-Model Validation: Test other models (Perspective API, custom fine-tuned models)

Long-Term Research

  1. Dutch Political Corpus: Create labeled training dataset for Dutch political discourse
  2. Fine-Tune Models: Train specialized classifiers on validated Dutch political content
  3. Longitudinal Study: Track patterns over election cycles and major events
  4. Cross-Platform Analysis: Compare Bluesky toxicity patterns with Twitter/X, Mastodon
  5. Inter-Rater Reliability Study: Multiple reviewers to validate human judgment consistency

9. Data Access

Database Content (as of March 30, 2026)

  • Accounts Table: 159 tracked political accounts
  • Posts Table: 6,684 posts and replies
  • Mentions Table: 8,506 unique mention posts
  • Toxicity Scores: 6,684 scored primary posts
  • Mention Toxicity Scores: 8,506 scored mentions
  • Human Reviews: 510 manual validations

Exported Datasets Available

  • Full post content with toxicity scores
  • Human review decisions with timestamps
  • Engagement metrics (likes, replies, reposts, quotes)
  • Time-series data for trend analysis

10. Conclusion

This study demonstrates that while AI-powered toxicity detection can identify potential concerns in large-scale social media content, it cannot reliably moderate without substantial human oversight. The 59.6% false positive rate indicates current models are not suitable for automated enforcement in political discourse contexts.

Key Takeaway: AI toxicity detection is a useful triage tool for human moderators, not a replacement for human judgment. Political discourse requires nuanced understanding of context, satire, and legitimate critique that current AI models cannot consistently provide.

Project Status: Data collection complete. Web interface remains available for analysis and reporting. Database preserved for future research.


Generated: March 30, 2026 Study Period: January 1 March 30, 2026 Monitored Platform: Bluesky Social Network Geographic Focus: Dutch Political Discourse