X Bookmarks to Raindrop.io Converter

📋 Project Overview

This project converts Twitter/X bookmarks exported as JSON into a CSV format compatible with Raindrop.io, using OpenAI's GPT-3.5-turbo for intelligent title generation and folder/tag reclassification.

🎯 Goal

Transform a JSON file containing Twitter bookmarks (with manually added folder and tag data) into a clean, organized CSV that can be imported into Raindrop.io with:

Intelligent title generation (concise one-liners)
Smart folder and tag reclassification
Proper nested folder structure under "X/"
Clean tag taxonomy (removing low-usage tags)

📁 Project Structure

X Bookmarks to Raindrop/
├── README.md                           # This file
├── twitter_bookmarks.json              # Original export (Twitter Bookmark Exporter)
├── twitter_bookmarks_tagged_full.json  # Manually enhanced with folders/tags (1,778 bookmarks)
├── APIKEY.txt                          # OpenAI API key (user-provided)
├── folders_list.txt                    # Extracted folder list for AI prompts
├── tags_list.txt                       # Extracted tag list for AI prompts
├── raindrop_format.csv                 # Final output (all bookmarks)
├── raindrop_cleaned.csv                # Cleaned output (tags with ≥5 uses)
├── setup_openai.py                     # OpenAI API key setup utility
├── analyze_folders_tags.py             # Extract folders/tags from JSON
├── reclassify_with_openai.py           # Early OpenAI processing script
├── analyze_tags.py                     # Analyze tag usage in CSV
└── clean_tags.py                       # Remove low-usage tags

🔧 Dependencies

pip install openai

Environment Variables

OPENAI_API_KEY: Your OpenAI API key (set via setup_openai.py or manually)

📊 Complete Data Flow

twitter_bookmarks.json (original export from Twitter Bookmark Exporter)
    ↓
[MANUAL STEP: User added "folder" and "tags" fields to each bookmark]
    ↓
twitter_bookmarks_tagged_full.json (manually enhanced with folders/tags)
    ↓
analyze_folders_tags.py → folders_list.txt, tags_list.txt
    ↓
reclassify_with_openai.py → raindrop_reclassified.csv (early attempt)
    ↓
[Multiple iterations and format fixes]
    ↓
create_raindrop_format.py → raindrop_format.csv (final format)
    ↓
clean_tags.py → raindrop_cleaned.csv (recommended for import)

🚀 Quick Start

1. Setup OpenAI API Key

python3 setup_openai.py

Or manually set: export OPENAI_API_KEY="your-key-here"

2. Prepare Enhanced JSON (Manual Step)

Start with twitter_bookmarks.json (original export)
Manually add "folder" and "tags" fields to each bookmark
Save as twitter_bookmarks_tagged_full.json

3. Extract Folder/Tag Lists

python3 analyze_folders_tags.py

4. Generate Raindrop CSV

python3 create_raindrop_format.py

5. Clean Tags (Recommended)

python3 clean_tags.py

6. Import to Raindrop.io

Upload raindrop_cleaned.csv to Raindrop.io

📋 Scripts Documentation

Core Scripts

`create_raindrop_format.py` 🎯

Purpose: Main conversion script that processes all bookmarks with OpenAI

Generates concise titles (max 100 chars) using GPT-3.5-turbo
Reclassifies folders and tags based on content analysis
Outputs CSV in exact Raindrop.io export format
Processes all 1,778 bookmarks with rate limiting

Key Features:

Single OpenAI API call per bookmark (cost-optimized)
Text cleaning for CSV compatibility
ISO 8601 timestamp format
Nested folders under "X/"
Adds "twitter" tag to all entries

`clean_tags.py` 🧹

Purpose: Removes tags with less than 5 uses to create cleaner taxonomy

Reduces unique tags from 1,247 to 203 (84% reduction)
Maintains meaningful tags only
Provides detailed analysis of tag usage

`reclassify_with_openai.py` 🔄

Purpose: Early version of OpenAI processing script

Predecessor to create_raindrop_format.py
Combined title generation and reclassification
Used separate API calls (less cost-efficient)
Generated intermediate CSV outputs

Utility Scripts

`setup_openai.py` 🔑

Purpose: Secure OpenAI API key setup

Prompts for API key securely (no echo)
Sets environment variable
Validates key format

`analyze_folders_tags.py` 📊

Purpose: Extract unique folders and tags from source JSON

Creates folders_list.txt and tags_list.txt
Used as reference lists for OpenAI reclassification
Provides usage statistics

`analyze_tags.py` 📈

Purpose: Analyze tag distribution in generated CSV

Counts total vs unique tags
Shows most popular tags
Helps understand Raindrop.io import statistics

🔍 Key Features

OpenAI Integration

Model: GPT-3.5-turbo
Single API Call: Combines title generation + classification
Rate Limiting: 0.2s delay between calls
Cost Optimization: ~$5-10 for 1,778 bookmarks

Raindrop.io Compatibility

Exact Format Match: Based on actual Raindrop export
Required Fields: id, title, note, excerpt, url, folder, tags, created, cover, highlights, favorite
Folder Structure: All nested under "X/" (e.g., "X/ai", "X/devtools")
Tag Format: Comma-separated, includes "twitter" tag

Data Processing

Text Cleaning: Removes problematic newlines and quotes
Timestamp Conversion: ISO 8601 format (2025-07-19T18:34:39.000Z)
Tag Optimization: Removes tags with <5 uses
Content Preservation: Full text in excerpt field

📊 Results

Final Statistics

Bookmarks: 1,778
Folders: 17 (nested under "X/")
Tags (before cleaning): 1,247 unique, 2,669 total uses
Tags (after cleaning): 203 unique, 7,393 total uses
Average tags per bookmark: 4.2

Top Tags (After Cleaning)

twitter: 1,778 uses
devtools: 840 uses
ai: 588 uses
opensource: 328 uses
GPT: 207 uses

🔧 OpenAI Prompt Strategy

The AI uses a sophisticated prompt that:

Analyzes full text content and URL
References predefined folder and tag lists
Suggests most appropriate folder from existing options
Selects relevant tags from existing list + new important ones
Generates concise, descriptive titles

🚨 Troubleshooting

Common Issues

Raindrop.io Import Problems

Solution: Use raindrop_cleaned.csv (follows exact export format)
Cause: CSV formatting, newlines in fields, wrong headers

OpenAI API Issues

Rate Limits: Script includes 0.2s delays
Invalid Key: Use setup_openai.py or check APIKEY.txt
Cost Control: Test with small subset first

Tag Count Confusion

Raindrop.io reports total tag uses, not unique tags
Use analyze_tags.py to understand the breakdown

File Issues

Large Files: JSON is ~8MB, CSV is ~2MB
Encoding: All files use UTF-8
Line Endings: Handled by Python CSV writer

💡 Lessons Learned

Single API Call: Combining title + classification saves ~50% on API costs
Exact Format Matching: Raindrop.io is strict about CSV format
Tag Cleanup: Essential for usable taxonomy (1,247 → 203 tags)
Text Cleaning: Critical for CSV compatibility
Rate Limiting: Prevents API throttling

🔄 Process Evolution

Initial: Started with twitter_bookmarks.json (no folders/tags)
Manual Enhancement: User added folder and tags fields → twitter_bookmarks_tagged_full.json
V1: json_to_raindrop_csv.py - Basic conversion with timestamp fixes
V2: Added OpenAI title generation
V3: reclassify_with_openai.py - Added folder/tag reclassification (separate API calls)
V4: Optimized to single API call per bookmark
V5: Multiple CSV format attempts to fix Raindrop.io compatibility
V6: create_raindrop_format.py - Final format matching Raindrop export structure
Final: clean_tags.py - Tag cleanup for better taxonomy

📝 Manual Steps Required

Export bookmarks using Twitter Bookmark Exporter → twitter_bookmarks.json
Manually add "folder" and "tags" fields to each bookmark → twitter_bookmarks_tagged_full.json
Obtain OpenAI API key
Place key in APIKEY.txt or set environment variable
Run scripts in sequence (analyze → generate → clean)
Import final CSV to Raindrop.io

🎯 Future Improvements

Batch API calls for better efficiency
Support for other bookmark sources
Custom tag taxonomy rules
Automated import via Raindrop.io API
Progress bars for long operations

📞 Notes for Future Self/Agents

The user prefers Python3 over python
OpenAI API key was provided in APIKEY.txt due to terminal paste issues
Tag cleaning with min 5 uses was crucial for usability
Raindrop.io is very strict about CSV format - use exact export structure
User values cost optimization (single API call approach)
All folders should be nested under "X/"
Always add "twitter" tag to all entries

Last Updated: January 2025
Total Processing Time: ~15 minutes for 1,778 bookmarks
Estimated API Cost: $5-10 USD

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
analyze_folders_tags.py		analyze_folders_tags.py
analyze_tags.py		analyze_tags.py
clean_tags.py		clean_tags.py
create_clean_csv.py		create_clean_csv.py
create_raindrop_json.py		create_raindrop_json.py
folders_list.txt		folders_list.txt
json_to_raindrop_csv.py		json_to_raindrop_csv.py
raindrop_bookmarks.json		raindrop_bookmarks.json
reclassify_with_openai.py		reclassify_with_openai.py
setup_openai.py		setup_openai.py
tags_list.txt		tags_list.txt
twitter_bookmarks.json		twitter_bookmarks.json
twitter_bookmarks_tagged_full.json		twitter_bookmarks_tagged_full.json

rmichelena/x_to_raindrop

Folders and files

Latest commit

History

Repository files navigation