Skip to content

Oscarski/slack-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Slack Channel Exporter

A professional Python tool to export all messages and thread replies from a Slack channel, saving results in both JSONL and CSV formats. Perfect for data analysis, archiving, or migration projects.

🚀 Features

  • Complete Channel Export: Fetches all messages from a specified Slack channel
  • Thread Support: Exports all thread replies for each parent message
  • Dual Format Output: Automatically saves data in both JSONL and CSV formats
  • Environment-Based Security: Uses .env files for secure credential management
  • Rate Limit Handling: Built-in delays to respect Slack API rate limits
  • Error Handling: Robust error handling for API failures and network issues

📁 Project Structure

slack-scraper/
├── data/                   # Exported files (gitignored)
│   ├── <CHANNEL_ID>.jsonl  # Main channel messages
│   ├── <CHANNEL_ID>.csv    # CSV version of messages
│   ├── <CHANNEL_ID>_threads.jsonl  # Thread replies
│   └── <CHANNEL_ID>_threads.csv    # CSV version of threads
├── scripts/
│   └── slack_channel_export.py     # Main export script
├── requirements.txt        # Python dependencies
├── .env.example           # Environment variables template
├── .gitignore            # Git ignore rules
└── README.md             # This file

🛠️ Prerequisites

Before using this tool, you need:

  1. Slack App with Bot Token:

    • Create a Slack app at https://api.slack.com/apps
    • Add a bot user to your app
    • Install the app to your workspace
    • Get your bot token (starts with xoxb-)
  2. Required Bot Scopes:

    • channels:history - Read channel messages
    • groups:history - Read private channel messages (if needed)
    • im:history - Read direct messages (if needed)
    • mpim:history - Read group DMs (if needed)
  3. Channel ID:

    • Right-click on the channel in Slack
    • Select "Copy link" and extract the channel ID from the URL
    • Or use the Slack API to list channels

⚙️ Installation

  1. Clone the repository:

    git clone https://github.com/Oscarski/slack-scraper.git
    cd slack-scraper
  2. Install Python dependencies:

    pip install -r requirements.txt
  3. Set up environment variables:

    cp .env.example .env
  4. Edit .env with your credentials:

    SLACK_TOKEN=xoxb-your-actual-bot-token-here
    CHANNEL_ID=your-channel-id-here

🚀 Usage

Quick Start

Run the export script:

python scripts/slack_channel_export.py

This will:

  • Fetch all messages from your specified channel
  • Export thread replies for each parent message
  • Save results in both JSONL and CSV formats
  • Create files in the data/ directory

Output Files

The script generates four files in the data/ directory:

File Description Format
<CHANNEL_ID>.jsonl Main channel messages JSON Lines
<CHANNEL_ID>.csv Main channel messages CSV
<CHANNEL_ID>_threads.jsonl Thread replies only JSON Lines
<CHANNEL_ID>_threads.csv Thread replies only CSV

Example Output

JSONL format (one JSON object per line):

{"type":"message","user":"U1234567890","text":"Hello world!","ts":"1234567890.123456","thread_ts":"1234567890.123456"}
{"type":"message","user":"U0987654321","text":"This is a reply","ts":"1234567891.123456","thread_ts":"1234567890.123456"}

CSV format (comma-separated values):

type,user,text,ts,thread_ts
message,U1234567890,"Hello world!",1234567890.123456,1234567890.123456
message,U0987654321,"This is a reply",1234567891.123456,1234567890.123456

🔧 Configuration

Environment Variables

Variable Description Example
SLACK_TOKEN Your Slack bot token xoxb-1234567890-abcdef...
CHANNEL_ID Target channel ID C1234567890

Rate Limiting

The script includes built-in rate limiting to respect Slack's API limits:

  • 1-second delay between API calls
  • Automatic handling of rate limit errors
  • Configurable limits in the script

🔒 Security Best Practices

  • Use environment variables for sensitive data
  • Never commit .env files to version control
  • Use bot tokens instead of user tokens
  • Limit bot permissions to only what's needed
  • Don't hardcode tokens in your code
  • Don't share tokens in public repositories

🐛 Troubleshooting

Common Issues

"invalid_auth" error:

  • Check that your SLACK_TOKEN is correct and starts with xoxb-
  • Verify the bot has the required scopes
  • Ensure the bot is installed in your workspace

"channel_not_found" error:

  • Verify the CHANNEL_ID is correct
  • Check that the bot has access to the channel
  • For private channels, ensure the bot was added

Rate limiting errors:

  • The script handles this automatically, but you can increase delays if needed
  • Check your Slack app's rate limits in the API dashboard

Debug Mode

Add debug prints to see what's happening:

# In slack_channel_export.py, add:
print(f"Token: {SLACK_TOKEN[:10]}...")  # Shows first 10 chars
print(f"Channel: {CHANNEL_ID}")

📊 Data Analysis

The exported data can be used for:

  • Message analytics and engagement metrics
  • Content analysis and sentiment analysis
  • User activity patterns and participation rates
  • Thread analysis and conversation flows
  • Data migration to other platforms

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

👨‍💻 Author

Oskar Kościanski

🙏 Acknowledgments

  • Built with slack-sdk
  • Uses python-dotenv for environment management
  • Inspired by the need for better Slack data export tools

⚠️ Important: Always respect Slack's Terms of Service and API usage guidelines when using this tool.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages