This project processes conversation data exported from ChatGPT (in JSON format) and converts it into individual Markdown files. The application is designed to work with modern ChatGPT export formats and organizes the output by default into a structured folder hierarchy based on year and month. Each Markdown file is prefixed with a date-time stamp in the format YYMMDD-HHMM
followed by the sanitized conversation title, making it easy to track when each conversation occurred.
You can export your ChatGPT data by following the official OpenAI documentation:
How do I export my ChatGPT history and data?
This repository starts with a single commit to provide a clean, focused version for public sharing.
- Python 3.x (tested with Python 3.10+)
- Command line/terminal access
- Clone or download this repository to your local machine.
- Create your local configuration file:
cp config/config.py config/config_loc.py
- Edit
config/config_loc.py
to set your specific paths:INPUT_FOLDER = "/path/to/your/input/folder" ARCHIVE_DIR = "/path/to/your/output/folder"
- No additional libraries are required since the script uses only the Python Standard Library.
To run the script with the default settings (which use the folder hierarchy):
python main.py
This will:
- Process the conversations JSON file from the configured INPUT_FOLDER
- Output Markdown files to ARCHIVE_DIR organized by year and month
- Create/update a CSV index file (
_index.csv
) in the output directory - Skip unchanged conversations based on content hash comparison
If you prefer to have all Markdown files output into a single folder without the date-based hierarchy, run:
python main.py --no-folders
If you haven't created config/config_loc.py
, the script will use default relative paths:
- Input:
./input/conversations.json
- Output:
./output/
This allows immediate testing with sample data placed in an input
folder.
By default, the output will be organized as follows:
output/
├── _index.csv # CSV index of all conversations
├── 2023/
│ ├── 230300/ # Month folder (YYMM00)
│ │ ├── 230215-1030 Conversation Title.md
│ │ ├── 230220-1405 Another Conversation.md
├── 2024/
├── 240100/
│ ├── 240105-0915 Yet Another Chat.md
With the --no-folders
option, all Markdown files will be placed directly in the output directory.
main.py
: The main script that orchestrates the conversion processparser.py
: Handles parsing the conversation structure from JSONchat_processor.py
: Contains functions for analyzing conversations and metadatautils.py
: Utility functions for file handling and CSV operationsconfig/
: Configuration files (createconfig_loc.py
for your paths)README.md
: This documentation file
- Converts ChatGPT conversations to Markdown with proper formatting
- Detects and properly formats web search results and code execution outputs
- Identifies voice conversations and includes audio duration information
- Maintains a CSV index with metadata about each conversation
- Prevents duplicate processing by tracking content hashes
- Provides summary statistics about conversation types and content
When you run the script, it provides an analysis summary in the terminal that looks like this:
=== Analysis Summary ===
Total Role Counts:
system: 745
user: 17475
tool: 3126
assistant: 20947
Total Tool Types:
unknown: 19
Total Assistant JSON Instances: 0
This summary provides:
- Role Counts: How many messages from each participant type (system, user, assistant, tool)
- Tool Types: Counts of different tool types used in conversations
- "unknown" tool types are those not explicitly handled by the script (currently only "web_search" and "code_interpreter" have specific handlers)
- Assistant JSON Instances: Count of JSON data found in assistant messages
The script uses content hashing to avoid reprocessing unchanged conversations. When a conversation is unchanged, you'll see output like:
Skipped: /path/to/output/2023/230100/230131-1731 Example Conversation.md (unchanged)
This efficiency feature helps when repeatedly processing large export files.
This project is open source and available under the MIT License.
This script is a modified version of the gavi/chatgpt-markdown repository. Proper attribution is given to the original author, and modifications have been made to support configurable input/output directories and to implement a folder hierarchy based on year and month.
For more details, please refer to the LICENSE file.