Skip to content

CLI & Python tool to serialize Git repos into structured text with code, structure, and filtering options.

Notifications You must be signed in to change notification settings

marutilai/repo-serializer

Repository files navigation

Repo Serializer

A Python utility for serializing local Git repositories into a structured text file, capturing the directory structure (in ASCII format), file names, and contents of source files. Ideal for providing a comprehensive snapshot of a repository for code review, documentation, or interaction with large language models (LLMs).

Installation

# Install from PyPI
pip install repo-serializer

# For development
pip install "repo-serializer[dev]"

Usage

Command Line

# Basic usage
repo-serializer /path/to/repository

# Specify output file
repo-serializer /path/to/repository -o output.txt

# Copy to clipboard in addition to saving to file
repo-serializer /path/to/repository -c

# Use structure-only mode to output only the directory structure and filenames
repo-serializer /path/to/repository -s

# Skip specific directories (can be used multiple times or as a comma-separated list)
repo-serializer /path/to/repository --skip-dir build,dist
repo-serializer /path/to/repository --skip-dir build --skip-dir dist

# Only include Python files (.py, .ipynb)
repo-serializer /path/to/repository --python

# Only include JavaScript/TypeScript files
repo-serializer /path/to/repository --javascript

# Combine with other options
repo-serializer /path/to/repository --python -s -c  # Python files, structure only, copy to clipboard

# Extract AI/LLM prompts from the repository
repo-serializer /path/to/repository -p
repo-serializer /path/to/repository --prompt -o prompts.txt

Python API

from repo_serializer import serialize

# Serialize a repository, skipping specific directories
serialize("/path/to/repository", "output.txt", skip_dirs=["build", "dist"])

Features

  • Directory Structure: Clearly visualize repository structure in ASCII format.
  • Structure-Only Mode: Option to output only the directory structure and filenames without file contents.
  • File Filtering: Excludes common binary files, cache directories, hidden files, and irrelevant artifacts to keep outputs concise and focused.
  • Smart Content Handling:
    • Parses Jupyter notebooks to extract markdown and code cells with sample outputs
    • Limits CSV files to first 5 lines
    • Truncates large text files after 1000 lines
    • Handles non-UTF-8 and binary files gracefully
  • Extensive Filtering: Skips common configuration files, build artifacts, test directories, and more.
  • Clipboard Integration: Option to copy output directly to clipboard.
  • Prompt Extraction: Extract AI/LLM prompts from various sources:
    • Inline strings in Python/JavaScript code near LLM API calls (OpenAI, Anthropic, etc.)
    • Standalone prompt files (.prompt.txt, .prompt.md, etc.)
    • YAML/JSON configuration files with prompt definitions
    • Jupyter notebooks containing prompts
    • Markdown files in prompt-related directories

Example

# Create a serialized snapshot of your project
repo-serializer /Users/example_user/projects/my_repo -o repo_snapshot.txt

Prompt Extraction

The prompt extraction feature (-p or --prompt) helps you analyze and audit AI/LLM prompts used throughout your codebase. This is particularly useful for:

  • Reviewing prompt consistency and quality
  • Refactoring duplicate prompts
  • Creating prompt libraries
  • Documenting AI behavior
  • Security auditing of prompts

What Gets Extracted

  1. Inline Prompts in Code

    • Strings near OpenAI, Anthropic, and other LLM API calls
    • Multi-line strings assigned to prompt-related variables
    • Template literals in JavaScript/TypeScript
  2. Standalone Prompt Files

    • Files with extensions like .prompt.txt, .prompt.md
    • Files in prompts/ or similar directories
    • YAML/JSON files with prompt configurations
  3. Configuration Files

    • YAML/JSON files with keys like prompt, system_prompt, instructions
    • Nested prompt definitions in configuration objects

Example Output

Found 5 prompts in the repository:
================================================================================

File: src/ai_agent.py
--------------------------------------------------------------------------------

Line 15 (inline_string) - Near LLM API call:

You are an expert Python developer. Your task is to help users write clean, efficient, and well-documented Python code. Follow these guidelines:

  1. Always use proper error handling
  2. Write comprehensive docstrings
  3. Follow PEP 8 style guidelines

File: config/prompts.yaml
--------------------------------------------------------------------------------

Line 3 (config_file) - Key: system:

You are a code reviewer. Analyze the provided code for:

  • Security vulnerabilities
  • Performance issues
  • Code quality and maintainability

Contributing

Pull requests and improvements are welcome! Please ensure your contributions are clearly documented and tested.

Development

Quick Testing

For quick testing during development:

# Install in development mode
pip install -e .

# Now any changes to the source code take effect immediately
repo-serializer /path/to/test/repo -o test_output.txt

Full Test Suite

Run the test script:

./dev/test_dev.py

This will:

  1. Install the package in development mode
  2. Run multiple test scenarios
  3. Generate test outputs for review

About

CLI & Python tool to serialize Git repos into structured text with code, structure, and filtering options.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published