Skip to content

the-provost/combiner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Codebase Consolidator

Overview

This Bash script helps prepare large codebases for AI analysis by consolidating files into a single text file. It's specifically designed to optimize token usage when feeding code into AI assistants (like Claude, GPT, etc.) for codebase analysis and questions.

Purpose

When working with AI assistants on large code projects, token limitations can prevent uploading the entire codebase. This tool solves that problem by:

  • Including all text-based files with their content (code, markdown, config files, etc.)
  • Only listing media file paths without their binary content
  • Creating a single, well-formatted text file that provides the AI with a complete view of your codebase structure

Use Cases

Code Analysis & Documentation

  • Creating comprehensive codebases for AI code review tools like Claude or ChatGPT
  • Generating documentation snapshots of entire projects
  • Code auditing and analysis across multiple files
  • Creating backups of text-based project files

Content Management

  • Consolidating documentation spread across multiple files
  • Creating searchable archives of text content
  • Merging configuration files for analysis
  • Collecting logs or data files for processing

Development Workflows

  • Preparing codebases for LLM-assisted refactoring or debugging
  • Creating training datasets from code repositories
  • Generating comprehensive project overviews for new team members
  • Archiving project states at specific milestones

Data Processing

  • Aggregating CSV, JSON, or other structured data files
  • Collecting configuration files for batch processing
  • Merging scattered text files into single documents

Installation

Make it globally available (Recommended)

  1. Create a local bin directory and move the script:

    mkdir -p ~/bin
    mv combiner.sh ~/bin/combiner
    chmod +x ~/bin/combiner
  2. Add ~/bin to your PATH (if not already there):

    echo 'export PATH="$HOME/bin:$PATH"' >> ~/.bashrc
    source ~/.bashrc
  3. Now you can use combiner from anywhere:

    combiner /path/to/your/project

Alternative: System-wide installation

sudo mv combiner.sh /usr/local/bin/combiner
sudo chmod +x /usr/local/bin/combiner

Alternative: Shell alias

echo "alias combiner='/path/to/your/combiner.sh'" >> ~/.bashrc
source ~/.bashrc

Usage

combiner <directory>

Example:

combiner ~/projects/my-web-app

How It Works

  1. Recursively traverses the specified directory
  2. Automatically excludes common directories that waste tokens:
    • Node.js: node_modules, .npm, debug logs
    • Python: __pycache__, venv, build, dist, *.egg-info
    • Laravel/PHP: vendor, storage/logs, bootstrap/cache
    • General: .git, .vscode, .idea, cache folders, build artifacts
  3. For code and text files: Includes both file path and complete content
  4. For media files (gif, jpg, jpeg, png, psd, svg, eps): Includes only the file path to save tokens
  5. Creates a single output file (combined_files.txt) containing only your actual codebase

Best Practices for AI Analysis

  • Run this script on your project's root directory
  • The script automatically filters out dependencies and build artifacts for optimal token usage
  • Upload the resulting combined_files.txt to your AI assistant
  • Ask specific questions about your code structure, implementation details, or potential improvements
  • Reference specific files or components in your questions

Example Workflow

  1. Run: combiner ~/projects/my-web-app
  2. Upload the generated combined_files.txt to your AI assistant
  3. Ask: "Can you explain how the authentication flow works across the codebase?"

Contributing

Contributions are welcome! Please:

  • Fork the repository
  • Create a feature branch
  • Submit a pull request with a clear description
  • Ensure your contributions are compatible with GPL v3

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

What this means:

  • ✅ You can use, modify, and distribute this software
  • ✅ You can use it commercially
  • ⚠️ Any derivative works must also be licensed under GPL v3
  • ⚠️ You must include the license and copyright notice
  • ⚠️ You must disclose the source code of any distributed modifications

Notes

  • The script intelligently excludes dependency folders (node_modules, vendor, etc.) and build artifacts to optimize token usage
  • For very large projects, you might want to run this on specific subdirectories
  • The script preserves file paths, making it easy for the AI to understand project structure
  • Binary and media files are just listed by path to prevent token waste on non-textual content
  • Output includes a summary showing how many directories were skipped for transparency

About

Codebase Combiner to get a single text file of your whole codebase

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages