Skip to content

Commit 96b3a8e

Browse files
ODAnconaSChattot
andauthored
Major Refactor and New Features: Modularization, Enhanced CLI, and Logging (#20)
* added whitelist and blacklist * added unit test * update rules * fail * unit tests * strong unit tests * update main * compilation * update code * extracted filter * filter test implemented * glob approach * filter in progress * filter test pass * test filter good * split into modules * include ok * logger * docstring * fixed the exclude pattern empty bug * unit test done * git test * code cleaning * README Updated * cleanup * debug cleaned * template support * improve api * test template * git test * clean the code * handling error properly * cleaned * documentation * semver * ver 2.0.0 --------- Co-authored-by: SChattot <stephane.chattot@heig-vd.ch>
1 parent ea2c0a9 commit 96b3a8e

File tree

13 files changed

+1426
-417
lines changed

13 files changed

+1426
-417
lines changed

Cargo.toml

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
[package]
22
name = "code2prompt"
3-
version = "1.1.0"
4-
authors = ["Mufeed VH <mufeed@lyminal.space>"]
3+
version = "2.0.0"
4+
authors = ["Mufeed VH <mufeed@lyminal.space>","Olivier D'Ancona <olivier.dancona@master.hes-so.ch>"]
55
description = "A command-line (CLI) tool to generate an LLM prompt from codebases of any size, fast."
66
keywords = ["code", "prompt", "llm", "gpt", "ai"]
77
categories = ["command-line-utilities", "development-tools"]
@@ -17,6 +17,7 @@ edition = "2021"
1717
name = "code2prompt"
1818
test = false
1919
bench = false
20+
path = "src/main.rs"
2021

2122
[dependencies]
2223
clap = { version = "4.0", features = ["derive"] }
@@ -32,6 +33,10 @@ anyhow = "1.0.80"
3233
inquire = "0.7.1"
3334
regex = "1.10.3"
3435
git2 = { version = "0.18.2", default_features = false, features = [ "https", "vendored-libgit2", "vendored-openssl" ] }
36+
glob = "0.3.1"
37+
once_cell = "1.19.0"
38+
log = "0.4"
39+
env_logger = "0.11.3"
3540
arboard = "3.4.0"
3641

3742
[profile.release]
@@ -44,3 +49,9 @@ section = "utility"
4449
assets = [
4550
["target/release/code2prompt", "/usr/bin/", "755"],
4651
]
52+
53+
[dev-dependencies]
54+
tempfile = "3.3"
55+
assert_cmd = "2.0"
56+
predicates = "2.0"
57+
env_logger = "0.11.3"

README.md

Lines changed: 72 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,4 @@
1-
# `code2prompt`
2-
3-
A CLI tool to convert your codebase into a single LLM prompt with source tree, prompt templating, and token counting.
4-
5-
---
6-
7-
You can run this tool on the entire directory and it would generate a well-formatted Markdown prompt detailing the source tree structure, and all the code. You can then upload this document to either GPT or Claude models with higher context windows and ask it to:
8-
9-
- Rewrite the code to another language.
10-
- Find bugs/security vulnerabilities.
11-
- Document the code.
12-
- Implement new features.
13-
14-
You can customize the prompt template to achieve any of the desired use cases. It essentially traverses a codebase and creates a prompt with all source files combined. In short, it automates copy-pasting multiple source files into your prompt and formatting them along with letting you know how many tokens your code consumes.
15-
16-
> I initially wrote this for personal use to utilize Claude 3.0's 200K context window and it has proven to be pretty useful so I decided to open-source it!
1+
# code2prompt
172

183
[![crates.io](https://img.shields.io/crates/v/code2prompt.svg#cache1)](https://crates.io/crates/code2prompt)
194
[![LICENSE](https://img.shields.io/github/license/mufeedvh/code2prompt.svg#cache1)](https://github.com/mufeedvh/code2prompt/blob/master/LICENSE)
@@ -22,125 +7,137 @@ You can customize the prompt template to achieve any of the desired use cases. I
227
<a href="https://github.com/mufeedvh/code2prompt"><img src=".assets/code2prompt-screenshot.png" alt="code2prompt"></a>
238
</h1>
249

10+
`code2prompt` is a command-line tool (CLI) that converts your codebase into a single LLM prompt with a source tree, prompt templating, and token counting.
11+
2512
## Table of Contents
2613

27-
* [Features](#features)
28-
* [Installation](#installation)
29-
* [Usage](#usage)
30-
* [Templates](#templates)
31-
* [User Defined Variables](#user-defined-variables)
32-
* [Build From Source](#build-from-source)
33-
* [Contribution](#contribution)
34-
* [License](#license)
35-
* [Support The Author](#liked-the-project)
14+
- [Features](#features)
15+
- [Installation](#installation)
16+
- [Usage](#usage)
17+
- [Templates](#templates)
18+
- [User Defined Variables](#user-defined-variables)
19+
- [Tokenizers](#tokenizers)
20+
- [Build From Source](#build-from-source)
21+
- [Contribution](#contribution)
22+
- [License](#license)
23+
- [Support The Author](#support-the-author)
3624

3725
## Features
3826

27+
You can run this tool on the entire directory and it would generate a well-formatted Markdown prompt detailing the source tree structure, and all the code. You can then upload this document to either GPT or Claude models with higher context windows and ask it to:
28+
3929
- Quickly generate LLM prompts from codebases of any size.
4030
- Customize prompt generation with Handlebars templates. (See the [default template](src/default_template.hbs))
41-
- Follows `.gitignore`.
42-
- Filter and exclude files by extension.
43-
- Display the token count of the generated prompt. (See [Tokenizers](#Tokenizers) for more details)
44-
- Optionally include the Git diff output (staged files) in the generated prompt.
45-
- Copy the generated prompt to the clipboard on generation.
31+
- Respects `.gitignore`.
32+
- Filter and exclude files using glob patterns.
33+
- Display the token count of the generated prompt. (See [Tokenizers](#tokenizers) for more details)
34+
- Optionally include Git diff output (staged files) in the generated prompt.
35+
- Automatically copy the generated prompt to the clipboard.
4636
- Save the generated prompt to an output file.
4737
- Exclude files and folders by name or path.
4838
- Add line numbers to source code blocks.
4939

40+
You can customize the prompt template to achieve any of the desired use cases. It essentially traverses a codebase and creates a prompt with all source files combined. In short, it automates copy-pasting multiple source files into your prompt and formatting them along with letting you know how many tokens your code consumes.
41+
5042
## Installation
5143

44+
### Latest Release
45+
5246
Download the latest binary for your OS from [Releases](https://github.com/mufeedvh/code2prompt/releases) OR install with `cargo`:
5347

54-
```
48+
```sh
5549
cargo install code2prompt
5650
```
5751

5852
For unpublished builds:
5953

60-
```
54+
```sh
6155
cargo install --git https://github.com/mufeedvh/code2prompt
6256
```
6357

64-
## Usage
65-
66-
67-
Generate a prompt from a codebase directory:
58+
### Prerequisites
6859

69-
```
70-
code2prompt path/to/codebase
71-
```
60+
For building `code2prompt` from source, you need to have these tools installed:
7261

73-
Use a custom Handlebars template file:
62+
- [Git](https://git-scm.org/downloads)
63+
- [Rust](https://rust-lang.org/tools/install)
64+
- Cargo (Automatically installed when installing Rust)
7465

75-
```
76-
code2prompt path/to/codebase -t path/to/template.hbs
66+
```sh
67+
git clone https://github.com/mufeedvh/code2prompt.git
68+
cd code2prompt/
69+
cargo build --release
7770
```
7871

79-
Filter files by extension:
72+
The first command clones the `code2prompt` repository to your local machine. The next two commands change into the `code2prompt` directory and build it in release mode.
8073

81-
```
82-
code2prompt path/to/codebase -f rs,toml
83-
```
74+
## Usage
8475

85-
Exclude files by extension:
76+
Generate a prompt from a codebase directory:
8677

87-
```
88-
code2prompt path/to/codebase -e txt,md
78+
```sh
79+
code2prompt path/to/codebase
8980
```
9081

91-
Exclude files by name:
82+
Use a custom Handlebars template file:
9283

93-
```
94-
code2prompt path/to/codebase --exclude-files "file1.txt,file2.txt"
84+
```sh
85+
code2prompt path/to/codebase -t path/to/template.hbs
9586
```
9687

97-
Exclude files by folder/directory path:
88+
Filter files using glob patterns:
9889

99-
```
100-
code2prompt path/to/codebase --exclude-folders "tests,docs"
90+
```sh
91+
code2prompt path/to/codebase --include="*.rs,*.toml"
10192
```
10293

103-
Use relative paths instead of absolute paths:
94+
Exclude files using glob patterns:
10495

105-
```
106-
code2prompt path/to/codebase --relative-paths
96+
```sh
97+
code2prompt path/to/codebase --exclude="*.txt,*.md"
10798
```
10899

109-
Display token count of the generated prompt:
100+
Display the token count of the generated prompt:
110101

111-
```
102+
```sh
112103
code2prompt path/to/codebase --tokens
113104
```
114105

115-
Specify tokenizer for token count:
106+
Specify a tokenizer for token count:
116107

108+
```sh
109+
code2prompt path/to/codebase --tokens --encoding=p50k
117110
```
118-
code2prompt path/to/codebase --tokens --encoding p50k
119-
```
120-
121-
Supported tokenizers: `cl100k`, `p50k`, `p50k_edit`, `r50k_base`.
122111

112+
Supported tokenizers: `cl100k`, `p50k`, `p50k_edit`, `r50k_bas`.
123113
> [!NOTE]
124-
> See [Tokenizers](#Tokenizers) for more details.
114+
> See [Tokenizers](#tokenizers) for more details.
125115
126116
Save the generated prompt to an output file:
127117

128-
```
129-
code2prompt path/to/codebase -o output.txt
118+
```sh
119+
code2prompt path/to/codebase --output=output.txt
130120
```
131121

132-
Generate git commit message (for staged files):
122+
Generate a Git commit message (for staged files):
133123

134-
```
135-
code2prompt path/to/codebase --diff -t "templates/write-git-commit.hbs"
124+
```sh
125+
code2prompt path/to/codebase --diff -t templates/write-git-commit.hbs
136126
```
137127

138128
Add line numbers to source code blocks:
139129

140-
```
130+
```sh
141131
code2prompt path/to/codebase --line-number
142132
```
143133

134+
- Rewrite the code to another language.
135+
- Find bugs/security vulnerabilities.
136+
- Document the code.
137+
- Implement new features.
138+
139+
> I initially wrote this for personal use to utilize Claude 3.0's 200K context window and it has proven to be pretty useful so I decided to open-source it!
140+
144141
## Templates
145142

146143
`code2prompt` comes with a set of built-in templates for common use cases. You can find them in the [`templates`](templates) directory.
@@ -153,7 +150,7 @@ Use this template to generate prompts for documenting the code. It will add docu
153150

154151
Use this template to generate prompts for finding potential security vulnerabilities in the codebase. It will look for common security issues and provide recommendations on how to fix or mitigate them.
155152

156-
### [`clean-up-code.hbs`](templates/clean-up-code.hbs)
153+
### [`clean-up-code.hbs`](templates/clean-up-code.hbs)
157154

158155
Use this template to generate prompts for cleaning up and improving the code quality. It will look for opportunities to improve readability, adherence to best practices, efficiency, error handling, and more.
159156

@@ -175,7 +172,7 @@ Use this template to generate prompts for improving the performance of the codeb
175172

176173
You can use these templates by passing the `-t` flag followed by the path to the template file. For example:
177174

178-
```
175+
```sh
179176
code2prompt path/to/codebase -t templates/document-the-code.hbs
180177
```
181178

@@ -206,24 +203,6 @@ For more context on the different tokenizers, see the [OpenAI Cookbook](https://
206203

207204
`code2prompt` makes it easy to generate prompts for LLMs from your codebase. It traverses the directory, builds a tree structure, and collects information about each file. You can customize the prompt generation using Handlebars templates. The generated prompt is automatically copied to your clipboard and can also be saved to an output file. `code2prompt` helps streamline the process of creating LLM prompts for code analysis, generation, and other tasks.
208205

209-
## Build From Source
210-
211-
### Prerequisites
212-
213-
For building `code2prompt` from source, you need to have these tools installed:
214-
215-
* [Git](https://git-scm.org/downloads)
216-
* [Rust](https://rust-lang.org/tools/install)
217-
* Cargo (Automatically installed when installing Rust)
218-
219-
```
220-
$ git clone https://github.com/mufeedvh/code2prompt.git
221-
$ cd code2prompt/
222-
$ cargo build --release
223-
```
224-
225-
The first command clones the `code2prompt` repository to your local machine. The next two commands change into the `code2prompt` directory and build it in release mode.
226-
227206
## Contribution
228207

229208
Ways to contribute:
@@ -240,4 +219,4 @@ Licensed under the MIT License, see <a href="https://github.com/mufeedvh/code2pr
240219

241220
## Liked the project?
242221

243-
If you liked the project and found it useful, please give it a :star: and consider supporting the author!
222+
If you liked the project and found it useful, please give it a :star: and consider supporting the authors!

src/filter.rs

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
//! This module contains the logic for filtering files based on include and exclude patterns.
2+
3+
use colored::*;
4+
use glob::Pattern;
5+
use log::{debug, error};
6+
use std::fs;
7+
use std::path::Path;
8+
9+
/// Determines whether a file should be included based on include and exclude patterns.
10+
///
11+
/// # Arguments
12+
///
13+
/// * `path` - The path to the file to be checked.
14+
/// * `include_patterns` - A slice of strings representing the include patterns.
15+
/// * `exclude_patterns` - A slice of strings representing the exclude patterns.
16+
/// * `include_priority` - A boolean indicating whether to give priority to include patterns if both include and exclude patterns match.
17+
///
18+
/// # Returns
19+
///
20+
/// * `bool` - `true` if the file should be included, `false` otherwise.
21+
pub fn should_include_file(
22+
path: &Path,
23+
include_patterns: &[String],
24+
exclude_patterns: &[String],
25+
include_priority: bool,
26+
) -> bool {
27+
// ~~~ Clean path ~~~
28+
let canonical_path = match fs::canonicalize(path) {
29+
Ok(path) => path,
30+
Err(e) => {
31+
error!("Failed to canonicalize path: {}", e);
32+
return false;
33+
}
34+
};
35+
let path_str = canonical_path.to_str().unwrap();
36+
37+
// ~~~ Check glob patterns ~~~
38+
let included = include_patterns
39+
.iter()
40+
.any(|pattern| Pattern::new(pattern).unwrap().matches(path_str));
41+
let excluded = exclude_patterns
42+
.iter()
43+
.any(|pattern| Pattern::new(pattern).unwrap().matches(path_str));
44+
45+
// ~~~ Decision ~~~
46+
let result = match (included, excluded) {
47+
(true, true) => include_priority, // If both include and exclude patterns match, use the include_priority flag
48+
(true, false) => true, // If the path is included and not excluded, include it
49+
(false, true) => false, // If the path is excluded, exclude it
50+
(false, false) => include_patterns.is_empty(), // If no include patterns are provided, include everything
51+
};
52+
53+
debug!(
54+
"Checking path: {:?}, {}: {}, {}: {}, decision: {}",
55+
path_str,
56+
"included".bold().green(),
57+
included,
58+
"excluded".bold().red(),
59+
excluded,
60+
result
61+
);
62+
result
63+
}

0 commit comments

Comments
 (0)