Skip to content

Commit b032d5b

Browse files
authored
Update README.md
1 parent 8ba6bc4 commit b032d5b

File tree

1 file changed

+70
-5
lines changed

1 file changed

+70
-5
lines changed

README.md

Lines changed: 70 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,11 @@ One can use `CEDARScript` to concisely and unambiguously represent code modifica
2323

2424
IDEs can store the local history of files in CEDARScript format, and this can also be used for searches.
2525

26+
### Other Ideas to Explore
27+
- Code review systems for automated, in-depth code assessments
28+
- Automated code documentation and explanation tools
29+
- ...
30+
2631
## Key Features:
2732

2833
- **SQL-like syntax** for intuitive code querying and manipulation;
@@ -32,14 +37,14 @@ IDEs can store the local history of files in CEDARScript format, and this can al
3237
- Avoids wasted time and tokens on failed search/replace operations caused by misplaced spaces, indentations or typos;
3338
- **High-level abstractions** for complex refactoring operations via refactoring languages (currently supports Rope syntax);
3439
- **[Relative indentation](grammar.js#L301-L366)** for easily maintaining proper code structure;
35-
- Allows fetching or modify targeted parts of code;
40+
- Allows fetching or modifying targeted parts of code;
3641
- **Locations in code**: Doesn't use line numbers. Instead, offers [more resilient alternatives](grammar.js#L241-L297), like:
3742
- **[Line](grammar.js#L243-L246)** markers. Ex:
3843
- `LINE "if name == 'some name':"`
3944
- **[Identifier](grammar.js#L248-L251)** markers (`VARIABLE`, `FUNCTION`, `CLASS`). Ex:
4045
- `FUNCTION 'my_function'`
4146
- **Language-agnostic design** for versatile code analysis
42-
- **[Code analysis operations](grammar.js#L192-L219)** return results in XML format for easier parsing and processing by LLM systems.
47+
- **[Code analysis operations](grammar.js#L192-L219)** return results in XML format for easier parsing and processing by LLM (Large Language Model) systems.
4348

4449
## Examples
4550

@@ -82,8 +87,67 @@ FROM ONBOARDING
8287

8388
# Future Work
8489

85-
1. Select a model to fine-tune so that it natively understands `CEDARScript`;
86-
2. Provide language extensions that will improve how LLMs interact with other tools, APIs, and resource types;
90+
1. Create a browser extension that allows web-chat interfaces of Large Language Models to tackle larger file changes
91+
2. Select a model to fine-tune so that it natively understands `CEDARScript`;
92+
3. Provide language extensions that will improve how LLMs interact with other resource types;
93+
4. Explore using it as an **LLM-Tool Interface**
94+
95+
## CEDARScript Browser Extension for LLM Web Interfaces
96+
97+
<details>
98+
<summary>As Large Language Models (LLMs) become increasingly accessible through web-based chat interfaces, there's a growing need to enhance their ability to handle larger codebases and complex file changes. We propose developing a browser extension that leverages CEDARScript to bridge this gap.</summary>
99+
100+
- **Seamless Integration**: The extension would integrate with popular LLM web interfaces (e.g., ChatGPT, Claude, Gemini) by leveraging [llm-context.py](https://github.com/cyberchitta/llm-context.py), allowing users to work with larger files and codebases directly within these platforms.
101+
102+
- **CEDARScript Translation**: The changes proposed by the LLM would be concisely expressed as `CEDARScript` commands, enabling more efficient token usage.
103+
104+
- **Local File System Access**: The extension could securely access the user's local file system, allowing for direct manipulation of code files based on `CEDARScript` instructions generated by the LLM.
105+
106+
- **Diff Visualization**: Changes proposed by the LLM would be presented as standard diffs _or_ as `CEDARScript` code, allowing users to review and approve modifications before applying them to their codebase.
107+
108+
- **Context Preservation**: The extension would maintain context across chat sessions, enabling long-running refactoring tasks that span multiple interactions.
109+
110+
This browser extension would expand the capabilities of web-based LLM interfaces, allowing developers to leverage these powerful AI tools for more substantial code modification and analysis tasks. By using CEDARScript as an intermediary language, the extension would ensure efficient and accurate communication between the user, the LLM, and the local codebase.
111+
112+
</details>
113+
114+
## Fine-tuning a Model for Native CEDARScript Understanding
115+
116+
<details>
117+
118+
<summary>This initiative could enhance the efficiency and effectiveness of AI-assisted code analysis and transformation.</summary>
119+
120+
### Why Fine-tune?
121+
122+
1. **Improved Accuracy**: A fine-tuned model will have a deeper understanding of CEDARScript syntax and semantics, leading to more accurate code analysis and generation.
123+
2. **Efficiency**: Native understanding of CEDARScript will reduce the need for extensive prompting.
124+
3. **Consistency**: A model trained specifically on CEDARScript will produce more consistent and idiomatic output, adhering closely to the language's conventions and best practices.
125+
4. **Extended Capabilities**: Fine-tuning could enable the model to perform more complex CEDARScript operations and understand nuanced aspects of the language that general-purpose models might miss.
126+
127+
### Approach
128+
129+
1. **Model Selection**: We will evaluate various state-of-the-art language models to determine the most suitable base model for fine-tuning. Factors such as model size, pre-training data, and architectural features will be considered.
130+
2. **Dataset Creation**: A comprehensive dataset of CEDARScript examples, covering a wide range of use cases and complexities, will be created. This dataset will include both CEDARScript commands and their corresponding natural language descriptions or intentions.
131+
3. **Fine-tuning Process**: The selected model will undergo fine-tuning using the created dataset. We'll experiment with different fine-tuning techniques, depending on the resources available and the desired outcome.
132+
4. **Evaluation**: The fine-tuned model will be rigorously tested on a held-out test set to assess its performance in understanding and generating CEDARScript. Metrics such as accuracy, fluency, and task completion will be used.
133+
5. **Iterative Improvement**: Based on the evaluation results, we'll iteratively refine the fine-tuning process, potentially adjusting the dataset, fine-tuning parameters, or even the base model selection.
134+
135+
</details>
136+
137+
## LLM-Tool Interface
138+
139+
<details>
140+
141+
<summary>As Large Language Models continue to evolve and find applications in various real-world scenarios, there's a growing need for standardized ways for LLMs to interact with external tools and APIs. We envision `CEDARScript` as a potential solution to this challenge.</summary>
142+
143+
- **Standardized Tool Interaction**: `CEDARScript` could serve as an intermediary language between LLMs and various tools, providing a consistent, SQL-like syntax for expressing tool usage intentions.
144+
- **Tool-Agnostic Commands**: By defining a set of generic commands that map to common tool functionalities, `CEDARScript` could simplify the process of integrating new tools and APIs.
145+
- **Complex Tool Pipelines**: The language's SQL-like structure could allow for easy chaining of multiple tool operations, enabling more complex workflows.
146+
- **Abstraction of API Complexity**: CEDARScript could hide the underlying complexity of diverse tool APIs behind a simpler, unified interface.
147+
148+
This approach could potentially enhance LLMs' ability to leverage external tools and capabilities, making it easier to deploy them in diverse real-world applications. Future work could explore the feasibility and implementation of this concept, aiming to create a more seamless integration between LLMs and the tools they use to interact with the world.
149+
150+
</details>
87151

88152
# Related
89153

@@ -95,7 +159,8 @@ FROM ONBOARDING
95159

96160
# See Also
97161
1. [OpenAI Fine-tuning](https://platform.openai.com/docs/guides/fine-tuning/common-use-cases)
162+
2. [llm-context.py](https://github.com/cyberchitta/llm-context.py)
98163

99164
# Unrelated
100165

101-
1. [Cedar Policy Language](https://www.cedarpolicy.com/)
166+
1. [Cedar Policy Language](https://www.cedarpolicy.com/) (`CEDARScript` is _not_ a policy language)

0 commit comments

Comments
 (0)