Skip to content

Commit b73614f

Browse files
committed
Move sections to the proposals folder
1 parent 54f41e5 commit b73614f

File tree

7 files changed

+189
-201
lines changed

7 files changed

+189
-201
lines changed

README.md

Lines changed: 0 additions & 201 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,6 @@
1414
- [Supported Languages](#supported-languages)
1515
- [How can CEDARScript be used](#how-can-cedarscript-be-used)
1616
- [Examples](#examples)
17-
- [Planned Features](#planned-features)
18-
- [Future Enhancements](#future-enhancements)
1917
- [Proposals](#proposals)
2018
- [Related](#related)
2119

@@ -230,207 +228,8 @@ UPDATE FILE "app/main.py" REPLACE FUNCTION "calculate_total" WITH ED '''
230228
''';
231229
```
232230

233-
234-
235231
There are [many more examples](test/corpus) to look at...
236232

237-
# Planned Features
238-
239-
## Onboarding Capabilities
240-
241-
This capability is designed to help developers, AI assistants, and other tools quickly gain a comprehensive understanding of a project's structure, conventions, and context.
242-
243-
### Key Onboarding Features
244-
<details>
245-
1. **Convention Discovery**:
246-
CEDARScript can automatically extract coding conventions from designated files like `CONVENTIONS.md`:
247-
248-
```sql
249-
SELECT CONVENTIONS
250-
FROM ONBOARDING;
251-
```
252-
253-
2. **Context Retrieval**:
254-
Quickly access project context from files like `.context.md` or `.contextdocs.md`:
255-
256-
```sql
257-
SELECT CONTEXT
258-
FROM ONBOARDING;
259-
```
260-
261-
3. **Comprehensive Project Overview**:
262-
Gather all essential project information in one query:
263-
264-
```sql
265-
SELECT *
266-
FROM ONBOARDING;
267-
```
268-
</details>
269-
270-
# Future Work
271-
272-
## Future Enhancements
273-
274-
<details>
275-
<summary>Ideas to explore:</summary>
276-
277-
- Automatic generation of project structure visualizations
278-
- Integration with version control history for context-aware onboarding
279-
- Customizable onboarding queries for specific project needs
280-
281-
282-
1. [Tree-Sitter query language](https://cycode.com/blog/tips-for-using-tree-sitter-queries/) integration, which could open up many possibilities;
283-
2. [Comby](https://github.com/comby-tools/comby) notation for an alternate syntax to express refactorings on code or data formats;
284-
3. Create a browser extension that allows web-chat interfaces of Large Language Models to tackle larger file changes;
285-
4. Select a model to fine-tune so that it natively understands `CEDARScript`;
286-
5. Provide language extensions that will improve how LLMs interact with other resource types;
287-
6. Explore using it as an **LLM-Tool Interface**;
288-
289-
## Tree-Sitter Query Language Integration
290-
291-
This could open up many possibilities, like:
292-
293-
### Advanced Code Analysis: provide statistics about functions in the project
294-
295-
```sql
296-
QUERY LANGUAGE 'tree-sitter'
297-
FROM PROJECT
298-
PATTERN '''
299-
(function_definition
300-
name: (identifier) @func_name
301-
parameters: (parameters) @params
302-
body: (block
303-
(return_statement) @return_stmt))
304-
'''
305-
WITH ANALYSIS
306-
COUNT @func_name AS "Total Functions"
307-
AVERAGE (LENGTH @params) AS "Avg Parameters"
308-
PERCENTAGE (IS_PRESENT @return_stmt) AS "Functions with Return";
309-
```
310-
311-
### Concisely modify all methods
312-
313-
Find all classes and their methods in Python files, then insert a print statement after each method definition:
314-
315-
```sql
316-
QUERY LANGUAGE 'tree-sitter'
317-
FROM PROJECT
318-
PATTERN '''
319-
(class_definition
320-
name: (identifier) @class_name
321-
body: (block
322-
(function_definition
323-
name: (identifier) @method_name)))
324-
'''
325-
WITH ACTIONS
326-
INSERT AFTER @method_name
327-
CONTENT '''
328-
@0: print("Method called:", @method_name)
329-
''';
330-
```
331-
332-
### Cross-language refactoring: replace all calls to "deprecated_function" across Python, JavaScript, and TypeScript files.
333-
334-
```sql
335-
QUERY LANGUAGE 'tree-sitter'
336-
FROM PROJECT
337-
LANGUAGES ["python", "javascript", "typescript"]
338-
PATTERN '''
339-
(call_expression
340-
function: (identifier) @func_name
341-
(#eq? @func_name "deprecated_function"))
342-
'''
343-
WITH ACTIONS
344-
REPLACE @func_name
345-
WITH CONTENT "new_function";
346-
```
347-
348-
### Custom Linting Rules:
349-
We can define project-specific linting rules using Tree-sitter queries:
350-
351-
```sql
352-
QUERY LANGUAGE 'tree-sitter'
353-
FROM PROJECT
354-
PATTERN '''
355-
(import_statement
356-
(dotted_name) @import_name
357-
(#match? @import_name "^(os|sys)$"))
358-
'''
359-
WITH LINT
360-
SEVERITY "WARNING"
361-
MESSAGE "Direct import of system modules discouraged. Use custom wrappers instead.";
362-
```
363-
364-
## Comby Notation
365-
366-
To replace 'failUnlessEqual' with 'assertEqual':
367-
```sql
368-
UPDATE PROJECT
369-
REAFCTOR LANGUAGE "comby"
370-
WITH PATTERN '''
371-
comby 'failUnlessEqual(:[a],:[b])' 'assertEqual(:[a],:[b])' example.py
372-
'''
373-
```
374-
375-
## CEDARScript Browser Extension for LLM Web Interfaces
376-
377-
<details>
378-
<summary>As Large Language Models (LLMs) become increasingly accessible through web-based chat interfaces, there's a growing need to enhance their ability to handle larger codebases and complex file changes. We propose developing a browser extension that leverages CEDARScript to bridge this gap.</summary>
379-
380-
- **Seamless Integration**: The extension would integrate with popular LLM web interfaces (e.g., ChatGPT, Claude, Gemini) by leveraging [llm-context.py](https://github.com/cyberchitta/llm-context.py), allowing users to work with larger files and codebases directly within these platforms.
381-
382-
- **CEDARScript Translation**: The changes proposed by the LLM would be concisely expressed as `CEDARScript` commands, enabling more efficient token usage.
383-
384-
- **Local File System Access**: The extension could securely access the user's local file system, allowing for direct manipulation of code files based on `CEDARScript` instructions generated by the LLM.
385-
386-
- **Diff Visualization**: Changes proposed by the LLM would be presented as standard diffs _or_ as `CEDARScript` code, allowing users to review and approve modifications before applying them to their codebase.
387-
388-
- **Context Preservation**: The extension would maintain context across chat sessions, enabling long-running refactoring tasks that span multiple interactions.
389-
390-
This browser extension would expand the capabilities of web-based LLM interfaces, allowing developers to leverage these powerful AI tools for more substantial code modification and analysis tasks. By using CEDARScript as an intermediary language, the extension would ensure efficient and accurate communication between the user, the LLM, and the local codebase.
391-
392-
</details>
393-
394-
## Fine-tuning a Model for Native CEDARScript Understanding
395-
396-
<details>
397-
398-
<summary>This initiative could enhance the efficiency and effectiveness of AI-assisted code analysis and transformation.</summary>
399-
400-
### Why Fine-tune?
401-
402-
1. **Improved Accuracy**: A fine-tuned model will have a deeper understanding of CEDARScript syntax and semantics, leading to more accurate code analysis and generation.
403-
2. **Efficiency**: Native understanding of CEDARScript will reduce the need for extensive prompting.
404-
3. **Consistency**: A model trained specifically on CEDARScript will produce more consistent and idiomatic output, adhering closely to the language's conventions and best practices.
405-
4. **Extended Capabilities**: Fine-tuning could enable the model to perform more complex CEDARScript operations and understand nuanced aspects of the language that general-purpose models might miss.
406-
407-
### Approach
408-
409-
1. **Model Selection**: We will evaluate various state-of-the-art language models to determine the most suitable base model for fine-tuning. Factors such as model size, pre-training data, and architectural features will be considered.
410-
2. **Dataset Creation**: A comprehensive dataset of CEDARScript examples, covering a wide range of use cases and complexities, will be created. This dataset will include both CEDARScript commands and their corresponding natural language descriptions or intentions.
411-
3. **Fine-tuning Process**: The selected model will undergo fine-tuning using the created dataset. We'll experiment with different fine-tuning techniques, depending on the resources available and the desired outcome.
412-
4. **Evaluation**: The fine-tuned model will be rigorously tested on a held-out test set to assess its performance in understanding and generating CEDARScript. Metrics such as accuracy, fluency, and task completion will be used.
413-
5. **Iterative Improvement**: Based on the evaluation results, we'll iteratively refine the fine-tuning process, potentially adjusting the dataset, fine-tuning parameters, or even the base model selection.
414-
415-
</details>
416-
417-
## LLM-Tool Interface
418-
419-
<details>
420-
421-
<summary>As Large Language Models continue to evolve and find applications in various real-world scenarios, there's a growing need for standardized ways for LLMs to interact with external tools and APIs. We envision `CEDARScript` as a potential solution to this challenge.</summary>
422-
423-
- **Standardized Tool Interaction**: `CEDARScript` could serve as an intermediary language between LLMs and various tools, providing a consistent, SQL-like syntax for expressing tool usage intentions.
424-
- **Tool-Agnostic Commands**: By defining a set of generic commands that map to common tool functionalities, `CEDARScript` could simplify the process of integrating new tools and APIs.
425-
- **Complex Tool Pipelines**: The language's SQL-like structure could allow for easy chaining of multiple tool operations, enabling more complex workflows.
426-
- **Abstraction of API Complexity**: CEDARScript could hide the underlying complexity of diverse tool APIs behind a simpler, unified interface.
427-
428-
This approach could potentially enhance LLMs' ability to leverage external tools and capabilities, making it easier to deploy them in diverse real-world applications. Future work could explore the feasibility and implementation of this concept, aiming to create a more seamless integration between LLMs and the tools they use to interact with the world.
429-
430-
</details>
431-
432-
</details>
433-
434233
# Proposals
435234
See [current proposals](proposals/)
436235

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# LLM-Tool Interface
2+
3+
As Large Language Models continue to evolve and find applications in various real-world scenarios, there's a growing need for standardized ways for LLMs to interact with external tools and APIs. We envision `CEDARScript` as a potential solution to this challenge.
4+
5+
- **Standardized Tool Interaction**: `CEDARScript` could serve as an intermediary language between LLMs and various tools, providing a consistent, SQL-like syntax for expressing tool usage intentions.
6+
- **Tool-Agnostic Commands**: By defining a set of generic commands that map to common tool functionalities, `CEDARScript` could simplify the process of integrating new tools and APIs.
7+
- **Complex Tool Pipelines**: The language's SQL-like structure could allow for easy chaining of multiple tool operations, enabling more complex workflows.
8+
- **Abstraction of API Complexity**: CEDARScript could hide the underlying complexity of diverse tool APIs behind a simpler, unified interface.
9+
10+
This approach could potentially enhance LLMs' ability to leverage external tools and capabilities,
11+
making it easier to deploy them in diverse real-world applications. Future work could explore the feasibility and
12+
implementation of this concept, aiming to create a more seamless integration between LLMs and the tools they use
13+
to interact with the world.

proposals/README.md

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# Proposals
2+
3+
## Current Proposals
4+
1. [Onboarding Capabilities](onboarding-features)
5+
2. [Browser Extension](browser-extension)
6+
3. [Capture External Command Output](capture-external-command-output)
7+
4. [LLM Tool Interface](LLM%20Tool%20Interface)
8+
5. [Model Fine-Tuning](model-fine-tuning)
9+
6. [Tree-Sitter Query Language Integration](tree-sitter-query-language)
10+
11+
## Ideas to explore
12+
13+
- Automatic generation of project structure visualizations
14+
- Integration with version control history for context-aware onboarding
15+
- Customizable onboarding queries for specific project needs
16+
17+
18+
1. [Tree-Sitter query language](https://cycode.com/blog/tips-for-using-tree-sitter-queries/) integration, which could open up many possibilities;
19+
2. [Comby](https://github.com/comby-tools/comby) notation for an alternate syntax to express refactorings on code or data formats;
20+
3. Create a browser extension that allows web-chat interfaces of Large Language Models to tackle larger file changes;
21+
4. Select a model to fine-tune so that it natively understands `CEDARScript`;
22+
5. Provide language extensions that will improve how LLMs interact with other resource types;
23+
6. Explore using it as an **LLM-Tool Interface**;
24+
25+
## Comby Notation
26+
27+
To replace 'failUnlessEqual' with 'assertEqual':
28+
```sql
29+
UPDATE PROJECT
30+
REAFCTOR LANGUAGE "comby"
31+
WITH PATTERN '''
32+
comby 'failUnlessEqual(:[a],:[b])' 'assertEqual(:[a],:[b])' example.py
33+
'''
34+
```
35+

proposals/browser-extension/README.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# CEDARScript Browser Extension for LLM Web Interfaces
2+
3+
As Large Language Models (LLMs) become increasingly accessible through web-based chat interfaces,
4+
there's a growing need to enhance their ability to handle larger codebases and complex file changes.
5+
We propose developing a browser extension that leverages CEDARScript to bridge this gap.
6+
7+
- **Seamless Integration**: The extension would integrate with popular LLM web interfaces (e.g., ChatGPT, Claude, Gemini) by leveraging [llm-context.py](https://github.com/cyberchitta/llm-context.py), allowing users to work with larger files and codebases directly within these platforms.
8+
9+
- **CEDARScript Translation**: The changes proposed by the LLM would be concisely expressed as `CEDARScript` commands, enabling more efficient token usage.
10+
11+
- **Local File System Access**: The extension could securely access the user's local file system, allowing for direct manipulation of code files based on `CEDARScript` instructions generated by the LLM.
12+
13+
- **Diff Visualization**: Changes proposed by the LLM would be presented as standard diffs _or_ as `CEDARScript` code, allowing users to review and approve modifications before applying them to their codebase.
14+
15+
- **Context Preservation**: The extension would maintain context across chat sessions, enabling long-running refactoring tasks that span multiple interactions.
16+
17+
This browser extension would expand the capabilities of web-based LLM interfaces, allowing developers to leverage these powerful AI tools for more substantial code modification and analysis tasks. By using CEDARScript as an intermediary language, the extension would ensure efficient and accurate communication between the user, the LLM, and the local codebase.

proposals/model-fine-tuning/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Fine-tuning a Model for Native CEDARScript Understanding
2+
3+
This initiative could enhance the efficiency and effectiveness of AI-assisted code analysis and transformation.
4+
5+
## Why Fine-tune?
6+
7+
1. **Improved Accuracy**: A fine-tuned model will have a deeper understanding of CEDARScript syntax and semantics, leading to more accurate code analysis and generation.
8+
2. **Efficiency**: Native understanding of CEDARScript will reduce the need for extensive prompting.
9+
3. **Consistency**: A model trained specifically on CEDARScript will produce more consistent and idiomatic output, adhering closely to the language's conventions and best practices.
10+
4. **Extended Capabilities**: Fine-tuning could enable the model to perform more complex CEDARScript operations and understand nuanced aspects of the language that general-purpose models might miss.
11+
12+
## Approach
13+
14+
1. **Model Selection**: We will evaluate various state-of-the-art language models to determine the most suitable base model for fine-tuning. Factors such as model size, pre-training data, and architectural features will be considered.
15+
2. **Dataset Creation**: A comprehensive dataset of CEDARScript examples, covering a wide range of use cases and complexities, will be created. This dataset will include both CEDARScript commands and their corresponding natural language descriptions or intentions.
16+
3. **Fine-tuning Process**: The selected model will undergo fine-tuning using the created dataset. We'll experiment with different fine-tuning techniques, depending on the resources available and the desired outcome.
17+
4. **Evaluation**: The fine-tuned model will be rigorously tested on a held-out test set to assess its performance in understanding and generating CEDARScript. Metrics such as accuracy, fluency, and task completion will be used.
18+
5. **Iterative Improvement**: Based on the evaluation results, we'll iteratively refine the fine-tuning process, potentially adjusting the dataset, fine-tuning parameters, or even the base model selection.
19+
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Onboarding Capabilities
2+
3+
This capability is designed to help developers, AI assistants, and other tools quickly gain a comprehensive understanding
4+
of a project's structure, conventions, and context.
5+
6+
## Key Onboarding Features
7+
8+
1. **Convention Discovery**:
9+
CEDARScript can automatically extract coding conventions from designated files like `CONVENTIONS.md`:
10+
11+
```sql
12+
SELECT CONVENTIONS
13+
FROM ONBOARDING;
14+
```
15+
16+
2. **Context Retrieval**:
17+
Quickly access project context from files like `.context.md` or `.contextdocs.md`:
18+
19+
```sql
20+
SELECT CONTEXT
21+
FROM ONBOARDING;
22+
```
23+
24+
3. **Comprehensive Project Overview**:
25+
Gather all essential project information in one query:
26+
27+
```sql
28+
SELECT *
29+
FROM ONBOARDING;
30+
```
31+
µ

0 commit comments

Comments
 (0)