[RFC]: Automated Code Reviews and Fixes via LLM-powered stdlib-bot

### Full name

Abhishek Jain

### University status

No

### University name

_No response_

### University program

_No response_

### Expected graduation

_No response_

### Short biography

TODO

### Timezone

Indian Standard Time (IST), UTC + 5:30 hours 

### Contact details

email: jainabhishekjain007@gmail.com, github: @abhishekblue, gitter: abhishekblue:gitter.im

### Platform

Windows

### Editor

I use **VS Code**, as this is the most popular choice of code editor. In the past I have used Notepad++ and PyCharm little bit, but I liked the VS Code the most comfortable. Its lightweight design and built-in features make it efficient for development, especially when utilizing keyboard shortcuts for productivity.

### Programming experience

I was always interested in programming but never did it consistently. Due to different career trajectory, I was never able to pursue programming as a career. But from the last year I started learning it properly. I started with Python and Django, practiced a bit with very basic web apps. After that I switched to JavaScript for full-stack web development. I have learned MERN stack, along with Postgres for databases.
I built e-commerce clone, social media clone and todo app for practicing. 
I also built a very basic digital wallet app that follows database transactions to simulate transfer of money. 
Then I started contributing to stdlib and have been exploring this repo since February. 
Apart from this I am interested in AI/ML, I have tried my hands on ComfyUI for image generation. 


### JavaScript experience

I started learning JavaScript last year, after I purchased a course from a YouTuber. I got introduced to open-source from that as well. I learnt MERN stack along with Postgres. I have used hono and vite for developing web apps and have basic knowledge of Typescript as well and created todo application, a digital wallet clone with JWT authentication and a medium like application (mostly focusing on CRUD functionality). 
Apart from this the experience that I have with JS are the contribution that I have and I will make in stdlib. 
The thing that I liked most about the JS is that it's being used everywhere and even when I was learning Python and Django for web-dev for frontend, JS is required. First, I thought shifting from Python to JS would be difficult as Python is such an easy language for beginners but that is not the case. In my opinion, there are more tutorials available for JS than Python as JS. 
Also, I don’t have any particular dislikes about JavaScript. However, managing async code can sometimes be tricky, and debugging issues related to callbacks or promises can take time. But overall, I enjoy working with JavaScript because of its flexibility and the fact that it allows me to build both frontend and backend applications.

### Node.js experience

I have experience in building full stack web apps with help of node.js. I have done backend development using express.js, worked with API requests to fetch and send data for building CRUD based apps. I have also worked with NoSQL databases like MongoDB for storing and managing data. Additionally, since I work with JavaScript, I frequently use npm for installing packages and managing dependencies.

### C/Fortran experience

I don’t have much experience with C or Fortran. I learned basic C in school long ago, but I haven’t used it much since then. I’m open to learning if needed.

### Interest in stdlib

I was searching through last year's GSOC orgs to start contributing to open source, I filtered out a few but decided to go all in with stdlib (it’s either stdlib or nothing, since the beginning of my GSOC journey). The first reason was many good-first issues as well as how maintained, properly labelled all issues are with no spam issues (that some of the repos have, that I think is confusing for a first time contributor). 
Also the 'stdlib' name itself - I believe everyone has heard of it when they start with their programming journey in school with C. 
Then I made my first ever open-source contribution here and decided to stick with stdlib. Also, types of issues that we need to work on can really strengthen the basics and core principles of JS.
Also, as stdlib is a scientific and mathematical computing library, and I was decent and very interested in maths during my school days, it aligns with my interests and learning path. I want to explore AI/ML which requires these kinds of libraries, the interest I had was going to be inevitable. 


### Version control

Yes

### Contributions to stdlib

**Open PRs**
- [feat: add stats/incr/nanmminmax](https://github.com/stdlib-js/stdlib/pull/6422)
- [feat: add stats/incr/nanhmean](https://github.com/stdlib-js/stdlib/pull/6027)
- [feat: add stats/incr/nangmean](https://github.com/stdlib-js/stdlib/pull/6020)
- [feat: add stats/incr/nanewmean](https://github.com/stdlib-js/stdlib/pull/6012)
- [feat: add nanminmax package to handle NaN values](https://github.com/stdlib-js/stdlib/pull/5995)
- [feat(stats/incr): add nanmin package to handle NaN values](https://github.com/stdlib-js/stdlib/pull/5917)
- [feat: add assert/has-same-constructor](https://github.com/stdlib-js/stdlib/pull/5446)

**Merged PRs**
- [fix: revert uniform implementation and remove unnecessary loop ](https://github.com/stdlib-js/stdlib/pull/5314)
- [style: remove trailing whitespace](https://github.com/stdlib-js/stdlib/pull/5372)


### stdlib showcase

TODO

### Goals

The goal of this project is to enhance stdlib-bot by introducing LLM-powered automation for reviewing PRs and generating code fixes. The project is divided into two phases:
Phase 1 (First 4 weeks) – Improve PR review by implementing a bot that suggests comments using RAG (Retrieval-Augmented Generation).


Phase 2 (Remaining weeks) – Extend the bot to generate code fixes, apply them iteratively, and improve accuracy with fine-tuning.


By the end of the project, the bot will automatically analyze CI failures, retrieve similar past fixes, suggest code changes, and refine itself through continuous learning.


### Why this project?

At first I was confused between the test runner ( [#7](https://github.com/stdlib-js/google-summer-of-code/issues/7) ) and this project idea. But then during a weekly office meeting, I came to know that the maintainers have full time jobs and they maintain stdlib by going the extra mile in their free time. 
Also, while contributing to the project, mostly errors that I got are regarding linting and styling issues. 
So keeping in mind both of these problems, I decided to go with this project as the bot, when implemented completely, should be able to resolve basic linting and styling errors, and comment on PRs (from other contributors) mentioning similar errors which will help in saving time in maintaining the project and can help all the maintainers in making things easy. Also, as tech industry is embracing LLMs and my interest is also in AI, this project caters to that as well and because of the AI trend in the industry it will help me in my career as well. 


### Qualifications

I have a good grasp over javascript and backend development and contributing to stdlib gave me working knowledge of stdlib’s styling and linting rules. Also for this project, I have tried to create a demo bot that comments on issues, which helped me get familiar with Github APIs and workflows. 
Also for fine tuning and RAG - I have watched some crash course tutorials on youtube that will help me during the implementation. 


### Prior art

This has not been implemented in this library but there are other tools like Github CoPilot or Cursor AI that is doing code correction but any of them does not have Github workflow integration.

### Commitment

I plan to dedicate 30hrs/week to this project during GSOC. This will allow me to make sure I have enough time to learn and try the technologies that I need for the project (in case I dont have any idea how something works). Also I dont have any other commitments, so I will be flexible with my schedule and time commitment. Before the official coding period, I will spend time understanding stdlib-bot's existing workflows and contributing to relevant issues.

### Schedule

Assuming a 12 week schedule,

This project is divided into two phases.  

- **Phase 1:** Implement a bot that only makes comments based on past data using fine-tuning and RAG.  
- **Phase 2:** Extend the bot's functionality to create PRs and generate code using the same approach as Phase 1.  

This structure keeps the project easier to work on and the tasks well-segregated. Also, achieving accurate code generation according to stdlib standards is much more difficult compared to comment generation.  

---

## **Phase 1: PR Review Comment Bot (Weeks 1-4)**  

### **Goal for Phase 1**  
Implement an LLM-powered bot that retrieves past PR review comments and suggests relevant feedback on new PRs. The commenting data will mostly be about linting checks, formatting suggestions (as per `stdlib` style guide), and minor fixes.  

### **Community Bonding Period**  
- Finalize `stdlib-bot` enhancement design.  
- Analyze and extend stdlib-bot’s GitHub Action workflows.  
- Research RAG, vector search & LLMs.  
- Set up a vector database for experimentation.  

### **Week 1**  
- Collect comment data from PRs (10-20).  
- Create vector embeddings for the data.  
- Implement RAG with vector search.  
- Generate review comments using RAG and prompt engineering.  

**Goal:** A working prototype that gives code fix suggestions on PRs using RAG and LLM.  

### **Week 2**  
- Extract PR comments and CI failure data for the past 2-3 years.  
- Clean data (by removing comments like "LGTM", "Thank you", etc.) and format data to JSON.  
- Link CI data with comments.  

**Goal:** Have a cleaned and linked dataset ready.  

### **Week 3**  
- Update vector embeddings for this dataset.  
- Fine-tune LLM on this data.  
- Test the bot in a sandboxed environment.
- Compare RAG vs. fine-tuned responses for different scenarios. 

**Goal:** Have a fine-tuned commenting bot ready with test results.  

### **Week 4**  
- Test fine-tuned bot on real PRs.  
- Improve prompt engineering to refine results.  
- Implement tests to check the accuracy of comments.  
- If accuracy is low, re-evaluate and refine approaches from Weeks 2 and 3 in Week 5; otherwise, move to Phase 2.  

**Goal:** Ensure the bot provides accurate and useful comments consistently.  

### **Week 5 (Fallback/Refinement Week)**  
- Analyze incorrect cases.  
- Improve or recheck data cleaning and structuring.  
- Check RAG settings.  
- Adjust model parameters or try alternative fine-tuning strategies.  

**Goal:** Finalize and implement the bot.  

---

## **Phase 2: PR Creating and Code Generating Bot (Weeks 6-12)**  

### **Goal for Phase 2**  
Extend the bot's functionality so it can start suggesting and making code corrections and create PRs for basic linting and style related issues.

### **Week 6 (Midterm)**  
- **Midterm evaluation submission**, summarizing:  
  - Progress in Phase 1 (Commenting Bot).  
  - Results of fine-tuning and RAG comparison.  
  - Challenges faced and improvements made.  
  - Plan for Phase 2 (Code Fixing Bot).  
- Expand and restructure CI failure data (from Week 2).  
  - Extract failure types and link them with fixes.  
  - Identify which commits in a PR actually fixed the CI failures by comparing changes before and after the fix.  
  - Store structured data and create embeddings for this data.  
- Testing & Validating Fixes  
  - Retrieve past fixes using RAG.  
  - Identify fixing commits within PRs using CI failure data.  
  - Apply and validate the fix by running CI tests.  
  - Collect data on successful fixes (to be used for fine-tuning).  

**Goal:** Submit the midterm evaluation and finalize data for fine-tuning using RAG.  

### **Week 7**  
- Fine-tune LLM for code generation using data from last week.  
- Deploy & validate in a sandbox repository.  
  - Test code fixes in the sandbox repo.  
  - Generate code suggestions and apply them automatically.  
 - Improve Fixing Strategies
   - Analyze cases where the bot fails to generate a valid fix.  
   - Adjust prompt engineering or retrain the model based on errors.  

**Goal:** Deploy the code-fixing bot in a sandbox environment and improve its accuracy.  

### **Week 8**  
- Compare bot-generated fixes with contributors' fixes in past PRs.  
  - Implement methods to compare both fixes (regex or Levenshtein distance).  
  - Check if CI checks are passing on bot-generated code.  
  - Collect data on good fixes.  
- Implement `llms.txt` to enhance compatibility with stdlib contributor guidelines.  

**Goal:** Compare bot fixes to past contributor fixes, improve accuracy, and add `llms.txt`.  

### **Week 9**  
- Deploy the bot on real PRs and monitor its performance.  
  - Gather feedback and improve upon it.  
  - Adjust retrieval, prompts, or `llms.txt` accordingly.  
 - Implement Iterative Fix Mechanism
   - If the first generated fix fails, the bot generates a new commit based on CI failure.  
   - Set max attempts.
   - Collect failure logs, label data, and fine-tune on this.  

**Goal:** Deploy the bot on real PRs, test iterative fixes, and start improving based on feedback.  

### **Week 10**  
- Conduct more real-world testing across multiple PRs.  
- Refine the iterative fix mechanism.  
- Ensure smooth automation in PR workflows.  
- Start documentation.  

**Goal:** Expand real-world testing, refine automation, and begin documentation.  

### **Week 11**  
- Optimize and improve performance using methods like:  
  - Smaller embedding models for retrieval.  
  - Indexing & caching for efficiency.  
  - Using a smaller LLM model where applicable.  
- Complete documentation.  

**Goal:** Improve bot speed and efficiency while completing documentation.  

### **Week 12**  
- Finalize testing, documentation, and submit final work.  
- Complete any miscellaneous tasks.  


Notes:

- The community bonding period is a 3 week period built into GSoC to help you get to know the project community and participate in project discussion. This is an opportunity for you to setup your local development environment, learn how the project's source control works, refine your project plan, read any necessary documentation, and otherwise prepare to execute on your project project proposal.
- Usually, even week 1 deliverables include some code.
- By week 6, you need enough done at this point for your mentor to evaluate your progress and pass you. Usually, you want to be a bit more than halfway done.
- By week 11, you may want to "code freeze" and focus on completing any tests and/or documentation.
- During the final week, you'll be submitting your project.


### Related issues

_No response_

### Checklist

- [x] I have read and understood the [Code of Conduct](https://github.com/stdlib-js/stdlib/blob/develop/CODE_OF_CONDUCT.md).
- [x] I have read and understood the application materials found in this repository.
- [x] I understand that plagiarism will not be tolerated, and I have authored this application in my own words.
- [x] I have read and understood the [patch requirement](https://github.com/stdlib-js/google-summer-of-code/blob/main/README.md#patch-requirement) which is necessary for my application to be considered for acceptance.
- [x] I have read and understood the [stdlib showcase requirement](https://github.com/stdlib-js/google-summer-of-code/blob/main/README.md#showcase-requirement) which is necessary for my application to be considered for acceptance.
- [x] The issue name begins with `[RFC]:` and succinctly describes your proposal.
- [x] I understand that, in order to apply to be a GSoC contributor, I must submit my final application to <https://summerofcode.withgoogle.com/> **before** the submission deadline.

[RFC]: Automated Code Reviews and Fixes via LLM-powered stdlib-bot #124

Description

Full name

University status

University name

University program

Expected graduation

Short biography

Timezone

Contact details

Platform

Editor

Programming experience

JavaScript experience

Node.js experience

C/Fortran experience

Interest in stdlib

Version control

Contributions to stdlib

stdlib showcase

Goals

Why this project?

Qualifications

Prior art

Commitment

Schedule

Phase 1: PR Review Comment Bot (Weeks 1-4)

Goal for Phase 1

Community Bonding Period

Week 1

Week 2

Week 3

Week 4

Week 5 (Fallback/Refinement Week)

Phase 2: PR Creating and Code Generating Bot (Weeks 6-12)

Goal for Phase 2

Week 6 (Midterm)

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

Related issues

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions