Haskell Code Completion

Goal: In this project, we aim to improve LLM-based code completion for functional programming languages, in specific: Haskell.

Models: We consider two pre-trained decoder only models: UniXcoder and CodeGPT.

Data: github-code-haskell-function (HuggingFace)

Benchmarking: Customized version of HumanEval for Haskell, see /humaneval-hs.

Installation

Clone the repository:

git clone https://github.com/<repository author/repository name>.git

Install the dependencies:

Either using pip and the provided requirements.txt files within the repository:
```
pip install -r requirements.txt
```
Or using Poetry:
```
curl -sSL https://install.python-poetry.org | python3 -

poetry env use python3.8
poetry install
```
Note: for details, please refer to pyproject.toml. In case of missing dependencies, use poetry add <name>.

Usage

Any code surrounding the considered models (i.e. UniXcoder and CodeGPT) can be found in /models. Within this folder, there are subfolders for:

/finetuning: contains the code and scripts to finetune both the models (see blue-finetune.sh for our final finetuning script)
/inference: contains the code to run inference on both models
/evaluation: contains the code for the evaluation of both models Furthermore, this folder contains create_model_inputs.py for the splitting of the data for the models into train and test sets.

The HumanEval code can be found in /humaneval-hs. Each problem has been annotated manually with splits for the manual evaluation. The annotation of the results can be found in /models/evaluation/annotated.

In case you want to do manual evaluation on your own inference results, you can generate an Excel for annotation using excelify.py in /models/evaluation and plot the results of your annotations using plotify.py in the same folder. In case you are repeating the experiment and are curious to overlapping results, you can use overlapify.py in /overlap-check where both old and new files ought to be placed in the /old and /new folders respectively, with the same file names.

Each file has been documented with comments to explain the code and if applicable, the arguments that can be passed to the script.

Results

For the main results, please refer to our submitted paper. Additional details can be found in /appendix/appendix.pdf. Note: GitHub limits the visibility of all PDF pages for large documents, click the ** More Pages ** button at the bottom of the scroll view repeatedly until the end is reached, i.e. page 23.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
appendix		appendix
humaneval-hs		humaneval-hs
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Haskell Code Completion

Haskell Code Completion

Installation

Usage

Results

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

AISE-TUDelft/HaskellCCEval

Folders and files

Latest commit

History

Repository files navigation

Haskell Code Completion

Haskell Code Completion

Installation

Usage

Results

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages