Merge Conflict Resolution: Classification or Generation?

MergeGen is a generation-based merge conflict resolution approach, which first proposes a structural representation for the merge conflicts and then proposes an encoder-decoder generative model to produce the resolutions through generation. In this repository, we provide our replication package.

Environment

We provide a file named packages.txt that contains a list of all the essential packages required for replication. To create a conda environment and install the necessary packages, you can execute the following commands.

$ conda create -n ase-merge python=3.8

$ conda activate ase-merge

$ python -m pip install -r packages.txt

Dataset

Before executing the model, we need to pre-process the dataset first. Since pre-processing is time-consuming, we use the subprocess module of python to prepare the dataset parallelly. The maximum number of processes executing simultaneously is 100, and each subprocess will deal with 1,000 data of the dataset in our experiment. The raw data we use to pre-process is established by the previous work [1] and can be found in their provided zenodo repository. You can download the raw data and put it in the RAW_DATA folder. Then, you can execute the following command to obtain the processed dataset.

$ python runtotal_dataset.py dataset_parallel.py

The file runtotal_dataset.py starts the parent process to prepare the dataset parallelly and assigns the sub-tasks to different subprocesses. The file dataset_parallel.py is the program executed in the subprocesses and used to prepare the dataset. The processed split dataset will be stored in the folder PROCESSED.

Model

Our model is built upon CodeT5, and it is defined in the run_mergegen.py file. To fine-tune the model, execute the following command:

$ python run_mergegen.py train

The resulting model will be saved as best_model.pt, and the training process will be stored in the OUTPUT folder.

To test the model on the testing set, use the following command

$ python run_mergegen.py test

The resolutions and testing process will be stored in the OUTPUT folder.

When you train or test the model for the first time, the dataset.py file will be executed, and the split dataset will be combined to obtain the final prepared dataset.

Output

The OUTPUT directory holds the conflict resolutions that were produced by MergeGen.

Reference

[1] Svyatkovskiy A, Fakhoury S, Ghorbani N, et al. Program merge conflict resolution via neural transformers[C]//Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2022: 822-833.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Merge Conflict Resolution: Classification or Generation?

Environment

Dataset

Model

Output

Reference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
OUTPUT		OUTPUT
Merge Conflict Resolution- Classification or Generation.pdf		Merge Conflict Resolution- Classification or Generation.pdf
README.md		README.md
dataset.py		dataset.py
dataset_parallel.py		dataset_parallel.py
packages.txt		packages.txt
run_mergegen.py		run_mergegen.py
runtotal_dataset.py		runtotal_dataset.py

DJjjjhao/ase-merge

Folders and files

Latest commit

History

Repository files navigation

Merge Conflict Resolution: Classification or Generation?

Environment

Dataset

Model

Output

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages