Automated Repair of Ambiguous Natural Language Requirements

SpecFix is a tool for automatically repairing ambiguous natural language requirements to improve code generation by large language models (LLMs).

Key Features

Analyzing the distribution of programs induced by a given requirement.

Measuring and reducing semantic entropy, which captures how many distinct interpretations (clusters of semantically equivalent programs) the requirement allows.

Ensuring example consistency, a novel metric that quantifies how well sampled programs satisfy the clarifying examples attached to the requirement.

Performing contrastive specification inference, which takes the repaired (or clustered) programs and iteratively refines the original text so that the most desirable interpretations are prioritized.

Structure

The repository is structured as follows:

specfix/
    ├── main.py                 # Main script to run the tool
    ├── cluster.py              # Clustering functions for program clustering
    ├── evaluator.py            # Evaluation functions for repairing and measuring 
    ├── model.py                # Model functions for interacting with LLMs
    ├── prompt.py               # Prompts for each task
    ├── solution_transformer.py # Functions for transforming generated programs
    ├── testers.py              # Test functions for detecting ambiguity
    ├── utils.py                # Utility functions for various tasks
    ├── datasets/               # Dataset (HumanEval+ and MBPP+)
    ├── Results/                # Directory to save results
    ├── experiment_results/     # Directory for our experiment results
    ├── requirements.txt        # Python package dependencies
    └── README.md               # Documentation for the tool

Installation

To install SpecFix, create a virtual environment and install the required packages:

python -m venv specfix-venv
source specfix-venv/bin/activate  # On Windows use `specfix-venv\Scripts\activate`
pip install -r requirements.txt

Set up the LLM API keys in the environment variables:

export LLM_API_KEY="your_llm_api_key"

Usage

Run the tool:

cd specfix
python main.py -d <dataset_name> -p <path_to_dataset> -c <clustering_sample_size> -e <evaluation_sample_size> -k <pass@k_value> -m <model_name> -t <temperature>

The results will be saved in the Results directory. The directory structure will be as follows:

Results/model_name/dataset_name/
    ├── humaneval-{timestamp}.jsonl
    └── mbpp-{timestamp}.jsonl

The jsonl files contain the following fields:

original_requirement: The original requirement text.
repaired_requirement: The repaired requirement text.
original_clusters: The clusters of programs generated from the original requirement.
repaired_clusters: The clusters of programs generated from the repaired requirement.
results:
- original_passk: The pass@k value for the original requirement.
- original_avg_pass_rate: The average pass rate for the original requirement.
- original_nzpassk: The number of non-zero pass@k values for the original requirement.
- original_majority_passk: The majority vote pass@k value for the original requirement.
- original_entropy: The semantic entropy of the original requirement.
- repaired_passk: The pass@k value for the repaired requirement.
- repaired_avg_pass_rate: The average pass rate for the repaired requirement.
- repaired_nzpassk: The number of non-zero pass@k values for the repaired requirement.
- repaired_majority_passk: The majority vote pass@k value for the repaired requirement.
- repaired_entropy: The semantic entropy of the repaired requirement.

Example

To run the tool on the HumanEval+ dataset with 20 samples for clustering and Pass@1 with 10 samples for evaluation, using the gpt-4o model with a temperature of 0.7, you can use the following command:

python main.py -d humaneval -p path/to/humaneval+.jsonl -c 20 -e 10 -k 1 -m gpt-4o -t 0.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Automated Repair of Ambiguous Natural Language Requirements

Key Features

Structure

Installation

Usage

Example

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
datasets		datasets
experiment		experiment
experiment_results		experiment_results
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
cluster.py		cluster.py
evaluator.py		evaluator.py
main.py		main.py
model.py		model.py
prompting.py		prompting.py
requirements.txt		requirements.txt
solution_transformer.py		solution_transformer.py
tester.py		tester.py
utils.py		utils.py

msv-lab/SpecFix

Folders and files

Latest commit

History

Repository files navigation

Automated Repair of Ambiguous Natural Language Requirements

Key Features

Structure

Installation

Usage

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages