This repository is for the artifacts of the paper “Fact-Aligned and Template-Constrained Static Analyzer Rule Enhancement with LLMs”
RuleRefiner introduces a multi-stage LLM framework that eliminates false alarms in static analyzers by refining detection rules through dynamic profiling, differential fault localization, and constrained LLM modifications, achieving 80.28% success on 218 Semgrep issues (1.34x-2.45x over baselines) while producing expert-level rules.
- Python 3.10+
- pip 22.0+
# Install dependencies
pip install -r requirements.txtWe use LLM API service from Bailian platform
export DASHSCOPE_API_KEY=XXXXXXXXXXXXXXXXFor more detail or to support other LLMs out of our paper, please refers to deepseek.py.
cat deepseek.pyRun the pipeline test, it should work without exception.
python3 semgrep_pipeline_test.pyFor single defective rule refinment, use rr.py.
python rr.py --input_file example.json --k 5The input file format is as follow:
{
  "id": "avoid-pyyaml-load",
  "rule": "rules:\n- id: avoid-pyyaml-load\n  languages:\n  - python\n  message: |\n    Avoid using `load()`. `PyYAML.load` can create arbitrary Python\n    objects. A malicious actor could exploit this to run arbitrary\n    code. Use `safe_load()` instead.\n  fix-regex:\n    regex: load\n    replacement: safe_load\n    count: 1\n  severity: ERROR\n  patterns:\n  - pattern-inside: |\n      import yaml\n      ...\n  - pattern-not-inside: |\n      $YAML = ruamel.yaml.YAML(...)\n      ...\n      $YAML.load(...)\n  - pattern: yaml.load(...)\n",
  "rule_path": "avoid-pyyaml-load.yaml",
  "test_path": "avoid-pyyaml-load.py",
  "splited_testsuite_b": [
    "import yaml\n\n#ruleid:avoid-pyyaml-load\nyaml.load(\"!!python/object/new:os.system [echo EXPLOIT!]\")",
    "import yaml\n\ndef thing(**kwargs):\n    #ruleid:avoid-pyyaml-load\n    yaml.load(\"!!python/object/new:os.system [echo EXPLOIT!]\", **kwargs)",
    "import yaml\n\ndef check_ruamel_yaml():\n    from ruamel.yaml import YAML\n    yaml = YAML(typ=\"rt\")\n    # ok:avoid-pyyaml-load\n    yaml.load(\"thing.yaml\")",
    "import yaml\n\ndef this_is_ok(stream):\n    #ok:avoid-pyyaml-load\n    return yaml.load(stream, Loader=yaml.CSafeLoader)"
  ],
  "actual": [
    true,
    true,
    false,
    true
  ],
  "expected": [
    true,
    true,
    false,
    false
  ],
  "index": 4
}For multiple rules refinement, for example, run a experiment, using semgrep_pipeline.py
python semgrep_pipeline.py [OPTIONS]This run the Semgrep rule refinement pipeline with configurable modes and models. Executes the pipeline k times (pass@k evaluation), aggregates results, and calculates success rate.
| Argument | Type | Default | Choices | Description | 
|---|---|---|---|---|
| --mode | str | full | naive,cot,fewshot,localization,template,full | Pipeline execution mode: - naive: Basic LLM refinement- cot: Chain-of-Thought prompting- fewshot: Few-shot learning- localization: Fault localization only- template: Template-guided refinement- full: Full RuleRefiner pipeline | 
| --prompt_file | str | results/semgrep_prompts.jsonl | - | Output file for generated LLM prompts | 
| --result_file | str | results/semgrep_result.jsonl | - | Output file for raw LLM responses | 
| --verify_file | str | results/semgrep_verify.jsonl | - | Output file for verification results | 
| --temperature | float | 0.0 | 0.0-1.0 | LLM sampling temperature (0=deterministic) | 
| --model | str | deepseek-v3 | deepseek-v3,qwen-plus | LLM backend to use | 
| --k | int | 1 | 1-10 | Number of iterations for pass@k evaluation | 
Basic execution with default parameters:
python semgrep_pipeline.pyFull pipeline with Qwen model:
python semgrep_pipeline.py \
  --mode full \
  --model qwen-plus \
  --temperature 0.3 \
  --k 5Run localization-only mode:
python semgrep_pipeline.py \
  --mode localization \
  --prompt_file results/localization_prompts.jsonlEvaluate few-shot learning with pass@3:
python semgrep_pipeline.py \
  --mode fewshot \
  --k 3 \
  --result_file results/fewshot_results.jsonl| File Pattern | Contents | 
|---|---|
| results/semgrep_prompts.jsonl.{i} | Generated prompts for each rule | 
| results/semgrep_result.jsonl.{i} | Raw LLM responses | 
| results/semgrep_verify.jsonl.{i} | Verification results with success status | 
├── dataset/                   # Semgrep datasets
├── examples/                  # Example rule files and test cases
├── experimental/              # Experimental predicate graph translation for CodeQL and PMD
├── results/                   # Output files and evaluation results
├── scripts/                   # useful scripts
├── venv/                      # Virtual environment (excluded from version control)
│
├── semgrep2nx.py               # Semgrep to Predicate Graph conversion & dynamic profiling
├── graph.py                    # Graph analysis 
├── semgrep_locate.py           # Fault localization implementation  
├── semgrep_template.py         # template generator
├── semgrep_prompt.py           # prompt generation
├── semgrep_verify.py           # Rule validation                     
├── semgrep_pipeline.py         # pipeline
├── rr.py                       # entry of RuleRefiner
├── semgrep_syntax.py           # Rule syntax fixer
└── testcase.py                 # Test case
│
├── models/                    # LLM integration modules
│   ├── deepseek.py            # DeepSeek model interface
│   ├── doubao.py              # Doubao model interface
│   └── qwen.py                # Qwen model interface
│
├── semgrep_output_parser.py    # Output parsing utilities
├── utils.py                    # Common utilities
└── para.py                     # Parallel processing config
...This is originally created for our paper "Fact-Aligned and Template-Constrained Static Analyzer Rule Enhancement with LLMs" (ASE 2025, to appear).