ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

Official codebase for the paper "ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning".

🏆 IJCAI 2025 Travel Planning Challenge (TPC@IJCAI)

We are proud to announce that ChinaTravel has been selected as the official benchmark for the Travel Planning Challenge (TPC) @ IJCAI 2025!

Official Competition Website: https://chinatravel-competition.github.io/IJCAI2025/

Participants are invited to develop novel agents that can tackle real-world travel planning scenarios under complex constraints. This competition will showcase state-of-the-art approaches in language agent research.

📝 ChangeLog

2025.06

Fix error collection in the evaluation code of commonsense.
Fix pure-neuro agent's pipeline
Fix load_datasets from huggingface
Update exception handling in syntax verification

2025.05

Update logs for the latest version.
Provide the evaluation code for the TPC.

2025.04

Added local data loader. Users can now load custom queries locally. When specifying non-default splits_name values (e.g., "abc") for "run_exp.py", the system will automatically load corresponding files from evaluation/default_splits/abc.txt, where the TXT file contains the target query filenames.
Detailed constraints classification. See detailed docs at Evaluation README
Introduced LLM-modulo baseline Implement the LLM-modulo pipeline with a ground-truth symbolic verifier. Based on methodology from: Paper: Robust Planning with Compound LLM Architectures: An LLM-Modulo Approach Codebase: https://github.com/Atharva-Gundawar/LLM-Modulo-prompts
Support local LLMs inference with Qwen3-8B/4B.

🚀 Quick Start

⚙️ Setup

Create a conda environment and install dependencies:

conda create -n chinatravel python=3.9  
conda activate chinatravel  
pip install -r requirements.txt

Download the database and unzip it to the "chinatravel/environment/" directory

Download Links: Google Drive, NJU Drive

Download the open-source LLMs (optional).

bash download_llms.sh

Download the tokenizers.

wget https://cdn.deepseek.com/api-docs/deepseek_v3_tokenizer.zip -P chinatravel/local_llm/
unzip chinatravel/local_llm/deepseek_v3_tokenizer.zip -d chinatravel/local_llm/

▶️ Running

We support the deepseek (offical API from deepseek), gpt-4o (chatgpt-4o-latest), glm4-plus, and local inferences with Qwen (Qwen3-8B), llama, mistral (Mistral-7B-Instruct-v0.3), etc.

export OPENAI_API_KEY=""

python run_exp.py --splits easy --agent LLMNeSy --llm deepseek --oracle_translation
python run_exp.py --splits medium --agent LLMNeSy --llm deepseek --oracle_translation
python run_exp.py --splits human --agent LLMNeSy --llm deepseek --oracle_translation

python run_exp.py --splits human --agent LLMNeSy --llm Qwen3-8B --oracle_translation


python run_exp.py --splits human --agent LLMNeSy --llm deepseek 
python run_exp.py --splits human --agent LLMNeSy --llm Qwen3-8B 


python run_exp.py --splits human --agent LLM-modulo --llm deepseek --refine_steps 10 --oracle_translation
python run_exp.py --splits human --agent LLM-modulo --llm Qwen3-8B --refine_steps 10 --oracle_translation

Note:

The --oracle_translation flag enables access to annotated ground truth including:

hard_logic_py: Executable verification DSL code
hard_logic_nl: The corrsponding constraint descriptions
Example annotation structure:

{
  "hard_logic_py": [
    "
    total_cost=0 
    for activity in allactivities(plan):
        total_cost+=activity_cost(activity)
            total_cost += innercity_transport_cost(activity_transports(activity))
    result=(total_cost<=1000)
    ", 
    "
    innercity_transport_set=set()
    for activity in allactivities(plan):
        if activity_transports(activity)!=[]:              
            innercity_transport_set.add(innercity_transport_type(activity_transports(activity)))
    result=(innercity_transport_set<={'taxi'})
    "
  ], 
  "hard_logic_nl": ["总预算为1800元", "市内交通选择taxi"], 
}

LLM-modulo method requires oracle_translation mode for its symbolic refinement process

📊 Evaluation

python eval_exp.py --splits human --method LLMNeSy_deepseek_oracletranslation
python eval_exp.py --splits human --method LLMNeSy_deepseek
python eval_exp.py --splits human --method LLM-modulo_deepseek_10steps_oracletranslation
python eval_exp.py --splits human --method LLM-modulo_Qwen3-8B_10steps_oracletranslation

In TPC@IJCAI2025, the evaluation code is provided in the eval_tpc.py file. You can run the evaluation code as follows:

python eval_tpc.py --splits tpc_phase1 --method YOUR_METHOD_NAME

📚 Docs

Environment Constraints

🛠️ Advanced Development

1. Develop Your Own Agent Algorithm

To develop your own agent algorithm, you need to inherit the BaseAgent class from chinatravel/agent/base.py and add the logic for your algorithm to the init_agent function in chinatravel/agent/load_model.py. We provide an empty agent example named TPCAgent.

Steps:

Inherit the BaseAgent class: Create a new Python file in the chinatravel/agent directory and define your own agent class, inheriting from BaseAgent.

from .base import BaseAgent

class YourAgent(BaseAgent):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        # Initialization logic

    def act(self, observation):
        # Implement the decision - making logic of the agent
        pass

Add code to the init_agent function: Open the chinatravel/agent/load_model.py file and add support for your new agent in the init_agent function.

def init_agent(kwargs):
    # ... existing code ...
    elif kwargs["method"] == "YourMethodName":
        agent = YourAgent(
            **kwargs
        )
    # ... existing code ...
    return agent

2. Develop Your Own Local LLM

To develop your own local large - language model (LLM), you need to inherit the AbstractLLM class from chinatravel/agent/llms.py and add the corresponding local LLM inference code in llms.py. We provide an empty LLM example named TPCLLM. Steps:

Inherit the AbstractLLM class: Define your own LLM class in the chinatravel/agent/llms.py file, inheriting from AbstractLLM.

class YourLLM(AbstractLLM):
    def __init__(self):
        super().__init__()
        # Initialization logic
        self.name = "YourLLMName"

    def _get_response(self, messages, one_line, json_mode):
        # Implement the response logic of the LLM
        response = "Your LLM response"
        if json_mode:
            # Handle JSON mode
            pass
        elif one_line:
            # Handle one - line mode
            response = response.split("\n")[0]
        return response

Add code to the init_agent function: Open the chinatravel/agent/load_model.py file and add support for your new llm in the init_llm function.

def init_llm(kwargs):
    # ... existing code ...
    elif llm_name == "glm4-plus":
        llm = YourLLM()
    # ... existing code ...
    return llm

3. Run Your Code Using Experiment Scripts

After completing the above development, you can use the experiment scripts to run your code.

Example of running:

python run_tpc.py --splits easy --agent TPCAgent --llm TPCLLM
python run_exp.py --splits easy --agent YourMethodName --llm YourLLMName

The results will be saved in the results/YourMethodName_YourLLMName_xxx directory, e.g., results/TPCAgent_TPCLLM.

✉️ Contact

If you have any problems, please contact Jie-Jing Shao, Bo-Wen Zhang, Xiao-Wen Yang.

📌 Citation

If our paper or related resources prove valuable to your research, we kindly ask for citation.

@misc{shao2024chinatravelrealworldbenchmarklanguage,
      title={ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning}, 
      author={Jie-Jing Shao and Xiao-Wen Yang and Bo-Wen Zhang and Baizhi Chen and Wen-Da Wei and Guohao Cai and Zhenhua Dong and Lan-Zhe Guo and Yu-feng Li},
      year={2024},
      eprint={2412.13682},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2412.13682}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

🏆 IJCAI 2025 Travel Planning Challenge (TPC@IJCAI)

📝 ChangeLog

2025.06

2025.05

2025.04

🚀 Quick Start

⚙️ Setup

▶️ Running

📊 Evaluation

📚 Docs

🛠️ Advanced Development

1. Develop Your Own Agent Algorithm

2. Develop Your Own Local LLM

3. Run Your Code Using Experiment Scripts

✉️ Contact

📌 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
TPC@AIC2025		TPC@AIC2025
chinatravel		chinatravel
images		images
.gitignore		.gitignore
README.md		README.md
download_llm.sh		download_llm.sh
eval_exp.py		eval_exp.py
eval_tpc.py		eval_tpc.py
requirements.txt		requirements.txt
run_exp.py		run_exp.py
run_tpc.py		run_tpc.py

LAMDASZ-ML/ChinaTravel

Folders and files

Latest commit

History

Repository files navigation

ChinaTravel: A Real-World Benchmark for Language Agents in Chinese Travel Planning

🏆 IJCAI 2025 Travel Planning Challenge (TPC@IJCAI)

📝 ChangeLog

2025.06

2025.05

2025.04

🚀 Quick Start

⚙️ Setup

▶️ Running

📊 Evaluation

📚 Docs

🛠️ Advanced Development

1. Develop Your Own Agent Algorithm

2. Develop Your Own Local LLM

3. Run Your Code Using Experiment Scripts

✉️ Contact

📌 Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages