Trajectory modeling, which includes research on trajectory data pattern mining and future prediction, has widespread applications in areas such as life services, urban transportation, and public administration. Numerous methods have been proposed to address specific problems within trajectory modeling. However, the heterogeneity of data and the diversity of trajectory tasks make effective and reliable trajectory modeling an important yet highly challenging endeavor, even for domain experts. In this paper, we propose TrajAgent, a agent framework powered by large language models (LLMs), designed to facilitate robust and efficient trajectory modeling through automation modeling. This framework leverages and optimizes diverse specialized models to address various trajectory modeling tasks across different datasets effectively. In TrajAgent, we first develop UniEnv, an execution environment with a unified data and model interface, to support the execution and training of various models. Building on UniEnv, we introduce an agentic workflow designed for automatic trajectory modeling across various trajectory tasks and data. Furthermore, we introduce collaborative learning schema between LLM-based agents and small speciallized models, to enhance the performance of the whole framework effectively.
- End-to-End Automation: Data Preprocessing β Model Selection β Data Augmentation β Parameter Optimization β Model Training β Result Analysis
- LLM-Driven Decision Making: Intelligent decision mechanism based on Large Language Models for automatic optimal strategy selection
- Multi-Task Support: Supports various trajectory modeling tasks including trajectory prediction, user linkage, anomaly detection, and more
- Multi-Dataset Compatibility: Compatible with mainstream trajectory datasets such as Foursquare, Gowalla, Brightkite, etc.
- Modular Design: Independent and testable modules supporting both standalone and combined usage
| Specific Task | Supported Models | Dataset Type |
|---|---|---|
| Next Location Prediction | DeepMove, RNN, FPMC, GETNext, LLM-ZS | checkin |
| Travel Time Estimation | DeepTTE, DutyTTE, MulTTTE | gps |
| Trajectory Recovery | TrajBERT | gps |
| Map Matching | DeepMM, GraphMM | map |
| Trajectory User Linkage | DPLink, MainTUL, S2TUL | checkin |
| Trajectory Generation | ActSTD, DSTPP | checkin |
| Trajectory Representation | CACSR | checkin |
| Trajectory Anomaly Detection | GMVSAE | gps |
| Intention Prediction | LIMP | checkin |
- Python 3.7+
- CUDA 10.1+ (recommended)
- Multiple conda environments (see environment/ directory)
The project includes multiple pre-configured conda environments. Install as needed:
# View available environments
ls environment/
# Install environments (examples)
conda env create -f environment/libcity_py39_torch231_cu121.txt
conda env create -f environment/STAN_py37_cu101_torch171.txt
# ... other environmentsAfter installing environments, modify the environment paths in shell scripts:
# Modify .sh files in base_model/ directory
# Replace BASE_ENV_PATH="your_base_env_path" with actual path
# Example:
BASE_ENV_PATH="/path/to/your/conda/envs"export PYTHONPATH="/path/to/TrajAgent"
export OPENAI_API_KEY="your_openai_key"
export DEEPINFRA_API_KEY="your_deepinfra_key" # optional
export SiliconFlow_API_KEY="your_siliconflow_key" # optionalTrajAgent/
βββ plan_agent_run.py # End-to-end pipeline entry
βββ da_agent_run.py # Data augmentation entry
βββ fm_agent_run.py # Data formatting/conversion entry
βββ op_agent_run.py # Result analysis/summary entry
βββ param_agent_run.py # Parameter optimization entry
βββ preprocess/ # Data preprocessing
β βββ traj_preprocess.py
β βββ traj_preprocess_gps.py
βββ data_augmentation/ # Data augmentation + utils
β βββ da_agent.py
β βββ utils/
β βββ base_llm.py
β βββ llm_da_utils.py
β βββ distribution_sampler.py
β βββ prompts.py
βββ model_selection/
β βββ utils/
β βββ utils.py
βββ param_optimize/
β βββ pa_agent.py
β βββ utils/
β βββ utils.py
βββ result_optimize/
β βββ optimize_agent.py
βββ UniEnv/ # Unified environment for external models
β βββ base_model/ # Shell runners for each model
β βββ model_lib/ # Third-party model code
β βββ etc/
β βββ settings.py
β βββ da-config.yaml
β βββ model_config/
βββ environment/ # Environment specs (txt)
βββ data/ # Datasets and outputs
β βββ input_format/
β βββ model_output/
β βββ aux/
βββ nl_input_parser.py # Natural-language argument parser (rule + LLM fallback)
βββ README_EN.md
# Basic usage
python plan_agent_run.py \
--task="Next_Location_Prediction" \
--source="foursquare" \
--target="standard" \
--city="London" \
--gpu_id=0
# Parameter description
# --task: Task type (see supported tasks list)
# --source: Source dataset name
# --target: Target data format (recommend using standard)
# --city: City name (required for certain datasets)
# --gpu_id: GPU device IDTrajAgent supports natural-language instructions with a rule-first parser and LLM fallback. The parser first extracts parameters via rules; if required fields are missing, it invokes the LLM to complete them. Set your API key if you want LLM fallback.
# Optional (enable LLM fallback):
export OPENAI_API_KEY=your_key
# Optional model control (defaults to gpt-4o-mini if not set):
export LLM_MODEL=gpt-4o-mini
# Example: run Next Location Prediction on agentmove (London), GPU 1, 10 epochs, 5 steps
python plan_agent_run.py \
--query "I'm looking to figure out which points of interest users are likely to visit next in London"Parameter semantics and supported options (also available via nl_input_parser.explain_all_options()):
- task: Map_Matching, Trajectory_Generation, Trajectory_Representation, Trajectory_Recovery, Next_Location_Prediction, Trajectory_User_Linkage, Travel_Time_Estimation, Trajectory_Anomaly_Detection
- source: foursquare, gowalla, brightkite, agentmove, Earthquake, tencent, chengdu
- target: foursquare, gowalla, brightkite, agentmove, standard
- city (required when source=agentmove): CapeTown, London, Moscow, Mumbai, Nairobi, NewYork, Paris, SanFrancisco, SaoPaulo, Sydney, Tokyo, Unknown
- other: gpu_id, base_model, trial_num, max_step, max_epoch, memory_length
python traj_preprocess.py \
--city="London" \
--dataset="foursquare" \
--model="DeepMove"python da_agent_run.py \
--task="Next_Location_Prediction" \
--dataset="foursquare" \
--model="DeepMove" \
--city="London" \
--gpu_id=0 \
--pa_dapython fm_agent_run.py \
--task="Next_Location_Prediction" \
--dataset="foursquare" \
--city="London" \
--gpu_id=0python op_agent_run.py \
--task="Next_Location_Prediction" \
--dataset="foursquare" \
--city="London"Place raw data in the data/input_format/ directory:
data/input_format/
βββ foursquare/
β βββ source1.csv # POI data
β βββ source2.csv # Check-in data
βββ gowalla/
β βββ source1.csv
βββ ...
| Dataset | Type | Description |
|---|---|---|
| foursquare | checkin | Foursquare check-in data |
| gowalla | checkin | Gowalla social network data |
| brightkite | checkin | Brightkite location data |
| agentmove | checkin | Synthetic trajectory data |
| tencent | map | Tencent map data |
| chengdu | gps | Chengdu taxi trajectory data |
| porto | gps | Porto taxi trajectory data |
| earthquake | time_series | Earthquake time series data |
Evaluation datasets for the natural-language command parsing are generated by the scripts under evaluate_userQuery/ (e.g., gen_task_plan.py, eval_task_plan.py). The generated data are stored at:
/data/evaluate
Notes:
gen_task_plan.pyproduces task-plan JSON files (e.g.,task_plan_6.json).- Paths are absolute; ensure the target directory exists or has write permission.
- You can customize prompts/data inside
evaluate_userQuery/before generation.
The city parameter is implemented based on AgentMove (Agentmove: A large language model based agentic framework for zero-shot next location prediction), which divides worldwide Foursquare check-in data by cities. Supported cities include:
AgentMove City Datasets:
CapeTown- Cape TownLondon- LondonMoscow- MoscowMumbai- MumbaiNairobi- NairobiNewYork- New YorkParis- ParisSanFrancisco- San FranciscoSaoPaulo- SΓ£o PauloSydney- SydneyTokyo- Tokyo
Usage Rules:
- When
data_typeisagentmove, you can select thecityparameter to specify the city dataset for training - When
data_typeis other types,citycan be set toNoneorUnknown
- All Datasets: Download Link
The model library contains implementations of various trajectory modeling algorithms. You can download the complete model library from:
- Complete Model Library: Download Link
- LibCity: GitHub
Modify UniEnv/etc/da-config.yaml to configure data augmentation parameters:
# Position sampling method
pos_sample_method: ["uniform"]
# Item sampling method
item_sample_method: ["memorybased"]
# Time position
pos: ["time"]
# Operations requiring item sampling
need_item_sampling: ["insert", "replace", "Ti-insert", "Ti-replace"]Model configuration files are located in the UniEnv/etc/model_config/ directory and can be adjusted as needed.
# Configure new augmentation strategies in da-config.yaml
augment_operation: ["Ti-crop", "Ti-insert", "Ti-mask", "Ti-reorder"]- Add model code under
UniEnv/model_lib/ - Add execution scripts under
UniEnv/base_model/ - Update model configuration in
UniEnv/etc/settings.py
# Modify data_augmentation/utils/base_llm.py
LLMWrapper(
temperature=0,
model_name="llama3-70b",
max_tokens=6000,
model_kwargs={"stop": "\n"},
platform="DeepInfra"
)- Adjust
memory_lengthparameter to control memory length - Use
max_stepto control reflection rounds
- Adjust
max_epochbased on dataset characteristics - Use appropriate
batch_sizeandlearning_rate
- Support multi-GPU training
- Data augmentation operations support parallel execution
A: Ensure all .sh files have BASE_ENV_PATH replaced with the correct conda environment path.
A: Check data path configuration in UniEnv/etc/settings.py to ensure it points to the correct data directory.
A: Reduce batch_size or use smaller models, adjust gpu_memory_utilization parameter.
A: Check API key configuration and network connection, ensure sufficient API quota.
If you find this work helpful, please cite our paper:
@article{du2024trajagent,
title={TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration},
author={Du, Yuwei and Feng, Jie and Zhao, Jie and Yuan, Jian and Li, Yong},
journal={arXiv preprint arXiv:2410.20445},
year={2024}
}For questions, please contact us through:
- Submit an Issue
- Send email to [hiimingwei@gmail.com]
Note: Please ensure all environment variables and paths are correctly configured before use, and adjust relevant parameters according to specific needs.

