Natural Language Interface for SEDAR based on a multi-agent LLM system.
This repository contains the code for our CIKM 2025 paper on a multi-agent natural language interface (NLI) for semantic data lakes. The system enables users to interact with the SEDAR data lake platform using plain language, making advanced data management, discovery, and analytics accessible to non-technical users.
Our approach integrates large language models (LLMs) in a modular, multi-agent architecture. By combining retrieval-augmented generation (RAG) and dynamic tool-calling, the system translates user queries into structured API calls to execute complex workflows over the data lake.
Main features:
- Multi-agent orchestration for complex query decomposition and execution
- Retrieval-augmented generation for relevant API/tool selection
- Automatic tool-calling for seamless backend integration
- Evaluation framework for correctness and robustness
- We performed finetuning on a dedicated dataset specifically tailored for this system.
The repository includes code, datasets, and evaluation scripts.
- main.py: Entry point for the system without chainlit chat.
- chainlit_chat: Entry point for the system with the Chainlit interface.
- agent_graph/: Multi-agent orchestration logic.
- agents/: Agent implementations.
- models/: Model configuration and management.
- sedarapi/: API integration with the SEDAR data lake.
- prompts/: Prompt templates and compression logic.
- tools/: Tool definitions for agent actions.
- utils/: Utility functions.
- evaluation/: Contains all evaluation code.
- evaluation/evaluation.py: Script for running quantitative evaluation.
- evaluation/queries_similarity/: Semantic variations of queries.
- finetuning/: All code for finetuning.
- finetuning/dataset.jsonl: Main dataset in ShareGPT format.
- finetuning/data/: Sample datasets used in the datalake for finetuning.
- finetuning/langsmith_chat_loader.py: Loads LLM runs from LangSmith and creates datasets for finetuning.
- data/: Sample data used for evaluation (e.g., CSVs, JSON files).