Skip to content

hsnr-data-science/SEDAR-NLI

Repository files navigation

SEDAR-NLI

Natural Language Interface for SEDAR based on a multi-agent LLM system.

This repository contains the code for our CIKM 2025 paper on a multi-agent natural language interface (NLI) for semantic data lakes. The system enables users to interact with the SEDAR data lake platform using plain language, making advanced data management, discovery, and analytics accessible to non-technical users.

Our approach integrates large language models (LLMs) in a modular, multi-agent architecture. By combining retrieval-augmented generation (RAG) and dynamic tool-calling, the system translates user queries into structured API calls to execute complex workflows over the data lake.

Main features:

  • Multi-agent orchestration for complex query decomposition and execution
  • Retrieval-augmented generation for relevant API/tool selection
  • Automatic tool-calling for seamless backend integration
  • Evaluation framework for correctness and robustness
  • We performed finetuning on a dedicated dataset specifically tailored for this system.

The repository includes code, datasets, and evaluation scripts.

Repository Structure

Main Code

  • main.py: Entry point for the system without chainlit chat.
  • chainlit_chat: Entry point for the system with the Chainlit interface.
  • agent_graph/: Multi-agent orchestration logic.
  • agents/: Agent implementations.
  • models/: Model configuration and management.
  • sedarapi/: API integration with the SEDAR data lake.
  • prompts/: Prompt templates and compression logic.
  • tools/: Tool definitions for agent actions.
  • utils/: Utility functions.

Evaluation

Finetuning

Data

  • data/: Sample data used for evaluation (e.g., CSVs, JSON files).

About

Natural Language Interface for SEDAR based on a multi-agent LLM system.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages