🤖 AI Data Scientist

A hierarchical multi-agent system to mimic the job duties of a data scientist, built using LangGraph and vanna.ai.

Medium article: AI Data Scientist: An Attempt to Replace My Own Job

Features

Coder Agent: Generates and executes Python code, returning results.
Data Analyst Agent: Answers questions about data using text-to-SQL.
Slides Generator Agent: Creates PowerPoint presentations.

The agents are managed by a supervisor to fulfill user requests, with short-term memory persistence.

Visualizing the system:

Usage

Setup

Clone the repository

git clone https://github.com/cintiaching/ai-data-scientist
cd ai-data-scientist

Install dependency using uv
```
uv sync
source .venv/bin/activate
```
Create .env file from .env.example and set the required environment variables.

Ingest Data

This project uses an SQLite database to store data. By default, it pulls data from dataceo/sales-and-customer-data on Kaggle. The data will be automatically downloaded from Kaggle Hut when running ingest_data.py. You can modify the script to ingest data of your choice.

Training

Since it uses Vanna.ai, training is required for the agent to understand your data, similar to how a data scientist learns about their dataset. For more details, check Vanna.ai Training Documentation. Modify train.py to incorporate your domain knowledge and use case.

LLM Setup

Navigate to agents/llm.py to configure the LLM settings.
Ensure the necessary environment variables are set for LLM configuration.
Model Recommendation: Use a smart LLM for code generation. For options, visit the Chatbot Arena Benchmark

Entry script

Example code:

python main.py

To start the localhost UI created using streamlit, run

python -m streamlit run app.py

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
agents		agents
static		static
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
app.py		app.py
ingest_data.py		ingest_data.py
main.py		main.py
pyproject.toml		pyproject.toml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🤖 AI Data Scientist

Features

Usage

Setup

Ingest Data

Training

LLM Setup

Entry script

About

Uh oh!

Releases

Packages

Uh oh!

Languages

cintiaching/ai-data-scientist

Folders and files

Latest commit

History

Repository files navigation

🤖 AI Data Scientist

Features

Usage

Setup

Ingest Data

Training

LLM Setup

Entry script

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages