Skip to content

camronh/Tabular-Data-Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Tabular Data Bot

This is an experiment on doing RAG with tabular data. We focus on "Eval Driven Development" where we start with an evaluation dataset and focus on improving the scores of these evals.

Process

Here is the process we followed to iterate on the bot.

  1. Data Collection - We started by cleaning the data. We removed any bad data and normalized the movie data. Then we added a very simple semantic search functionality to it by adding an embeddings column and an embedding_norm column that speeds up the semantic search by a lot.

  2. Build the Agent - We build a super basic agent that can invoke a single tool for searching movies. We use LangGraph to orchestrate the agent in the notebook before moving it to the langgraph.py file.

  3. Build the Ground Truth Dataset - We come up with some example questions that we expect users to ask. We then run the agent against the dataset with LangSmith to store the traces. Then we add those traces to a LangSmith Annotation queue, which makes it easy for use to manually correct the answers before adding them to a dataset. I went through and corrected the answers for all of the examples by hand before adding them to a LangSmith Dataset.

  4. Evaluation - We build an LLM as a Judge that basically, for each example, takes in the correct answer from our ground truth dataset and the answer that the agent outputs and decides if the answer is correct. This will allow us to automatically score the agent's performance because we have already defined what correct answers are.

  5. Iterate - Now we have a baseline of how the agent performs. We have everything we need to start iterating. At this point we, run the evals and check the results for which ones failed. We decide based on value which features we want to add and in which order. We implement those features and run evals again to see how the results have improved.

  6. Expand the Dataset - Once we get the score to a level that we are happy with, our job should be to decrease the score by expanding the dataset. We want to add new examples that the agent can't solve correctly and then repeat the process starting from step 3.

About

An eval driven AI agent pointed at tabular movie data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published