LLM Evaluation for Chemical Engineering Q&A

This project demonstrates a framework for evaluating Large Language Models (LLMs) on chemical engineering questions and answers using LangSmith.

Key Components

Dataset

Loaded from an Excel file containing chemical engineering Q&A pairs
Converted to a Hugging Face Dataset

Models

Various LLMs are evaluated, including open-source models and API-based services
A separate "judge" model (e.g., GPT-4o) is used for evaluation

Evaluation Metrics

Custom metrics scored on a scale of 1-5 by the judge model:

Completeness
Relevance
Conciseness
Confidence
Factuality
Judgment

LangSmith Integration

Creates datasets and runs evaluations using the LangSmith platform

Visualization

Generates distribution plots for each evaluation metric

Key Functions

get_model(): Initializes the specified LLM
predict(): Generates responses from the LLM for given questions
factor_evaluator(): Evaluates LLM responses using the judge model
plot_figures_metrics(): Creates visualizations of evaluation results

Usage

Set up environment variables (API keys, etc.)
Specify the input dataset and models to evaluate
Run the evaluation loop, which:
- Generates responses from each model
- Evaluates responses using the judge model
- Logs results to LangSmith
Analyze results through LangSmith dashboard and generated plots

Notes

The notebook is designed to work with various compute environments (local, Google Colab, RunPod, etc.)
It includes options for CPU and GPU acceleration
Evaluation can be resource-intensive, especially for larger models and datasets

This framework allows for systematic comparison of LLM performance on domain-specific tasks, providing insights into model capabilities and areas for improvement.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.gitignore		.gitignore
README.md		README.md
langsmith_evaluate_beam_search_new_prompts_statistics_final.ipynb		langsmith_evaluate_beam_search_new_prompts_statistics_final.ipynb
requirements.txt		requirements.txt
runpod_instructions.txt		runpod_instructions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Evaluation for Chemical Engineering Q&A

Key Components

Dataset

Models

Evaluation Metrics

LangSmith Integration

Visualization

Key Functions

Usage

Notes

About

Releases

Packages

Languages

nsourlos/langsmith_evaluate_chem_eng

Folders and files

Latest commit

History

Repository files navigation

LLM Evaluation for Chemical Engineering Q&A

Key Components

Dataset

Models

Evaluation Metrics

LangSmith Integration

Visualization

Key Functions

Usage

Notes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages