gg8_langchain_demo

gg_langchain_demo is a demonstration project that showcases the integration of GridGain/Apache Ignite with LangChain, using the custom langchain_gridgain package. This project provides examples of how to use GridGain as a backend for various LangChain components, particularly focusing on a laptop recommendation system.

Features

GridGain-based key-value store, chat history, LLM cache, and document loader
Document Loader (GridGain) for managing reviews of the laptops
Key Value Store (GridGain) for managing specs of the laptops
Custom Retriever combines data from Document & Key Value Store to load data in the Vector Database and also assist in retrieval
Vector store (GridGain) for efficient similarity search on both reviews & specs
LLM Cache (GridGain) for caching responses from LLM works on exact match
Semantic LLM cache (Gridgain) for caching LLM responses to similar user queries
OpenAI-based language model and embeddings
Conversational AI system for laptop recommendations

The system adapts its behavior based on the provided arguments, allowing for flexible configuration of the retrieval and storage mechanisms.

Architecture & Project Structure

System Architecture

The above diagram illustrates the architecture and flow of the laptop recommendation system. Here's a breakdown of the components and their interactions:

User: Initiates the process by inputting a query.
Main: The central component that orchestrates the entire process.
DataPopulator: Responsible for populating the data stores with laptop information.
OpenAIEmbeddings: Generates embeddings for laptop reviews, specs, and user queries.
CustomRetriever: Retrieves relevant documents based on the user's query.
Vector Store: Stores and searches vector representations of laptop data (FAISS or GridGain-based).
RetrievalChain: Combines retrieved documents and conversation history to generate a context for the LLM.
OpenAI LLM: The language model that generates responses based on the provided context.
GridGain-based stores: Various components for storing and retrieving different types of data.

Project Files

main.py: Main script that sets up and runs the laptop recommendation system.
custom_retriever.py: Implements the CustomRetriever class for FAISS-based retrieval.
retriever_instantiator.py: Instantiates and initializes the retriever.
utils.py: Contains utility functions for initializing various components.
data_generator.py: Generates sample laptop data.
csv_data_generator.py: Creates a CSV file with vector embeddings for GridGain's vector store.
data_loader.py: Loads data into the doc loader, specs, and vector store.

Prerequisites

Python 3.11.7
- You can use pyenv to manage multiple Python versions (optional):
  1. Install pyenv: brew install pyenv (or your system's package manager)
  2. Create and activate the environment:
```
pyenv virtualenv 3.11.7 langchain-env
source $HOME/.pyenv/versions/langchain-env/bin/activate 
```
- Alternatively, ensure Python 3.11.7 is installed directly.
A running GridGain Enterprise or Ultimate Edition, at least 8.9.17 (release notes)
- Make sure your license includes access to the vector search feature.
- If you see any error about gridgain-vector-query or vector search not enabled, you need to enable it by moving libs/optional/gridgain-vector-query to the libs/ folder
OpenAI API key
1. Visit https://platform.openai.com/signup
2. Create an account (or sign in if you already have one)
3. Once logged in, go to the Settings->API section: https://platform.openai.com/settings/organization/api-keys
4. Click "Create new secret key"
5. Add your billing information (required for API access)
6. Make sure to copy and save your API key immediately after creation - you won't be able to see it again after leaving the page
7. If you’re seeing quota issues with respect to OpenAI, you need to purchase OpenAI credits

Installation

Clone this repository:

git clone https://github.com/GridGain-Demos/gg8_langchain_demo.git
cd gg8_langchain_demo

Install all the required dependencies:

pip install langchain-gridgain==1.0.2 langchain==0.3.21 langchain-community~=0.3.20 langchain-openai==0.2.12

Usage

The demo project consists of one main program to run the example

cd src
python main.py [--load_data true|false] [--use_history true|false] [--use_semantic_llm_cache true|false] [--use_api_key YOUR_OPENAI_KEY]

This is the main script that runs the laptop recommendation bot.

Arguments:

--load_data: Whether you would like to load the precreated data in the src/data folder
--use_history: Whether to use chat history (default: false). When it is false the llm cache for caching exact user queries is switched on.
--use_semantic_llm_cache: Whether to semantic llm cache for caching llm responses for similar user queries (default: false). When it is true the history feature is disabled.
--use_api_key: Your OpenAI API key (if not provided, you'll be prompted for it)

Initialization

The following components are initialized in the utils.py file:

OpenAI Embeddings:

def initialize_embeddings_model(api_key):
    os.environ["OPENAI_API_KEY"] = api_key
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    return embeddings

OpenAI LLM:

def initialize_opneai_llm(api_key):
    os.environ["OPENAI_API_KEY"] = api_key
    llm = OpenAI()
    return llm

GridGain Document Loader:

def initialize_doc_loader(client):
    doc_loader = GridGainDocumentLoader(
        cache_name="review_cache",
        client=client,
        create_cache_if_not_exists=True
    )
    return doc_loader

GridGain Key-Value Store:

def initialize_keyvalue_store(client):
    key_value_store = GridGainStore(
        cache_name="laptop_specs",
        client=client
    )
    return key_value_store

GridGain Chat History:

def initialize_chathistory_store(client):
    chat_history = GridGainChatMessageHistory(
        session_id="user_session",
        cache_name="chat_history",
        client=client
    )
    return chat_history

GridGain LLM Cache:

def initialize_llm_cache(client):
    llm_cache = GridGainCache(
        cache_name="llm_cache",
        client=client
    )
    return llm_cache

GridGain Semantic LLM Cache:

def initialize_semantic_llm_cache(client, embedding)-> GridGainSemanticCache:
   llm_cache = GridGainCache(
      cache_name="llm_cache",
      client=client
   )
   semantic_cache = GridGainSemanticCache(
      llm_cache=llm_cache,
      cache_name="semantic_llm_cache",
      client=client,
      embedding=embedding,
      similarity_threshold=0.85 # similarity search threshold
   )
   return semantic_cache

GridGain Vector Store:

def initialize_vector_store(client, embedding_model):
   vector_store = GridGainVectorStore(
      cache_name="vector_cache",
      client=client,
      embedding=embedding_model,
   )
   return vector_store

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gg8_langchain_demo

Table of Contents

Features

Architecture & Project Structure

System Architecture

Project Files

Prerequisites

Installation

Usage

Initialization

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

GridGain-Demos/gg8_langchain_demo

Folders and files

Latest commit

History

Repository files navigation

gg8_langchain_demo

Table of Contents

Features

Architecture & Project Structure

System Architecture

Project Files

Prerequisites

Installation

Usage

Initialization

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages