How to synchronize databases, documents to vector databases #23972

qingtian1771 · 2024-07-08T13:36:08Z

qingtian1771
Jul 8, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

no code

Description

I need a mechanism to synchronize changes in databases, documents, and vector databases.

That is, when records in databases and documents change, I need a mechanism to synchronize these changes to the vector database. The purpose of synchronization is to support some AI applications, such as applications that support RAG mode.

The specific requirements are as follows:

When a record is added to the database, the information of this record needs to be put into the vector database (stored after encoding)
When a record is modified in the database, the record in the vector database needs to be modified at the same time.
When a record is deleted from the database, the record in the vector database needs to be deleted at the same time.
When a document is added (such as a pdf document), the corresponding record of this document needs to be added to the vector database (through extraction, slicing, vectorization and storage).
When a document is deleted, all vectors of this document in the vector database need to be deleted.
There is no need to modify the synchronization of documents for the time being, because modifying a document can be completed by a combination of deletion and addition.

System Info

no information

@dosu · 2024-07-08T13:44:00Z

dosubot[bot]
bot Jul 8, 2024

To synchronize changes in databases, documents, and vector databases for AI applications like those in RAG mode, you can use a combination of Redis for real-time updates, FastAPI for handling server-sent events (SSE), and a vector database like JaguarDB. Here’s a step-by-step guide:

Set up Redis for real-time updates: Use Redis Pub/Sub to publish and subscribe to changes in your data.
Use FastAPI with SSE: FastAPI can be used to create an API that supports SSE for real-time updates.
Integrate with LangChain and JaguarDB: Use LangChain to handle the AI application logic and JaguarDB for the vector store.

Here is a sample implementation:

from fastapi import FastAPI
from sse_starlette.sse import EventSourceResponse
import redis
import json
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores.jaguar import Jaguar
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

app = FastAPI()
r = redis.Redis()

# Initialize Jaguar Vector Store
url = "http://192.168.5.88:8080/fwww/"
pod = "vdb"
store = "langchain_rag_store"
vector_index = "v"
vector_type = "cosine_fraction_float"
vector_dimension = 1536
embeddings = OpenAIEmbeddings()
vectorstore = Jaguar(pod, store, vector_index, vector_type, vector_dimension, url, embeddings)
vectorstore.login()
vectorstore.create(metadata="category char(16)", text_size=4096)

# Endpoint to publish changes to Redis
@app.post("/update")
async def update_data(data: dict):
    r.publish('data_changes', json.dumps(data))
    return {"status": "Data published"}

# Endpoint to listen for real-time updates
@app.get("/events")
async def events():
    def event_generator():
        pubsub = r.pubsub()
        pubsub.subscribe('data_changes')
        for message in pubsub.listen():
            if message['type'] == 'message':
                yield {
                    "event": "update",
                    "data": message['data'].decode('utf-8')
                }
    return EventSourceResponse(event_generator())

# Example function to handle data changes
def handle_data_change(data):
    # Update vector store and document store
    loader = TextLoader(data['file_path'])
    documents = loader.load()
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=300)
    docs = text_splitter.split_documents(documents)
    vectorstore.add_documents(docs)

# Subscribe to Redis channel and handle data changes
def subscribe_to_changes():
    pubsub = r.pubsub()
    pubsub.subscribe('data_changes')
    for message in pubsub.listen():
        if message['type'] == 'message':
            data = json.loads(message['data'])
            handle_data_change(data)

# Run the subscription in a background task
import threading
threading.Thread(target=subscribe_to_changes).start()

This code sets up a FastAPI application with endpoints to publish data changes and listen for real-time updates using SSE. It also includes a function to handle data changes and update the vector store and document store accordingly. The subscription to Redis changes runs in a background thread to ensure real-time synchronization. The integration with LangChain is demonstrated using the Jaguar vector store and OpenAIEmbeddings to manage the vector store ^[1]^[2]^[3].

To continue talking to Dosu, mention @dosu.

0 replies

alex-thc · 2025-07-22T01:34:05Z

alex-thc
Jul 22, 2025

You can use our open source project https://github.com/adiom-data/dsync/ to do the initial indexing and real-time change-data-capture to sync your database to the vector store. We have implemented initial support for Weaviate and also have Qdrant in the works.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to synchronize databases, documents to vector databases #23972

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to synchronize databases, documents to vector databases #23972

Uh oh!

qingtian1771 Jul 8, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 2 comments

Uh oh!

dosubot[bot] bot Jul 8, 2024

Uh oh!

alex-thc Jul 22, 2025

qingtian1771
Jul 8, 2024

dosubot[bot]
bot Jul 8, 2024

alex-thc
Jul 22, 2025