Multi user Chatbot in Production - How I load dataframe in the memory and read later by the langchain process? #27700

mraguth · 2024-10-29T02:43:03Z

mraguth
Oct 29, 2024

We created a chatbot and extended its use to multiple users. However, during multiple user interactions and queries, we noticed that the process repeatedly calls the dataframe. The question is how can we save the dataframe in memory to avoid querying it multiple times with Langchain? My intention is to read the data once and keep it in memory so that Langchain can access it from there. How can I change the code to achieve this? Please help. The below is my sample code I am using.

repl = PythonAstREPLTool(      ##### This is the function brining df in to the agent_executor
    locals={"df": df},
    name="python_repl",
    description="A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.",
    args_schema=PythonInputs,
    prompt = prompt,
)

tools = [repl]

agent = create_openai_functions_agent(
    llm=llm, prompt=prompt, tools=tools
)
agent_executor = AgentExecutor(
    agent=agent, 
    tools=tools, 
    max_iterations=40, 
    early_stopping_method="force",    ##### Changed from generate due to Generate error.
    allow_dangerous_code=True,
    allow_dangerous_requests=True,
    agent_executor_kwargs={"handle_parsing_errors": True},
    verbose=False,
    tune_prompt = tune_prompt,
    metadata_prompt = metadata_prompt,
) | (lambda x: x["output"])

# Typing for playground inputs
class AgentInputs(BaseModel):
    input: str

agent_executor = agent_executor.with_types(input_type=AgentInputs)

@dosu · 2024-10-29T02:47:23Z

dosubot[bot]
bot Oct 29, 2024

To load a dataframe into memory for a multi-user chatbot using LangChain and avoid querying it multiple times, you can modify your code to store the dataframe in a global variable or a shared in-memory data structure. This way, the dataframe is loaded once and can be accessed by multiple users without reloading it from the source. Here's how you can modify your code to achieve this:

Load the dataframe once and store it in a global variable.
Use this global dataframe in your functions or classes that require access to the data.

Here's a modified version of the relevant part of your code:

import pandas as pd
from datasets import load_dataset

# Load the dataset once and store it in a global variable
global_df = None

def load_data():
    global global_df
    if global_df is None:
        # Load your dataset here
        data = load_dataset("MongoDB/embedded_movies")
        df = pd.DataFrame(data["train"])
        df = df[df["fullplot"].notna()]  # Filter out records with null 'fullplot'
        df.rename(columns={"plot_embedding": "embedding"}, inplace=True)  # Rename column
        global_df = df

# Call this function once at the start of your application
load_data()

# Now you can use `global_df` in your application without reloading it
def get_data():
    return global_df

# Example usage in your application
df = get_data()

# Use `df` in your agent setup
repl = PythonAstREPLTool(
    locals={"df": df},
    name="python_repl",
    description="A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.",
    args_schema=PythonInputs,
    prompt=prompt,
)

tools = [repl]

agent = create_openai_functions_agent(
    llm=llm, prompt=prompt, tools=tools
)
agent_executor = AgentExecutor(
    agent=agent, 
    tools=tools, 
    max_iterations=40, 
    early_stopping_method="force",
    allow_dangerous_code=True,
    allow_dangerous_requests=True,
    agent_executor_kwargs={"handle_parsing_errors": True},
    verbose=False,
    tune_prompt=tune_prompt,
    metadata_prompt=metadata_prompt,
)

# Typing for playground inputs
class AgentInputs(BaseModel):
    input: str

agent_executor = agent_executor.with_types(input_type=AgentInputs)

By using a global variable global_df, the dataframe is loaded into memory once and can be accessed by any part of your application without reloading it from the source. This approach is suitable for applications where the data does not change frequently and can be shared across multiple users or sessions ^[1].

To continue talking to Dosu, mention @dosu.

2 replies

mraguth Oct 29, 2024
Author

@dosu... thank you and I will try.

dosubot[bot] bot Oct 29, 2024

You're welcome! If you encounter any issues or have further questions while implementing the changes, feel free to ask. Good luck with your chatbot!

To continue talking to Dosu, mention @dosu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multi user Chatbot in Production - How I load dataframe in the memory and read later by the langchain process? #27700

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multi user Chatbot in Production - How I load dataframe in the memory and read later by the langchain process? #27700

Uh oh!

mraguth Oct 29, 2024

Replies: 1 comment · 2 replies

Uh oh!

dosubot[bot] bot Oct 29, 2024

Uh oh!

mraguth Oct 29, 2024 Author

Uh oh!

dosubot[bot] bot Oct 29, 2024

mraguth
Oct 29, 2024

Replies: 1 comment 2 replies

dosubot[bot]
bot Oct 29, 2024

mraguth Oct 29, 2024
Author