Parameter control of chained llm and retriever from Gradio interface #30836

SwHaraday · 2025-04-15T04:51:46Z

SwHaraday
Apr 15, 2025

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

TEMPERATURE = 0.6
RELEVANCE_SCORE = 0.7

llm = LlamaCpp(
    model_path=model_path,
    n_gpu_layers=-1, 
    stop=["Question:", "Answer:"], 
    n_ctx=8192,
    temperature=TEMPERATURE,
    verbose=False, 
)

Description

Hi folks !

I need to know how to change parameters of chained llm and retriever from Gradio interface.
Targets are 'TEMPERATURE' of llm and 'RELEVANCE_SCORE' of retriever.

llm is loaded like;

TEMPERATURE = 0.6
RELEVANCE_SCORE = 0.7

llm = LlamaCpp(
    model_path=model_path,
    n_gpu_layers=-1, 
    stop=["Question:", "Answer:"], 
    n_ctx=8192,
    temperature=TEMPERATURE,
    verbose=False, 
)

and chained with retriever like;

chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=index.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs={'score_threshold': RELEVANCE_SCORE, 'k': 2},
    ), 
    chain_type_kwargs={"prompt": QUESTION_PROMPT}, 
    chain_type="stuff", 
    return_source_documents=True
)

chat function for Gradio is;

def chat(message, history, REL_SCORE, TEMP):

    response = chain.invoke(messages)

    for i, source in enumerate(response["source_documents"], 1):
        print(f"\nindex: {i}----------------------------------------------------")
        print(f"{source.page_content}")
        print("---------------------------------------------------------------")

    response_result = response["result"]
    yield response_result

Gradio interface is as below;

demo = gr.ChatInterface(fn=chat,
                 title='RAG-Chat',
                 type='messages',
                 additional_inputs=[
                                    gr.Slider(0.6, 1.0, value=RELEVANCE_SCORE, step=0.01, label="rel_score", visible=True),
                                    gr.Slider(0.0, 1.0, value=TEMPERATURE, step=0.01, label="Temperature", visible=True),
                                    ]
                )
demo.queue().launch()

Now no error is given, however parameters are not changed.

How is it possible to change parameters from Gradio's 'additional_inputs' ?
Is there something missing in 'def Chat()' ?

Environment:python3.10.16 on conda for Windows11Pro
langchain 0.3.20
langchain_community 0.3.19
faiss 1.10.0
gradio 5.20.1
llama_cpp_python 0.3.7

Thanks in advance.

System Info

Environment:
python3.10.16 on conda for Windows11Pro
langchain 0.3.20
langchain_community 0.3.19
faiss 1.10.0
gradio 5.20.1
llama_cpp_python 0.3.7

Answered by mghiasvand1

Apr 15, 2025

@SwHaraday, there were a few small bugs in my previous comment, so disregard it and consider this one instead. I have updated the code. After defining your index variable as your vector store and setting the model_path and QUESTION_PROMPT variables, running the code below will likely resolve the issue of the two mentioned hyperparameters not updating within your interface. What has been done is that the llm and retriever are initialized outside the chat function. Inside the chat function, the temperature of the llm is set first, followed by the score threshold of the retriever. Finally, the chain is created based on this updated llm and retriever, followed by the invocation.

TEMPERATURE = 0…

View full answer

mghiasvand1 · 2025-04-15T06:39:11Z

mghiasvand1
Apr 15, 2025

@SwHaraday, the problem seems to be inside your chat function. You need to perform the following steps inside this function: (i) set the LLM temperature using the TEMP hyperparameter, (ii) define an updated retriever using the REL_SCORE hyperparameter, and (iii) create an updated chain that includes the updated retriever and your LLM configured with the specified temperature, and use this updated chain for the invocation. These modifications are applied in the following code:

def chat(message, history, REL_SCORE, TEMP):
    llm.temperature = TEMP
    
    updated_retriever = index.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs={'score_threshold': REL_SCORE, 'k': 2},
    )
    
    updated_chain = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=updated_retriever,
        chain_type_kwargs={"prompt": QUESTION_PROMPT},
        chain_type="stuff",
        return_source_documents=True
    )
    
    response = updated_chain.invoke({"query": message})  
    
    for i, source in enumerate(response["source_documents"], 1):
        print(f"\nIndex: {i} ----------------------------------------------------")
        print(f"{source.page_content}")
        print("---------------------------------------------------------------")
    
    response_result = response["result"]
    yield response_result

Additionally, it is worth noting that while defining the updated_retriever, I used index, which was not visible in any part of your provided code. However, this corresponds to the vector store you are using. For example, it can be initialized using the code below with FAISS and the all-MiniLM-L6-v2 embedding model:

from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS

embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
index = FAISS.from_documents(docs, embedding_model)

In this code, it's important to note that the docs variable is the split version of your original text document, which is formed after applying your JapaneseCharacterTextSplitter to it.

3 replies

SwHaraday Apr 15, 2025
Author

Hi @mghiasvand1,

OK ! I understood what I missed.
Do I have to send 'llm' and 'retriever' to 'def chat' as args ? or No ?
or...
Maybe I don't need to define 'retriever' and 'chain' outside of 'def chat'.

Anyway, I will try day after tomorrow.
Thank you for your quick response !

BR Yuji

mghiasvand1 Apr 15, 2025

@SwHaraday, there were a few small bugs in my previous comment, so disregard it and consider this one instead. I have updated the code. After defining your index variable as your vector store and setting the model_path and QUESTION_PROMPT variables, running the code below will likely resolve the issue of the two mentioned hyperparameters not updating within your interface. What has been done is that the llm and retriever are initialized outside the chat function. Inside the chat function, the temperature of the llm is set first, followed by the score threshold of the retriever. Finally, the chain is created based on this updated llm and retriever, followed by the invocation.

TEMPERATURE = 0.6
RELEVANCE_SCORE = 0.7

llm = LlamaCpp(
    model_path=model_path,
    n_gpu_layers=-1,
    stop=["Question:", "Answer:"],
    n_ctx=8192,
    temperature=TEMPERATURE,
    verbose=False,
)

retriever = index.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={'score_threshold': RELEVANCE_SCORE, 'k': 2},
)

def chat(message, history, REL_SCORE, TEMP):
    llm.temperature = TEMP
    retriever.search_kwargs['score_threshold'] = REL_SCORE
    chain = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        chain_type_kwargs={"prompt": QUESTION_PROMPT},
        chain_type="stuff",
        return_source_documents=True
    )

    response = chain.invoke({"query": message})
    for i, source in enumerate(response["source_documents"], 1):
        print(f"\nIndex: {i} ----------------------------------------------------")
        print(f"{source.page_content}")
        print("---------------------------------------------------------------")
    response_result = response["result"]
    yield response_result

demo = gr.ChatInterface(
    fn=chat,
    title='RAG-Chat',
    type='messages',
    additional_inputs=[ 
        gr.Slider(0.6, 1.0, value=RELEVANCE_SCORE, step=0.01, label="rel_score", visible=True),
        gr.Slider(0.0, 1.0, value=TEMPERATURE, step=0.01, label="Temperature", visible=True),
    ]
)

demo.queue().launch()

Please mark this answer as the accepted one if this comment has resolved your question.

Answer selected by SwHaraday

SwHaraday Apr 17, 2025
Author

Hi @mghiasvand1,

Test result was PERFECT.
I believe RAG-LLM is one of most difficult MATCHING TECHNOLOGY, so controlling parameters by users in live is essential.

Your efforts on this discussion is pretty much appreciated !!

BR Yuji

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parameter control of chained llm and retriever from Gradio interface #30836

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Parameter control of chained llm and retriever from Gradio interface #30836

Uh oh!

SwHaraday Apr 15, 2025

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

mghiasvand1 Apr 15, 2025

Uh oh!

SwHaraday Apr 15, 2025 Author

Uh oh!

Uh oh!

mghiasvand1 Apr 15, 2025

Uh oh!

SwHaraday Apr 17, 2025 Author

SwHaraday
Apr 15, 2025

Replies: 1 comment 3 replies

mghiasvand1
Apr 15, 2025

SwHaraday Apr 15, 2025
Author

SwHaraday Apr 17, 2025
Author