500 error code when consolidate 3 models in one serverless endpoint following example here: https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/meta-llama3/langchain.ipynb

### Operating System

Windows

### Version Information

There are many logs reporting 500s coming from the 2 following URLs:

https://meta-llama-3-1-405b-instruct-czz.eastus2.models.ai.azure.com/chat/completions
https://cohere-command-r-plus-uiawv.eastus2.models.ai.azure.com/chat/completions
 
![Image](https://github.com/user-attachments/assets/90343053-9887-42b0-a873-9d892c64048d)


Code snippet:

from langchain.chains import LLMChain
from langchain_core.output_parsers import StrOutputParser

from langchain.memory import ConversationBufferMemory
from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)
from langchain.schema import SystemMessage
from langchain_community.chat_models.azureml_endpoint import (
    AzureMLChatOnlineEndpoint,
    AzureMLEndpointApiType,
    CustomOpenAIChatContentFormatter,  # Updated formatter
)
token=get_token()

#"https://apimdevcloudeng.azure-api.net/mlstudio/chat/completions"
chat_model = AzureMLChatOnlineEndpoint(
    #endpoint_url="https://Cohere-command-r-plus-uiawv.eastus2.models.ai.azure.com/chat/completions",
    endpoint_url="https://apimdevcloudeng.azure-api.net/v1/chat/completions",
    endpoint_api_type=AzureMLEndpointApiType.serverless,
    endpoint_api_key=token,
    content_formatter=CustomOpenAIChatContentFormatter(),
    model_kwargs={"model":"mist"}
    #params={"model":"mist"}
     # Updated formatter
)
params={"model":"mist"}
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "Question: {question}")
])

# chat_llm_chain = LLMChain(
#     llm=chat_model,
#     prompt=prompt,
#     verbose=True,
# )

output_parser = StrOutputParser()

chain = prompt | chat_model | output_parser

question = "What are the differences between Azure Machine Learning and Azure AI services?"

response = chain.invoke({"question": question})
print(response)

 

Github repo link:

https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/meta-llama3/langchain.ipynb


How to consolidate 3 models in one serverless endpoint and facility calls with 3 models?

### Steps to reproduce

Code snippet:

from langchain.chains import LLMChain
from langchain_core.output_parsers import StrOutputParser

from langchain.memory import ConversationBufferMemory
from langchain.prompts import (
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    MessagesPlaceholder,
)
from langchain.schema import SystemMessage
from langchain_community.chat_models.azureml_endpoint import (
    AzureMLChatOnlineEndpoint,
    AzureMLEndpointApiType,
    CustomOpenAIChatContentFormatter,  # Updated formatter
)
token=get_token()

#"https://apimdevcloudeng.azure-api.net/mlstudio/chat/completions"
chat_model = AzureMLChatOnlineEndpoint(
    #endpoint_url="https://Cohere-command-r-plus-uiawv.eastus2.models.ai.azure.com/chat/completions",
    endpoint_url="https://apimdevcloudeng.azure-api.net/v1/chat/completions",
    endpoint_api_type=AzureMLEndpointApiType.serverless,
    endpoint_api_key=token,
    content_formatter=CustomOpenAIChatContentFormatter(),
    model_kwargs={"model":"mist"}
    #params={"model":"mist"}
     # Updated formatter
)
params={"model":"mist"}
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "Question: {question}")
])

# chat_llm_chain = LLMChain(
#     llm=chat_model,
#     prompt=prompt,
#     verbose=True,
# )

output_parser = StrOutputParser()

chain = prompt | chat_model | output_parser

question = "What are the differences between Azure Machine Learning and Azure AI services?"

response = chain.invoke({"question": question})
print(response)

 

Github repo link:

https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/meta-llama3/langchain.ipynb

### Expected behavior

returen completion results

### Actual behavior

500 errors

### Addition information

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

500 error code when consolidate 3 models in one serverless endpoint following example here: https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/meta-llama3/langchain.ipynb #3424

Operating System

Version Information

chat_llm_chain = LLMChain(

llm=chat_model,

prompt=prompt,

verbose=True,

)

Steps to reproduce

chat_llm_chain = LLMChain(

llm=chat_model,

prompt=prompt,

verbose=True,

)

Expected behavior

Actual behavior

Addition information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

500 error code when consolidate 3 models in one serverless endpoint following example here: https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/meta-llama3/langchain.ipynb #3424

Description

Operating System

Version Information

chat_llm_chain = LLMChain(

llm=chat_model,

prompt=prompt,

verbose=True,

)

Steps to reproduce

chat_llm_chain = LLMChain(

llm=chat_model,

prompt=prompt,

verbose=True,

)

Expected behavior

Actual behavior

Addition information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions