What are your takes on open-source model capability to implement numerous pipeline logic that you have came out with? #8449
Replies: 5 comments 1 reply
-
🤖 Hi, Thank you for your detailed question and for taking the time to read through the LlamaIndex documentation. Your insights and questions are very valuable. You're correct in your understanding that the LlamaIndex pipeline logic assumes certain capabilities from the Large Language Models (LLMs). These capabilities include logical decomposition of queries, evaluation of answer quality, and functioning as a Main-Agent/Document-Agent, among others. As for your question about the feasibility of implementing these capabilities with open-source LLMs, it's a complex issue. Open-source LLMs have shown potential in implementing the pipeline logic outlined in the LlamaIndex. However, their performance can be variable and sometimes unreliable, requiring prompt engineering or fine-tuning to improve. For instance, the open-source LLM 'llama2-chat-7b 4bit' from Hugging Face performs well with Basic Query Engines but is usually unreliable with Router Query Engine, SubQuestion Query Engine, Text2SQL, and Pydantic Programs. It is also sometimes unreliable with Data Agents. The model seems to be quite chatty, which makes parsing structured outputs difficult. Fine-tuning and prompt engineering are likely required for better performance on structured outputs. Another open-source LLM, 'Mistral-7B-instruct-v0.1 4bit', also from Hugging Face, shows similar performance. It is reliable with Basic Query Engines but usually unreliable with Router Query Engine and SubQuestion Query Engine. It is sometimes unreliable with Text2SQL, Pydantic Programs, and Data Agents. This model seems slightly more reliable for structured outputs compared to Llama2, and with some prompt engineering, it may perform better. The 'zephyr-7b-alpha' model from Hugging Face appears to be more reliable than other open-source models of this size. It performs well with Basic Query Engines, Router Query Engine, SubQuestion Query Engine, and Text2SQL. It is also reliable with Pydantic Programs but sometimes unreliable as a Data Agent. This model still hallucinates a bit, especially as an agent. In conclusion, while open-source LLMs can be used to implement the pipeline logic in LlamaIndex, their performance can be variable and may require additional engineering or fine-tuning to improve. They offer the advantage of customizability and transparency, which can be beneficial in certain applications. As for the specific features of the LlamaIndex that are most dependent on the capabilities of the LLMs, these include the main interfaces for interacting with different LLMs ( I hope this answers your question. If you have any further queries, feel free to ask. Best, Sources
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot. |
Beta Was this translation helpful? Give feedback.
-
Hey @timothylimyl thanks for the thoughtful note! I think your intuition / understanding is spot on. The more advanced features in llamaindex definitely assumes a level of LLM capability which many open source LLMs lack right now. We've started more explicitly tracking which LLMs can power which features in llamaindex here: https://docs.llamaindex.ai/en/stable/core_modules/model_modules/llms/root.html#llm |
Beta Was this translation helpful? Give feedback.
-
Hi @Disiok , thanks for the link, that is what I am looking for. However, it is lacking details on the verification process that was carried out. Can the team over at LlamaIndex share with us how was this done? It will be very useful to have the pipeline code for this as the verification must be grounded on some examples. Thus, it will be great for us to know what examples were used by the team to further verify other examples and share with the community. I want to test out whether these models are able to be the back-end engine for the various task (router, subquestion, text2sql, etc) but will be great to see what the team has done to not duplicate your effort and be an add-on resource. Thank you for your attention to this matter. |
Beta Was this translation helpful? Give feedback.
-
Pardon my previous query, I just found out that the notebook is linked in the column of the I will look through the notebooks to see what are there to add on. |
Beta Was this translation helpful? Give feedback.
-
Seeing how prompt engineering guidance varies at times (Anthropic Claude prefers XML and only just in 2.1 supported a system prompt), it does feel like it would be beneficial to have a clear code construct to allow overrides for particular models. OpenAI feels first class level with the most use and testing. Supporting overrides for other models would differ if it's one that has a direct provider in the code base like Anthropic or is a loader for many highly varied models where you can't assume prompt tuning across them like Bedrock (Claude, Titan, Llama2), HuggingFace, and LlamaCpp so another abstraction level would be useful to map a prompt tuning style to particular models without those loader styles. It would be great to have a clean method for interested community members working heavily with particular models to create these prompt tuning packs and mappings that can be leveraged by others instead of everyone making their own solo overrides (in various styles) that never get contributed back to the core. Being able to support the prompt tuning pack + model mapper for each of these task use cases would be an awesome goal: https://docs.llamaindex.ai/en/stable/module_guides/models/llms.html |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Question Validation
Question
Hi,
I really enjoyed reading through the entire llama index documentation to get some ideas on how you are going about implementing RAG.
It seems to me that a lot of software logic implemented here will require the LLM to be:
By OS level LLM, I am indicating that the LLM is treated as the central-agent that does is capable of dynamic planning and reasoning which I believe is only currently available for closed-sourced LLM as it is an emergent property of size. Thus, I am curious on whether have you tested the overall pipeline with an open-source LLM to replicate your RAG pipeline successfully.
For example, there's a few requirement that you have assumed the LLM to have:
a. Logical decomposition of main query to precise sub query for usage of search and retrieval
b. Evaluation of quality of answer given query to decide whether to re-query
c. Using it as Main-Agent/Document-Agent including your idea of Router
A few that I think is within the realm of "doable" for open-source:
a. Summarisation of descriptions of chunk (Efforts to disambiguate)
b. Normal QA (Answers is found in context or in the compiled answers)
and an extra questionable feature:
a. Answering based off context provided and not hallucinate. If you are just prompting it to "do not provide an answer if answers cannot be found in the context provided", then it is highly probable your LLM will still answer especially if context has some sort of relevance. This is something that GPT3.5 has problem with too.
I foresee a lot of pitfalls/point of failures that will come with open-source models especially if most pipeline logic is just prompting your way through and hoping that the LLM can handle the logic. Don't get me wrong, I understand the quick progress wins that you can get with OpenAI and it really helps you to test your idea immediately.
What are your takes on open-source model capability to implement numerous pipeline logic that you have came out with? Hope this thread can be a space for discussion in the area of open-source LLM.
Beta Was this translation helpful? Give feedback.
All reactions