Delete spec prompt

Lin-jun-xiang · Lin-jun-xiang · commit 9c2e39c013ca · 2023-07-09T23:18:57.000+08:00
diff --git a/README.md b/README.md
@@ -22,7 +22,11 @@
 
 ### Introduction
 
-* Easily build an AI model using Langchain and Streamlit.
+* Project Purpose:
+    * Build a powerful "LLM" model using langchain and streamlit, **enabling your LLM model to do what ChatGPT can't**:
+      * **Connect with external data** by using PDF documents as an example, allowing the LLM model to understand the uploaded files through RetrievalQA techniques.
+      * Integrate LLM with other tools to achieve **internet connectivity**. For instance, using Serp API as an example, leverage the Langchain framework to enable querying the model for **current issues** (i.e., **Google search engine**).
+      * Integrate LLM with the **LLM Math model**, enabling accurate **mathematical calculations**.
 
 * This project consists of three main components:
     * [`DataConnection`](../model/data_connection.py): Allows LLM to communicate with external data, i.e., read PDF files and perform text segmentation for large PDFs to avoid exceeding OPENAI's 4000-token limit.
@@ -34,17 +38,16 @@
 
 
 * `docGPT` is developed based on **Langchain** and **Streamlit**.
-    * `Langchain`: LangChain is a framework for **developing applications supported by language models**. It supports the following applications:
-        1. Connecting LLM models with external data sources.
-        2. Allowing interaction with LLM models.
-    * `Streamlit`: Streamlit enables fast and free deployment of Python applications.
-
 
 ---
 
 ### What's LangChain?
 
-For an introduction to LangChain, it is recommended to refer to the official documentation or the GitHub [repository](https://github.com/hwchase17/langchain).
+* LangChain is a framework for developing applications powered by language models. It supports the following applications:
+    1. Connecting LLM models with external data sources.
+    2. Enabling interactions with LLM models.
+
+* For an introduction to LangChain, it is recommended to refer to the official documentation or the GitHub [repository](https://github.com/hwchase17/langchain).
 
 **Questions that ChatGPT cannot answer can be handled by Langchain!**
 
diff --git a/README.zh-TW.md b/README.zh-TW.md
@@ -18,31 +18,31 @@
 
 ---
 
-
 ### Introduction
 
-* 使用 langchain、streamlit 輕鬆搭建出一個 AI 模型
-
+* 專案目的:
+    * 使用 langchain、streamlit 輕鬆搭建出一個強大的 "LLM" 模型，**讓您的 LLM 模型能夠實現 ChatGPT 做不到的事**:
+      * 與**外部數據連接**，本專案以 **PDF 文件**為例子，透過 RetrievalQA 技術讓 LLM 理解您上傳的文件
+      * 整合 LLM 與其他工具，達到**連網功能**，本專案以 Serp API 為例子，透過 Langchain 框架，使您能夠詢問模型有關**現今問題** (即 **google 搜尋引擎**)
+      * 整合 LLM 與 **LLM Math 模型**，使您能夠讓模型準確做到**數學計算**
 * 本專案的設計架構主要有三個元素:
     * [`DataConnection`](../model/data_connection.py): 讓 LLM 負責與外部數據溝通，也就是讀取 PDF 檔案，並針對大型 PDF 進行文本切割，避免超出 OPENAI 4000 tokens 的限制
     * [`docGPT`](../docGPT/): 該元素就是讓模型了解 PDF 內容的核心，包含將 PDF 文本進行向量嵌入、建立 langchain 的 retrievalQA 模型。詳細簡介請[參考](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa)
     * [`agent`](../agent/agent.py): 負責管理模型所用到的工具、並根據使用者提問**自動判斷**使用何種工具處理，工具包含
         * `SerpAI`: 當使用者問題屬於 "**現今問題**"，使用該工具可以進行 **google 搜索**
         * `llm_math_chain`: 當使用者問題屬於 "**數學計算**"，使用該工具可以進行 數學計算
         * `docGPT`: 當使用者詢問有關 PDF 文檔內容，使用該工具可以進行解答 (該工具也是我們透過 retrievalQA 建立的)
-
-
 * `docGPT` 是基於 **langchain** 與 **streamlit** 開發的
-    * `langchain`: LangChain 是一個用於**開發由語言模型支持的應用程序的框架**。它支持以下應用程序
-        1. 可以將 LLM 模型與外部數據源進行連接
-        2. 允許與 LLM 模型進行交互
-    * `streamlit`: streamlit 使 python 可以**快速、免費**的部署屬於你的應用程序
 
 ---
 
 ### What's LangChain?
 
-有關 langchain 的介紹，建議查看官方文件、[Github源專案](https://github.com/hwchase17/langchain)
+* LangChain 是一個用於**開發由語言模型支持的應用程序的框架**。它支持以下應用程序
+        1. 可以將 LLM 模型與外部數據源進行連接
+        2. 允許與 LLM 模型進行交互
+* 有關 langchain 的介紹，建議查看官方文件、[Github源專案](https://github.com/hwchase17/langchain)
+
 
 **ChatGPT 無法回答的問題，交給 Langchain 實現!**
 
diff --git a/agent/agent.py b/agent/agent.py
@@ -47,7 +47,6 @@ def create_doc_chat(self, docGPT) -> Tool:
             func=docGPT.run,
             description="""
             useful for when you need to answer questions from the context of PDF,
-            especially ask the specification of display.
             """
         )
         return tool
diff --git a/app.py b/app.py
@@ -89,7 +89,7 @@ def load_api_key() -> None:
         if temp_file_path:
             os.remove(temp_file_path)
 
-        docGPT, docGPT_spec, calculate_tool, search_tool = None, None, None, None
+        docGPT, calculate_tool, search_tool = None, None, None
 
         try:
             agent_ = AgentHelper()
@@ -99,14 +99,8 @@ def load_api_key() -> None:
             )
             docGPT_tool = agent_.create_doc_chat(docGPT)
 
-            docGPT_spec = DocGPT(docs=docs)
-            docGPT_spec.create_qa_chain(
-                chain_type='refine',
-            )
-            docGPT_spec_tool = agent_.create_doc_chat(docGPT_spec)
         except Exception as e:
-            print(e)
-            pass
+            st.write(e)
 
         try:
             search_tool = agent_.get_searp_chain
@@ -117,12 +111,12 @@ def load_api_key() -> None:
             calculate_tool = agent_.get_calculate_chain
 
             tools = [
-                docGPT_tool, docGPT_spec_tool,
-                calculate_tool, search_tool
+                docGPT_tool,
+                search_tool
             ]
             agent_.initialize(tools)
         except Exception as e:
-            print(e)
+            st.write(e)
 
 
 if not st.session_state['openai_api_key']:
@@ -139,10 +133,12 @@ def load_api_key() -> None:
 
 @lru_cache(maxsize=20)
 async def get_response(query: str):
-    if agent_ and query and query != '':
-        response = agent_.query(query)
-        return response
-
+    try:
+        if agent_.agent_ is not None:
+            response = agent_.query(query)
+            return response
+    except Exception as e:
+        pass
 
 query = st.text_input(
     "#### Question:",
@@ -153,7 +149,7 @@ async def get_response(query: str):
 user_container = st.container()
 
 with user_container:
-    if query:
+    if query and query != '':
         response = asyncio.run(get_response(query))
         st.session_state.query.append(query)
         st.session_state.response.append(response) 
diff --git a/docGPT/docGPT.py b/docGPT/docGPT.py
@@ -9,6 +9,7 @@
 from langchain.memory import ConversationBufferMemory
 from langchain.prompts import PromptTemplate
 from langchain.vectorstores import Chroma
+from langchain.chat_models import ChatOpenAI
 
 
 openai.api_key = os.getenv('OPENAI_API_KEY')
@@ -44,7 +45,7 @@ def __init__(
     @property
     def create_qa_chain(self) -> RetrievalQA:
         qa_chain = RetrievalQA.from_chain_type(
-            llm=OpenAI(temperature=0),
+            llm=self.llm,
             chain_type=self.chain_type,
             retriever=self.retriever,
             chain_type_kwargs=self.chain_type_kwargs
@@ -61,12 +62,6 @@ def __init__(
     ) -> None:
         super().__init__(chain_type, retriever, llm)
 
-    def _get_chat_history(self, inputs) -> str:
-        res = []
-        for human, ai in inputs:
-            res.append(f"Human:{human}\nAI:{ai}")
-        return "\n".join(res)
-
     @property
     def create_qa_chain(self) -> ConversationalRetrievalChain:
         # TODO: cannot use conversation qa chain
@@ -75,11 +70,10 @@ def create_qa_chain(self) -> ConversationalRetrievalChain:
             return_messages=True
         )
         qa_chain = ConversationalRetrievalChain.from_llm(
-            llm=OpenAI(temperature=0),
+            llm=self.llm,
             chain_type=self.chain_type,
             retriever=self.retriever,
-            memory=memory,
-            get_chat_history=self._get_chat_history
+            memory=memory
         )
         return qa_chain    
 
@@ -88,11 +82,16 @@ class DocGPT:
     def __init__(self, docs):
         self.docs = docs
         self.qa_chain = None
+        self.llm = ChatOpenAI(
+            temperature=0.2,
+            max_tokens=2000,
+            model_name='gpt-3.5-turbo'
+        )
 
         self.prompt_template = """
-        Cite each reference using [Page Number] notation (every result has this number at the beginning).
-        Only answer what is asked. The answer should be short and concise. Answer step-by-step.
+        Only answer what is asked. Answer step-by-step.
         If the content has sections, please summarize them in order and present them in a bulleted format.
+        Utilize line breaks for better readability.
         For example, sequentially summarize the introduction, methods, results, and so on.
 
         {context}
@@ -154,14 +153,14 @@ def create_qa_chain(
             self.qa_chain = RChain(
                 chain_type=chain_type,
                 retriever=retriever,
-                llm=OpenAI(temperature=0),
+                llm=self.llm,
                 chain_type_kwargs=chain_type_kwargs
             ).create_qa_chain
         else:
             self.qa_chain = CRChain(
                 chain_type=chain_type,
                 retriever=retriever,
-                llm=OpenAI(temperature=0)
+                llm=self.llm
             ).create_qa_chain
 
     def run(self, query: str) -> str:
diff --git a/requirements.txt b/requirements.txt
@@ -1,5 +1,7 @@
-langchain==0.0.224
+langchain==0.0.228
 openai==0.27.8
-streamlit==1.24.0
+streamlit==1.24.1
 streamlit_chat==0.1.1
 pymupdf
+chromadb
+tiktoken

Original file line number	Diff line number	Diff line change
`@@ -47,7 +47,6 @@ def create_doc_chat(self, docGPT) -> Tool:`
`47`	`47`	`func=docGPT.run,`
`48`	`48`	`description="""`
`49`	`49`	`useful for when you need to answer questions from the context of PDF,`
`50`		`- especially ask the specification of display.`
`51`	`50`	`"""`
`52`	`51`	`)`
`53`	`52`	`return tool`