Update README

Lin-jun-xiang · Lin-jun-xiang · commit d037e2254c62 · 2023-07-04T14:14:11.000+08:00
diff --git a/.github/workflows/trans.yml b/.github/workflows/trans.yml
diff --git a/READM.zh-TW.md b/READM.zh-TW.md
@@ -1,18 +1,24 @@
 # docGPT
 
-(English)[README.md] | (中文版)[README.zh-TW.md]
-
+[English](./README.md) | [中文版](./README.zh-TW.md)
 
 - 目錄
     - [Introduction](#introduction)
     - [What's LangChain?](#whats-langchain)
-      - [Questions that ChatGPT cannot answer are handed over to LangChain for implementation!](#questions-that-chatgpt-cannot-answer-are-handed-over-to-langchain-for-implementation)
     - [How to Use docGPT?](#how-to-use-docgpt)
-    - [Why Use docGPT?](#why-use-docgpt)
     - [How to develope a docGPT with streamlit?](#how-to-develope-a-docgpt-with-streamlit)
 
+
+* 主要開發軟體與套件:
+    * `Python 3.8.6`
+    * `Langchain 0.0.218`
+    * `Streamlit 1.22.0`
+
+* 使用該工具至少須具備 `openai_api_key`，有關如何取得 key 可以前往[連結](https://platform.openai.com/)
+
 ---
 
+
 ### Introduction
 
 * 使用 langchain、streamlit 輕鬆搭建出一個 AI 模型
@@ -32,19 +38,11 @@
         2. 允許與 LLM 模型進行交互
     * `streamlit`: streamlit 使 python 可以**快速、免費**的部署屬於你的應用程序
 
-* 主要開發軟體與套件:
-    * `Python 3.8.6`
-    * `Langchain 0.0.218`
-    * `Streamlit 1.22.0`
-
-* 使用該工具至少須具備 `openai_api_key`，有關如何取得 key 可以前往[連結](https://platform.openai.com/)
-
+---
 
 ### What's LangChain?
 
-* 有關 langchain 的介紹，建議查看官方文件、[Github源專案](https://github.com/hwchase17/langchain)
-
-#### Questions that ChatGPT cannot answer are handed over to LangChain for implementation!
+有關 langchain 的介紹，建議查看官方文件、[Github源專案](https://github.com/hwchase17/langchain)
 
 **ChatGPT 無法回答的問題，交給 Langchain 實現!**
 
@@ -71,6 +69,8 @@
 
 透過 langchain，我們可以創建屬於自己的 chatgpt 模型，它可以是通用型的模型，也可以是**企業化、商業化**的!
 
+---
+
 ### How to Use docGPT?
 
 * 前往[應用程序](https://docgpt-app.streamlit.app/)
@@ -84,16 +84,7 @@
 
 ![RGB_cleanup](https://github.com/Lin-jun-xiang/docGPT-streamlit/blob/main/img/docGPT.gif?raw=true)
 
-
-### Why Use docGPT?
-
-* 本專案開發的 `docGPT` 具有以下功能:
-  * 上傳 PDF
-  * 與GPT進行來回答覆，快速學習PDF內容
-  * 進行文檔總結
-  * 附加 **"math-llm"**，提供您進行**數學計算**相關問答 (chatgpt無法回答的問題)
-  * 附加 **"google-search"**，提供您進行**google搜尋** (chatgpt無法回答的問題)
-
+---
 
 ### How to develope a docGPT with streamlit?
 
@@ -104,12 +95,14 @@
 方法有如下兩種:
 
 * 於本地開發方式
-    1. `pip install -r requirements.txt`: 下載開發需求套件
-    2. `streamlit run ./app.py`: 於專案根目錄啟動服務
-    3. 開始體驗!
+    * `pip install -r requirements.txt`: 下載開發需求套件
+    * `streamlit run ./app.py`: 於專案根目錄啟動服務
+    * 開始體驗!
 
 * 使用 Streamlit Community Cloud 免費部屬、管理和共享應用程序
-    1. 將您的應用程序放在公共 GitHub 存儲庫中（並確保它有一個 `requirements.txt`！）
-    2. 登錄[share.streamlit.io](https://share.streamlit.io/)
-    3. 單擊“部署應用程序”，然後粘貼您的 GitHub URL
-    4. 完成部屬[應用程序](https://docgpt-app.streamlit.app//)
+    * 將您的應用程序放在公共 GitHub 存儲庫中（並確保它有一個 `requirements.txt`！）
+    * 登錄[share.streamlit.io](https://share.streamlit.io/)
+    * 單擊“部署應用程序”，然後粘貼您的 GitHub URL
+    * 完成部屬[應用程序](https://docgpt-app.streamlit.app//)
+
+<a href="#top">Back to top</a>
diff --git a/README.md b/README.md
@@ -0,0 +1,107 @@
+# docGPT
+
+[English](./README.md) | [中文版](./README.zh-TW.md)
+
+- Table of Contents
+    - [Introduction](#introduction)
+    - [What's LangChain?](#whats-langchain)
+    - [How to Use docGPT?](#how-to-use-docgpt)
+    - [How to Develop a docGPT with Streamlit?](#how-to-develop-a-docgpt-with-streamlit)
+
+
+* Main Development Software and Packages:
+    * `Python 3.8.6`
+    * `Langchain 0.0.218`
+    * `Streamlit 1.22.0`
+
+* Using this tool requires at least the `openai_api_key`. You can visit the [link](https://platform.openai.com/) to learn how to obtain the key.
+
+
+---
+
+
+### Introduction
+
+* Easily build an AI model using Langchain and Streamlit.
+
+* This project consists of three main components:
+    * [`DataConnection`](../model/data_connection.py): Allows LLM to communicate with external data, i.e., read PDF files and perform text segmentation for large PDFs to avoid exceeding OPENAI's 4000-token limit.
+    * [`docGPT`](../docGPT/): This component enables the model to understand the content of PDFs. It includes embedding PDF text and building a retrievalQA model using Langchain. For more details, please refer to the [documentation](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa).
+    * [`agent`](../agent/agent.py): Responsible for managing the tools used by the model and automatically determining which tool to use based on the user's question. The tools include:
+        * `SerpAI`: Used for "**current questions**" by performing a **Google search**.
+        * `llm_math_chain`: Used for "**mathematical calculations**" by performing mathematical computations.
+        * `docGPT`: Used for answering questions about the content of PDF documents. (This tool is built using retrievalQA)
+
+
+* `docGPT` is developed based on **Langchain** and **Streamlit**.
+    * `Langchain`: LangChain is a framework for **developing applications supported by language models**. It supports the following applications:
+        1. Connecting LLM models with external data sources.
+        2. Allowing interaction with LLM models.
+    * `Streamlit`: Streamlit enables fast and free deployment of Python applications.
+
+
+---
+
+### What's LangChain?
+
+For an introduction to LangChain, it is recommended to refer to the official documentation or the GitHub [repository](https://github.com/hwchase17/langchain).
+
+**Questions that ChatGPT cannot answer can be handled by Langchain!**
+
+Here, the author briefly introduces the differences between Langchain and ChatGPT. You will be amazed by this open-source project called Langchain through the following example!
+
+> Imagine a scenario where ChatGPT cannot answer mathematical questions or questions about events beyond 2020 (e.g., "Who will be the president in 2023?").
+>
+> * For mathematical questions: In addition to the OpenAI model, there is a specialized tool called math-llm that handles mathematical questions.
+> * For current questions: We can use Google search.
+>
+> Therefore, to design a powerful and versatile AI model, we need to include three tools: "chatgpt", "math-llm", and "Google search".
+>
+> If the user's question involves mathematical calculations, we use the math-llm tool to handle and answer it.
+>
+> In the non-AI era, we would use `if...else...` to decide which tool to use based on the user's question. However, Langchain provides a more flexible and powerful way to handle this.
+> In the AI era, we want users to directly ask their questions without having to pre-select the question type! In Langchain, there is a concept called "agent" that allows us to:
+
+* Provide tools for the agent to manage, such as `tools = ['chatgpt', 'math-llm', 'google-search']`.
+* Include chains designed using Langchain, such as using the `retrievalQA chain` to create a question-answering model based on document content, and append this chain to the tools managed by the agent.
+* **Allow the agent to determine which tool to use based on the user's question** (fully automated and AI-driven).
+
+With Langchain, we can create our own ChatGPT model that can be general-purpose or tailored for specific industries and commercial use!
+
+---
+
+### How to Use docGPT?
+
+* Visit the [application](https://docgpt-app.streamlit.app/).
+
+* Enter your API keys:
+    * `OpenAI API Key`: Required.
+    * `SERPAPI API Key`: Optional. If you want to ask questions about content not appearing in the PDF document, you need this key.
+
+* Upload a PDF file from your local machine.
+* Start asking questions!
+
+![docGPT](https://github.com/Lin-jun-xiang/docGPT-streamlit/blob/main/img/docGPT.gif?raw=true)
+
+---
+
+### How to Develop a docGPT with Streamlit?
+
+A step-by-step tutorial to quickly build your own chatGPT!
+
+First, clone the repository using `git clone https://github.com/Lin-jun-xiang/docGPT-streamlit.git`.
+
+There are two methods:
+
+* Local development:
+    * `pip install -r requirements.txt`: Download the required packages for development.
+    * `streamlit run ./app.py`: Start the service in the project's root directory.
+    * Start exploring!
+
+* Use Streamlit Community Cloud for free deployment, management, and sharing of applications:
+    * Put your application in a public GitHub repository (make sure it has a `requirements.txt`!).
+    * Log in to [share.streamlit.io](https://share.streamlit.io/).
+    * Click "Deploy an App" and paste your GitHub URL.
+    * Complete the deployment of your [application](https://docgpt-app.streamlit.app/).
+
+<a href="#top">Back to top</a>