Lin-jun-xiang
diff --git a/‎README.md
Lines changed: 59 additions & 61 deletions b/‎README.md
Lines changed: 59 additions & 61 deletions
@@ -18,127 +18,125 @@
 
 If you like this project, please give it a ⭐`Star` to support the developers~
 
-### What's new in version3?
+### ✨What's new in version3?
 
-* We introduced the usage of `gpt4free`, **allowing users to use the application for free without entering any API key or making payments.**
+* Introduction of integration with `gpt4free`, **allowing users to use docGPT for free without needing to input API keys or make payments**.
 
-* If you want to use the `gpt4free` free model, you need to select a `Provider` (default is `g4f.provider.ChatgptAi`). For more information about [`gpt4free`](https://github.com/xtekky/gpt4free), please refer to the source project.
+- If you choose to use the `gpt4free` model, you only need to select the `Provider` (default is `g4f.provider.ChatgptAi`). For more details about `gpt4free`, refer to the [source project](https://github.com/xtekky/gpt4free).
 
-* Version2
-  * Utilizes the **`openai` model**
-  * To use this tool, you need to have at least the `openai_api_key`. You can obtain the key by visiting the [link](https://platform.openai.com/)
-  * If you have a `serpapi_key`, the AI model can answer questions and implement Google search functionality
+- Version 2:
+  - Uses the **`openai` model**.
+  - Requires an `openai_api_key`. You can obtain this key from the [link](https://platform.openai.com/).
+  - If you have a `serpapi_key`, AI responses can include Google search results.
 
-* Version3
-  * Retains all the features of Version2
-  * Adds the **`gpt4free` model**, enabling users to use it **completely for free**
-  * Users can choose between `gpt4free` or `openai` as the model, with differences as follows:
-    * `gpt4free`: Achieves free access to openai through reverse engineering, although it's less stable
-    * `openai`: Stable access to the `openai` model by providing an API key
+- Version 3:
+  - Retains all the features of Version 2.
+  - Introduces the **`gpt4free` model**, enabling completely free usage.
+  - Users can choose between `gpt4free` and `openai` models:
+    - `gpt4free`: Allows free access to OpenAI models through reverse engineering, but stability might be compromised.
+    - `openai`: Offers stable access by using an API key.
 
 <p align="center">
 <img src="img/2023-08-29-13-39-00.png" width="70%">
 </p>
 
-
 ---
 
-### Introduction
+### 📚Introduction
 
 * Project Purpose:
-    * Build a powerful "LLM" model using langchain and streamlit, **enabling your LLM model to do what ChatGPT can't**:
-      * **Connect with external data** by using PDF documents as an example, allowing the LLM model to understand the uploaded files through RetrievalQA techniques.
-      * Integrate LLM with other tools to achieve **internet connectivity**. For instance, using Serp API as an example, leverage the Langchain framework to enable querying the model for **current issues** (i.e., **Google search engine**).
-      * Integrate LLM with the **LLM Math model**, enabling accurate **mathematical calculations**.
+    * The purpose of this project is to create a powerful "LLM" model using LangChain and Streamlit. This model aims to **surpass the capabilities of ChatGPT** by enabling:
+      * **Connect with external data**, such as PDF documents, through RetrievalQA techniques for the model to understand uploaded files.
+      * Integrate LLM with other tools to achieve **internet connectivity**. exemplified by using **Serp API** for querying modern topics similar to **Google search**.
+      * Integration with **LLM Math** model for accurate mathematical computations.
 
 * This project consists of three main components:
-    * [`DataConnection`](../model/data_connection.py): Allows LLM to communicate with external data, i.e., read PDF files and perform text segmentation for large PDFs to avoid exceeding OPENAI's 4000-token limit.
-    * [`docGPT`](../docGPT/): This component enables the model to understand the content of PDFs. It includes embedding PDF text and building a retrievalQA model using Langchain. For more details, please refer to the [documentation](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa).
-    * [`agent`](../agent/agent.py): Responsible for managing the tools used by the model and automatically determining which tool to use based on the user's question. The tools include:
-        * `SerpAI`: Used for "**current questions**" by performing a **Google search**.
-        * `llm_math_chain`: Used for "**mathematical calculations**" by performing mathematical computations.
-        * `docGPT`: Used for answering questions about the content of PDF documents. (This tool is built using retrievalQA)
-
+    * [`DataConnection`](../model/data_connection.py): Facilitates communication between LLM and external data, like reading PDF files. It also includes splitting large PDFs to avoid OpenAI's 4096 token limitation.
+    * [`docGPT`](../docGPT/):  The core element that helps the model understand PDF content. It involves embedding PDF text vectors and creating LangChain's retrievalQA model. For more details, refer to the [documentation](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa).
+    * [`agent`](../agent/agent.py): Manages tools used by the model and automatically decides which tool to use based on user queries. Tools include:
+        * `SerpAI`: Used for modern topics, enabling Google search functionality.
+        * `llm_math_chain`: Used for mathematical calculations.
+        * `docGPT`: Used for answering queries related to PDF document content, designed using retrievalQA.
 
 * `docGPT` is developed based on **Langchain** and **Streamlit**.
 
 ---
 
-### What's LangChain?
+### 🦜️What's LangChain?
 
 * LangChain is a framework for developing applications powered by language models. It supports the following applications:
     1. Connecting LLM models with external data sources.
-    2. Enabling interactions with LLM models.
+    2. Interactive communication with LLM models.
 
-* For an introduction to LangChain, it is recommended to refer to the official documentation or the GitHub [repository](https://github.com/hwchase17/langchain).
+* For more details about LangChain, refer to the [official documentation](https://github.com/hwchase17/langchain).
 
-**Questions that ChatGPT cannot answer can be handled by Langchain!**
+**For questions that ChatGPT can't answer, turn to LangChain!**
 
-Here, the author briefly introduces the differences between Langchain and ChatGPT. You will be amazed by this open-source project called Langchain through the following example!
+LangChain fills in the gaps left by ChatGPT. Through the following example, you can understand the power of LangChain:
 
-> Imagine a scenario where ChatGPT cannot answer mathematical questions or questions about events beyond 2020 (e.g., "Who will be the president in 2023?").
+> In cases where ChatGPT can't solve mathematical problems or answer questions about events after 2020 (e.g., "Who is the president in 2023?"):
 >
-> * For mathematical questions: In addition to the OpenAI model, there is a specialized tool called math-llm that handles mathematical questions.
-> * For current questions: We can use Google search.
+> * For mathematical problems: There's a math-LLM model dedicated to handling math queries.
+> * For modern topics: You can use Google search.
 >
-> Therefore, to design a powerful and versatile AI model, we need to include three tools: "chatgpt", "math-llm", and "Google search".
+> To create a comprehensive AI model, we need to combine "ChatGPT," "math-LLM," and "Google search" tools.
 >
-> If the user's question involves mathematical calculations, we use the math-llm tool to handle and answer it.
+> In the non-AI era, we used `if...else...` to categorize user queries and had users select the question type through UI.
 >
-> In the non-AI era, we would use `if...else...` to decide which tool to use based on the user's question. However, Langchain provides a more flexible and powerful way to handle this.
-> In the AI era, we want users to directly ask their questions without having to pre-select the question type! In Langchain, there is a concept called "agent" that allows us to:
+> In the AI era, users should be able to directly ask questions without preselecting the question type. With LangChain's agent:
+>  * We provide tools to the agent, e.g., `tools = ['chatgpt', 'math-llm', 'google-search']`.
+>  * Tools can include chains designed using LangChain, such as using a retrievalQA chain to answer questions from documents.
+>  * **The agent automatically decides which tool to use based on user queries** (fully automated).
 
-* Provide tools for the agent to manage, such as `tools = ['chatgpt', 'math-llm', 'google-search']`.
-* Include chains designed using Langchain, such as using the `retrievalQA chain` to create a question-answering model based on document content, and append this chain to the tools managed by the agent.
-* **Allow the agent to determine which tool to use based on the user's question** (fully automated and AI-driven).
+Through LangChain, you can create a universal AI model or tailor it for business applications.
 
-With Langchain, we can create our own ChatGPT model that can be general-purpose or tailored for specific industries and commercial use!
 
 ---
 
-### How to Use docGPT?
+### 🚩How to Use docGPT?
+
+1. 🎬Visit the [application](https://docgpt-app.streamlit.app/).
 
-* Visit the [application](https://docgpt-app.streamlit.app/).
+2. 🔑Enter your `API_KEY` (optional in Version 3, as you can use the `gpt4free` free model):
+   - `OpenAI API KEY`: Ensure you have available usage.
+   - `SERPAPI API KEY`: Required if you want to query content not present in the PDF.
 
-* Enter your API keys: (This step is optional in version V3, you can choose to skip it and use the `gpt4free` free model)
-    * `OpenAI API Key`: Make sure you still have usage left
-    * `SERPAPI API Key`: Optional. If you want to ask questions about content not appearing in the PDF document, you need this key.
+3. 📁Upload a PDF file from local storage.
 
-* Upload a PDF file from your local machine.
-* Start asking questions!
+4. 🚀Start asking questions!
 
 ![docGPT](https://github.com/Lin-jun-xiang/docGPT-streamlit/blob/main/img/docGPT.gif?raw=true)
 
 ---
 
-### How to Develop a docGPT with Streamlit?
+### 🧠How to Develop a docGPT with Streamlit?
 
 A step-by-step tutorial to quickly build your own chatGPT!
 
 First, clone the repository using `git clone https://github.com/Lin-jun-xiang/docGPT-streamlit.git`.
 
 There are two methods:
 
-* Local development:
+* **Local development**:
     * `pip install -r requirements.txt`: Download the required packages for development.
     * `streamlit run ./app.py`: Start the service in the project's root directory.
     * Start exploring!
 
-* Use Streamlit Community Cloud for free deployment, management, and sharing of applications:
-    * Put your application in a public GitHub repository (make sure it has a `requirements.txt`!).
-    * Log in to [share.streamlit.io](https://share.streamlit.io/).
-    * Click "Deploy an App" and paste your GitHub URL.
-    * Complete the deployment of your [application](https://docgpt-app.streamlit.app/).
+* Use Streamlit Community **Cloud for free** deployment, management, and sharing of applications:
+   - Place your application in a public GitHub repository (ensure you have `requirements.txt`).
+   - Log in to [share.streamlit.io](https://share.streamlit.io/).
+   - Click "Deploy an App," then paste your GitHub URL.
+   - Complete deployment and share your [application](https://docgpt-app.streamlit.app//).
 
 ---
 
-### Advanced - How to build a better model in langchain
+### 💬Advanced - How to build a better model in langchain
 
-Using Langchain to build docGPT, you can pay attention to the following details that can make your model more powerful:
+To build a powerful docGPT model in LangChain, consider these tips to enhance performance:
 
 1. **Language Model**
 
-    Choosing the right LLM Model can save you time and effort. For example, you can choose OpenAI's `gpt-3.5-turbo` (default is `text-davinci-003`):
+    Select an appropriate LLM model, such as OpenAI's `gpt-3.5-turbo` or other models. Experiment with different models to find the best fit for your use case.
 
     ```python
     # ./docGPT/docGPT.py
@@ -155,7 +153,7 @@ Using Langchain to build docGPT, you can pay attention to the following details
 
 2. **PDF Loader**
 
-    There are various PDF text loaders available in Python, each with its own advantages and disadvantages. Here are three loaders the authors have used:
+    Choose a suitable PDF loader. Consider using `PyMuPDF` for fast text extraction and `PDFPlumber` for extracting text from tables.
 
     ([official Langchain documentation](https://python.langchain.com/docs/modules/data_connection/document_loaders/how_to/pdf))
 
@@ -169,7 +167,7 @@ Using Langchain to build docGPT, you can pay attention to the following details
 
 3. **Tracking Token Usage**
 
-    This doesn't make the model more powerful, but it allows you to track the token usage and OpenAI API key consumption during the QA Chain process.
+    Implement token usage tracking with callbacks in LangChain to monitor token and API key usage during the QA chain process.
 
     When using `chain.run`, you can try using the [method](https://python.langchain.com/docs/modules/model_io/models/llms/how_to/token_usage_tracking) provided by Langchain to track token usage here: