Skip to content

Commit 3cf023e

Browse files
committed
Refactor README
1 parent d29b967 commit 3cf023e

File tree

2 files changed

+109
-113
lines changed

2 files changed

+109
-113
lines changed

README.md

Lines changed: 59 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -18,127 +18,125 @@
1818

1919
If you like this project, please give it a ⭐`Star` to support the developers~
2020

21-
### What's new in version3?
21+
### What's new in version3?
2222

23-
* We introduced the usage of `gpt4free`, **allowing users to use the application for free without entering any API key or making payments.**
23+
* Introduction of integration with `gpt4free`, **allowing users to use docGPT for free without needing to input API keys or make payments**.
2424

25-
* If you want to use the `gpt4free` free model, you need to select a `Provider` (default is `g4f.provider.ChatgptAi`). For more information about [`gpt4free`](https://github.com/xtekky/gpt4free), please refer to the source project.
25+
- If you choose to use the `gpt4free` model, you only need to select the `Provider` (default is `g4f.provider.ChatgptAi`). For more details about `gpt4free`, refer to the [source project](https://github.com/xtekky/gpt4free).
2626

27-
* Version2
28-
* Utilizes the **`openai` model**
29-
* To use this tool, you need to have at least the `openai_api_key`. You can obtain the key by visiting the [link](https://platform.openai.com/)
30-
* If you have a `serpapi_key`, the AI model can answer questions and implement Google search functionality
27+
- Version 2:
28+
- Uses the **`openai` model**.
29+
- Requires an `openai_api_key`. You can obtain this key from the [link](https://platform.openai.com/).
30+
- If you have a `serpapi_key`, AI responses can include Google search results.
3131

32-
* Version3
33-
* Retains all the features of Version2
34-
* Adds the **`gpt4free` model**, enabling users to use it **completely for free**
35-
* Users can choose between `gpt4free` or `openai` as the model, with differences as follows:
36-
* `gpt4free`: Achieves free access to openai through reverse engineering, although it's less stable
37-
* `openai`: Stable access to the `openai` model by providing an API key
32+
- Version 3:
33+
- Retains all the features of Version 2.
34+
- Introduces the **`gpt4free` model**, enabling completely free usage.
35+
- Users can choose between `gpt4free` and `openai` models:
36+
- `gpt4free`: Allows free access to OpenAI models through reverse engineering, but stability might be compromised.
37+
- `openai`: Offers stable access by using an API key.
3838

3939
<p align="center">
4040
<img src="img/2023-08-29-13-39-00.png" width="70%">
4141
</p>
4242

43-
4443
---
4544

46-
### Introduction
45+
### 📚Introduction
4746

4847
* Project Purpose:
49-
* Build a powerful "LLM" model using langchain and streamlit, **enabling your LLM model to do what ChatGPT can't**:
50-
* **Connect with external data** by using PDF documents as an example, allowing the LLM model to understand the uploaded files through RetrievalQA techniques.
51-
* Integrate LLM with other tools to achieve **internet connectivity**. For instance, using Serp API as an example, leverage the Langchain framework to enable querying the model for **current issues** (i.e., **Google search engine**).
52-
* Integrate LLM with the **LLM Math model**, enabling accurate **mathematical calculations**.
48+
* The purpose of this project is to create a powerful "LLM" model using LangChain and Streamlit. This model aims to **surpass the capabilities of ChatGPT** by enabling:
49+
* **Connect with external data**, such as PDF documents, through RetrievalQA techniques for the model to understand uploaded files.
50+
* Integrate LLM with other tools to achieve **internet connectivity**. exemplified by using **Serp API** for querying modern topics similar to **Google search**.
51+
* Integration with **LLM Math** model for accurate mathematical computations.
5352

5453
* This project consists of three main components:
55-
* [`DataConnection`](../model/data_connection.py): Allows LLM to communicate with external data, i.e., read PDF files and perform text segmentation for large PDFs to avoid exceeding OPENAI's 4000-token limit.
56-
* [`docGPT`](../docGPT/): This component enables the model to understand the content of PDFs. It includes embedding PDF text and building a retrievalQA model using Langchain. For more details, please refer to the [documentation](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa).
57-
* [`agent`](../agent/agent.py): Responsible for managing the tools used by the model and automatically determining which tool to use based on the user's question. The tools include:
58-
* `SerpAI`: Used for "**current questions**" by performing a **Google search**.
59-
* `llm_math_chain`: Used for "**mathematical calculations**" by performing mathematical computations.
60-
* `docGPT`: Used for answering questions about the content of PDF documents. (This tool is built using retrievalQA)
61-
54+
* [`DataConnection`](../model/data_connection.py): Facilitates communication between LLM and external data, like reading PDF files. It also includes splitting large PDFs to avoid OpenAI's 4096 token limitation.
55+
* [`docGPT`](../docGPT/): The core element that helps the model understand PDF content. It involves embedding PDF text vectors and creating LangChain's retrievalQA model. For more details, refer to the [documentation](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa).
56+
* [`agent`](../agent/agent.py): Manages tools used by the model and automatically decides which tool to use based on user queries. Tools include:
57+
* `SerpAI`: Used for modern topics, enabling Google search functionality.
58+
* `llm_math_chain`: Used for mathematical calculations.
59+
* `docGPT`: Used for answering queries related to PDF document content, designed using retrievalQA.
6260

6361
* `docGPT` is developed based on **Langchain** and **Streamlit**.
6462

6563
---
6664

67-
### What's LangChain?
65+
### 🦜️What's LangChain?
6866

6967
* LangChain is a framework for developing applications powered by language models. It supports the following applications:
7068
1. Connecting LLM models with external data sources.
71-
2. Enabling interactions with LLM models.
69+
2. Interactive communication with LLM models.
7270

73-
* For an introduction to LangChain, it is recommended to refer to the official documentation or the GitHub [repository](https://github.com/hwchase17/langchain).
71+
* For more details about LangChain, refer to the [official documentation](https://github.com/hwchase17/langchain).
7472

75-
**Questions that ChatGPT cannot answer can be handled by Langchain!**
73+
**For questions that ChatGPT can't answer, turn to LangChain!**
7674

77-
Here, the author briefly introduces the differences between Langchain and ChatGPT. You will be amazed by this open-source project called Langchain through the following example!
75+
LangChain fills in the gaps left by ChatGPT. Through the following example, you can understand the power of LangChain:
7876

79-
> Imagine a scenario where ChatGPT cannot answer mathematical questions or questions about events beyond 2020 (e.g., "Who will be the president in 2023?").
77+
> In cases where ChatGPT can't solve mathematical problems or answer questions about events after 2020 (e.g., "Who is the president in 2023?"):
8078
>
81-
> * For mathematical questions: In addition to the OpenAI model, there is a specialized tool called math-llm that handles mathematical questions.
82-
> * For current questions: We can use Google search.
79+
> * For mathematical problems: There's a math-LLM model dedicated to handling math queries.
80+
> * For modern topics: You can use Google search.
8381
>
84-
> Therefore, to design a powerful and versatile AI model, we need to include three tools: "chatgpt", "math-llm", and "Google search".
82+
> To create a comprehensive AI model, we need to combine "ChatGPT," "math-LLM," and "Google search" tools.
8583
>
86-
> If the user's question involves mathematical calculations, we use the math-llm tool to handle and answer it.
84+
> In the non-AI era, we used `if...else...` to categorize user queries and had users select the question type through UI.
8785
>
88-
> In the non-AI era, we would use `if...else...` to decide which tool to use based on the user's question. However, Langchain provides a more flexible and powerful way to handle this.
89-
> In the AI era, we want users to directly ask their questions without having to pre-select the question type! In Langchain, there is a concept called "agent" that allows us to:
86+
> In the AI era, users should be able to directly ask questions without preselecting the question type. With LangChain's agent:
87+
> * We provide tools to the agent, e.g., `tools = ['chatgpt', 'math-llm', 'google-search']`.
88+
> * Tools can include chains designed using LangChain, such as using a retrievalQA chain to answer questions from documents.
89+
> * **The agent automatically decides which tool to use based on user queries** (fully automated).
9090
91-
* Provide tools for the agent to manage, such as `tools = ['chatgpt', 'math-llm', 'google-search']`.
92-
* Include chains designed using Langchain, such as using the `retrievalQA chain` to create a question-answering model based on document content, and append this chain to the tools managed by the agent.
93-
* **Allow the agent to determine which tool to use based on the user's question** (fully automated and AI-driven).
91+
Through LangChain, you can create a universal AI model or tailor it for business applications.
9492

95-
With Langchain, we can create our own ChatGPT model that can be general-purpose or tailored for specific industries and commercial use!
9693

9794
---
9895

99-
### How to Use docGPT?
96+
### 🚩How to Use docGPT?
97+
98+
1. 🎬Visit the [application](https://docgpt-app.streamlit.app/).
10099

101-
* Visit the [application](https://docgpt-app.streamlit.app/).
100+
2. 🔑Enter your `API_KEY` (optional in Version 3, as you can use the `gpt4free` free model):
101+
- `OpenAI API KEY`: Ensure you have available usage.
102+
- `SERPAPI API KEY`: Required if you want to query content not present in the PDF.
102103

103-
* Enter your API keys: (This step is optional in version V3, you can choose to skip it and use the `gpt4free` free model)
104-
* `OpenAI API Key`: Make sure you still have usage left
105-
* `SERPAPI API Key`: Optional. If you want to ask questions about content not appearing in the PDF document, you need this key.
104+
3. 📁Upload a PDF file from local storage.
106105

107-
* Upload a PDF file from your local machine.
108-
* Start asking questions!
106+
4. 🚀Start asking questions!
109107

110108
![docGPT](https://github.com/Lin-jun-xiang/docGPT-streamlit/blob/main/img/docGPT.gif?raw=true)
111109

112110
---
113111

114-
### How to Develop a docGPT with Streamlit?
112+
### 🧠How to Develop a docGPT with Streamlit?
115113

116114
A step-by-step tutorial to quickly build your own chatGPT!
117115

118116
First, clone the repository using `git clone https://github.com/Lin-jun-xiang/docGPT-streamlit.git`.
119117

120118
There are two methods:
121119

122-
* Local development:
120+
* **Local development**:
123121
* `pip install -r requirements.txt`: Download the required packages for development.
124122
* `streamlit run ./app.py`: Start the service in the project's root directory.
125123
* Start exploring!
126124

127-
* Use Streamlit Community Cloud for free deployment, management, and sharing of applications:
128-
* Put your application in a public GitHub repository (make sure it has a `requirements.txt`!).
129-
* Log in to [share.streamlit.io](https://share.streamlit.io/).
130-
* Click "Deploy an App" and paste your GitHub URL.
131-
* Complete the deployment of your [application](https://docgpt-app.streamlit.app/).
125+
* Use Streamlit Community **Cloud for free** deployment, management, and sharing of applications:
126+
- Place your application in a public GitHub repository (ensure you have `requirements.txt`).
127+
- Log in to [share.streamlit.io](https://share.streamlit.io/).
128+
- Click "Deploy an App," then paste your GitHub URL.
129+
- Complete deployment and share your [application](https://docgpt-app.streamlit.app//).
132130

133131
---
134132

135-
### Advanced - How to build a better model in langchain
133+
### 💬Advanced - How to build a better model in langchain
136134

137-
Using Langchain to build docGPT, you can pay attention to the following details that can make your model more powerful:
135+
To build a powerful docGPT model in LangChain, consider these tips to enhance performance:
138136

139137
1. **Language Model**
140138

141-
Choosing the right LLM Model can save you time and effort. For example, you can choose OpenAI's `gpt-3.5-turbo` (default is `text-davinci-003`):
139+
Select an appropriate LLM model, such as OpenAI's `gpt-3.5-turbo` or other models. Experiment with different models to find the best fit for your use case.
142140

143141
```python
144142
# ./docGPT/docGPT.py
@@ -155,7 +153,7 @@ Using Langchain to build docGPT, you can pay attention to the following details
155153

156154
2. **PDF Loader**
157155

158-
There are various PDF text loaders available in Python, each with its own advantages and disadvantages. Here are three loaders the authors have used:
156+
Choose a suitable PDF loader. Consider using `PyMuPDF` for fast text extraction and `PDFPlumber` for extracting text from tables.
159157

160158
([official Langchain documentation](https://python.langchain.com/docs/modules/data_connection/document_loaders/how_to/pdf))
161159

@@ -169,7 +167,7 @@ Using Langchain to build docGPT, you can pay attention to the following details
169167

170168
3. **Tracking Token Usage**
171169

172-
This doesn't make the model more powerful, but it allows you to track the token usage and OpenAI API key consumption during the QA Chain process.
170+
Implement token usage tracking with callbacks in LangChain to monitor token and API key usage during the QA chain process.
173171

174172
When using `chain.run`, you can try using the [method](https://python.langchain.com/docs/modules/model_io/models/llms/how_to/token_usage_tracking) provided by Langchain to track token usage here:
175173

0 commit comments

Comments
 (0)