You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+59-61Lines changed: 59 additions & 61 deletions
Original file line number
Diff line number
Diff line change
@@ -18,127 +18,125 @@
18
18
19
19
If you like this project, please give it a ⭐`Star` to support the developers~
20
20
21
-
### What's new in version3?
21
+
### ✨What's new in version3?
22
22
23
-
*We introduced the usage of `gpt4free`, **allowing users to use the application for free without entering any API key or making payments.**
23
+
*Introduction of integration with `gpt4free`, **allowing users to use docGPT for free without needing to input API keys or make payments**.
24
24
25
-
* If you want to use the `gpt4free`free model, you need to select a`Provider` (default is `g4f.provider.ChatgptAi`). For more information about [`gpt4free`](https://github.com/xtekky/gpt4free), please refer to the source project.
25
+
- If you choose to use the `gpt4free` model, you only need to select the`Provider` (default is `g4f.provider.ChatgptAi`). For more details about `gpt4free`, refer to the [source project](https://github.com/xtekky/gpt4free).
26
26
27
-
* Version2
28
-
* Utilizes the **`openai` model**
29
-
* To use this tool, you need to have at least the `openai_api_key`. You can obtain the key by visiting the [link](https://platform.openai.com/)
30
-
* If you have a `serpapi_key`, the AI model can answer questions and implement Google search functionality
27
+
- Version 2:
28
+
- Uses the **`openai` model**.
29
+
- Requires an `openai_api_key`. You can obtain this key from the [link](https://platform.openai.com/).
30
+
- If you have a `serpapi_key`, AI responses can include Google search results.
31
31
32
-
* Version3
33
-
* Retains all the features of Version2
34
-
* Adds the **`gpt4free` model**, enabling users to use it **completely for free**
35
-
* Users can choose between `gpt4free`or`openai`as the model, with differences as follows:
36
-
*`gpt4free`: Achieves free access to openai through reverse engineering, although it's less stable
37
-
*`openai`: Stable access to the `openai` model by providing an API key
32
+
- Version 3:
33
+
- Retains all the features of Version 2.
34
+
- Introduces the **`gpt4free` model**, enabling completely free usage.
35
+
- Users can choose between `gpt4free`and`openai`models:
36
+
-`gpt4free`: Allows free access to OpenAI models through reverse engineering, but stability might be compromised.
37
+
-`openai`: Offers stable access by using an API key.
38
38
39
39
<palign="center">
40
40
<imgsrc="img/2023-08-29-13-39-00.png"width="70%">
41
41
</p>
42
42
43
-
44
43
---
45
44
46
-
### Introduction
45
+
### 📚Introduction
47
46
48
47
* Project Purpose:
49
-
*Build a powerful "LLM" model using langchain and streamlit, **enabling your LLM model to do what ChatGPT can't**:
50
-
***Connect with external data** by using PDF documents as an example, allowing the LLM model to understand the uploaded files through RetrievalQA techniques.
51
-
* Integrate LLM with other tools to achieve **internet connectivity**. For instance, using Serp API as an example, leverage the Langchain framework to enable querying the model for **current issues** (i.e., **Google search engine**).
52
-
*Integrate LLM with the **LLM Math model**, enabling accurate **mathematical calculations**.
48
+
*The purpose of this project is to create a powerful "LLM" model using LangChain and Streamlit. This model aims to **surpass the capabilities of ChatGPT** by enabling:
49
+
***Connect with external data**, such as PDF documents, through RetrievalQA techniques for the model to understand uploaded files.
50
+
* Integrate LLM with other tools to achieve **internet connectivity**. exemplified by using **Serp API** for querying modern topics similar to **Google search**.
51
+
*Integration with **LLM Math** model for accurate mathematical computations.
53
52
54
53
* This project consists of three main components:
55
-
*[`DataConnection`](../model/data_connection.py): Allows LLM to communicate with external data, i.e., read PDF files and perform text segmentation for large PDFs to avoid exceeding OPENAI's 4000-token limit.
56
-
*[`docGPT`](../docGPT/): This component enables the model to understand the content of PDFs. It includes embedding PDF text and building a retrievalQA model using Langchain. For more details, please refer to the [documentation](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa).
57
-
*[`agent`](../agent/agent.py): Responsible for managing the tools used by the model and automatically determining which tool to use based on the user's question. The tools include:
58
-
*`SerpAI`: Used for "**current questions**" by performing a **Google search**.
59
-
*`llm_math_chain`: Used for "**mathematical calculations**" by performing mathematical computations.
60
-
*`docGPT`: Used for answering questions about the content of PDF documents. (This tool is built using retrievalQA)
61
-
54
+
*[`DataConnection`](../model/data_connection.py): Facilitates communication between LLM and external data, like reading PDF files. It also includes splitting large PDFs to avoid OpenAI's 4096 token limitation.
55
+
*[`docGPT`](../docGPT/): The core element that helps the model understand PDF content. It involves embedding PDF text vectors and creating LangChain's retrievalQA model. For more details, refer to the [documentation](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa).
56
+
*[`agent`](../agent/agent.py): Manages tools used by the model and automatically decides which tool to use based on user queries. Tools include:
57
+
*`SerpAI`: Used for modern topics, enabling Google search functionality.
58
+
*`llm_math_chain`: Used for mathematical calculations.
59
+
*`docGPT`: Used for answering queries related to PDF document content, designed using retrievalQA.
62
60
63
61
*`docGPT` is developed based on **Langchain** and **Streamlit**.
64
62
65
63
---
66
64
67
-
### What's LangChain?
65
+
### 🦜️What's LangChain?
68
66
69
67
* LangChain is a framework for developing applications powered by language models. It supports the following applications:
70
68
1. Connecting LLM models with external data sources.
71
-
2.Enabling interactions with LLM models.
69
+
2.Interactive communication with LLM models.
72
70
73
-
* For an introduction to LangChain, it is recommended to refer to the official documentation or the GitHub [repository](https://github.com/hwchase17/langchain).
71
+
* For more details about LangChain, refer to the [official documentation](https://github.com/hwchase17/langchain).
74
72
75
-
**Questions that ChatGPT cannot answer can be handled by Langchain!**
73
+
**For questions that ChatGPT can't answer, turn to LangChain!**
76
74
77
-
Here, the author briefly introduces the differences between Langchain and ChatGPT. You will be amazed by this open-source project called Langchain through the following example!
75
+
LangChain fills in the gaps left by ChatGPT. Through the following example, you can understand the power of LangChain:
78
76
79
-
> Imagine a scenario where ChatGPT cannot answer mathematical questions or questions about events beyond 2020 (e.g., "Who will be the president in 2023?").
77
+
> In cases where ChatGPT can't solve mathematical problems or answer questions about events after 2020 (e.g., "Who is the president in 2023?"):
80
78
>
81
-
> * For mathematical questions: In addition to the OpenAI model, there is a specialized tool called math-llm that handles mathematical questions.
82
-
> * For current questions: We can use Google search.
79
+
> * For mathematical problems: There's a math-LLM model dedicated to handling math queries.
80
+
> * For modern topics: You can use Google search.
83
81
>
84
-
> Therefore, to design a powerful and versatile AI model, we need to include three tools: "chatgpt", "math-llm", and "Google search".
82
+
> To create a comprehensive AI model, we need to combine "ChatGPT," "math-LLM," and "Google search" tools.
85
83
>
86
-
> If the user's question involves mathematical calculations, we use the math-llm tool to handle and answer it.
84
+
> In the non-AI era, we used `if...else...` to categorize user queries and had users select the question type through UI.
87
85
>
88
-
> In the non-AI era, we would use `if...else...` to decide which tool to use based on the user's question. However, Langchain provides a more flexible and powerful way to handle this.
89
-
> In the AI era, we want users to directly ask their questions without having to pre-select the question type! In Langchain, there is a concept called "agent" that allows us to:
86
+
> In the AI era, users should be able to directly ask questions without preselecting the question type. With LangChain's agent:
87
+
> * We provide tools to the agent, e.g., `tools = ['chatgpt', 'math-llm', 'google-search']`.
88
+
> * Tools can include chains designed using LangChain, such as using a retrievalQA chain to answer questions from documents.
89
+
> ***The agent automatically decides which tool to use based on user queries** (fully automated).
90
90
91
-
* Provide tools for the agent to manage, such as `tools = ['chatgpt', 'math-llm', 'google-search']`.
92
-
* Include chains designed using Langchain, such as using the `retrievalQA chain` to create a question-answering model based on document content, and append this chain to the tools managed by the agent.
93
-
***Allow the agent to determine which tool to use based on the user's question** (fully automated and AI-driven).
91
+
Through LangChain, you can create a universal AI model or tailor it for business applications.
94
92
95
-
With Langchain, we can create our own ChatGPT model that can be general-purpose or tailored for specific industries and commercial use!
96
93
97
94
---
98
95
99
-
### How to Use docGPT?
96
+
### 🚩How to Use docGPT?
97
+
98
+
1. 🎬Visit the [application](https://docgpt-app.streamlit.app/).
100
99
101
-
* Visit the [application](https://docgpt-app.streamlit.app/).
100
+
2. 🔑Enter your `API_KEY` (optional in Version 3, as you can use the `gpt4free` free model):
101
+
-`OpenAI API KEY`: Ensure you have available usage.
102
+
-`SERPAPI API KEY`: Required if you want to query content not present in the PDF.
102
103
103
-
* Enter your API keys: (This step is optional in version V3, you can choose to skip it and use the `gpt4free` free model)
104
-
*`OpenAI API Key`: Make sure you still have usage left
105
-
*`SERPAPI API Key`: Optional. If you want to ask questions about content not appearing in the PDF document, you need this key.
A step-by-step tutorial to quickly build your own chatGPT!
117
115
118
116
First, clone the repository using `git clone https://github.com/Lin-jun-xiang/docGPT-streamlit.git`.
119
117
120
118
There are two methods:
121
119
122
-
* Local development:
120
+
***Local development**:
123
121
*`pip install -r requirements.txt`: Download the required packages for development.
124
122
*`streamlit run ./app.py`: Start the service in the project's root directory.
125
123
* Start exploring!
126
124
127
-
* Use Streamlit Community Cloud for free deployment, management, and sharing of applications:
128
-
* Put your application in a public GitHub repository (make sure it has a `requirements.txt`!).
129
-
* Log in to [share.streamlit.io](https://share.streamlit.io/).
130
-
*Click "Deploy an App" and paste your GitHub URL.
131
-
*Complete the deployment of your [application](https://docgpt-app.streamlit.app/).
125
+
* Use Streamlit Community **Cloud for free** deployment, management, and sharing of applications:
126
+
- Place your application in a public GitHub repository (ensure you have `requirements.txt`).
127
+
- Log in to [share.streamlit.io](https://share.streamlit.io/).
128
+
-Click "Deploy an App," then paste your GitHub URL.
129
+
-Complete deployment and share your [application](https://docgpt-app.streamlit.app//).
132
130
133
131
---
134
132
135
-
### Advanced - How to build a better model in langchain
133
+
### 💬Advanced - How to build a better model in langchain
136
134
137
-
Using Langchain to build docGPT, you can pay attention to the following details that can make your model more powerful:
135
+
To build a powerful docGPT model in LangChain, consider these tips to enhance performance:
138
136
139
137
1.**Language Model**
140
138
141
-
Choosing the right LLM Model can save you time and effort. For example, you can choose OpenAI's `gpt-3.5-turbo`(default is `text-davinci-003`):
139
+
Select an appropriate LLM model, such as OpenAI's `gpt-3.5-turbo`or other models. Experiment with different models to find the best fit for your use case.
142
140
143
141
```python
144
142
# ./docGPT/docGPT.py
@@ -155,7 +153,7 @@ Using Langchain to build docGPT, you can pay attention to the following details
155
153
156
154
2. **PDF Loader**
157
155
158
-
There are variousPDFtext loaders available in Python, each with its own advantages anddisadvantages. Here are three loaders the authors have used:
156
+
Choose a suitablePDFloader. Consider using `PyMuPDF`for fast text extraction and`PDFPlumber`for extracting text from tables.
@@ -169,7 +167,7 @@ Using Langchain to build docGPT, you can pay attention to the following details
169
167
170
168
3. **Tracking Token Usage**
171
169
172
-
This doesn't make the model more powerful, but it allows you to track the token usage and OpenAI API key consumption during the QA Chain process.
170
+
Implement token usage tracking with callbacks in LangChain to monitor token andAPI key usage during the QAchain process.
173
171
174
172
When using `chain.run`, you can try using the [method](https://python.langchain.com/docs/modules/model_io/models/llms/how_to/token_usage_tracking) provided by Langchain to track token usage here:
0 commit comments