Skip to content

Commit 63c0dd4

Browse files
committed
Fix: xxx_tool is not defined
2 parents 8162d1a + a275726 commit 63c0dd4

File tree

4 files changed

+152
-7
lines changed

4 files changed

+152
-7
lines changed

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2023 JunXiang
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
- [What's LangChain?](#whats-langchain)
88
- [How to Use docGPT?](#how-to-use-docgpt)
99
- [How to Develop a docGPT with Streamlit?](#how-to-develop-a-docgpt-with-streamlit)
10-
10+
- [Advanced - How to build a better model in langchain](#advanced---how-to-build-a-better-model-in-langchain)
1111

1212
* Main Development Software and Packages:
1313
* `Python 3.8.6`
@@ -107,5 +107,67 @@ There are two methods:
107107
* Click "Deploy an App" and paste your GitHub URL.
108108
* Complete the deployment of your [application](https://docgpt-app.streamlit.app/).
109109

110+
---
111+
112+
### Advanced - How to build a better model in langchain
113+
114+
Using Langchain to build docGPT, you can pay attention to the following details that can make your model more powerful:
115+
116+
1. **Language Model**
117+
118+
Choosing the right LLM Model can save you time and effort. For example, you can choose OpenAI's `gpt-3.5-turbo` (default is `text-davinci-003`):
119+
120+
```python
121+
# ./docGPT/docGPT.py
122+
llm = ChatOpenAI(
123+
temperature=0.2,
124+
max_tokens=2000,
125+
model_name='gpt-3.5-turbo'
126+
)
127+
```
128+
129+
Please note that there is no best or worst model. You need to try multiple models to find the one that suits your use case the best. For more OpenAI models, please refer to the [documentation](https://platform.openai.com/docs/models).
130+
131+
(Some models support up to 16,000 tokens!)
132+
133+
2. **PDF Loader**
134+
135+
There are various PDF text loaders available in Python, each with its own advantages and disadvantages. Here are three loaders the authors have used:
136+
137+
([official Langchain documentation](https://python.langchain.com/docs/modules/data_connection/document_loaders/how_to/pdf))
138+
139+
* `PyPDF`: Simple and easy to use.
140+
* `PyMuPDF`: Reads the document very **quickly** and provides additional metadata such as page numbers and document dates.
141+
* `PDFPlumber`: Can **extract text within tables**. Similar to PyMuPDF, it provides metadata but takes longer to parse.
142+
143+
If your document contains multiple tables and important information is within those tables, it is recommended to try `PDFPlumber`, which may give you unexpected results!
144+
145+
Please do not overlook this detail, as without correctly parsing the text from the document, even the most powerful LLM model would be useless!
146+
147+
3. **Tracking Token Usage**
148+
149+
This doesn't make the model more powerful, but it allows you to track the token usage and OpenAI API key consumption during the QA Chain process.
150+
151+
When using `chain.run`, you can try using the [method](https://python.langchain.com/docs/modules/model_io/models/llms/how_to/token_usage_tracking) provided by Langchain to track token usage here:
152+
153+
```python
154+
from langchain.callbacks import get_openai_callback
155+
156+
with get_openai_callback() as callback:
157+
response = self.qa_chain.run(query)
158+
159+
print(callback)
160+
161+
# Result of print
162+
"""
163+
chain...
164+
...
165+
> Finished chain.
166+
Total Tokens: 1506
167+
Prompt Tokens: 1350
168+
Completion Tokens: 156
169+
Total Cost (USD): $0.03012
170+
```
171+
110172
<a href="#top">Back to top</a>
111173

README.zh-TW.md

Lines changed: 66 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
- [What's LangChain?](#whats-langchain)
88
- [How to Use docGPT?](#how-to-use-docgpt)
99
- [How to develope a docGPT with streamlit?](#how-to-develope-a-docgpt-with-streamlit)
10-
10+
- [Advanced - How to build a better model in langchain](#advanced---how-to-build-a-better-model-in-langchain)
1111

1212
* 主要開發軟體與套件:
1313
* `Python 3.8.6`
@@ -26,7 +26,7 @@
2626
* 整合 LLM 與其他工具,達到**連網功能**,本專案以 Serp API 為例子,透過 Langchain 框架,使您能夠詢問模型有關**現今問題** (即 **google 搜尋引擎**)
2727
* 整合 LLM 與 **LLM Math 模型**,使您能夠讓模型準確做到**數學計算**
2828
* 本專案的設計架構主要有三個元素:
29-
* [`DataConnection`](../model/data_connection.py): 讓 LLM 負責與外部數據溝通,也就是讀取 PDF 檔案,並針對大型 PDF 進行文本切割,避免超出 OPENAI 4000 tokens 的限制
29+
* [`DataConnection`](../model/data_connection.py): 讓 LLM 負責與外部數據溝通,也就是讀取 PDF 檔案,並針對大型 PDF 進行文本切割,避免超出 OPENAI 4096 tokens 的限制
3030
* [`docGPT`](../docGPT/): 該元素就是讓模型了解 PDF 內容的核心,包含將 PDF 文本進行向量嵌入、建立 langchain 的 retrievalQA 模型。詳細簡介請[參考](https://python.langchain.com/docs/modules/chains/popular/vector_db_qa)
3131
* [`agent`](../agent/agent.py): 負責管理模型所用到的工具、並根據使用者提問**自動判斷**使用何種工具處理,工具包含
3232
* `SerpAI`: 當使用者問題屬於 "**現今問題**",使用該工具可以進行 **google 搜索**
@@ -47,7 +47,7 @@
4747

4848
**ChatGPT 無法回答的問題,交給 Langchain 實現!**
4949

50-
在這邊,作者將簡單介紹 langchain 與 chatgpt 之間的差異,相信您理解以下例子,你會對 langchain 這個開源項目感到震驚!
50+
在這邊,作者將簡單介紹 langchain 與 chatgpt 之間的差異,相信您理解以下例子,您會對 langchain 這個開源項目感到震驚!
5151

5252
>今天可以想像 chatgpt 無法回答數學問題、超過 2020 年後的事情(例如2023年貴國總統是誰?)
5353
>
@@ -89,7 +89,7 @@
8989

9090
### How to develope a docGPT with streamlit?
9191

92-
手把手教學,讓你快速建立一個屬於自己的 chatGPT !
92+
手把手教學,讓您快速建立一個屬於自己的 chatGPT !
9393

9494
首先請進行 `git clone https://github.com/Lin-jun-xiang/docGPT-streamlit.git`
9595

@@ -106,4 +106,66 @@
106106
* 單擊“部署應用程序”,然後粘貼您的 GitHub URL
107107
* 完成部屬[應用程序](https://docgpt-app.streamlit.app//)
108108

109+
---
110+
111+
### Advanced - How to build a better model in langchain
112+
113+
使用 Langchain 搭建 docGPT,您可以注意以下幾個點,這些小細節能夠讓您的模型更強大:
114+
115+
1. **Language Model**
116+
117+
使用適當的 LLM Model,會讓您事半功倍,例如您可以選擇使用 OpenAI 的 `gpt-3.5-turbo` (預設是 `text-davinci-003`):
118+
119+
```python
120+
# ./docGPT/docGPT.py
121+
llm = ChatOpenAI(
122+
temperature=0.2,
123+
max_tokens=2000,
124+
model_name='gpt-3.5-turbo'
125+
)
126+
```
127+
128+
請注意,模型之間並沒有最好與最壞,您需要多試幾個模型,才會發現最適合自己案例的模型,更多 OpenAI model 請[參考](https://platform.openai.com/docs/models)
129+
130+
(部分模型可以使用 16,000 tokens!)
131+
132+
2. **PDF Loader**
133+
134+
在 Python 中有許多解析 PDF 文字的 Loader,每個 Loader 各有優缺點,以下整理三個作者用過的
135+
136+
([Langchain官方介紹](https://python.langchain.com/docs/modules/data_connection/document_loaders/how_to/pdf)):
137+
138+
* `PyPDF`: 簡單易用
139+
* `PyMuPDF`: 讀取文件**速度非常快速**,除了能解析文字,還能取得頁數、文檔日期...等 MetaData。
140+
* `PDFPlumber`: 能夠解析出**表格內部文字**,使用方面與 `PyMuPDF` 相似,皆能取得 MetaData,但是解析時間較長。
141+
142+
如果您的文件具有多個表格,且重要資訊存在表格中,建議您嘗試 `PDFPlumber`,它會給您意想不到的結果!
143+
請不要忽略這個細節,因為沒有正確解析出文件中的文字,即使 LLM 模型再強大也無用!
144+
145+
3. **Tracking Token Usage**
146+
147+
這個並不能讓模型強大,但是能讓您清楚知道 QA Chain 的過程中,您使用的 tokens、openai api key 的使用量。
148+
149+
當您使用 `chain.run` 時,可以嘗試用 langchain 提供的 [方法](https://python.langchain.com/docs/modules/model_io/models/llms/how_to/token_usage_tracking):
150+
151+
```python
152+
from langchain.callbacks import get_openai_callback
153+
154+
with get_openai_callback() as callback:
155+
response = self.qa_chain.run(query)
156+
157+
print(callback)
158+
159+
# Result of print
160+
"""
161+
chain...
162+
...
163+
> Finished chain.
164+
Total Tokens: 1506
165+
Prompt Tokens: 1350
166+
Completion Tokens: 156
167+
Total Cost (USD): $0.03012
168+
"""
169+
```
170+
109171
<a href="#top">Back to top</a>

app.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ def load_api_key() -> None:
8888
if temp_file_path:
8989
os.remove(temp_file_path)
9090

91-
docGPT_tool, calculate_tool, search_tool = None, None, None
91+
docGPT_tool, calculate_tool, search_tool, llm_tool = None, None, None, None
9292

9393
try:
9494
agent_ = AgentHelper()
@@ -117,7 +117,7 @@ def load_api_key() -> None:
117117
]
118118
agent_.initialize(tools)
119119
except Exception as e:
120-
st.write(e)
120+
pass
121121

122122

123123
if not st.session_state['openai_api_key']:

0 commit comments

Comments
 (0)