Skip to content

Commit 9746aa2

Browse files
committed
init
0 parents  commit 9746aa2

File tree

13 files changed

+745
-0
lines changed

13 files changed

+745
-0
lines changed

.gitignore

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
.chroma/
2+
data/
3+
External_Data_Pipeline/
4+
PDF/Omren
5+
config.py
6+
main.py
7+
note.md
8+
9+
# Byte-compiled / optimized / DLL files
10+
__pycache__/
11+
*.py[cod]
12+
*$py.class
13+
14+
# C extensions
15+
*.so
16+
17+
# Distribution / packaging
18+
.Python
19+
build/
20+
develop-eggs/
21+
dist/
22+
downloads/
23+
eggs/
24+
.eggs/
25+
lib/
26+
lib64/
27+
parts/
28+
sdist/
29+
var/
30+
wheels/
31+
share/python-wheels/
32+
*.egg-info/
33+
.installed.cfg
34+
*.egg
35+
MANIFEST
36+
37+
# PyInstaller
38+
# Usually these files are written by a python script from a template
39+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
40+
*.manifest
41+
*.spec
42+
43+
# Installer logs
44+
pip-log.txt
45+
pip-delete-this-directory.txt
46+
47+
# Unit test / coverage reports
48+
htmlcov/
49+
.tox/
50+
.nox/
51+
.coverage
52+
.coverage.*
53+
.cache
54+
nosetests.xml
55+
coverage.xml
56+
*.cover
57+
*.py,cover
58+
.hypothesis/
59+
.pytest_cache/
60+
cover/
61+
62+
# Translations
63+
*.mo
64+
*.pot
65+
66+
# Django stuff:
67+
*.log
68+
local_settings.py
69+
db.sqlite3
70+
db.sqlite3-journal
71+
72+
# Flask stuff:
73+
instance/
74+
.webassets-cache
75+
76+
# Scrapy stuff:
77+
.scrapy
78+
79+
# Sphinx documentation
80+
docs/_build/
81+
82+
# PyBuilder
83+
.pybuilder/
84+
target/
85+
86+
# Jupyter Notebook
87+
.ipynb_checkpoints
88+
89+
# IPython
90+
profile_default/
91+
ipython_config.py
92+
93+
# pyenv
94+
# For a library or package, you might want to ignore these files since the code is
95+
# intended to run in multiple environments; otherwise, check them in:
96+
# .python-version
97+
98+
# pipenv
99+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
100+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
101+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
102+
# install all needed dependencies.
103+
#Pipfile.lock
104+
105+
# poetry
106+
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
107+
# This is especially recommended for binary packages to ensure reproducibility, and is more
108+
# commonly ignored for libraries.
109+
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
110+
#poetry.lock
111+
112+
# pdm
113+
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
114+
#pdm.lock
115+
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
116+
# in version control.
117+
# https://pdm.fming.dev/#use-with-ide
118+
.pdm.toml
119+
120+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
121+
__pypackages__/
122+
123+
# Celery stuff
124+
celerybeat-schedule
125+
celerybeat.pid
126+
127+
# SageMath parsed files
128+
*.sage.py
129+
130+
# Environments
131+
.env
132+
.venv
133+
env/
134+
venv/
135+
ENV/
136+
env.bak/
137+
venv.bak/
138+
139+
# Spyder project settings
140+
.spyderproject
141+
.spyproject
142+
143+
# Rope project settings
144+
.ropeproject
145+
146+
# mkdocs documentation
147+
/site
148+
149+
# mypy
150+
.mypy_cache/
151+
.dmypy.json
152+
dmypy.json
153+
154+
# Pyre type checker
155+
.pyre/
156+
157+
# pytype static type analyzer
158+
.pytype/
159+
160+
# Cython debug symbols
161+
cython_debug/
162+
163+
# PyCharm
164+
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
165+
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
166+
# and can be added to the global gitignore or merged into this file. For a more nuclear
167+
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
168+
#.idea/
628 KB
Binary file not shown.

README.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
## docGPT
2+
3+
主要開發工具:
4+
* `Python`
5+
* `Langchain`
6+
* `Streamlit`
7+
8+
使用該工具至少須具備 `openai_api_key`,有關如何取得 key 可以前往[連結](https://platform.openai.com/)
9+
10+
## Introduction
11+
12+
* `docGPT` 是基於 langchain 與 streamlit 開發的
13+
* `langchain`: LangChain 是一個用於**開發由語言模型支持的應用程序的框架**。它支持以下應用程序
14+
1. 可以將 LLM 模型與外部數據源進行連接
15+
2. 允許與 LLM 模型進行交互
16+
* `streamlit`: streamlit 使 python 可以快速、免費的部署屬於你的應用程序 (通常拿來部屬AI)
17+
* 原理
18+
* 透過 langchain 結合多種模型
19+
* 基於 langchain retrievalQA 開發的 **pdf 問答工具**
20+
* 基於 math-llm 開發的 **數學計算工具**
21+
* 基於 google-search 開發的 **搜索工具**
22+
23+
## LangChain
24+
25+
* 有關 langchain 的介紹,建議查看官方文件、[Github源專案](https://github.com/hwchase17/langchain)
26+
27+
#### Questions that ChatGPT cannot answer are handed over to LangChain for implementation!
28+
29+
**ChatGPT 無法回答的問題,交給 Langchain 實現!**
30+
31+
在這邊,作者將簡單介紹 langchain 與 chatgpt 之間的差異,相信您理解以下例子,你會對 langchain 這個開源項目感到震驚!
32+
33+
>今天可以想像 chatgpt 無法回答數學問題、超過 2020 年後的事情(例如2023年貴國總統是誰?)
34+
>
35+
> * 數學問題: 除了 Openai 模型,還存在專門處理數學問題的 math-llm
36+
> * 現今問題: 可以使用 google 搜尋
37+
>
38+
>因此,我們要設計一個強大通用的 ai 模型,勢必要加入** "chatgpt"、"math-llm"、"google search"** 三個工具
39+
>
40+
>如果使用者的提問屬於數學計算類型,我們就使用 math-llm 工具解決、回答
41+
>
42+
>非AI時代,我們就會透過 `if...else...` 方式判斷使用者提問屬於哪種類型,此時就必須在使用者介面讓使用者選擇提問類型
43+
>(UI 介面會有選擇欄位)
44+
>
45+
>但在AI時代,我們要讓使用者直接提問,而不需要事先選擇提問類型!
46+
>在 langchain 中有一個 agent 的概念,它讓我們可以:
47+
>
48+
> * 我們提供工具給他管理,例如 `tools = ['chatgpt', 'math-llm', 'google-search']`
49+
> * 工具也可以包含透過 langchain 設計出的 chain,例如使用 `retrievalQA chain` 設計一個可以回答來自文檔內容的提問,並將此 chain append 到 agent 管理的 tools
50+
> * **藉由 agent 判斷使用者提問,並自行決策出使用哪個工具處理問題** (完全自動化、ai化)
51+
52+
透過 langchain,我們可以創建屬於自己的 chatgpt 模型,它可以是通用型的模型,也可以是**企業化、商業化**的!
53+
54+
## How to Use?
55+
56+
* 輸入您的 `API_KEY`:
57+
* `OpenAI API KEY`: 必須設定
58+
* `SERPAPI API KEY`: 根據您需求,如果您要問**PDF文檔沒有出現**的內容,您就需要用此 KEY
59+
60+
* 上傳來自本地的 PDF 檔案
61+
* 開始進行提問 !
62+
63+
![RGB_cleanup](https://github.com/Lin-jun-xiang/docGPT-streamlit/blob/main/img/docGPT.gif?raw=true)
64+
65+
66+
## Why Use?
67+
68+
* 本專案開發的 `docGPT` 具有以下功能:
69+
* 上傳 PDF
70+
* 與GPT進行來回答覆,快速學習PDF內容
71+
* 進行文檔總結
72+
* 附加 **"math-llm"**,提供您進行**數學計算**相關問答 (chatgpt無法回答的問題)
73+
* 附加 **"google-search"**,提供您進行**google搜尋** (chatgpt無法回答的問題)
74+
75+

agent/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .agent import AgentHelper

agent/agent.py

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
import os
2+
from typing import Optional
3+
4+
import openai
5+
from langchain import LLMMathChain, SerpAPIWrapper
6+
from langchain.agents import AgentType, Tool, initialize_agent
7+
from langchain.callbacks import get_openai_callback
8+
from langchain.llms import OpenAI
9+
10+
11+
openai.api_key = os.getenv('OPENAI_API_KEY')
12+
os.environ['SERPAPI_API_KEY'] = os.getenv('SERPAPI_API_KEY')
13+
14+
15+
class AgentHelper:
16+
"""Add agent to help docGPT can be perfonm better."""
17+
def __init__(self) -> None:
18+
self.llm = OpenAI(temperature=0)
19+
self.agent_ = None
20+
self.tools = []
21+
22+
@property
23+
def get_calculate_chain(self) -> Tool:
24+
llm_math_chain = LLMMathChain.from_llm(llm=self.llm, verbose=True)
25+
tool = Tool(
26+
name='Calculator',
27+
func=llm_math_chain.run,
28+
description='useful for when you need to answer questions about math'
29+
)
30+
return tool
31+
32+
@property
33+
def get_searp_chain(self) -> Tool:
34+
35+
search = SerpAPIWrapper()
36+
tool = Tool(
37+
name='Search',
38+
func=search.run,
39+
description='useful for when you need to answer questions about current events'
40+
)
41+
return tool
42+
43+
def create_doc_chat(self, docGPT) -> Tool:
44+
"""Add a custom docGPT tool"""
45+
tool = Tool(
46+
name='DocumentGPT',
47+
func=docGPT.run,
48+
description="""
49+
useful for when you need to answer questions from the context of PDF,
50+
especially ask the specification of display.
51+
"""
52+
)
53+
return tool
54+
55+
def initialize(self, tools):
56+
for tool in tools:
57+
if isinstance(tool, Tool):
58+
self.tools.append(tool)
59+
60+
self.agent_ = initialize_agent(
61+
self.tools,
62+
self.llm,
63+
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
64+
verbose=True
65+
)
66+
67+
def query(self, query: str) -> Optional[str]:
68+
response = None
69+
with get_openai_callback() as callback:
70+
response = self.agent_.run(query)
71+
print(callback)
72+
return response

0 commit comments

Comments
 (0)