Skip to content

Commit 7fd30f9

Browse files
committed
Add CSV loader
1 parent cd0c70b commit 7fd30f9

File tree

4 files changed

+34
-25
lines changed

4 files changed

+34
-25
lines changed

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,9 @@
55

66
[English](./README.md) | [中文版](./README.zh-TW.md)
77

8-
Welcome to the `docGPT` User Guide. This guide will take you through the features and usage of docGPT, and walk you through building your own application.
8+
Free `docGPT` allows you to chat with your documents (`.pdf`, `.docx`, `.csv`), without the need for any keys or fees.
99

10+
Additionally, you can deploy the app anywhere based on the document.
1011

1112
- Table of Contents
1213
- [Introduction](#introduction)
@@ -26,7 +27,7 @@ If you like this project, please give it a ⭐`Star` to support the developers~
2627

2728
### 📚Introduction
2829

29-
* Upload a Document link from your local device (`.pdf` or `.docx`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
30+
* Upload a Document link from your local device (`.pdf`, `.docx`, `.csv`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
3031

3132
* Provide two models:
3233
* `gpt4free`
@@ -46,7 +47,7 @@ If you like this project, please give it a ⭐`Star` to support the developers~
4647

4748
- **`gpt4free` Integration**: Everyone can use `docGPT` for **free** without needing an OpenAI API key.
4849
- **Support docx, pdf file**: Users can upload PDF or Word file.
49-
- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading `.pdf` or `.docx` files.
50+
- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading `.pdf`, `.docx` or `.csv` files.
5051
- **Langchain Agent**: Enables AI to answer current questions and achieve Google search-like functionality.
5152
- **User-Friendly Environment**: Easy-to-use interface for simple operations.
5253

@@ -92,7 +93,7 @@ Through LangChain, you can create a universal AI model or tailor it for business
9293
- `SERPAPI API KEY`: Required if you want to query content not present in the Document.
9394

9495
3. 📁Upload a Document file (choose one method)
95-
* Method 1: Browse and upload your own `.pdf` or `.docx` file from your local machine.
96+
* Method 1: Browse and upload your own `.pdf`, `.docx` or `.csv` file from your local machine.
9697
* Method 2: Enter the Document `URL` link directly.
9798

9899
4. 🚀Start asking questions!

README.zh-TW.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@
44

55
[English](./README.md) | [中文版](./README.zh-TW.md)
66

7-
歡迎來到 `docGPT` 使用指南。本指南將帶您深入了解 `docGPT` 的功能和用法,並讓您親自搭建一個屬於自己的應用程式。
7+
免費的`docGPT`允許您與您的文件 (`.pdf`, `.docx`, `.csv`) 進行對話,無需任何金鑰或費用。
8+
9+
此外,您也可以根據該文件操作,將程序部屬在任何地方。
810

911
- 目錄
1012
- [Introduction](#introduction)
@@ -24,7 +26,7 @@
2426

2527
### 📚Introduction
2628

27-
* 上傳來自本地的 Document 連結 (`.pdf` or `.docx`),並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
29+
* 上傳來自本地的 Document 連結 (`.pdf`, `.docx`, `.csv`),並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
2830
* 提供兩種模型選擇:
2931
* `gpt4free`
3032
* **完全免費,"允許使用者在無需輸入 API 金鑰或付款的情況下使用該應用程序"**
@@ -43,7 +45,7 @@
4345

4446
- **`gpt4free` 整合**:任何人都可以免費使用 GPT4,無需輸入 OpenAI API 金鑰。
4547
- **支援 docx, pdf 檔案**: 可以上傳 PDF or Word 檔
46-
- **直接輸入 Document 網址**:使用者可以直接輸入 Document 網址進行解析,無需從本地上傳 `.pdf` or `.docx` 檔案。
48+
- **直接輸入 Document 網址**:使用者可以直接輸入 Document 網址進行解析,無需從本地上傳 `.pdf`, `.docx` or `.csv` 檔案。
4749
- **Langchain Agent**:AI 能夠回答當前問題,實現類似 Google 搜尋功能。
4850
- **簡易操作環境**:友善的界面,操作簡便
4951

@@ -90,7 +92,7 @@ LangChain 填補了 ChatGPT 的不足之處。通過以下示例,您可以理
9092
* `SERPAPI API KEY`: 如果您要查詢 Document 中不存在的內容,則需要使用此金鑰。
9193

9294
3. 📁上傳來自本地的 Document 檔案 (選擇一個方法)
93-
* 方法一: 從本地機瀏覽並上傳自己的 `.pdf` or `.docx`
95+
* 方法一: 從本地機瀏覽並上傳自己的 `.pdf`, `.docx` or `.csv`
9496
* 方法二: 輸入 Document URL 連結
9597

9698
4. 🚀開始提問 !

app.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@ def theme() -> None:
4040
st.image('./static/img/repos_logo.png', width=250)
4141

4242
with st.sidebar:
43-
4443
with st.expander(':orange[How to use?]'):
4544
st.markdown(
4645
"""
@@ -113,22 +112,22 @@ def load_api_key() -> None:
113112

114113

115114
def upload_and_process_document() -> list:
116-
st.write('#### Upload a Document file')
115+
st.write('#### Upload a Document file (PDF, DOCX, CSV)')
117116
browse, url_link = st.tabs(
118117
['Drag and drop file (Browse files)', 'Enter document URL link']
119118
)
120119
with browse:
121120
upload_file = st.file_uploader(
122-
'Browse file (.pdf, .docx)',
123-
type=['pdf', 'docx'],
121+
'Browse file (.pdf, .docx, .csv)',
122+
type=['pdf', 'docx', 'csv'],
124123
label_visibility='hidden'
125124
)
126125
filetype = os.path.splitext(upload_file.name)[1].lower() if upload_file else None
127126
upload_file = upload_file.read() if upload_file else None
128127

129128
with url_link:
130129
doc_url = st.text_input(
131-
"Enter document URL Link (.pdf, .docx)",
130+
"Enter document URL Link (.pdf, .docx, .csv)",
132131
placeholder='https://www.xxx/uploads/file.pdf',
133132
label_visibility='hidden'
134133
)
@@ -167,7 +166,8 @@ def get_response(query: str) -> str:
167166
'(Click the "Show Available Providers" button in sidebar)\n'
168167
'2. If you are using openai model, '
169168
'try to re-pass openai api key.\n'
170-
'3. Or you did not pass the PDF file successfully.'
169+
'3. Or you did not pass the file successfully.\n'
170+
'4. Try to Refresh the page (F5).'
171171
)
172172
except Exception as e:
173173
app_logger.info(f'{__file__}: {e}')

model/data_connection.py

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
import requests
55
import streamlit as st
6-
from langchain.document_loaders import Docx2txtLoader, PyMuPDFLoader
6+
from langchain.document_loaders import CSVLoader, Docx2txtLoader, PyMuPDFLoader
77
from langchain.text_splitter import RecursiveCharacterTextSplitter
88

99

@@ -22,18 +22,24 @@ def get_files(path: str, filetype: str = '.pdf') -> Iterator[str]:
2222
def load_documents(
2323
file: str,
2424
filetype: str = '.pdf'
25-
) -> Union[Docx2txtLoader, PyMuPDFLoader]:
26-
"""Loading PDF or Docx"""
27-
if filetype == '.pdf':
28-
loader = PyMuPDFLoader(file)
29-
elif filetype == '.docx':
30-
loader = Docx2txtLoader(file)
25+
) -> Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader]:
26+
"""Loading PDF, Docx, CSV"""
27+
try:
28+
if filetype == '.pdf':
29+
loader = PyMuPDFLoader(file)
30+
elif filetype == '.docx':
31+
loader = Docx2txtLoader(file)
32+
elif filetype == '.csv':
33+
loader = CSVLoader(file, encoding='utf-8')
3134

32-
return loader.load()
35+
return loader.load()
36+
except Exception as e:
37+
print(f'\033[31m{e}')
38+
return []
3339

3440
@staticmethod
3541
def split_documents(
36-
document: Union[Docx2txtLoader, PyMuPDFLoader],
42+
document: Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader],
3743
chunk_size: int=2000,
3844
chunk_overlap: int=0
3945
) -> list:
@@ -53,6 +59,6 @@ def crawl_file(url: str) -> str:
5359
'.pdf' in filetype or '.docx' in filetype):
5460
return response.content, filetype
5561
else:
56-
st.warning('Url cannot parse to PDF or DOCX')
62+
st.warning('Url cannot parse correctly.')
5763
except:
58-
st.warning('Url cannot parse to PDF or DOCX')
64+
st.warning('Url cannot parse correctly.')

0 commit comments

Comments
 (0)