Skip to content

Commit ca06de7

Browse files
committed
add txt loader
1 parent 7dcb0e8 commit ca06de7

File tree

4 files changed

+31
-21
lines changed

4 files changed

+31
-21
lines changed

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55

66
[English](./README.md) | [中文版](./README.zh-TW.md)
77

8-
Free `docGPT` allows you to chat with your documents (`.pdf`, `.docx`, `.csv`), without the need for any keys or fees.
8+
Free `docGPT` allows you to chat with your documents (`.pdf`, `.docx`, `.csv`, `.txt`), without the need for any keys or fees.
99

1010
Additionally, you can deploy the app anywhere based on the document.
1111

@@ -27,7 +27,7 @@ If you like this project, please give it a ⭐`Star` to support the developers~
2727

2828
### 📚Introduction
2929

30-
* Upload a Document link from your local device (`.pdf`, `.docx`, `.csv`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
30+
* Upload a Document link from your local device (`.pdf`, `.docx`, `.csv`, `.txt`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
3131

3232
* Provide two models:
3333
* `gpt4free`
@@ -46,8 +46,8 @@ If you like this project, please give it a ⭐`Star` to support the developers~
4646
### 🧨Features
4747

4848
- **`gpt4free` Integration**: Everyone can use `docGPT` for **free** without needing an OpenAI API key.
49-
- **Support docx, pdf file**: Users can upload PDF or Word file.
50-
- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading `.pdf`, `.docx` or `.csv` files.
49+
- **Support docx, pdf, csv, txt file**: Users can upload PDF, Word, CSV, txt file.
50+
- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading document files(see the demo).
5151
- **Langchain Agent**: Enables AI to answer current questions and achieve Google search-like functionality.
5252
- **User-Friendly Environment**: Easy-to-use interface for simple operations.
5353

@@ -93,7 +93,7 @@ Through LangChain, you can create a universal AI model or tailor it for business
9393
- `SERPAPI API KEY`: Required if you want to query content not present in the Document.
9494

9595
3. 📁Upload a Document file (choose one method)
96-
* Method 1: Browse and upload your own `.pdf`, `.docx` or `.csv` file from your local machine.
96+
* Method 1: Browse and upload your own `.pdf`, `.docx`, `.csv`, `.txt` file from your local machine.
9797
* Method 2: Enter the Document `URL` link directly.
9898

9999
4. 🚀Start asking questions!

README.zh-TW.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[English](./README.md) | [中文版](./README.zh-TW.md)
66

7-
免費的`docGPT`允許您與您的文件 (`.pdf`, `.docx`, `.csv`) 進行對話,無需任何金鑰或費用。
7+
免費的`docGPT`允許您與您的文件 (`.pdf`, `.docx`, `.csv`, `.txt`) 進行對話,無需任何金鑰或費用。
88

99
此外,您也可以根據該文件操作,將程序部屬在任何地方。
1010

@@ -26,7 +26,7 @@
2626

2727
### 📚Introduction
2828

29-
* 上傳來自本地的 Document 連結 (`.pdf`, `.docx`, `.csv`),並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
29+
* 上傳來自本地的 Document 連結 (`.pdf`, `.docx`, `.csv`, `.txt`),並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
3030
* 提供兩種模型選擇:
3131
* `gpt4free`
3232
* **完全免費,"允許使用者在無需輸入 API 金鑰或付款的情況下使用該應用程序"**
@@ -44,8 +44,8 @@
4444
### 🧨Features
4545

4646
- **`gpt4free` 整合**:任何人都可以免費使用 GPT4,無需輸入 OpenAI API 金鑰。
47-
- **支援 docx, pdf 檔案**: 可以上傳 PDF or Word 檔
48-
- **直接輸入 Document 網址**:使用者可以直接輸入 Document 網址進行解析,無需從本地上傳 `.pdf`, `.docx` or `.csv` 檔案
47+
- **支援 docx, pdf, csv, txt 檔案**: 可以上傳 PDF, Word, CSV, txt
48+
- **直接輸入 Document 網址**:使用者可以直接輸入 Document URL 進行解析,無需從本地上傳檔案(如下方demo所示)
4949
- **Langchain Agent**:AI 能夠回答當前問題,實現類似 Google 搜尋功能。
5050
- **簡易操作環境**:友善的界面,操作簡便
5151

@@ -92,7 +92,7 @@ LangChain 填補了 ChatGPT 的不足之處。通過以下示例,您可以理
9292
* `SERPAPI API KEY`: 如果您要查詢 Document 中不存在的內容,則需要使用此金鑰。
9393

9494
3. 📁上傳來自本地的 Document 檔案 (選擇一個方法)
95-
* 方法一: 從本地機瀏覽並上傳自己的 `.pdf`, `.docx` or `.csv`
95+
* 方法一: 從本地機瀏覽並上傳自己的 `.pdf`, `.docx`, `.csv` or `.txt`
9696
* 方法二: 輸入 Document URL 連結
9797

9898
4. 🚀開始提問 !

app.py

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,14 @@ def theme() -> None:
3939
with st.expander(':orange[How to use?]'):
4040
st.markdown(
4141
"""
42-
1. Enter your API keys: (You can choose to skip it and use the `gpt4free` free model)
42+
1. Enter your API keys: (You can use the `gpt4free` free model **without API keys**)
4343
* `OpenAI API Key`: Make sure you still have usage left
4444
* `SERPAPI API Key`: Optional. If you want to ask questions about content not appearing in the PDF document, you need this key.
45-
2. Upload a Document file (choose one method):
46-
* method1: Browse and upload your own `.pdf or .docx` file from your local machine.
47-
* method2: Enter the PDF or DOCX `URL` link directly.
45+
2. **Upload a Document** file (choose one method):
46+
* method1: Browse and upload your own document file from your local machine.
47+
* method2: Enter the document URL link directly.
48+
49+
(**support documents**: `.pdf`, `.docx`, `.csv`, `.txt`)
4850
3. Start asking questions!
4951
4. More details.(https://github.com/Lin-jun-xiang/docGPT-streamlit)
5052
5. If you have any questions, feel free to leave comments and engage in discussions.(https://github.com/Lin-jun-xiang/docGPT-streamlit/issues)
@@ -108,22 +110,22 @@ def load_api_key() -> None:
108110

109111

110112
def upload_and_process_document() -> list:
111-
st.write('#### Upload a Document file (PDF, DOCX, CSV)')
113+
st.write('#### Upload a Document file')
112114
browse, url_link = st.tabs(
113115
['Drag and drop file (Browse files)', 'Enter document URL link']
114116
)
115117
with browse:
116118
upload_file = st.file_uploader(
117-
'Browse file (.pdf, .docx, .csv)',
118-
type=['pdf', 'docx', 'csv'],
119+
'Browse file (.pdf, .docx, .csv, `.txt`)',
120+
type=['pdf', 'docx', 'csv', 'txt'],
119121
label_visibility='hidden'
120122
)
121123
filetype = os.path.splitext(upload_file.name)[1].lower() if upload_file else None
122124
upload_file = upload_file.read() if upload_file else None
123125

124126
with url_link:
125127
doc_url = st.text_input(
126-
"Enter document URL Link (.pdf, .docx, .csv)",
128+
"Enter document URL Link (.pdf, .docx, .csv, .txt)",
127129
placeholder='https://www.xxx/uploads/file.pdf',
128130
label_visibility='hidden'
129131
)

model/data_connection.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,12 @@
33

44
import requests
55
import streamlit as st
6-
from langchain.document_loaders import CSVLoader, Docx2txtLoader, PyMuPDFLoader
6+
from langchain.document_loaders import (
7+
CSVLoader,
8+
Docx2txtLoader,
9+
PyMuPDFLoader,
10+
TextLoader,
11+
)
712
from langchain.text_splitter import RecursiveCharacterTextSplitter
813

914

@@ -22,7 +27,7 @@ def get_files(path: str, filetype: str = '.pdf') -> Iterator[str]:
2227
def load_documents(
2328
file: str,
2429
filetype: str = '.pdf'
25-
) -> Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader]:
30+
) -> Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader, TextLoader]:
2631
"""Loading PDF, Docx, CSV"""
2732
try:
2833
if filetype == '.pdf':
@@ -31,15 +36,18 @@ def load_documents(
3136
loader = Docx2txtLoader(file)
3237
elif filetype == '.csv':
3338
loader = CSVLoader(file, encoding='utf-8')
39+
elif filetype == '.txt':
40+
loader = TextLoader(file, encoding='utf-8')
3441

3542
return loader.load()
43+
3644
except Exception as e:
3745
print(f'\033[31m{e}')
3846
return []
3947

4048
@staticmethod
4149
def split_documents(
42-
document: Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader],
50+
document: Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader, TextLoader],
4351
chunk_size: int=2000,
4452
chunk_overlap: int=0
4553
) -> list:

0 commit comments

Comments
 (0)