Add CSV loader

Lin-jun-xiang · Lin-jun-xiang · commit 7fd30f9f37b9 · 2023-09-28T14:44:45.000+08:00
diff --git a/README.md b/README.md
@@ -5,8 +5,9 @@
 
 [English](./README.md) | [中文版](./README.zh-TW.md)
 
-Welcome to the `docGPT` User Guide. This guide will take you through the features and usage of docGPT, and walk you through building your own application.
+Free `docGPT` allows you to chat with your documents (`.pdf`, `.docx`, `.csv`), without the need for any keys or fees.
 
+Additionally, you can deploy the app anywhere based on the document.
 
 - Table of Contents
     - [Introduction](#introduction)
@@ -26,7 +27,7 @@ If you like this project, please give it a ⭐`Star` to support the developers~
 
 ### 📚Introduction
 
-* Upload a Document link from your local device (`.pdf` or `.docx`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
+* Upload a Document link from your local device (`.pdf`, `.docx`, `.csv`) and query `docGPT` about the content of the Document. For example, you can ask GPT to summarize an article.
 
 * Provide two models:
   * `gpt4free`
@@ -46,7 +47,7 @@ If you like this project, please give it a ⭐`Star` to support the developers~
 
 - **`gpt4free` Integration**: Everyone can use `docGPT` for **free** without needing an OpenAI API key.
 - **Support docx, pdf file**: Users can upload PDF or Word file.
-- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading `.pdf` or `.docx` files.
+- **Direct Document URL Input**: Users can input Document `URL` links for parsing without uploading `.pdf`, `.docx` or `.csv` files.
 - **Langchain Agent**: Enables AI to answer current questions and achieve Google search-like functionality.
 - **User-Friendly Environment**: Easy-to-use interface for simple operations.
 
@@ -92,7 +93,7 @@ Through LangChain, you can create a universal AI model or tailor it for business
    - `SERPAPI API KEY`: Required if you want to query content not present in the Document.
 
 3. 📁Upload a Document file (choose one method)
-    * Method 1: Browse and upload your own `.pdf` or `.docx` file from your local machine.
+    * Method 1: Browse and upload your own `.pdf`, `.docx` or `.csv` file from your local machine.
     * Method 2: Enter the Document `URL` link directly.
 
 4. 🚀Start asking questions!
diff --git a/README.zh-TW.md b/README.zh-TW.md
@@ -4,7 +4,9 @@
 
 [English](./README.md) | [中文版](./README.zh-TW.md)
 
-歡迎來到 `docGPT` 使用指南。本指南將帶您深入了解 `docGPT` 的功能和用法，並讓您親自搭建一個屬於自己的應用程式。
+免費的`docGPT`允許您與您的文件 (`.pdf`, `.docx`, `.csv`) 進行對話，無需任何金鑰或費用。
+
+此外，您也可以根據該文件操作，將程序部屬在任何地方。
 
 - 目錄
     - [Introduction](#introduction)
@@ -24,7 +26,7 @@
 
 ### 📚Introduction
 
-* 上傳來自本地的 Document 連結 (`.pdf` or `.docx`)，並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
+* 上傳來自本地的 Document 連結 (`.pdf`, `.docx`, `.csv`)，並且向 `docGPT` 詢問有關 Document 內容。例如: 您可以請 GPT 幫忙總結文章
 * 提供兩種模型選擇:
   * `gpt4free`
     * **完全免費，"允許使用者在無需輸入 API 金鑰或付款的情況下使用該應用程序"**
@@ -43,7 +45,7 @@
 
 - **`gpt4free` 整合**：任何人都可以免費使用 GPT4，無需輸入 OpenAI API 金鑰。
 - **支援 docx, pdf 檔案**: 可以上傳 PDF or Word 檔
-- **直接輸入 Document 網址**：使用者可以直接輸入 Document 網址進行解析，無需從本地上傳 `.pdf` or `.docx` 檔案。
+- **直接輸入 Document 網址**：使用者可以直接輸入 Document 網址進行解析，無需從本地上傳 `.pdf`, `.docx` or `.csv` 檔案。
 - **Langchain Agent**：AI 能夠回答當前問題，實現類似 Google 搜尋功能。
 - **簡易操作環境**：友善的界面，操作簡便
 
@@ -90,7 +92,7 @@ LangChain 填補了 ChatGPT 的不足之處。通過以下示例，您可以理
     * `SERPAPI API KEY`: 如果您要查詢 Document 中不存在的內容，則需要使用此金鑰。
 
 3. 📁上傳來自本地的 Document 檔案 (選擇一個方法)
-    * 方法一: 從本地機瀏覽並上傳自己的 `.pdf` or `.docx` 檔
+    * 方法一: 從本地機瀏覽並上傳自己的 `.pdf`, `.docx` or `.csv` 檔
     * 方法二: 輸入 Document URL 連結
 
 4. 🚀開始提問 ! 
diff --git a/app.py b/app.py
@@ -40,7 +40,6 @@ def theme() -> None:
     st.image('./static/img/repos_logo.png', width=250)
 
     with st.sidebar:
-
         with st.expander(':orange[How to use?]'):
             st.markdown(
                 """
@@ -113,22 +112,22 @@ def load_api_key() -> None:
 
 
 def upload_and_process_document() -> list:
-    st.write('#### Upload a Document file')
+    st.write('#### Upload a Document file (PDF, DOCX, CSV)')
     browse, url_link = st.tabs(
         ['Drag and drop file (Browse files)', 'Enter document URL link']
     )
     with browse:
         upload_file = st.file_uploader(
-            'Browse file (.pdf, .docx)',
-            type=['pdf', 'docx'],
+            'Browse file (.pdf, .docx, .csv)',
+            type=['pdf', 'docx', 'csv'],
             label_visibility='hidden'
         )
         filetype = os.path.splitext(upload_file.name)[1].lower() if upload_file else None
         upload_file = upload_file.read() if upload_file else None
 
     with url_link:
         doc_url = st.text_input(
-            "Enter document URL Link (.pdf, .docx)",
+            "Enter document URL Link (.pdf, .docx, .csv)",
             placeholder='https://www.xxx/uploads/file.pdf',
             label_visibility='hidden'
         )
@@ -167,7 +166,8 @@ def get_response(query: str) -> str:
             '(Click the "Show Available Providers" button in sidebar)\n'
             '2. If you are using openai model, '
             'try to re-pass openai api key.\n'
-            '3. Or you did not pass the PDF file successfully.'
+            '3. Or you did not pass the file successfully.\n'
+            '4. Try to Refresh the page (F5).'
         )
     except Exception as e:
         app_logger.info(f'{__file__}: {e}')
diff --git a/model/data_connection.py b/model/data_connection.py
@@ -3,7 +3,7 @@
 
 import requests
 import streamlit as st
-from langchain.document_loaders import Docx2txtLoader, PyMuPDFLoader
+from langchain.document_loaders import CSVLoader, Docx2txtLoader, PyMuPDFLoader
 from langchain.text_splitter import RecursiveCharacterTextSplitter
 
 
@@ -22,18 +22,24 @@ def get_files(path: str, filetype: str = '.pdf') -> Iterator[str]:
     def load_documents(
         file: str,
         filetype: str = '.pdf'
-    ) -> Union[Docx2txtLoader, PyMuPDFLoader]:
-        """Loading PDF or Docx"""
-        if filetype == '.pdf':
-            loader = PyMuPDFLoader(file)
-        elif filetype == '.docx':
-            loader = Docx2txtLoader(file)
+    ) -> Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader]:
+        """Loading PDF, Docx, CSV"""
+        try:
+            if filetype == '.pdf':
+                loader = PyMuPDFLoader(file)
+            elif filetype == '.docx':
+                loader = Docx2txtLoader(file)
+            elif filetype == '.csv':
+                loader = CSVLoader(file, encoding='utf-8')
 
-        return loader.load()
+            return loader.load()
+        except Exception as e:
+            print(f'\033[31m{e}')
+            return []
 
     @staticmethod
     def split_documents(
-        document: Union[Docx2txtLoader, PyMuPDFLoader],
+        document: Union[CSVLoader, Docx2txtLoader, PyMuPDFLoader],
         chunk_size: int=2000,
         chunk_overlap: int=0
     ) -> list:
@@ -53,6 +59,6 @@ def crawl_file(url: str) -> str:
                 '.pdf' in filetype or '.docx' in filetype):
                 return response.content, filetype
             else:
-                st.warning('Url cannot parse to PDF or DOCX')
+                st.warning('Url cannot parse correctly.')
         except:
-            st.warning('Url cannot parse to PDF or DOCX')
+            st.warning('Url cannot parse correctly.')