Skip to content

Add DuckDB integration with example usage and tests #1193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions docs/components/model_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,21 @@ ModelCatalog().register_open_chat_model("my_open_chat_model2",
model_type="chat")
```

## DuckDB Integration

DuckDB is an in-process SQL OLAP database management system designed for analytical workloads. It now supports similarity search using vectors, making it a great fit for Retrieval-Augmented Generation (RAG) workflows.

### Features
- Lightweight and efficient, even on local machines.
- Supports vectorized execution for analytical queries.
- Enables similarity search for RAG workflows.

### Example Usage
Refer to the example script `examples/Models/duckdb_rag_example.py` for a demonstration of how to use DuckDB with `llmware`.

### Testing
Tests for DuckDB integration can be found in `tests/models/test_duckdb_integration.py`.


Need help or have questions?
============================
Expand Down
28 changes: 28 additions & 0 deletions examples/Models/duckdb_rag_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from llmware.duckdb_integration import DuckDBIntegration

def main():
# Initialize DuckDB
db = DuckDBIntegration(db_path='example.duckdb')

# Create a table for documents
db.create_table('documents', 'id INTEGER, content TEXT')

# Insert example data
documents = [
(1, 'DuckDB is an in-process SQL OLAP database management system.'),
(2, 'It is designed for analytical workloads and supports vectorized execution.'),
(3, 'DuckDB recently added support for similarity search using vectors.')
]
db.insert_data('documents', documents)

# Query the table
results = db.query('SELECT * FROM documents;')
print('Documents in the database:')
for row in results:
print(row)

# Close the connection
db.close()

if __name__ == '__main__':
main()
44 changes: 44 additions & 0 deletions llmware/duckdb_integration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
import duckdb

class DuckDBIntegration:
def __init__(self, db_path: str = ':memory:'):
"""Initialize the DuckDB connection.

Args:
db_path (str): Path to the DuckDB database file. Defaults to in-memory.
"""
self.connection = duckdb.connect(database=db_path, read_only=False)

def create_table(self, table_name: str, schema: str):
"""Create a table in the DuckDB database.

Args:
table_name (str): Name of the table to create.
schema (str): Schema definition for the table.
"""
self.connection.execute(f"CREATE TABLE IF NOT EXISTS {table_name} ({schema});")

def insert_data(self, table_name: str, data: list):
"""Insert data into a DuckDB table.

Args:
table_name (str): Name of the table to insert data into.
data (list): List of tuples representing rows to insert.
"""
placeholders = ', '.join(['?'] * len(data[0]))
self.connection.executemany(f"INSERT INTO {table_name} VALUES ({placeholders});", data)

def query(self, sql: str):
"""Execute a query on the DuckDB database.

Args:
sql (str): SQL query to execute.

Returns:
list: Query results.
"""
return self.connection.execute(sql).fetchall()

def close(self):
"""Close the DuckDB connection."""
self.connection.close()
1 change: 1 addition & 0 deletions llmware/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ huggingface-hub>=0.19.4
tokenizers>=0.15.0
boto3>=1.24.53
colorama==0.4.6
duckdb>=0.7.1


29 changes: 29 additions & 0 deletions tests/models/test_duckdb_integration.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
""" Test for DuckDB integration in llmware"""
from llmware.duckdb_integration import DuckDBIntegration

def test_duckdb_integration():
# Initialize DuckDB in memory
db = DuckDBIntegration()

# Create a table
db.create_table('test_table', 'id INTEGER, name TEXT')

# Insert data
data = [
(1, 'Alice'),
(2, 'Bob'),
(3, 'Charlie')
]
db.insert_data('test_table', data)

# Query the data
results = db.query('SELECT * FROM test_table;')

# Validate the results
assert len(results) == 3
assert results[0] == (1, 'Alice')
assert results[1] == (2, 'Bob')
assert results[2] == (3, 'Charlie')

# Close the connection
db.close()