Skip to content

Commit cb50905

Browse files
simonprickettamotl
andauthored
Add llama index example (#691)
* Updated .gitignore. * Added venv/ * Pylint fixes. * Minor wording update. * Additional files for llama-index example. * Update topic/machine-learning/llama-index/README.md Looks good to me. Co-authored-by: Andreas Motl <andreas.motl@crate.io> * Encloses code in if main block. --------- Co-authored-by: Andreas Motl <andreas.motl@crate.io>
1 parent fc49187 commit cb50905

File tree

5 files changed

+289
-0
lines changed

5 files changed

+289
-0
lines changed

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,8 @@ __pycache__
99
coverage.xml
1010
mlruns/
1111
archive/
12+
venv/
1213
logs.log
14+
*.tmp
15+
*.swp
16+
*.bak
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Connecting CrateDB Data to an LLM with LlamaIndex and Azure OpenAI
2+
3+
This folder contains the codebase for [this tutorial](https://community.cratedb.com/t/how-to-connect-your-cratedb-data-to-llm-with-llamaindex-and-azure-openai/1612) on the CrateDB community forum. You should read the tutorial for instructions on how to set up the components that you need on Azure, and use this README for setting up CrateDB and the Python code.
4+
5+
This has been tested using:
6+
7+
* Python 3.12.2
8+
* macOS Sequoia 15.0.1
9+
* CrateDB 5.8.3 running in CrateDB Cloud on AWS Europe (Ireland)
10+
11+
## Database Setup
12+
13+
You will need a CrateDB Cloud database: sign up [here](https://console.cratedb.cloud/) and use the free "CRFREE" tier.
14+
15+
Make a note of the hostname, username and password for your database. You'll need those when configuring the environment file later.
16+
17+
Create a table in CrateDB:
18+
19+
```sql
20+
CREATE TABLE IF NOT EXISTS time_series_data (
21+
timestamp TIMESTAMP,
22+
value DOUBLE,
23+
location STRING,
24+
sensor_id INT
25+
);
26+
```
27+
28+
Add some sample data:
29+
30+
```sql
31+
INSERT INTO time_series_data (timestamp, value, location, sensor_id)
32+
VALUES
33+
('2023-09-14T00:00:00', 10.5, 'Sensor A', 1),
34+
('2023-09-14T01:00:00', 15.2, 'Sensor A', 1),
35+
('2023-09-14T02:00:00', 18.9, 'Sensor A', 1),
36+
('2023-09-14T03:00:00', 12.7, 'Sensor B', 2),
37+
('2023-09-14T04:00:00', 17.3, 'Sensor B', 2),
38+
('2023-09-14T05:00:00', 20.1, 'Sensor B', 2),
39+
('2023-09-14T06:00:00', 22.5, 'Sensor A', 1),
40+
('2023-09-14T07:00:00', 18.3, 'Sensor A', 1),
41+
('2023-09-14T08:00:00', 16.8, 'Sensor A', 1),
42+
('2023-09-14T09:00:00', 14.6, 'Sensor B', 2),
43+
('2023-09-14T10:00:00', 13.2, 'Sensor B', 2),
44+
('2023-09-14T11:00:00', 11.7, 'Sensor B', 2);
45+
```
46+
47+
## Python Project Setup
48+
49+
Create and activate a virtual environment:
50+
51+
```
52+
python3 -m venv .venv
53+
source .venv/bin/activate
54+
```
55+
56+
Install the dependencies:
57+
58+
```bash
59+
pip install -r requirements.txt
60+
```
61+
62+
## Configure your Environment
63+
64+
To configure your environment, copy the provided [`env.example`](./env.example) file to a new file named `.env`, then open it with a text editor.
65+
66+
Set the values in the file as follows:
67+
68+
```
69+
OPENAI_API_KEY=<Your key from Azure>
70+
OPENAI_API_TYPE=azure
71+
OPENAI_AZURE_ENDPOINT=https://<Your endpoint from Azure e.g. myendpoint.openai.azure.com>
72+
OPENAI_AZURE_API_VERSION=2024-08-01-preview
73+
LLM_INSTANCE=<The name of your Chat GPT 3.5 turbo instance from Azure>
74+
EMBEDDING_MODEL_INSTANCE=<The name of your Text Embedding Ada 2.0 instance from Azure>
75+
CRATEDB_URL="crate://<Database user name>:<Database password>@<Database host>:4200/?ssl=true"
76+
CRATEDB_TABLE_NAME=time_series_data
77+
```
78+
79+
Save your changes.
80+
81+
## Run the Code
82+
83+
Run the code like so:
84+
85+
```bash
86+
python main.py
87+
```
88+
89+
Here's the expected output:
90+
91+
```
92+
Creating SQLAlchemy engine...
93+
Connecting to CrateDB...
94+
Creating SQLDatabase instance...
95+
Creating QueryEngine...
96+
Running query...
97+
> Source (Doc id: b2b0afac-6fb6-4674-bc80-69941a8c10a5): [(17.033333333333335,)]
98+
Query was: What is the average value for sensor 1?
99+
Answer was: The average value for sensor 1 is 17.033333333333335.
100+
{
101+
'b2b0afac-6fb6-4674-bc80-69941a8c10a5': {
102+
'sql_query': 'SELECT AVG(value) FROM time_series_data WHERE sensor_id = 1',
103+
'result': [
104+
(17.033333333333335,)
105+
],
106+
'col_keys': [
107+
'avg(value)'
108+
]
109+
},
110+
'sql_query': 'SELECT AVG(value) FROM time_series_data WHERE sensor_id = 1',
111+
'result': [
112+
(17.033333333333335,)
113+
],
114+
'col_keys': [
115+
'avg(value)'
116+
]
117+
}
118+
```
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
OPENAI_API_KEY=TODO
2+
OPENAI_API_TYPE=azure
3+
OPENAI_AZURE_ENDPOINT=https://TODO.openai.azure.com
4+
OPENAI_AZURE_API_VERSION=2024-08-01-preview
5+
LLM_INSTANCE=TODO
6+
EMBEDDING_MODEL_INSTANCE=TODO
7+
CRATEDB_URL="crate://USER:PASSWORD@HOST:4200/?ssl=true"
8+
CRATEDB_TABLE_NAME=time_series_data
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
""" Example code using Azure Open AI and llama-index. """
2+
3+
import os
4+
import openai
5+
import sqlalchemy as sa
6+
7+
from dotenv import load_dotenv
8+
from langchain_openai import AzureOpenAIEmbeddings
9+
from llama_index.llms.azure_openai import AzureOpenAI
10+
from llama_index.embeddings.langchain import LangchainEmbedding
11+
from llama_index.core.utilities.sql_wrapper import SQLDatabase
12+
from llama_index.core.query_engine import NLSQLTableQueryEngine
13+
from llama_index.core import Settings
14+
15+
if __name__ == "__main__":
16+
load_dotenv()
17+
18+
openai.api_type = os.getenv("OPENAI_API_TYPE")
19+
openai.azure_endpoint = os.getenv("OPENAI_AZURE_ENDPOINT")
20+
openai.api_version = os.getenv("OPENAI_AZURE_API_VERSION")
21+
openai.api_key = os.getenv("OPENAI_API_KEY")
22+
23+
llm = AzureOpenAI(
24+
engine=os.getenv("LLM_INSTANCE"),
25+
azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"),
26+
api_key = os.getenv("OPENAI_API_KEY"),
27+
api_version = os.getenv("OPENAI_AZURE_API_VERSION"),
28+
temperature=0.0
29+
)
30+
31+
Settings.llm = llm
32+
Settings.embed_model = LangchainEmbedding(
33+
AzureOpenAIEmbeddings(
34+
azure_endpoint=os.getenv("OPENAI_AZURE_ENDPOINT"),
35+
model=os.getenv("EMBEDDING_MODEL_INSTANCE")
36+
)
37+
)
38+
39+
print("Creating SQLAlchemy engine...")
40+
engine_crate = sa.create_engine(os.getenv("CRATEDB_URL"))
41+
print("Connecting to CrateDB...")
42+
engine_crate.connect()
43+
print("Creating SQLDatabase instance...")
44+
sql_database = SQLDatabase(engine_crate, include_tables=[os.getenv("CRATEDB_TABLE_NAME")])
45+
print("Creating QueryEngine...")
46+
query_engine = NLSQLTableQueryEngine(
47+
sql_database=sql_database,
48+
tables=[os.getenv("CRATEDB_TABLE_NAME")],
49+
llm = llm
50+
)
51+
52+
print("Running query...")
53+
54+
QUERY_STR = "What is the average value for sensor 1?"
55+
answer = query_engine.query(QUERY_STR)
56+
print(answer.get_formatted_sources())
57+
print("Query was:", QUERY_STR)
58+
print("Answer was:", answer)
59+
print(answer.metadata)
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
aiohappyeyeballs==2.4.3
2+
aiohttp==3.10.10
3+
aiosignal==1.3.1
4+
annotated-types==0.7.0
5+
anyio==4.6.2.post1
6+
attrs==24.2.0
7+
azure-core==1.31.0
8+
azure-identity==1.19.0
9+
beautifulsoup4==4.12.3
10+
certifi==2024.8.30
11+
cffi==1.17.1
12+
charset-normalizer==3.4.0
13+
click==8.1.7
14+
crate==1.0.0.dev1
15+
cryptography==43.0.3
16+
dataclasses-json==0.6.7
17+
Deprecated==1.2.14
18+
dirtyjson==1.0.8
19+
distro==1.9.0
20+
frozenlist==1.4.1
21+
fsspec==2024.10.0
22+
geojson==3.1.0
23+
greenlet==3.1.1
24+
h11==0.14.0
25+
httpcore==1.0.6
26+
httpx==0.27.2
27+
idna==3.10
28+
jiter==0.6.1
29+
joblib==1.4.2
30+
jsonpatch==1.33
31+
jsonpointer==3.0.0
32+
langchain==0.3.4
33+
langchain-community==0.3.3
34+
langchain-core==0.3.12
35+
langchain-openai==0.2.3
36+
langchain-text-splitters==0.3.0
37+
langsmith==0.1.136
38+
llama-cloud==0.1.4
39+
llama-index==0.11.19
40+
llama-index-agent-openai==0.3.4
41+
llama-index-cli==0.3.1
42+
llama-index-core==0.11.19
43+
llama-index-embeddings-langchain==0.2.1
44+
llama-index-embeddings-openai==0.2.5
45+
llama-index-indices-managed-llama-cloud==0.4.0
46+
llama-index-legacy==0.9.48.post3
47+
llama-index-llms-azure-openai==0.2.2
48+
llama-index-llms-langchain==0.4.2
49+
llama-index-llms-openai==0.2.15
50+
llama-index-multi-modal-llms-openai==0.2.2
51+
llama-index-program-openai==0.2.0
52+
llama-index-question-gen-openai==0.2.0
53+
llama-index-readers-file==0.2.2
54+
llama-index-readers-llama-parse==0.3.0
55+
llama-parse==0.5.10
56+
marshmallow==3.23.0
57+
msal==1.31.0
58+
msal-extensions==1.2.0
59+
multidict==6.1.0
60+
mypy-extensions==1.0.0
61+
nest-asyncio==1.6.0
62+
networkx==3.4.2
63+
nltk==3.9.1
64+
numpy==1.26.4
65+
openai==1.52.0
66+
orjson==3.10.9
67+
packaging==24.1
68+
pandas==2.2.3
69+
pillow==11.0.0
70+
portalocker==2.10.1
71+
propcache==0.2.0
72+
pycparser==2.22
73+
pydantic==2.9.2
74+
pydantic-settings==2.6.0
75+
pydantic_core==2.23.4
76+
PyJWT==2.9.0
77+
pypdf==4.3.1
78+
python-dateutil==2.9.0.post0
79+
python-dotenv==1.0.1
80+
pytz==2024.2
81+
PyYAML==6.0.2
82+
regex==2024.9.11
83+
requests==2.32.3
84+
requests-toolbelt==1.0.0
85+
six==1.16.0
86+
sniffio==1.3.1
87+
soupsieve==2.6
88+
SQLAlchemy==2.0.36
89+
sqlalchemy-cratedb==0.40.0
90+
striprtf==0.0.26
91+
tenacity==8.5.0
92+
tiktoken==0.8.0
93+
tqdm==4.66.5
94+
typing-inspect==0.9.0
95+
typing_extensions==4.12.2
96+
tzdata==2024.2
97+
urllib3==2.2.3
98+
verlib2==0.2.0
99+
wrapt==1.16.0
100+
yarl==1.16.0

0 commit comments

Comments
 (0)