-
Notifications
You must be signed in to change notification settings - Fork 902
Simple example for an agentic flow (Current Bengaluru talk) #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found a typo in one of the sql scripts.
simple-agentic-flow-flink-sql/1-create-connections-with-confluent-cli.md
Outdated
Show resolved
Hide resolved
## Pinecone | ||
|
||
```bash | ||
confluent flink connection create pinecone-connection --environment your-confluent-environment-name \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confluent flink connection create pinecone-connection --environment your-confluent-environment-name \ | |
confluent flink connection create pinecone-connection --environment your-confluent-environment-id \ |
|
||
## Populating Pinecone with Vector Data | ||
|
||
Example values used in this demo can be found in `documentation-sample.json`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add instructions / link for getting started with pinecone (link to https://app.pinecone.io/, copy your endpoint and API key, create index with the embedding that will match what we use in OpenAI).
Add instructions so users can go from this JSON sample to embeddings -- this is where I got stuck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I linked to the article by Diptiman on this topic! I honestly didn't want to go into all depth and nuances, there are many ways to load the data to vector store. The value of this repo is to have a set of SQL commands that I show in the slides, and Pinecone is outside of the scope. This was supposed to be an accompanying material for Bengaluru talk to have some code outside of slides.
|
||
## Setting Up Connections Using Confluent CLI | ||
|
||
Once external sources like **Pinecone** and **OpenAI** are configured, use the **Confluent CLI** to establish secure connections. Refer to [`1-create-connections-with-confluent-cli.md`](1-create-connections-with-confluent-cli.md) for examples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of platform and data seeding prereqs to get through and IMO this doesn't provide enough guidance. I'd suggest a section for platform setup that walks through the signup links plus any specifics that you have to do for the demo to work (e.g. I assume the pinecone vector type is critical?). Doesn't have to hand hold with screen shots but should give someone what they need to do to set the table. I tried to get there on my own and gave up on not following how to populate pinecone from the sample JSON.
- Confluent cloud. Signup link plus maybe use the quick start to create kafka cluster and compute pool:
confluent flink quickstart \
--name simple_agentic_rag \
--max-cfu 10 \
--region us-east-1 \
--cloud aws
-
pinecone: signup link, copy endpoint and API key, and which embedding type to pick when you create an index
-
openAI: signup link, billing, API key creation
-
Atlas: same deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe that my approach to this repo was not a full tutorial with step by step detailed guidance, but a brief list of instructions together with all code snippets used.
The talk is the main delivery, and this is an accompanying repo, so that people don't have to type code from the slides.
Or maybe this repo isn't a good place for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you're aiming for... it's currently at a bit of a crossroads because I expected to be able to recreate the demo fairly happily and I don't think that's going to happen if someone tries to. my suggestion is either:
a) make is clear that this is really ideal for reading along with the talk, or stealing bits of it for people looking to do similar parts of it. it's not a folder where readers should expect end-to-end recreation (in under an hour). as I reviewed it that's what I expected and attempted and I think others who find this will do the same.
b) add enough instructions to make it an end to end journey. doesn't have to be super hand-holdy with screen shots. e.g., the pinecone signup would be a sentence like "Create a Pinecone account and create an index configured for OpenAI's test-embedding-3-small
model". For json to pinecone, Add a Python script / snippet that a techie would be able to take a massage to work with their index by plugging in their API key. etc.
I like (b) for all github examples since I think devs will have an expectation unless you set the expectations clearly by making changes along the lines of (a)... but them IMO (a) winds up being closer to a blog in terms of audience / expectations
|
||
```bash | ||
confluent flink connection create openai-connection-vector-embeddings \ | ||
--environment your-confluent-environment-name \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--environment your-confluent-environment-name \ | |
--environment your-confluent-environment-id \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, good catch!
( | ||
conversation_id string NOT NULL, | ||
customer_id string NOT NULL, | ||
cusomer_message string NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not going to flag them all but there are 20 or so cases in this PR to change cusomer
to customer
cusomer_message string NOT NULL, | |
customer_message string NOT NULL, |
conversation_id STRING NOT NULL, | ||
customer_id STRING NOT NULL, | ||
cusomer_message String NOT NULL, | ||
chatbot_response String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total nit but suggest all caps data types throughout this PR. BIGINT
and STRING
everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, replaced
SELECT * FROM customer_message WHERE customer_id = 'customer_3' | ||
|
||
---------------------------------- CALL TO EMBEDDING API ------------------------------------------ | ||
INSERT INTO customer_message_and_embedding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the indented INSERT
throughout intentional? looks kinda weird to me. suggest lining up INSERT
and SELECT
throughout
INSERT INTO customer_message_and_embedding | |
INSERT INTO customer_message_and_embedding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed :)
@@ -0,0 +1,19 @@ | |||
|
|||
DROP TABLE customer_message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use the flink quick start plugin then you can just delete the environment and flink API key. make it a little easier on people to only have to run a couple of commands
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, removed that file
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
No description provided.