-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Problem
Currently, students need to set up both a GOOGLE_API_KEY
and an OPENAI_API_KEY
to run the topic modeling process:
GOOGLE_API_KEY
for Gemini 2.0 Flash (used by Kura for summarization)OPENAI_API_KEY
for Text-Embedding-3-Small (used for clustering) and classification work
This creates unnecessary friction for students who may:
- Not have access to both API providers
- Face budget constraints with multiple paid services
- Experience setup complexity that detracts from the learning objectives
Proposed Solution
Option 1: Standardize on OpenAI (Recommended)
- Use OpenAI's models for both embeddings and summarization
- Replace Gemini 2.0 Flash with GPT-4o-mini or GPT-4o for summarization
- Keep OpenAI Text-Embedding-3-Small for clustering
- Only require
OPENAI_API_KEY
Option 2: Standardize on Google
- Use Google's text embeddings for clustering
- Keep Gemini 2.0 Flash for summarization
- Only require
GOOGLE_API_KEY
Option 3: Provide fallback options
- Allow either API key to be provided
- Automatically select appropriate models based on available keys
- Provide clear documentation on which option to choose
Impact
This change would:
- ✅ Reduce setup friction for students
- ✅ Lower cost barriers (students only need one API provider)
- ✅ Simplify the getting started experience
- ✅ Maintain the same learning outcomes
Current Usage
From the README:
You will need a
GOOGLE_API_KEY
or anOPENAI_API_KEY
for running this topic modelling process. We're using the OpenAI Text-Embedding-3-Small embeddings for clustering and the Gemini-2.0-flash models for summarisation (used bykura
).
The documentation says "or" but the implementation actually requires both keys for the full workflow.
Files to Update
README.md
- Update installation instructions- Notebooks 1, 2, 3 - Update prerequisite sections
- Code that initializes Kura with custom summary models
- Any hardcoded model references
Priority
High - This affects the student experience for the AI Engineering Summit workshop.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request