Skip to content

Simplify API key requirements for students - avoid requiring both OpenAI and Google API keys #3

@jxnl

Description

@jxnl

Problem

Currently, students need to set up both a GOOGLE_API_KEY and an OPENAI_API_KEY to run the topic modeling process:

  • GOOGLE_API_KEY for Gemini 2.0 Flash (used by Kura for summarization)
  • OPENAI_API_KEY for Text-Embedding-3-Small (used for clustering) and classification work

This creates unnecessary friction for students who may:

  1. Not have access to both API providers
  2. Face budget constraints with multiple paid services
  3. Experience setup complexity that detracts from the learning objectives

Proposed Solution

Option 1: Standardize on OpenAI (Recommended)

  • Use OpenAI's models for both embeddings and summarization
  • Replace Gemini 2.0 Flash with GPT-4o-mini or GPT-4o for summarization
  • Keep OpenAI Text-Embedding-3-Small for clustering
  • Only require OPENAI_API_KEY

Option 2: Standardize on Google

  • Use Google's text embeddings for clustering
  • Keep Gemini 2.0 Flash for summarization
  • Only require GOOGLE_API_KEY

Option 3: Provide fallback options

  • Allow either API key to be provided
  • Automatically select appropriate models based on available keys
  • Provide clear documentation on which option to choose

Impact

This change would:

  • ✅ Reduce setup friction for students
  • ✅ Lower cost barriers (students only need one API provider)
  • ✅ Simplify the getting started experience
  • ✅ Maintain the same learning outcomes

Current Usage

From the README:

You will need a GOOGLE_API_KEY or an OPENAI_API_KEY for running this topic modelling process. We're using the OpenAI Text-Embedding-3-Small embeddings for clustering and the Gemini-2.0-flash models for summarisation (used by kura).

The documentation says "or" but the implementation actually requires both keys for the full workflow.

Files to Update

  • README.md - Update installation instructions
  • Notebooks 1, 2, 3 - Update prerequisite sections
  • Code that initializes Kura with custom summary models
  • Any hardcoded model references

Priority

High - This affects the student experience for the AI Engineering Summit workshop.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions