Skip to content

cd-slash/speek

Repository files navigation

speek

Text-to-speech CLI using Google's Gemini API with configurable voices and styles.

Features

  • Multiple Voices: Choose from 5 different Gemini TTS voices
  • Configurable Speech Styles: Control tone and style with custom prompts
  • Temperature Control: Adjust creativity/consistency (0.0-2.0)
  • Persistent Config: Settings saved in ~/.speek/config.json
  • Interactive Setup: Guided configuration on first use
  • Multiple Input Methods: Direct text, stdin, or piped input

Installation

Using pnpm dlx (Recommended)

pnpm dlx speek "Hello world"

Global Installation

npm install -g speek
# or
pnpm install -g speek

Prerequisites

  1. SoX (for audio playback)

    # macOS
    brew install sox
    
    # Ubuntu/Debian
    sudo apt-get install sox
  2. Gemini API Key

Usage

Basic Usage

# Direct text
speek "Hello world"

# From stdin
echo "Hello world" | speek

# From file
cat file.txt | speek

Configuration

# Initial setup (prompted automatically if no API key)
speek --setup

# Show current config
speek --config

# Help
speek --help

Voice Options

  • Aoede - Calm, warm female voice (default)
  • Charon - Deep, authoritative male voice
  • Fenrir - Energetic, youthful male voice
  • Kore - Clear, professional female voice
  • Puck - Playful, animated voice

Speech Style Examples

  • "Speak naturally and clearly" (default)
  • "Speak with enthusiasm and energy"
  • "Speak in a calm, soothing tone"
  • "Speak like a professional news anchor"
  • "Speak with a friendly, conversational tone"
  • Custom styles supported

Configuration File

Settings are automatically saved to ~/.speek/config.json:

{
  "voiceName": "Aoede",
  "speechStyle": "You are a helpful assistant. Speak naturally and clearly.",
  "temperature": 1
}

Development

# Clone and install
git clone <repo>
cd speek
pnpm install

# Development
pnpm dev "Hello world"

# Build
pnpm build

# Test
pnpm start "Hello world"

API Reference

The tool uses Google's Gemini 2.5 Flash TTS model via the REST API with:

  • 24kHz, 16-bit, mono raw audio output
  • Configurable voice selection
  • System prompt injection for style control
  • Temperature-based creativity control

Troubleshooting

Common Issues

  1. "play command not found"

    • Install SoX: brew install sox (macOS) or apt-get install sox (Ubuntu)
  2. "GEMINI_API_KEY not set"

  3. Audio playback fails

    • Ensure SoX is properly installed
    • Check audio output device is working

Debug Mode

# Check config
speek --config

# Verify dependencies
speek "test" # Will check dependencies automatically

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published