Whisper Assistant is an extension for Visual Studio Code that transcribes your spoken words into text within the VSCode & Cursor editor. This hands-free approach to coding allows you to focus on your ideas instead of your typing.
✨ Features:
- Cross-platform audio recording with SoX (default) or custom recording commands
- Multiple API options: Local Docker, OpenAI, or Groq
- Configurable recording tools (ffmpeg, arecord, etc.) for advanced users
- Optimized for integration with AI coding assistants like Cursor
Whisper Assistant can also be integrated with other powerful AI tools, such as Chat GPT-4 or Cursor, to create a dynamic, AI-driven development environment.
By default, Whisper Assistant utilizes Whisper AI on your local machine, offering a free voice transcription service. For this, the base model of Whisper is used, balancing accuracy and performance. In the future, we will support other models.
There is also the option to use the OpenAI API or Groq API to transcribe your audio for remote transcription. Note: This requires an API key.
For more details about Whisper, visit the Whisper OpenAI GitHub page.
To install and setup Whisper Assistant, follow these steps:
-
Install a recording tool: Whisper Assistant uses SoX by default for microphone recording, but you can also configure a custom recording command using alternatives like ffmpeg.
- MacOS: Using the Homebrew package manager:
brew install sox
- Windows: Using the Chocolatey package manager:
Note for Windows Users: Some users have reported issues with newer SoX versions not recognizing the default audio device. If you encounter this, installing version 14.4.1 specifically might resolve the problem:
choco install sox.portable
choco install sox.portable --version=14.4.1
- Ubuntu/Debian:
sudo apt install sox
- Other Linux distributions: Use your package manager (e.g.,
yum install sox
,pacman -S sox
)
Linux users experiencing audio cutoff issues with SoX can use ffmpeg instead:
- Ubuntu/Debian:
sudo apt install ffmpeg
- MacOS:
brew install ffmpeg
- Windows:
choco install ffmpeg
After installation, configure the custom recording command in VS Code settings (see Custom Recording Commands section below).
- MacOS: Using the Homebrew package manager:
-
Install Docker to enable the local Whisper model or use the OpenAI API or Groq API for remote transcription.
- If using local transcription, follow the instructions in the Local Development with Faster Whisper section.
- If using remote transcription, follow the instructions in the Multiple API Options section.
-
Install the Whisper Assistant extension into Visual Studio Code or Cursor.
- Initialization: Upon loading Visual Studio Code, the extension verifies the correct installation of SoX (or your custom recording command if configured). If any issues are detected, an error message will be displayed.
Once initialization is complete, a microphone icon will appear in the bottom right status bar.
- Starting the Recording: Activate the extension by clicking on the quote icon or using the shortcut
Command+M
(for Mac) orControl+M
(for Windows). You can record for as long as you like, but remember, the longer the recording, the longer the transcription process. The recording time will be displayed in the status bar.
- Stopping the Recording: Stop the recording using the same shortcut (
Command+M
orControl+M
). The extension icon in the status bar will change to a loading icon, and a progress message will be displayed, indicating that the transcription is underway.
- Transcription: Once the transcription is complete, the text will be saved to the clipboard. This allows you to use the transcription in any program, not just within Visual Studio Code. If an editor is active, the transcription will be pasted there automatically.
Tip: A good microphone will improve transcription accuracy, although it is not a requirement.
Tip: For an optimal experience, consider using the Cursor.so application to directly call the Chat GPT-4 API for code instructions. This allows you to use your voice to instruct GPT to refactor your code, write unit tests, and implement various improvements.
Whisper Assistant uses SoX by default, but you can configure a custom recording command if you prefer alternatives like ffmpeg or need to work around platform-specific issues.
- Linux users experiencing audio cutoff: Some Linux distributions have issues with SoX cutting off the last few seconds of recordings
- Advanced users: Want to use specific audio settings or recording tools
- Specific microphone requirements: Need to target a particular audio device
- Open VS Code settings (
Cmd/Ctrl + ,
) - Search for "Whisper Assistant"
- Find "Custom Recording Command"
- Enter your command with the
$AUDIO_FILE
placeholder
Important: Your command MUST include $AUDIO_FILE
where the output file should be saved.
ffmpeg -f avfoundation -i :1 -ac 1 -ar 16000 -sample_fmt s16 $AUDIO_FILE
Note: Replace :1
with the appropriate device number from ffmpeg -f avfoundation -list_devices true -i ""
ffmpeg -f pulse -i default -ac 1 -ar 16000 -sample_fmt s16 $AUDIO_FILE
ffmpeg -f alsa -i default -ac 1 -ar 16000 -sample_fmt s16 $AUDIO_FILE
ffmpeg -f dshow -i audio="Microphone" -ac 1 -ar 16000 -sample_fmt s16 $AUDIO_FILE
Linux with arecord:
arecord -f S16_LE -c 1 -r 16000 $AUDIO_FILE
Any platform with custom settings:
sox -t pulseaudio default -c 1 -r 16000 $AUDIO_FILE gain -3
- Command validation error: Ensure your command includes
$AUDIO_FILE
- No audio recorded: Check your audio device permissions and microphone access
- Command not found: Verify the recording tool (ffmpeg, arecord, etc.) is installed and in your PATH
- Still experiencing cutoffs: Try adjusting buffer settings or switching recording tools
macOS (ffmpeg):
ffmpeg -f avfoundation -list_devices true -i ""
Linux (PulseAudio):
pactl list sources short
Linux (ALSA):
arecord -l
Windows (ffmpeg):
ffmpeg -list_devices true -f dshow -i dummy
To enhance your development experience with Cursor.so and Whisper Assistant, follow these simple steps:
- Start the recording: Press
Command+M
(Mac) orControl+M
(Windows). - Speak your instructions clearly.
- Stop the recording: Press
Command+M
(Mac) orControl+M
(Windows). Note: This initiates the transcription process. - Open the Cursor dialog: Press
Command+K
orCommand+L
. Important: Do this before the transcription completes. - The transcribed text will automatically populate the Cursor dialog. Here, you can edit the text or add files/docs, then press
Enter
to execute the GPT query.
By integrating Cursor.so with Whisper Assistant, you can provide extensive instructions without the need for typing, significantly enhancing your development workflow.
Whisper Assistant has been tested and supports:
- macOS: Full support with SoX (default) and ffmpeg (custom)
- Windows: Full support with SoX (default) and ffmpeg (custom)
- Linux: Full support with SoX (default) and ffmpeg (custom) - Note: Some distributions may experience audio cutoff issues with SoX, for which ffmpeg is recommended
If you encounter any platform-specific issues, please consider using the custom recording command feature or report the issue on our GitHub repository.
This extension supports using a local Faster Whisper model through Docker. This provides fast transcription locally and doesn't require an API key.
To get started with local transcription, use our Docker image:
docker run -d -p 4444:4444 --name whisper-assistant martinopensky/whisper-assistant:latest
Then configure VSCode:
- Open VSCode settings (File > Preferences > Settings)
- Search for "Whisper Assistant"
- Set "Api Provider" to "localhost"
- Set "Api Key" to any non-empty string (e.g., "localhost-dummy-key")
That's it! You can now use the extension with your local Whisper server.
If you're experiencing memory issues, you can limit the container's memory:
docker run -d -p 4444:4444 --memory=4g --name whisper-assistant martinopensky/whisper-assistant:latest
If you have a CUDA-capable GPU:
docker run -d -p 4444:4444 --gpus all --name whisper-assistant martinopensky/whisper-assistant:latest
# Stop the server
docker stop whisper-assistant
# Start the server
docker start whisper-assistant
# Remove the container
docker rm whisper-assistant
# View logs
docker logs whisper-assistant
# Update to latest version
docker pull martinopensky/whisper-assistant:latest
docker stop whisper-assistant
docker rm whisper-assistant
docker run -d -p 4444:4444 martinopensky/whisper-assistant:latest
-
Check if the server is running:
curl http://localhost:4444/v1/health
-
Common issues:
- First startup delay: The model is downloaded on first use, which may take a few minutes
- Memory issues: Try using the
--memory=4g
flag as shown above - Port conflicts: If port 4444 is in use, you can map to a different port:
Then update the custom endpoint in VSCode settings to
docker run -d -p 5000:4444 martinopensky/whisper-assistant:latest
http://localhost:5000
If you want to customize the server, you can build from our Dockerfile:
- Get the Dockerfile from our repository
- Build the image:
docker build -t whisper-assistant-local . docker run -d -p 4444:4444 whisper-assistant-local
Whisper Assistant offers three ways to transcribe your audio:
- Local Docker Server (Default): Run Whisper locally using our Docker container for privacy and no remote API costs
- OpenAI Cloud API: A powerful cloud option using OpenAI's Whisper-1 model for fast, accurate transcription (requires API key)
- Groq Cloud API: A powerful cloud option using Groq's Whisper Large v3 Turbo model for fast, accurate transcription (requires API key)
- Open VSCode settings (File > Preferences > Settings)
- Search for "Whisper Assistant"
- Set "Api Provider" to one of:
localhost
(default)openai
groq
- Enter your API key:
- For localhost: Any non-empty string (e.g., "localhost-dummy-key")
- For OpenAI: Get your key from OpenAI's console
- For Groq: Get your key from GROQ's console
When using localhost (default), you can customize the endpoint URL in settings if you're running the Docker container on a different port or host.