Whisper To Input

Whisper To Input, also known by its Mandarin name 輕聲細語輸入法, is an Android keyboard that performs speech-to-text (STT/ASR) with OpenAI Whisper and input the recognized text; Supports English, Chinese, Japanese, etc. and even mixed languages and Taiwanese.

The STT/ASR backend service can be either OpenAI API, Whisper ASR Webservice, or NVIDIA NIM (based on NVIDIA Riva).

Installation

Download the APK file from the latest release to your phone.
Locate the APK file in your phone and click it. Click "Install" to install the app.
An Unsafe app blocked warning will pop up. Click More details and then click Install anyway. Click Open to open the app.
Allow the app to record audio and send notifications. These permissions are required for the app to work properly. If you accidentally denied the permissions, you must go to the app settings page to allow them.

Go to the app settings page and enter your configuration. You have 2 choices, either using the official OpenAI API with your API key or self-host a Whisper ASR Webservice. For more information, see the Services section.

Some example configurations:

OpenAI API:

Speech to Text Backend:  OpenAI API
Endpoint:                https://api.openai.com/v1/audio/transcriptions
API Key:                 sk-...xxxx
Model:                   whisper-1
Language Code:

Whisper ASR Webservice:

Speech to Text Backend:  Whisper ASR Webservice
Endpoint:                http://<SERVER_IP>:9000/asr
API Key:
Model:
Language Code:

NVIDIA NIM:

Speech to Text Backend:  NVIDIA NIM
Endpoint:                http://<SERVER_IP>:9000/v1/audio/transcriptions
API Key:
Model:
Language Code:           multi

Go to the system settings page and enable the app keyboard. This process may vary depending on your Android version and phone model. The following screenshots are taken on Android 13 of a Asus Zenfone 8.
Open any app that requires text input, such as a browser, and click the input box. Choose the app keyboard by clicking the bottom right button and choosing Whisper Input.

Please note that the keyboard GUI will look slightly different than the screenshots due to the fixes in #50.
Click the microphone button to start recording. After you finish speaking, click the microphone button again. The recognized text will be inputted into the text box.

Keyboard Usage

Microphone Key in the center: Click to start recording, click again to stop recording, and input the recognized text.
Cancel Key in the bottom left (Only visible when recording): Click to cancel the current recording.
Backspace Key in the upper right: Delete the previous character. If you press and hold this key, it will keep deleting characters until you release it.
Enter Key in the bottom right: Input a newline character. If you press this while recording, it will stop recording and input the recognized text with a trailing newline.
Settings Key in the upper left: Open the app settings page.
Switch Key in the upper left: Switch to the previous input method. Note that if there were no previous input method, this key will not do anything.

Services

Either one of the following service can be used as the STT/ASR backend.

OpenAI API

Requires an OpenAI API key.

See the documentation for more info.

Whisper ASR Webservice

The most commonly used open-source self-host whisper service. Requires a self-hosted server.

Whisper ASR Webservice can be set up as described in #13.

NVIDIA NIM (Self-hosted)

NVIDIA's optimized whisper model using TensorRT-LLM. Requires a self-hosted server.

Use the openai/whisper-large-v3 NIM by following the deployment guide. After generating a NGC API key, run:

export NGC_API_KEY=<PASTE_API_KEY_HERE>
export LOCAL_NIM_CACHE=~/.cache/nim
mkdir -p "$LOCAL_NIM_CACHE"

docker run -it --rm --name=riva-asr \
   --runtime=nvidia \
   --gpus '"device=0"' \
   --shm-size=8GB \
   -e NGC_API_KEY \
   -e NIM_HTTP_API_PORT=9000 \
   -e NIM_GRPC_API_PORT=50051 \
   -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \
   -u $(id -u) \
   -p 9000:9000 \
   -p 50051:50051 \
   -e NIM_TAGS_SELECTOR=name=whisper-large-v3 \
   nvcr.io/nim/nvidia/riva-asr:1.3.0

and wait for a while. It should show the following when launched successfully:

INFO:uvicorn.error:Uvicorn running on http://0.0.0.0:9000 (Press CTRL+C to quit)

Perform a health check:

curl -X 'GET' 'http://localhost:9000/v1/health/ready'

this should show:

{"ready":true}

Download a sample audio:

# MP3 will not work, use wav instead.
wget https://github.com/audio-samples/audio-samples.github.io/raw/refs/heads/master/samples/wav/ted_speakers/BillGates/sample-0.wav

Then test it:

curl --request POST \
  --url http://localhost:9000/v1/audio/transcriptions \
  --header 'Content-Type: multipart/form-data' \
  --form file=@./sample-0.wav \
  --form language=multi \
  --form response_format=text

this should show:

"A cramp is no small danger on a swim. "

The following Python package is used in the official guide, but isn't required. We still include these instructions here for reference:

sudo apt-get install python3-pip
pip install nvidia-riva-client
git clone https://github.com/nvidia-riva/python-clients.git

and

python3 python-clients/scripts/asr/transcribe_file_offline.py --server 0.0.0.0:50051 --input-file ./sample-0.wav --language-code multi

will show something like:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "a cramp is no small danger on a swim ",
          "confidence": 0.0,
          "words": [],
          "languageCode": []
        }
      ],
      "channelTag": 1,
      "audioProcessed": 2.84625
    }
  ],
  "id": {
    "value": "dc98a7d8-487a-4825-bf3c-6f9621914246"
  }
}
Final transcript: a cramp is no small danger on a swim

and

python3 python-clients/scripts/asr/transcribe_file_offline.py --list-models

will show:

Available ASR models
{'en,zh,de,es,ru,ko,fr,ja,pt,tr,pl,ca,nl,ar,sv,it,id,hi,fi,vi,he,uk,el,ms,cs,ro,da,hu,ta,no,th,ur,hr,bg,lt,la,mi,ml,cy,sk,te,fa,lv,bn,sr,az,sl,kn,et,mk,br,eu,is,hy,ne,mn,bs,kk,sq,sw,gl,mr,pa,si,km,sn,yo,so,af,oc,ka,be,tg,sd,gu,am,yi,lo,uz,fo,ht,ps,tk,nn,mt,sa,lb,my,bo,tl,mg,as,tt,haw,ln,ha,ba,jw,su,yue,multi': [{'model': ['whisper-large-v3-multi-asr-offline-asr-bls-ensemble']}]}

Debugging

All current builds in the release page are debug builds. To view the logs, enable USB debugging, connect your phone to a PC, and use adb logcat to view the logs. If you have a local Android Studio install, launch the ADB tool, otherwise, you may want to consider installing a minimal standalone ADB from Minimal ADB and Fastboot.

Below are some useful adb logcat commands:

adb devices
adb logcat *:E
adb logcat *:W
adb logcat *:I
adb logcat *:D
adb logcat *:V

See the adb doc for more info.

Permission Description

RECORD_AUDIO: Required for the app to record audio for voice input.
POST_NOTIFICATIONS: Required for the app to show toasts in the background if any error occurs.

FAQ and Known Issues

Sometimes the keyboard will silently fail, please see issue #17 for further information.
Taiwanese (or Hokkien) transcription seems to work quiet well, although not declared officially (thanks @ijsun for discovering this). To support Taiwanese transcription, do not set the Language Code in the settings page.

Please open an issue if you have any questions.

Developer Notes

To build and release a newer version, open Android Studio and follow the steps below:

Bump the version code and version name.
Retrieve signing keystore and its password from Johnson.
Menu: Build > Generate Signed App Bundle / APK...
Select APK, and fill in the signing keystore path and password, and select Debug in the next step to build.
Rename the file whisper-to-input/android/app/build/outputs/apk/debug/app-debug.apk accordingly.
Create Git tag and release on GitHub.

See the official document for more information.

License

This repository is licensed under the GPLv3 license. For more information, please refer to the LICENSE file.

Main Contributors: Yan-Bin Diau (@tigerpaws01), Johnson Sun (@j3soon), Ying-Chou Sun (@ijsun)

For a complete list of contributors to the code of this repository, please visit the contributor list.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.github		.github
android		android
docs/images		docs/images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Whisper To Input

Installation

Keyboard Usage

Services

OpenAI API

Whisper ASR Webservice

NVIDIA NIM (Self-hosted)

Debugging

Permission Description

FAQ and Known Issues

Developer Notes

License

About

Uh oh!

Releases 4

Sponsor this project

Uh oh!

Packages

Contributors 3

Uh oh!

Languages

Uh oh!

j3soon/whisper-to-input

Folders and files

Latest commit

History

Repository files navigation

Whisper To Input

Installation

Keyboard Usage

Services

OpenAI API

Whisper ASR Webservice

NVIDIA NIM (Self-hosted)

Debugging

Permission Description

FAQ and Known Issues

Developer Notes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 4

Sponsor this project

Uh oh!

Packages 0

Contributors 3

Uh oh!

Languages

Packages