A powerful text-to-speech (TTS) extension for the Zathura PDF viewer, designed to provide a seamless and feature-rich audio reading experience.
This plugin allows you to listen to your PDF documents directly within Zathura, with advanced controls for playback, voice customization, and navigation. It integrates with modern, high-quality TTS engines to deliver clear and natural-sounding speech.
This repository contains both the TTS plugin and a modified version of Zathura with utility plugin support:
zathura-liberated/
βββ zathura-tts/ # TTS plugin implementation
βββ zathura/ # Modified Zathura with utility plugin support (submodule)
βββ zathura-pdf-poppler/ # PDF plugin (submodule)
βββ docs/ # Comprehensive documentation
Why this structure? We needed to extend Zathura's core to support utility plugins (vs just document plugins). This monorepo approach makes development and installation much simpler while we prove the concept. Once utility plugins are accepted upstream, this could be split into separate repositories.
- Multiple TTS Engines: Supports Piper-TTS (high-quality neural voices), Speech Dispatcher (system integration), and espeak-ng (reliable fallback).
- Playback Control: Play, pause, and stop audio narration with simple keyboard shortcuts.
- Navigation: Skip forward and backward by sentence or paragraph.
- Voice Customization: Adjust reading speed and select from available voices.
- Visual Feedback: Highlights the text currently being read.
- Continuous Reading: Automatically proceeds to the next page.
- Special Content Handling: Announces tables, lists, and other non-standard content.
(demo.gif)
This TTS plugin requires a modified version of Zathura with utility plugin support. The standard Zathura from package managers will not work.
Two options to get the required Zathura:
-
Use our pre-built version (included as submodule):
git clone --recursive https://github.com/ubuntupunk/zathura-liberated.git cd zathura-liberated # The submodule automatically points to our fork with utility plugin support # Build and install modified Zathura (see Installation section)
-
Apply our patch to upstream Zathura:
git clone https://github.com/pwmt/zathura.git cd zathura git apply ../0001-Add-utility-plugin-support-and-TTS-API-functions.patch # Build and install
-
Use our fork directly:
git clone https://github.com/ubuntupunk/zathura.git cd zathura git checkout feature/utility-plugin-support # Build and install
- Modified Zathura: With utility plugin support (see above)
- girara-gtk3: Version 0.4.0 or higher
- GLib: Version 2.50 or higher
- GTK+ 3: Version 3.22 or higher
- Speech Dispatcher (optional, for system TTS)
- GLib Version: Tested with GLib 2.74.6 (Debian 12). May need adjustments for newer GLib versions
- Upstream Sync: Our Zathura fork may not be fully synchronized with latest upstream development
- Build Dependencies: Ensure you have compatible versions of girara-gtk3, GTK+3, and related libraries
- Feature Branch: Our modifications are in the
feature/utility-plugin-support
branch of the fork
- Python: Version 3.8 or higher
- piper-tts: Version 1.2.0 or higher
Since this plugin requires a modified version of Zathura, you must build and install it first:
# Clone with submodules
git clone --recursive https://github.com/ubuntupunk/zathura-liberated.git
cd zathura-liberated
# Build and install modified Zathura
cd zathura
meson setup builddir
meson compile -C builddir
sudo meson install -C builddir
# Build and install PDF plugin
cd ../zathura-pdf-poppler
meson setup builddir
meson compile -C builddir
sudo meson install -C builddir
cd ..
# Example for Debian/Ubuntu
sudo apt install libgirara-dev libgtk-3-dev libglib2.0-dev speech-dispatcher
# Example for Arch Linux
sudo pacman -S girara gtk3 glib2 speech-dispatcher
For the best audio quality, install the Piper-TTS Python package.
pip install piper
Build and install the TTS plugin:
# From the zathura-liberated directory
cd zathura-tts
meson setup builddir
meson compile -C builddir
sudo meson install -C builddir
-
Download Piper Voices: Piper requires voice models to function. You can download high-quality voices from the Piper Voices repository on Hugging Face.
Each voice consists of a
.onnx
file and a.onnx.json
file.Example: To download the
en_US-lessac-medium
voice:wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/lessac/medium/en_US-lessac-medium.onnx.json
You should place these voice files in a directory where the plugin can find them, such as
~/.local/share/zathura-tts/voices/
. You can configure the voice path in yourzathurarc
.
Once installed, the TTS functionality can be controlled with the following keyboard shortcuts in Zathura:
Shortcut | Action |
---|---|
Ctrl+T |
Toggle TTS on/off |
Ctrl+Space |
Pause/Resume reading |
Ctrl+Right |
Skip to the next sentence |
Ctrl+Left |
Go to the previous sentence |
Ctrl+Shift+T |
Open TTS settings |
The plugin can be configured by editing the zathurarc
file or through the TTS settings dialog (Ctrl+Shift+T
).
Available options include:
tts_engine
:piper
,speech-dispatcher
, orespeak
(default:piper
).tts_voice
: Specify a voice for the selected engine.tts_speed
: Reading speed multiplier (0.5x to 3.0x).tts_highlight_color
: Color for highlighting spoken text.
The plugin selects the best available TTS engine in the following order:
- Piper-TTS: A high-quality, fast, and local neural text-to-speech system. It offers the most natural-sounding voices and is the recommended engine.
- Speech Dispatcher: A common interface for system-level TTS services on Linux. It provides access to a wide range of voices that may already be installed on your system.
- espeak-ng: A compact and reliable software speech synthesizer. It serves as a fallback and works on most systems without extra configuration.
For development, you can build the plugin without installing it system-wide.
meson setup build
meson compile -C build
To run tests:
meson test -C build
- Enhanced handling of scientific and mathematical notation.
- Support for more TTS engines and platforms.
- Improved UI for settings and voice selection.
- End-to-end testing with various document types.
Contributions are welcome! Please feel free to submit a pull request or open an issue for bugs, feature requests, or suggestions.
This project is licensed under the Zlib license. See the LICENSE file for details.