Skip to content

tassa-yoniso-manasi-karoto/langkit

 
 

Repository files navigation

Status: alpha

Langkit is an all-in-one tool designed to facilitate language learning from native media content using a collection of diverse features to transform movies, TV shows, etc., into easily ‘digestible’ material.

It supports automatic subtitle detection, bulk/recursive directory processing, seamless resumption of previously interrupted processing runs and multiple native (reference) language fallback.

Features

  • Subs2cards: Make Anki cards from subtitle timecodes like subs2srs
  • Making dubtitles¹: Make a subtitle file of dubs using Speech-To-Text
  • Voice enhancing: Make voices louder than Music&Effects
  • Condensed Audio: generate an abridged audio file containing only the dialogue from media for passive immersion (see explanation video linked below)
  • Subtitle romanization²
  • Subtitle tokenization: Separate words with spaces for languages which don't use spaces
  • Selective transliteration: selective transliteration of subtitles based on logogram frequency. Currently only japanese Kanjis are supported. Kanji with a frequency rank below the user-defined frequency threshold and regular readings are preserved, while others are converted to hiragana.

¹ 'dubtitles' is a subtitle file that matches the dubbing lines exactly. It is needed because translations of dubbings and of subtitles differ, as explained here

² for the list of supported languages by the transliteration feature see here

Important

Some features require an API key because certain processing tasks, such as speech-to-text, audio enhancement, are outsourced to an external provider like Replicate. These companies offer cloud-based machine learning models that handle complex tasks remotely, allowing Langkit to leverage the models without requiring local computation.
The cost of running a few processing tasks using these models is typically very low or free.

Warning

⚠️ About Feature Combinations: ⚠️
langkit provides numerous features, some of which may overlap or influence each other's behavior, creating a complex network of conditional interactions. Although relatively extensive testing has been conducted, the multitude of possible combinations mean that certain specific scenarios will still contain bugs / unexpected behavior, especially when utilizing less common or more intricate feature combinations. Users are encouraged to report any issues encountered either with the Debug Report exported from the Settings panel or with the Crash Report.

Langkit within Anki

Langkit can run as a standalone but it can now also run directly inside Anki. This offers better performance so it's the recommended way to use Langkit.

Anki Addon available here

tldr cli

𝗕𝗮𝘀𝗶𝗰 𝘀𝘂𝗯𝘀𝟮𝘀𝗿𝘀 𝗳𝘂𝗻𝗰𝘁𝗶𝗼𝗻𝗮𝗹𝗶𝘁𝘆
$ langkit subs2cards media.mp4 media.th.srt media.en.srt

𝗕𝘂𝗹𝗸 𝗽𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴 𝘄𝗶𝘁𝗵 𝗮𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗰 𝘀𝘂𝗯𝘁𝗶𝘁𝗹𝗲 𝘀𝗲𝗹𝗲𝗰𝘁𝗶𝗼𝗻 (𝘩𝘦𝘳𝘦: 𝘭𝘦𝘢𝘳𝘯 𝘣𝘳𝘢𝘻𝘪𝘭𝘪𝘢𝘯 𝘱𝘰𝘳𝘵𝘶𝘨𝘦𝘴𝘦 𝘧𝘳𝘰𝘮 𝘤𝘢𝘯𝘵𝘰𝘯𝘦𝘴𝘦 𝘰𝘳 𝘵𝘳𝘢𝘥𝘪𝘵𝘪𝘰𝘯𝘢𝘭 𝘤𝘩𝘪𝘯𝘦𝘴𝘦)
$ langkit subs2cards media.mp4 -l "pt-BR,yue,zh-Hant"

𝗦𝘂𝗯𝘁𝗶𝘁𝗹𝗲 𝘁𝗿𝗮𝗻𝘀𝗹𝗶𝘁𝗲𝗿𝗮𝘁𝗶𝗼𝗻 (+𝘁𝗼𝗸𝗲𝗻𝗶𝘇𝗮𝘁𝗶𝗼𝗻 𝗶𝗳 𝗻𝗲𝗰𝗲𝘀𝘀𝗮𝗿𝘆)
$ langkit translit media.ja.srt

𝗠𝗮𝗸𝗲 𝗮𝗻 𝗮𝘂𝗱𝗶𝗼𝘁𝗿𝗮𝗰𝗸 𝘄𝗶𝘁𝗵 𝗲𝗻𝗵𝗮𝗻𝗰𝗲𝗱/𝗮𝗺𝗽𝗹𝗶𝗳𝗶𝗲𝗱 𝘃𝗼𝗶𝗰𝗲𝘀 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝟮𝗻𝗱 𝗮𝘂𝗱𝗶𝗼𝘁𝗿𝗮𝗰𝗸 𝗼𝗳 𝘁𝗵𝗲 𝗺𝗲𝗱𝗶𝗮 (𝘙𝘦𝘱𝘭𝘪𝘤𝘢𝘵𝘦 𝘈𝘗𝘐 𝘵𝘰𝘬𝘦𝘯 𝘯𝘦𝘦𝘥𝘦𝘥)
$ langkit enhance media.mp4 -a 2 --sep demucs

𝗠𝗮𝗸𝗲 𝗱𝘂𝗯𝘁𝗶𝘁𝗹𝗲𝘀 𝘂𝘀𝗶𝗻𝗴 𝗦𝗽𝗲𝗲𝗰𝗵-𝘁𝗼-𝗧𝗲𝘅𝘁 𝗼𝗻 𝘁𝗵𝗲 𝘁𝗶𝗺𝗲𝗰𝗼𝗱𝗲𝘀 𝗼𝗳 𝗽𝗿𝗼𝘃𝗶𝗱𝗲𝗱 𝘀𝘂𝗯𝘁𝗶𝘁𝗹𝗲 𝗳𝗶𝗹𝗲 (𝘙𝘦𝘱𝘭𝘪𝘤𝘢𝘵𝘦 𝘈𝘗𝘐 𝘵𝘰𝘬𝘦𝘯 𝘯𝘦𝘦𝘥𝘦𝘥)
$ langkit subs2dubs --stt whisper media.mp4 (media.th.srt) -l "th"

𝗖𝗼𝗺𝗯𝗶𝗻𝗲 𝗮𝗹𝗹 𝗼𝗳 𝘁𝗵𝗲 𝗮𝗯𝗼𝘃𝗲 𝗶𝗻 𝗼𝗻𝗲 𝗰𝗼𝗺𝗺𝗮𝗻𝗱
$ langkit subs2cards /path/to/media/dir/  -l "th,en" --stt whisper --sep demucs --translit

Warning

The focus of my recent work has been the GUI therefore the CLI has been much less tested and is much more unstable at this point. Some features are not yet supported on the CLI.

Features in detail...

Subs2cards

Subs2cards converts your favorite TV shows and movies directly into Anki flashcards by extracting dialogues, images, and audio clips based on subtitle timecodes. It's ideal for sentence mining and context-aware word memorization.

Extra features compared to subs2srs

  • Default encoding to OPUS / AVIF: Use modern codecs to save storage.
  • Parallelization / multi-threading by default: By default all CPU cores available are used. You can reduce CPU usage by passing a lower --workers value than the default.
  • Bulk / recursive directory processing: if you pass a directory instead of a mp4. The target and native language must be set using -l, see tldr section.
  • Seamless resumption of previously interrupted runs
  • Dubtitles as source of truth of for subtitle lines (when both are selected together)

Condensed Audio

langkit can make an audio file containing all the audio snippets of dialog in the audiotrack.
This is meant to be used for passive listening.
👉 Explanations and context here: Optimizing Passive Immersion: Condensed Audio - YouTube

Additionally a summary of the episode/media can be generated by an AI (LLM) in your native language and embeded in the audiofile to refresh your memories before listening and help your understand the condensed audio's content. Openrouter serves a number of LLMs that can do this for free (Deepseek, Llama, Qwen...).

Voice Enhancing

Boosts clarity of speech in audio tracks by amplifying voices while reducing background music and effects. Ideal for learners who struggle with distinguishing words clearly, particularly useful for tonal languages or when studying languages with dense or unfamiliar phonetic patterns.

This feature works by merging the original audiotrack with negative gain together with an audiotrack containing the voices only with additional gain. Obtaining the isolated voice track requires running using one of these deep learning audio separation tool in the cloud:

Name (to be passed with --sep) Quality of separated vocals Price Type Note
demucs, de good very cheap 0.063$/run MIT license Recommended
demucs_ft, ft good cheap 0.252$/run MIT license Fine-tuned version: "take 4 times more time but might be a bit better".
I couldn't hear any difference with the original in my test.
spleeter, sp rather poor very, very cheap 0.00027$/run MIT license
elevenlabs, 11, el good very, very expensive
1$/MINUTE
proprietary Not supported on the GUI. Not fully supported on the CLI
due to limitations of their API (mp3 only) which desync the processed audio with the original. Does more processing than the others: noises are entirely eliminated, but it distort the soundstage to put the voice in the center. It might feel a bit uncanny in an enhanced track.

Dubtitles

Creates accurate subtitle files specifically synchronized with dubbed audio tracks using speech-to-text. This addresses the common mismatch between subtitle translations and audio dubs, ensuring text follows closely spoken dialogue.

Important

Don't rely on the Word Error Rate (WER) for all languages, check for your specific target language!

Provider Name (to be passed with --stt) Num of lang supported (WER<50%) WER (ALL languages) (lower is better) WER per language Price Type
OpenAI gpt-4o-transcribe 57 8,9% here $6/1000min closed
OpenAI gpt-4o-mini-transcribe 57 13,9% here $3/1000min closed
Elevenlabs scribe 99 7,7% here $6.67/1000min closed
Replicate whisper, wh 57 10,3% here $1.1/1000min MIT
Replicate insanely-fast-whisper, fast 57? 16,2% n/a $0.0071/run MIT

Subtitle romanization

Convert subtitles into a roman character version as phonetically accurate as possible

The list of supported languages by the transliteration feature is here

Subtitle tokenization

Separate words with spaces for languages which don't use spaces

The list of supported languages by the tokenization feature is here

Selective (Kanji) Transliteration

Automatically transliterates Japanese subtitles from kanji into hiragana based on user-defined frequency thresholds and phonetic regularity. This feature helps Japanese learners focus on common kanji by selectively converting rarer or irregular characters into easier-to-read hiragana, facilitating incremental kanji learning and smoother immersion into native content.

The frequency list comes from 6th edition of "Remembering the Kanji" by James W. Heisig and supports the most 3000 frequent Kanjis.

FAQ

On Windows I get a blue popup that says "Windows protected your PC" error when trying to run Langkit.

When running for the first time, Windows may show "Windows protected your PC":

  1. Click "More info"
  2. Click "Run anyway"

This is normal for unsigned software that isn't widely used and should only happen once.

For a more technical explanation: This error is triggered by Windows Defender SmartScreen, a security feature that protects against unknown applications. Langkit is flagged because it is not a widely-used application, it has not been seen by Microsoft's telemetry systems and therefore has no history of being safe.

How do I get these API keys?

API keys are only visible once during creation.

Replicate

  • Navigation: Click your username (top left) → API Tokens → Create token
  • URL: https://replicate.com/account/api-tokens
  • Limited free credits for new users. After free credits, pay-as-you-go starting at $0.000100/second for CPU.

OpenRouter

  • Navigation: Login → Keys → Create API Key → Name Key → Create
  • URL: https://openrouter.ai
  • Small free allowance for testing. Several models offer free variants marked with :free.

OpenAI

  • Navigation: Dashboard → API Keys (left menu under "Organization") → Create new secret key → Name key → Generate
  • URL: https://platform.openai.com
  • No free credits for new accounts. Phone verification required. Must purchase credits before API usage.

Google AI

  • Navigation: Dashboard (top right) → Create an API key
  • URLs: https://aistudio.google.com
  • Generious free tier. No credit card required for free access but rated limited.

ElevenLabs

Why isn't there the possibility to run the speech-to-text or voice separation locally?

Because I only have a 10 year old Pentium CPU with a graphic chipset.

Why is the executable/binary so heavy ?

The official Docker + Docker Compose libraries and their dependencies make up most of the size of the executable.

Download

See Releases

Requirements

The binary's location for FFmpeg and Mediainfo can be provided by a flag, in $PATH or in a "bin" directory placed in the folder where langkit is.

Using static FFmpeg builds guarantee that you have up-to-date codecs. If you don't use a well-maintained bleeding edge distro or brew, use the dev builds. You can check your distro here.

API Keys

Certain features, like voice enhancement and speech-to-text, require API keys from external cloud services. You can provide these keys using the GUI (recommended) or via environment variables for CLI-only usage.

Method 1: GUI Settings Panel (Recommended)

The easiest way to configure your keys is through the Settings panel in the Langkit application. Simply paste your keys into the corresponding fields and click "Save Changes".

Important

Keys entered in the GUI are stored in a plain text (unencrypted) configuration file on your system. While convenient, this is less secure than using the environment variable method below and this file should be deleted in case you are using a public computer.
The configuration file is located at:

  • Windows: %APPDATA%\langkit\config.yaml (use Notepad++ to open it)
  • Linux: ~/.config/langkit/config.yaml
  • macOS: ~/Library/Application Support/langkit/config.yaml

Method 2: Environment Variables (no API key persistence)

For CLI users or those who prefer not to store keys in a file, you can use environment variables. Set them in your shell (or in config file) before running Langkit.

Service Environment Variable
Replicate REPLICATE_API_KEY
ElevenLabs ELEVENLABS_API_KEY
OpenAI OPENAI_API_KEY
OpenRouter OPENROUTER_API_KEY
Google AI GOOGLE_API_KEY

How Keys are Handled (Precedence and Saving)

  1. Environment variables always take precedence. If an environment variable is set, its value will be used for processing, even if a different key is saved in the GUI's configuration file.
  2. The GUI is designed to protect your keys. When you open the settings panel, it will load and display keys from your environment variables. However, to avoid writing secrets from your environment to disk, it will only save a key to the configuration file if you explicitly paste a new value into an API key field in the GUI.
  3. The CLI does not write to the configuration file. It will read and use keys from environment variables or the config file but will never save them.
  4. Exported crash/debug reports are sanitized. They are guaranteed not to leak any API keys.

Output

(section may be outdated)

Before you can import the deck with Anki though, you must add a new Note Type which includes some or all of the fields below on the front and/or back of each card. The columns in the generated .tsv file are as follows:

# Name Description
1 Sound Extracted audio as a [sound] tag for Anki
2 Time Subtitle start time code as a string
3 Source Base name of the subtitle source file
4 Image Selected image frame as an <img> tag
5 ForeignCurr Current text in foreign subtitles file
6 NativeCurr Current text in native subtitles file
7 ForeignPrev Previous text in foreign subtitles file
8 NativePrev Previous text in native subtitles file
9 ForeignNext Next text in foreign subtitles file
10 NativeNext Next text in native subtitles file

When you review the created deck for the first time, you should go quickly through the entire deck at once. During this first pass, your goal should be to identify those cards which you can understand almost perfectly, if not for the odd piece of unknown vocabulary or grammar; all other cards which are either too hard or too easy should be deleted in this pass. Any cards which remain in the imported deck after mining should be refined and moved into your regular deck for studying the language on a daily basis.

Build & Development

See DEV.md

Aknowledgements

  • Special thanks to Matt vs Japan for his excellent video essays on language acquisition.

Linguistic Tools & Data

  • The core subs2cards functionality was first pioneered by cb4960 with the original subs2srs project.
  • Langkit began as a direct fork of Bunkai) by ustuehler, which reimplemented subs2srs in Go.
  • Japanese morphological analysis is provided by the ichiran project.
  • Indic scripts transliteration relies on the comprehensive Aksharamukha script converter.
  • Thai transliteration is made possible by the go-rod library for browser automation and thai2english website

Technical

This project stands on the shoulders of giants and would not be possible without numerous open-source projects' contributions:

  • Containerized linguistic analysis is managed with Docker Compose.
  • Essential media processing depends on the indispensable FFmpeg and MediaInfo tools.
  • The graphical user interface is:
  • Shout out to the excellent pyglossary dictionary files conversion tool which inspired me to create a log viewer inside the GUI as well

License

All new contributions from commit d540bd4 onward are licensed under GPL-3.0.

Support the project

Packages

No packages published

Languages

  • Svelte 39.1%
  • Go 37.4%
  • TypeScript 14.8%
  • Python 3.2%
  • Rust 2.6%
  • CSS 1.6%
  • Other 1.3%