Warning
Breaking changes from V1 to V2, see new arguments below
This Python script extracts both text and image-based subtitles from media files and saves them as .ass, .srt, or .vtt subtitle files. Additionally, it includes a customizable post-processor that standardizes the styling of .ass subtitles while retaining their original positioning.
In summary:
run script --> extract subtitle files --> postprocess --> output
The extraction process first uses ffprobe
to identify all available subtitle streams. Once identified, the ffmpeg
command is executed to extract the desired subtitle streams into the specified formats.
For image-based subtitles, the subtitle stream is converted into .sup format, and OCR is performed using pgsrip which uses tesseract-ocr
to transcribe the subtitles into a .srt file. The .srt file is then converted into the desired formats.
Once the extraction is complete, the tool runs a postprocessor that edits the subtitle files using the pysubs2 package. The tool executes a default workflow defined in postprocess.yaml (or a custom defined workflow).
Usage via docker run
docker run -u 1000:1000 -v /PATH/TO/MEDIA/DIR:/media -it --rm ghcr.io/klementng/subtitle-extract:latest /media [options]
Used for watching directory for changes
- see docker-compose.yml for sample configuration
Show options
usage: main.py [-h] [--log-level LOG_LEVEL] [--log-file LOG_FILE] [--app-watch] [--app-scan-interval APP_SCAN_INTERVAL] [--app-enabled-extractor] [--no-app-enabled-extractor]
[--app-enabled-postprocessor] [--no-app-enabled-postprocessor] [--extractor-exclude-enable] [--extractor-exclude-file EXTRACTOR_EXCLUDE_FILE]
[--extractor-exclude-append] [--extractor-extract-bitmap] [--extractor-config-overwrite] [--no-extractor-config-overwrite]
[--extractor-config-desired-formats EXTRACTOR_CONFIG_DESIRED_FORMATS [EXTRACTOR_CONFIG_DESIRED_FORMATS ...]]
[--extractor-config-languages EXTRACTOR_CONFIG_LANGUAGES [EXTRACTOR_CONFIG_LANGUAGES ...]]
[--extractor-config-unknown-language-as EXTRACTOR_CONFIG_UNKNOWN_LANGUAGE_AS] [--postprocessor-exclude-enable]
[--postprocessor-exclude-file POSTPROCESSOR_EXCLUDE_FILE] [--postprocessor-exclude-append]
[--postprocessor-config-workflow-file POSTPROCESSOR_CONFIG_WORKFLOW_FILE]
path
Application configuration
positional arguments:
path Path to media file/folder
options:
-h, --help show this help message and exit
--log-level LOG_LEVEL
Logging level (default: INFO)
--log-file LOG_FILE Path to log file (default: None)
--app-watch Enable app watch mode (default: false)
--app-scan-interval APP_SCAN_INTERVAL
App scan interval in mins (default: 0), 0=disabled
--app-enabled-extractor
Enable extractor (default: true)
--no-app-enabled-extractor
Disable extractor
--app-enabled-postprocessor
Enable postprocessor (default: true)
--no-app-enabled-postprocessor
Disable postprocessor
--extractor-exclude-enable
Enable extractor exclude (default: false)
--extractor-exclude-file EXTRACTOR_EXCLUDE_FILE
Extractor exclude file path (default: ./extracted.txt)
--extractor-exclude-append
Append to extractor exclude file (default: false)
--extractor-extract-bitmap
Extract bitmap (default: false)
--extractor-config-overwrite
Overwrite existing subtitle file during extraction (default: False)
--extractor-config-desired-formats EXTRACTOR_CONFIG_DESIRED_FORMATS [EXTRACTOR_CONFIG_DESIRED_FORMATS ...]
List of desired formats (default: srt ass)
--extractor-config-languages EXTRACTOR_CONFIG_LANGUAGES [EXTRACTOR_CONFIG_LANGUAGES ...]
List of languages (default: all)
--extractor-config-unknown-language-as EXTRACTOR_CONFIG_UNKNOWN_LANGUAGE_AS
Unknown language fallback (default: eng)
--postprocessor-exclude-enable
Postprocessor exclude enable (default: False)
--postprocessor-exclude-file POSTPROCESSOR_EXCLUDE_FILE
Postprocessor exclude file path (default: ./postprocessed.txt)
--postprocessor-exclude-append
Append to postprocessor exclude file (default: false)
--postprocessor-config-workflow-file POSTPROCESSOR_CONFIG_WORKFLOW_FILE
Postprocessor workflow file (default: postprocess.yaml)
To change styling of the ssa subtitle file, the postprocess.yaml file can be edited. To add custom actions, bind or replace the file at /app/postprocessing/user_actions.py