Skip to content

bpwhelan/GameSentenceMiner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gamesentenceminer

Github All Releases PyPI - Downloads GitHub Sponsors Ko-Fi GitHub License Discord

GSM - An Immersion toolkit for Games.

An application designed to assist with language learning through games.

Short Demo (Watch this first): https://www.youtube.com/watch?v=FeFBL7py6HY

Installation: https://www.youtube.com/watch?v=sVL9omRbGc4

Discord: https://discord.gg/yP8Qse6bb8

Anki Card Enhancement

GSM significantly enhances your Anki cards with rich contextual information:

  • Automated Audio Capture: Automatically records the voice line associated with the text.

    • Automatic Trim: Some simple math around the time that the text event came in, in combination with a "Voice Activation Detection" (VAD) library gives us neatly cut audio.
    • Manual Trim: If Automatic voiceline trim is not perfect, it's possible to open the audio in an external program for trimming.
  • Screenshot: Captures a screenshot of the game at the moment the voice line is spoken.

  • Multi-Line: It's possible to capture multiple lines at once with sentence audio with GSM's very own Texthooker.

  • AI Translation: Integrates AI to provide quick translations of the captured sentence. Custom Prompts also supported. (Optional, Bring your own Key)

Game Example (Has Audio)

Sekiro.mp4

VN Example (Has Audio)

Tanetsumi.mp4

OCR

GSM runs a fork of OwOCR to provide accurate text capture from games that do not have a hook. Here are some improvements GSM makes on stock OwOCR:

  • Easier Setup: With GSM's managed Python install, setup is only a matter of clicking a few buttons.

  • Exclusion Zones: Instead of choosing an area to OCR, you can choose an area to exclude from OCR. Useful if you have a static interface in your game and text appears randomly throughout.

  • Two-Pass OCR: To cut down on API calls and keep output clean, GSM features a "Two-Pass" OCR System. A Local OCR will be constantly running, and when the text on screen stabilizes, it will run a second, more accurate scan that gets sent to clipboard/WebSocket.

  • Consistent Audio Timing: With the two-pass system, we can still get accurate audio recorded and into Anki without the use of crazy offsets or hacks.

  • More Language Support: Stock OwOCR is hard-coded to Japanese, while in GSM you can use a variety of languages.

mgs_ocr.mp4

Overlay

GSM also features an overlay that allows for on-screen yomitan lookups. Whenever the overlay is on it will scan the screen ONCE whenever a text event from any source comes into GSM. It then allows for hovering over the actual characters in-game for yomitan lookups, and mining.

https://youtu.be/m1MweBsHbwI

l0qGasWkoH


Stats

GSM has a statistics page with currently 32 graphs chock full of pretty data.

stats

The stats are not just pretty.

They are designed to help you grow.

Set goals and see exactly what daily tasks you need to do to achieve them:

stats

See all the Kanji you've read in whatever order you want:

stats

And click on them to see every sentence you've read with that Kanji:

stats

Use Anki? Find Kanji you read a lot but aren't in Anki yet

stats

Clean up your data, anyway you want with advanced tools.

stats

These statistics aren't just meant to look pretty, they are meant to help you answer questions:

  • What can I play to maximise both fun and learning?
  • Do I read better in the evening, or in the mornings?
  • Am I progressing in this language?
  • How long should I immerse to reach my goals?

Basic Requirements

Documentation

For help with installation, setup, and other information, please visit the project's Wiki.

FAQ

How Does It Work?

This is a common question, and understanding this process will help clarify any issues you might encounter while using GSM.

  1. The beginning of the voice line is marked by a text event. This usually comes from Textractor, Agent, or another texthooker. GSM can listen for a clipboard copy and/or a WebSocket server (configurable in GSM).

  2. The end of the voice line is detected using a Voice Activity Detection (VAD) library running locally. (Example)

In essence, GSM relies on accurately timed text events to capture the corresponding audio.

GSM provides settings to accommodate less-than-ideal hooks. However, if you experience significant audio inconsistencies, they likely stem from a poorly timed hook, loud background music, or other external factors, rather than GSM itself. The core audio trimming logic has been stable and effective for many users across various games.

Contact

If you encounter issues, please ask for help in my Discord or create an issue here.

Acknowledgements

Donations

If you've found this or any of my other projects helpful, please consider supporting my work through GitHub Sponsors, or Ko-fi.