This repository demonstrates a complete workflow for digitally decoding Baudot-Murray/CCIR 476 encoded teleprinter tape from microfilm images using computer vision techniques. The Baudot-Murray code (also known as CCIR 476) was a 7-bit character encoding system used in early teleprinter systems.
The main processing pipeline is implemented in code_extraction.ipynb and consists of the following steps:
- Loads the microfilm image from 'Data/Baudot-Murray-example_cropped.jpg'
- Converts to grayscale and applies binary thresholding
- Uses probabilistic Hough line detection to identify the tape structure
- Creates a convex hull around detected lines to isolate the code area
- Crops the image to focus on the encoded data
The system recognizes that Baudot-Murray tape uses a specific grid structure:
- 26 rows of encoded characters
- 16 columns total, with only 14 used (2 groups of 7, separated by 2 spacing columns)
- Each cell represents a binary bit (hole = 1, no hole = 0)
For each grid cell:
- Samples the pixel values in that region
- Converts to binary based on average brightness (threshold at 177/255)
- Builds a binary representation of each character row
- Splits the data into two 7-bit sections (left and right groups)
- Flips the bit order to match standard Baudot-Murray encoding
- Uses the lookup table in 'Data/binary_to_ascii.json' to convert binary patterns to readable characters
- Automated tape detection: Uses Hough line detection to automatically locate and crop the tape area
- Grid-based sampling: Precisely samples each bit position using calculated grid coordinates
- Visual feedback: Displays the processing steps including detected lines, grid overlay, and extracted binary values
- Complete character set: Supports the full Baudot-Murray character set including letters, numbers, and control characters
Install the dependencies:
pip install -r requirements.txt
Run the code_extraction.ipynb notebook cell by cell to see the complete processing pipeline. The notebook will:
- Load and preprocess the microfilm image
- Detect and extract the tape area
- Apply grid-based character recognition
- Display the decoded text output
The repository includes:
- Data/Baudot-Murray-example_cropped.jpg - A cropped sample of teleprinter tape from microfilm
- Data/binary_to_ascii.json - Complete Baudot-Murray character encoding table
- Data/output.png - Example output visualization
This workflow can be adapted for other historical teleprinter tape formats and provides a foundation for digitizing archived communication records.
The project is licensed under the MIT License, allowing free use, modification, and distribution.