Baudot-Murray/CCIR 476 decoder

This repository demonstrates a complete workflow for digitally decoding Baudot-Murray/CCIR 476 encoded teleprinter tape from microfilm images using computer vision techniques. The Baudot-Murray code (also known as CCIR 476) was a 7-bit character encoding system used in early teleprinter systems.

How it works

The main processing pipeline is implemented in code_extraction.ipynb and consists of the following steps:

1. Image Preprocessing

Loads the microfilm image from 'Data/Baudot-Murray-example_cropped.jpg'
Converts to grayscale and applies binary thresholding
Uses probabilistic Hough line detection to identify the tape structure
Creates a convex hull around detected lines to isolate the code area
Crops the image to focus on the encoded data

2. Grid-Based Character Recognition

The system recognizes that Baudot-Murray tape uses a specific grid structure:

26 rows of encoded characters
16 columns total, with only 14 used (2 groups of 7, separated by 2 spacing columns)
Each cell represents a binary bit (hole = 1, no hole = 0)

3. Binary Extraction

For each grid cell:

Samples the pixel values in that region
Converts to binary based on average brightness (threshold at 177/255)
Builds a binary representation of each character row

4. Character Decoding

Splits the data into two 7-bit sections (left and right groups)
Flips the bit order to match standard Baudot-Murray encoding
Uses the lookup table in 'Data/binary_to_ascii.json' to convert binary patterns to readable characters

Key Features

Automated tape detection: Uses Hough line detection to automatically locate and crop the tape area
Grid-based sampling: Precisely samples each bit position using calculated grid coordinates
Visual feedback: Displays the processing steps including detected lines, grid overlay, and extracted binary values
Complete character set: Supports the full Baudot-Murray character set including letters, numbers, and control characters

Usage

Install the dependencies:

pip install -r requirements.txt

Run the code_extraction.ipynb notebook cell by cell to see the complete processing pipeline. The notebook will:

Load and preprocess the microfilm image
Detect and extract the tape area
Apply grid-based character recognition
Display the decoded text output

Sample Data

The repository includes:

Data/Baudot-Murray-example_cropped.jpg - A cropped sample of teleprinter tape from microfilm
Data/binary_to_ascii.json - Complete Baudot-Murray character encoding table
Data/output.png - Example output visualization

This workflow can be adapted for other historical teleprinter tape formats and provides a foundation for digitizing archived communication records.

License

The project is licensed under the MIT License, allowing free use, modification, and distribution.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Data		Data
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
code_extraction.ipynb		code_extraction.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Baudot-Murray/CCIR 476 decoder

How it works

1. Image Preprocessing

2. Grid-Based Character Recognition

3. Binary Extraction

4. Character Decoding

Key Features

Usage

Sample Data

License

About

Uh oh!

Languages

License

gpizzorno/baudot-murray-CCIR476-demo

Folders and files

Latest commit

History

Repository files navigation

Baudot-Murray/CCIR 476 decoder

How it works

1. Image Preprocessing

2. Grid-Based Character Recognition

3. Binary Extraction

4. Character Decoding

Key Features

Usage

Sample Data

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages