Skip to content

Demo workflow for digitally decoding Baudot-Murray/CCIR 476 encoded teleprinter tape from microfilm images using computer vision techniques.

License

Notifications You must be signed in to change notification settings

gpizzorno/baudot-murray-CCIR476-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Baudot-Murray/CCIR 476 decoder

License Python

This repository demonstrates a complete workflow for digitally decoding Baudot-Murray/CCIR 476 encoded teleprinter tape from microfilm images using computer vision techniques. The Baudot-Murray code (also known as CCIR 476) was a 7-bit character encoding system used in early teleprinter systems.

How it works

The main processing pipeline is implemented in code_extraction.ipynb and consists of the following steps:

1. Image Preprocessing

  • Loads the microfilm image from 'Data/Baudot-Murray-example_cropped.jpg'
  • Converts to grayscale and applies binary thresholding
  • Uses probabilistic Hough line detection to identify the tape structure
  • Creates a convex hull around detected lines to isolate the code area
  • Crops the image to focus on the encoded data

2. Grid-Based Character Recognition

The system recognizes that Baudot-Murray tape uses a specific grid structure:

  • 26 rows of encoded characters
  • 16 columns total, with only 14 used (2 groups of 7, separated by 2 spacing columns)
  • Each cell represents a binary bit (hole = 1, no hole = 0)

3. Binary Extraction

For each grid cell:

  • Samples the pixel values in that region
  • Converts to binary based on average brightness (threshold at 177/255)
  • Builds a binary representation of each character row

4. Character Decoding

  • Splits the data into two 7-bit sections (left and right groups)
  • Flips the bit order to match standard Baudot-Murray encoding
  • Uses the lookup table in 'Data/binary_to_ascii.json' to convert binary patterns to readable characters

Key Features

  • Automated tape detection: Uses Hough line detection to automatically locate and crop the tape area
  • Grid-based sampling: Precisely samples each bit position using calculated grid coordinates
  • Visual feedback: Displays the processing steps including detected lines, grid overlay, and extracted binary values
  • Complete character set: Supports the full Baudot-Murray character set including letters, numbers, and control characters

Usage

Install the dependencies:

pip install -r requirements.txt

Run the code_extraction.ipynb notebook cell by cell to see the complete processing pipeline. The notebook will:

  1. Load and preprocess the microfilm image
  2. Detect and extract the tape area
  3. Apply grid-based character recognition
  4. Display the decoded text output

Sample Data

The repository includes:

This workflow can be adapted for other historical teleprinter tape formats and provides a foundation for digitizing archived communication records.

License

The project is licensed under the MIT License, allowing free use, modification, and distribution.

About

Demo workflow for digitally decoding Baudot-Murray/CCIR 476 encoded teleprinter tape from microfilm images using computer vision techniques.

Topics

Resources

License

Stars

Watchers

Forks