For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text?

Hello, Can you suggest best tool to extract the TOC text, from a 2 column TOC style (PDF is scanned and ocr'd).

The problem with OCR space it does not read the text in columns, e.g. first column then second column. Rather it reads left to right, so you get the text in the wrong place 


For example: extract result from OCR space is (chapter Six is in column 2 of the TOC and the tool has read it on line 1)

Contents
Number  Chapter Six: Units..............:.......48  
Length, mass, capacity  
Chapter One: Types Of and time.... .... 

The problem with Tabular is I could not find any 2 column style TOC template. I tried to create my own template as a new person, and it did a very average job (e.g. did not recognise end of sentence, kept leading ..... before page number.  I could not find any auto scripts in sublime text editor to handle the typical TOC edit text issues either.

Nuntber,
Chapter One: Types of,
number ........................................... 2,
Squares and square roots .................,2
Cubes and cube roots .......................,2
Multiples .......................................,4
Prime factorisation ..........................,6
Chapter Two: Using numbers .....1 0,

Tabular is better than OCRspace, in the fact text is in the correct order but still alot of manipulation using Sublime Text Editor to get the "TOC text file " into the required layout to be able to auto-create TOC bookmarks in PDF (ie using one of the apps, pdftk or jpdfbookmarks)

Tabular is currently has no ability to ask questions of help. On github the issue tab is not showing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text? #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

For 2 column style TOC that is scan/ocr -- suggestion for best tool to extract TOC text? #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions