CRF for Punctuation Restoration

Conditional Random Fields model for a punctuation restoration task.

This repo contains the utilities necessary to allow convenient training of a Conditional Random Fields (CRF) model for restoration of punctuation to non-punctuated streams of text.

E.g. this is my input sentence becomes This is my input sentence.

The model is based on the works of Lui, M. and Wang, L. (2013), 'Recovering Casing and Punctuation using Conditional Random Fields'.

The task here is a multi-class token classification task where classification is applied to sequence of words.

The CRF model takes into account the word, POS tag, chunk tags, and NE tags for the current word and two words either side (i.e. 5-gram model)

Getting started (Local)

1. Clone the repository (linux/osx)

git clone https://github.com/anthonyyhughes/naive-bayes-space-restorer.git
virtualenv env
pip install -r requirements

Getting started (Colab)

1. Clone the repository

Recommended method for Google Colab notebooks:

!git clone https://github.com/anthonyyhughes/naive-bayes-space-restorer.git
!pip install -r requirements

How to use

Example usage for the operations covered below is also included in the example notebook: crf_punc_restorer_example.ipynb.

Training

Example usage:

python train.py

Run inference on a list of documents

python inference.py

References

Lui, M. and Wang, L., ”Recovering Casing and Punctuation using Conditional Random Fields” July, 2018. Available: https://aclanthology.org/U13-1020.pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.idea		.idea
data		data
src		src
.gitignore		.gitignore
README.md		README.md
crf_punc_restorer_example.ipynb		crf_punc_restorer_example.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CRF for Punctuation Restoration

Getting started (Local)

1. Clone the repository (linux/osx)

Getting started (Colab)

1. Clone the repository

How to use

Training

Run inference on a list of documents

References

About

Uh oh!

Releases

Packages

Languages

anthonyhughes/crf-punctuation-restoration

Folders and files

Latest commit

History

Repository files navigation

CRF for Punctuation Restoration

Getting started (Local)

1. Clone the repository (linux/osx)

Getting started (Colab)

1. Clone the repository

How to use

Training

Run inference on a list of documents

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages