s21-team11-project

ESBN Team

Convolution Neural Network with Trained Word2Vec Embeddings for Part-Of-Speech Tagging of English and Latin

The aim of this project is to create a part-of-speech (POS) tagger for English and Latin. The taggers were developed using a Convolutional Neural Network (CNN) with an Embedding layer of trained word embeddings, dropout layers to prevent overfitting, dense layers with relu, and a dense layer with softmax at the output layer, implemented with Keras Tensorflow. The English model was trained on the Natural Language Toolkit's (nltk) Brown, Treebank, and Conll200 corpora with an embedding matrix of Word2Vec's pretrained Google News corpus. The Latin model was trained on Universal Dependencies LLCT, ITTB, and PROIEL treebanks with Word2Vec's pretrained Continuous Skipgram of the Latin CoNLL17 corpus.

Input sentences to both models were padded with zeros for the required uniformity in input sizes. This inflates the rate of accuracy, therefore we use an existing masked accuracy function to better gauge the models' accuracies. The English model reached an accuracy of 98.57% and a masked accuracy of 96.54%. The Latin model reached an accuracy of 98.01% and a masked accuracy of 93.62%.

Demo

Download the file Demo.ipynb and open it using Jupyter
Follow the directions in Demo.ipynb

Jupyter Help

Both JupyterLab and Jupyter Notebook can be installed here: https://jupyter.org/

For a quick introduction to Jupyter, check out this step-by-step tutorial here: https://realpython.com/jupyter-notebook-introduction/

For more detailed help with JupyterLab or Jupyter Notebook, check out the documentation here: https://jupyter.org/documentation

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
FinalPaperESBN		FinalPaperESBN
Demo.ipynb		Demo.ipynb
FinalCNNModelEnglish.ipynb		FinalCNNModelEnglish.ipynb
FinalCNNModelLatin.ipynb		FinalCNNModelLatin.ipynb
LICENSE		LICENSE
MilestoneReport.ipynb		MilestoneReport.ipynb
Project Presentation.pptx		Project Presentation.pptx
Project-Proposal.ipynb		Project-Proposal.ipynb
Project_Paper.pdf		Project_Paper.pdf
Prototype.ipynb		Prototype.ipynb
README.md		README.md
References.ipynb		References.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

s21-team11-project

ESBN Team

Convolution Neural Network with Trained Word2Vec Embeddings for Part-Of-Speech Tagging of English and Latin

Demo

Jupyter Help

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

License

CSCI4850/s21-team11-project

Folders and files

Latest commit

History

Repository files navigation

s21-team11-project

ESBN Team

Convolution Neural Network with Trained Word2Vec Embeddings for Part-Of-Speech Tagging of English and Latin

Demo

Jupyter Help

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages