Python version: 3.6
Libraries:
- Gensim
- NLTK
- Numpy
- Pandas (with xlrd for Excel file loading)
- Sklearn
- https://github.com/sfu-discourse-lab/SOCC
- Documents of sklearn
NLTK lemmatizer cannot get the POS tagging of each token so I only used the default noun lemmatization.