The implementation contrains: Recurrent Neural Network(LSTM) Word-level English Language Model Character-level Chinese Language Model
Python 2.7 Tensorflow 0.12 Tensorlayer(sudo pip install tensorlayer)
To train a model with ptb dataset:
python train.py --data_path=ptb_data
To train a model with chinese dataset:
python train.py --data_path=zh_data
There are 3 supported model configurations:
config | epochs | train | valid | test |
---|---|---|---|---|
small | 13 | 37.99 | 121.39 | 115.91 |
medium | 39 | 48.45 | 86.16 | 82.07 |
large | 55 | 37.87 | 82.62 | 78.29 |
You can put the Chinese dataset in the zh_data,and split it to train.txt,valid.txt and test.txt
fudannlp16