Natural Language Processing and Deep Learning of 2016 Presidential Debate Transcripts. In particular, automatically generate Trump or Hillary debate speech.
Data download from kaggle, the data is essential text transcript of the first two presendential debates.
Most of the code is adapted from this tutorial.
" illary was going to bring back jobs to upstate new york and she failed. i've heard them where hillar "
y clinton has been a disaster. . and she said she salks about some bompanies are jostrtation. and i will tell you that i'm going to have a stepoenng to me and otr country. i aglee it the gissing to the war. and i will tell you that i'm going to have a stepoenng to me and of the freatest problem the stopigms and so doe to brpendry with the internet, and i will tell you that i'm going to have a stepoena, they're going to be a bompany. and i will tell you that i'm going to have a stepoena, they're going to be a bompany. and i will tell you that i'm going to have a stepoena, they're going to be a bompany. and i will tell you that i'm going to have a stepoena, they're going to be a bompany. and i will tell you that i'm going to have a stepoena, they're going to be a bompany. and i will tell you that i'm going to have a stepoena, they're going to be a bompany. and i will tell you that i'm going to have a stepoena, they're going to be a bompany. and i will tell you that i'm going to have a st
" ticians have done.. i'll tell you very -- well, just very simple to say. sidney blumenthal works for "
the country that she salks about to many of the people that are going to be a bompa tp be a president that was to be a desaster. she doesn't have the stamina. i mean, it was a disaster. and i don't think you thie in the world, i agree with you the interest rates and all of the people that are going to be a bompa tp be a president that was to be a desaster. she doesn't have the stamina. i mean, it was a disaster. and i don't think you thie in the world, i agree with you the interest rates and all of the people that are going to be a bompa tp be a president that was to be a desaster. she doesn't have the stamina. i mean, it was a disaster. and i don't think you thie in the world, i agree with you the interest rates and all of the people that are going to be a bompa tp be a president that was to be a desaster. she doesn't have the stamina. i mean, it was a disaster. and i don't think you thie in the world, i agree with you the interest rates and all of the people that are going to be a b
It's exciting to see that the network learned to complete names like "hillar" with a letter "y" as well as her last name "clinton". In fact, the model learned to produce some actual English words and sentences with semi-correct grammar just from a few hours worth of debate text. The sentence structure is preserved even though many words don't exist. I am surprised to find that many words and phrases are preserved, such as "country", "dissaster", "I will tell you", "I'm going to have a", etc. However, the speech gets stuck in a loop for the latter part of the paragragh.
Single hidden LSTM layer with 256 memory units:
CPU (laptop): 610s per epoch
GPU (hyades): 260s per epoch
Two hiddlen LSTM layers with 256 memory unites each:
CPU (laptop): 1700s per epoch
GPU (hyades): 1100s per epoch
total training time (GPU):
one-layer LSTM: 265s * 20 = 88 min, loss: 1.89
two-layer LSTM: 1100 * 50 = 15 hours, loss: 1.09
-
!Do not do "pip install tensorflow", do this instead:
sudo easy_install --upgrade six
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/mac/cpu/tensorflow-0.11.0rc0-py2-none-any.whl
sudo pip install --upgrade $TF_BINARY_URL
-
tensorflow will try to import the numpy verison that came with Mac, which has API incompatiblility with tf. The correct way to is to use virtualenv and install all needed libraries there
-
model.add(Dropout(0.2))
complains of import error intensorflow_backend.py
, need to edit the import statment at three places in code to this:from tensorflow.python.ops import control_flow_ops
andx = control_flow_ops.cond.....
(Fucking hell...)
Note on setting up project in hyades (super computing cluster at UCSC)
git clone https://github.com/MollyZhang/NLP_2016_US_Presidential_Debate
- Install python and pip 2.7 locally (instruction here), except for one difference:
add$HOME/python/bin/
to $PATH, instead of$HOME/python/Python-2.7.11/
- Setup virtualenv and activate virtualenv:
pip install virtualenv
virtualenv tf-env
source tf-env/bin/activate
- Install tensorflow (GPU version) along with the correct GNU C library and cuda library following this instruction create alias "tfpython" in place of the last step in the instruction for speedy python login
- Ssh into gpu server (gpu-1 to gup-8) and test tensorflow GPU installation following this link
mkdir ~/screen/
cd ~/screen
wget ftp://ftp.gnu.org/gnu/screen/screen-4.4.0.tar.gz
tar xzf screen-*.tar.gz
cd screen-*/
./configure --prefix=/home/mollyzhang/.local
make && make install