The result of the paper was reported using Karpathy's split(http://cs.stanford.edu/people/karpathy/ deepimagesent/), while this repo didn't do so, just note that for your information.