No space or unknown token in dataset dictionary, leading to training data corruption #1

grappli · 2020-08-07T13:41:05Z

A major issue here is that spaces and unknown tokens aren't encoded as in the paper. As a result, inputs have all spaces removed. This is probably the cause of the reduced accuracy compared to the paper.

grappli · 2020-08-08T08:32:36Z

The first letter of each sentence is also cut off.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

No space or unknown token in dataset dictionary, leading to training data corruption #1

No space or unknown token in dataset dictionary, leading to training data corruption #1

grappli commented Aug 7, 2020 •

edited

Loading

grappli commented Aug 8, 2020

Uh oh!

No space or unknown token in dataset dictionary, leading to training data corruption #1

No space or unknown token in dataset dictionary, leading to training data corruption #1

Comments

grappli commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

grappli commented Aug 8, 2020

Uh oh!

grappli commented Aug 7, 2020 •

edited

Loading