v0.3.1: Improve efficiency by avoiding explicit whitespace rows

Latest

github-actions released this 28 May 10:29

· 0 commits to v0.x since this release

release-v0.3.1

133efcb

Previous versions used explicit zeroed rows corresponding to whitespace tokens in spaCy. This required duplication and a number of assignments into the transformer output, which was inefficient.

Instead, whitespace tokens are now regarded as not aligning to any wordpiece tokens. If you do doc._.trf_data[i] where i is the index of a whitespace token, you'll receive an array of shape (0, n) where n is the output dimension. This is handled in Thinc's pooling operations, so the change doesn't require any update to models consuming the trf_data.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.3.1: Improve efficiency by avoiding explicit whitespace rows

Uh oh!