- Query Key Value version of cross attention layer
- add multi (homograph, homonym) trainer and dataset
- fix bug: we were hashing the placeholder symbol _ as the solution word
- Changes to multi transformer and to it's dataset
- implement masked attention but not use it (it performed worse)
What's Changed
- rename dirty to lexicon by @thewh1teagle in #2
New Contributors
- @thewh1teagle made their first contribution in #2
Full Changelog: v0.1.7...v0.1.8