RePlay 0.19.0 Release notes
- Highlights
- Backwards Incompatible Changes
- New Features
- Improvements
- Bug fixes
Highlights
In this release, we have added ScalableCrossEntropyLoss
and ConsecutiveDuplicatesFilter
. This release brings a lot of improvements and bug fixes - see the respective sections!
Backwards Incompatible Changes
This release entails changes that are not backward compatible with previous versions of RePLay. We have changed the architecture of Bert4Rec model in order to speed up, so in this release you will not be able to load the weights of the model trained using previous versions.
New Features
ScalableCrossEntropyLoss for SasRec model
We added ScalableCrossEntropyLoss
- new and innovative approximation of CrossEntropyLoss
aimed at solving the problem of GPU memory lack when learning on a large item catalogs. Reference article may be found at https://arxiv.org/pdf/2409.18721.
ConsecutiveDuplicatesFilter
We added a new filter - ConsecutiveDuplicatesFilter
- that allows to remove duplacate interactions from sequential datasets.
Improvements
SequenceEncodingRule speedup on PySpark
We accelerated transform()
method of SequenceEncodingRule
when applying it to PySpark dataframes.
Updating the maximum supported version of PyTorch
We updated maximum supported version of PyTorch, so now it is possible to install RePlay with PyTorch < 3.0.0.
Speedup sequential models
Firstly, we replaced self-made LayerNorm
and GELU
layers in Bert4Rec for PyTorch built-in implementations. Secondly, we added CE_restricted
loss for Bert4Rec that works like CrossEntropyLoss
, but uses some features of the Bert4Rec archtecture to speed up calculations (sparsification - the limitation of masks, based on the tokens that will be predicted). Thidrly, we replaced some computationally inefficient operations for faster analogues in SasRec and Bert4Rec.
Bug fixes
Fix error with accessing object fields in TensorSchema
We fixed an issue when it was not possible to train a sequential model when Hydra and MlFlows are installed with RePlay. It was caused by accessing object fields using wrong names in TensorSchema
.
Fix unexpected type casts in LabelEncodingRule with Pandas.
We detected unexpected type casts in transform()
method when using Pandas dataframes with LabelEncodingRule
and fixed this behaviour.
Fix bugs in Surprisal metric calculation
We fixed incorrect Surprisal
behavior with cold items on Polars and missing users on Pandas.