Skip to content

v0.19.0

Latest
Compare
Choose a tag to compare
@OnlyDeniko OnlyDeniko released this 26 May 11:40
· 2 commits to main since this release

RePlay 0.19.0 Release notes

  • Highlights
  • Backwards Incompatible Changes
  • New Features
  • Improvements
  • Bug fixes

Highlights

In this release, we have added ScalableCrossEntropyLoss and ConsecutiveDuplicatesFilter. This release brings a lot of improvements and bug fixes - see the respective sections!

Backwards Incompatible Changes

This release entails changes that are not backward compatible with previous versions of RePLay. We have changed the architecture of Bert4Rec model in order to speed up, so in this release you will not be able to load the weights of the model trained using previous versions.

New Features

ScalableCrossEntropyLoss for SasRec model

We added ScalableCrossEntropyLoss - new and innovative approximation of CrossEntropyLoss aimed at solving the problem of GPU memory lack when learning on a large item catalogs. Reference article may be found at https://arxiv.org/pdf/2409.18721.

ConsecutiveDuplicatesFilter

We added a new filter - ConsecutiveDuplicatesFilter - that allows to remove duplacate interactions from sequential datasets.

Improvements

SequenceEncodingRule speedup on PySpark

We accelerated transform() method of SequenceEncodingRule when applying it to PySpark dataframes.

Updating the maximum supported version of PyTorch

We updated maximum supported version of PyTorch, so now it is possible to install RePlay with PyTorch < 3.0.0.

Speedup sequential models

Firstly, we replaced self-made LayerNorm and GELU layers in Bert4Rec for PyTorch built-in implementations. Secondly, we added CE_restricted loss for Bert4Rec that works like CrossEntropyLoss, but uses some features of the Bert4Rec archtecture to speed up calculations (sparsification - the limitation of masks, based on the tokens that will be predicted). Thidrly, we replaced some computationally inefficient operations for faster analogues in SasRec and Bert4Rec.

Bug fixes

Fix error with accessing object fields in TensorSchema

We fixed an issue when it was not possible to train a sequential model when Hydra and MlFlows are installed with RePlay. It was caused by accessing object fields using wrong names in TensorSchema.

Fix unexpected type casts in LabelEncodingRule with Pandas.

We detected unexpected type casts in transform() method when using Pandas dataframes with LabelEncodingRule and fixed this behaviour.

Fix bugs in Surprisal metric calculation

We fixed incorrect Surprisal behavior with cold items on Polars and missing users on Pandas.