MM Pytorch API: TabularTransform for input tabular sequence #1161

sararb · 2023-06-27T18:45:49Z

Fixes # (issue)

Goals ⚽

Add support for padding, transforming, and masking sequential inputs data in MM Pytorch backend
The implemented transform classes should:
- Support multiple targets
- Be used for training, evaluation, and inference

Implementation Details 🚧

Implement TabularBatchPadding to pad a group of sequential inputs
Implement TabularPredictNext for generating targets of causal next item prediction
Implement TabularPredictLast for generating targets of last item prediction
Implement TabularPredictRandom for generating targets of predicting one random item and truncate the sequence so that the random item is at the last position.
Implement TabularMaskRandom for masked language modeling training (MLM) strategy
Implement TabularMaskLast for masking last item in the sequence, generally used to evaluate models trained with MLM.

Testing Details 🔍

Defined tests for padding and the different sequence transformations

github-actions · 2023-06-27T18:53:57Z

Documentation preview

https://nvidia-merlin.github.io/models/review/pr-1161

marcromeyn · 2023-06-28T09:42:25Z

merlin/models/torch/sequences.py

+MASK_PREFIX = "__mask"
+
+
+class TabularBatchPadding(nn.Module):


Should we rename to TabularPadding?

marcromeyn · 2023-06-30T09:48:23Z

merlin/models/torch/sequences.py

+MASK_PREFIX = "__mask"
+
+
+class TabularPadding(nn.Module):


I think it might be better to break padding out into a smaller PR first, and then do masking afterwards.

first version of sequence transforms applied to Batch input

509cbc6

sararb added enhancement New feature or request area/pytorch area/research area/session-based labels Jun 27, 2023

sararb requested a review from marcromeyn June 27, 2023 18:45

sararb self-assigned this Jun 27, 2023

marcromeyn reviewed Jun 28, 2023

View reviewed changes

sararb added 3 commits June 28, 2023 21:15

updates output of tabular transforms and add TabularMaskLast block

eabad77

fix linting

e480299

update docstrings

caa72f8

sararb changed the title ~~first version of sequence transforms applied to Batch input~~ MM Pytorch API: TabularTransform for input tabular sequence Jun 28, 2023

marcromeyn reviewed Jun 30, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MM Pytorch API: TabularTransform for input tabular sequence #1161

MM Pytorch API: TabularTransform for input tabular sequence #1161

Uh oh!

sararb commented Jun 27, 2023 •

edited

Loading

Uh oh!

github-actions bot commented Jun 27, 2023

Uh oh!

marcromeyn Jun 28, 2023

Uh oh!

marcromeyn Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		MASK_PREFIX = "__mask"


		class TabularBatchPadding(nn.Module):

		MASK_PREFIX = "__mask"


		class TabularPadding(nn.Module):

MM Pytorch API: TabularTransform for input tabular sequence #1161

Are you sure you want to change the base?

MM Pytorch API: TabularTransform for input tabular sequence #1161

Uh oh!

Conversation

sararb commented Jun 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goals ⚽

Implementation Details 🚧

Testing Details 🔍

Uh oh!

github-actions bot commented Jun 27, 2023

Documentation preview

Uh oh!

marcromeyn Jun 28, 2023

Choose a reason for hiding this comment

Uh oh!

marcromeyn Jun 30, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sararb commented Jun 27, 2023 •

edited

Loading