Dictionary encode a field in a dataset (string -> int) #47293
shner-elmo
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
So I want to encode a field/column in a Parquet dataset to save memory.
basically:
field.unique()
{val: i for i, val in enumerate(field.unique())}
Now what I have been struggling with, is creating a field/column in the dataset that contains all these integers.
The question is how can I perform the dictionary lookup row-wise without creating a UDF or doing a Python for loop.
I was trying this:
and also:
Is there a way to do the map lookup natively (at C speed) instead of Python for loops... ?
Beta Was this translation helpful? Give feedback.
All reactions