Problem regarding doing a vector search on a Lance Dataset generated using ray.data.Dataset.write_lance #4426
-
Hi, I'm currently using Lance to test its vector search ability and just encountered a problem. Hopefully I can get some help here. I am using this dataset of my own which is a large collection of essays and articles and consists of the following columns:
The steps are as follows:
The first 2 steps completed successfully and when I was doing the 3rd step I got the following exception:
I tried to enforce the schema when writing the lance dataset using Ray as the following:
but I got the following exception:
Version of ray: 2.48.0 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Arrow makes a distinction between a fixed-size-list and a variable-size-list. When you are reading from your Parquet files you are probably reading into a variable-size-list column. You can cast the column to a fixed size list (with dimension 4096). When you tried enforcing the schema with |
Beta Was this translation helpful? Give feedback.
Arrow makes a distinction between a fixed-size-list and a variable-size-list. When you are reading from your Parquet files you are probably reading into a variable-size-list column. You can cast the column to a fixed size list (with dimension 4096).
When you tried enforcing the schema with
pa.field("abstract_embedding", pa.list_(pa.float32()))
you are still specifying variable-size-list. To create a fixed-size-list field you'll wantpa.field("abstract_embedding", pa.list_(pa.float32(), 4096))