Skip to content

Problem regarding doing a vector search on a Lance Dataset generated using ray.data.Dataset.write_lance #4426

Answered by westonpace
king4face asked this question in Q&A
Discussion options

You must be logged in to vote

Arrow makes a distinction between a fixed-size-list and a variable-size-list. When you are reading from your Parquet files you are probably reading into a variable-size-list column. You can cast the column to a fixed size list (with dimension 4096).

When you tried enforcing the schema with pa.field("abstract_embedding", pa.list_(pa.float32())) you are still specifying variable-size-list. To create a fixed-size-list field you'll want pa.field("abstract_embedding", pa.list_(pa.float32(), 4096))

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@king4face
Comment options

Answer selected by king4face
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants