SparseVector indices question #78
Replies: 3 comments
-
@Anush008 - for your review/reply |
Beta Was this translation helpful? Give feedback.
-
Hey.
I think such large indices will be very unlikely. Unless an ultra-jumbo size corpus is fed at once. |
Beta Was this translation helpful? Give feedback.
-
We actually see wrapping around of indices when we are generating sparse vectors with some of our source data. The main thing is that this client spec is not adhering to what Qdrant documentation says as a 4G max for the sparse vector indices. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Qdrant documentation talks about sparse vectors indices being integers(unsigned) - is currently limited to u32 datatype range (4294967295).
https://qdrant.tech/documentation/concepts/vectors/#sparse-vectors
In the Java client, this is a list of java.lang.Integer, which in java can only hold a signed 32bit int, that will be limited to +/- 2G.
public static VectorInput vectorInput(List vector, List indices) {
return VectorInput.newBuilder()
.setSparse(SparseVector.newBuilder().addAllValues(vector).addAllIndices(indices).build())
.build();
}
This will be a problem when BM25 calculates any of the indices over 2G in the Java code. Isn't this a bug in the Java-client? Shouldn't the underlying type be java.lang.Long thats 64bit , so it can safely hold indices upto 4G?
Beta Was this translation helpful? Give feedback.
All reactions