Add support to read and assign ID for Document object directly from a CSV column in CSVLoader
#31049
muhammadyaseen
announced in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked
Feature request
In the
CSVLoader
, we currently have no way to specify anid_column
in the constructor here.I would like to add this functionality to the class.
Motivation
In many applications, there is a natural document ID already available in the CSV data e.g.
experiment_id
,conversation_thread
,internal_document_identifier
etc. It would be nice to have a consistent ID being used in the vector store as well as in the system where the data originates from.Proposal (If applicable)
Add an optional argument
id_column: str | None
to the constructor and if it is set, read and assign that ID to the created Document object. Any error handling and checks that this requires will also be added e.g. if the column is not present or if there are duplicates etc.I will be happy to contribute to this.
Beta Was this translation helpful? Give feedback.
All reactions