-
Notifications
You must be signed in to change notification settings - Fork 475
Description
Description of the feature request:
Currently, the gemini-embedding-001 endpoint returns a single, aggregated vector representing the entire text input. This request is to add a feature, likely through an API parameter such as output_representation="token" or return_token_embeddings=True, that allows users to retrieve the sequence of embeddings for each individual token in the input text.
The desired output would be the un-pooled last hidden state of the model, resulting in a list or tensor of vectors, where each vector corresponds to a token from the input. This is standard functionality in many open-source transformer models and is essential for tasks requiring token-level analysis.
What problem are you trying to solve with this feature?
The primary problem this feature solves is the inability to perform fine-grained, token-level NLP tasks. The current sentence-level embedding is unsuitable for this.
My specific use case is Word Sense Disambiguation (WSD):
To differentiate between meanings of a word like "Bank" (a financial institution vs. a river bank), I need to extract the contextual vector for the specific token "Bank" within each sentence. By clustering these token-specific vectors, I can automatically identify the different senses of the word. The current aggregated sentence embedding makes this impossible as the word's specific representation is lost.
Any other information you'd like to share?
No response