-
Notifications
You must be signed in to change notification settings - Fork 2
Description
This feature aims to provide users with immediate, real-time feedback on the estimated token usage of their current message, including any integrated GitHub repository content ( see #23 ) . This transparency is crucial for helping users manage LLM costs, adhere to context window limits, and optimize their prompts before sending, especially when incorporating large amounts of external data.
1. Location and Visibility
- The token estimation display will be rendered as a clear, non-intrusive element directly above the main chat input text area in the chat view.
- It should be visible whenever the user is actively typing or when the input area contains text or identified
@git:mentions. - Display Format:
Estimated Tokens: XXXX / YYYYXXXXrepresents the current estimated token count.YYYYrepresents the maximum context window of the currently selected LLM model.
- Visual Cues:
- The display should change color or include a warning icon as the
XXXXvalue approachesYYYY(e.g., yellow for 80% usage, red for 95%+ usage or over limit). - A small tooltip on hover should clarify: "Token count is an estimate based on a common LLM tokenizer. Actual usage may vary slightly depending on the selected model."
- The display should change color or include a warning icon as the
2. Calculation Logic
The token estimation will be performed i the backend based on the text that is intended to be sent to the LLM for the current turn. Chat history token count is already known, therefore only new text/file needs to be estimated.
-
Content to Tokenize: The estimated token count will primarily cover:
- The user's current text input in the chat message area.
- The aggregated textual content of all
@git:mentions present in the current input (resolved from attached GitHub Nodes or Global Repositories).
-
Tokenizer:
- The estimation will use the
tiktokenlibrary with thecl100k_baseencoding. This encoder is widely used by popular models like OpenAI's GPT-3.5 and GPT-4, providing a reliable baseline estimate.
- The estimation will use the
-
Backend Support for GitHub Content:
- To enable accurate real-time estimation of
@git:mentioned content, the backend must provide a mechanism for the frontend to retrieve the textual content of mentioned GitHub files on demand. - This could be a dedicated API endpoint (e.g.,
/api/estimate-context) that takes a list of@git:mentions (repo alias, file path) and returns their concatenated, LLM-ready content. The frontend would then tokenize this concatenated string along with the user's input.
- To enable accurate real-time estimation of
3. Update Mechanism (Performance & Responsiveness)
To ensure a smooth user experience and prevent excessive re-calculations on every keystroke:
- Debouncing: The token count update should be debounced. This means the calculation only triggers after the user has stopped typing for a short period (e.g., 300-500 milliseconds).
- Throttling (Alternative/Addition): Alternatively, or in addition to debouncing, consider throttling updates to occur only after a certain number of words have been typed (e.g., every 5 words), though debouncing is usually sufficient for text input.
- Initial Load: The token count should be calculated and displayed immediately when the chat component mounts or when the input area first receives focus/content.
Metadata
Metadata
Assignees
Labels
Projects
Status