Token Usage for Ollama #19422

moresearch · 2024-03-22T01:40:33Z

moresearch
Mar 22, 2024

Checked

I searched existing ideas and did not find a similar one
I added a very descriptive title
I've clearly described the feature request and motivation for it

Feature request

The current API seems not to allow keeping track of token usage while using Ollama.

ref: https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/llms/ollama.py

Motivation

Token usage is important for comparing LLM efficiency.

Proposal (If applicable)

No response

mmmmmm44 · 2024-03-25T10:13:04Z

mmmmmm44
Mar 25, 2024

I am doing a project about LLM with Langchain and Ollama and found the way to track Ollama output data.

The idea is to create a class that extends from BaseCallbackHandler from langchain, and implements the on_llm_end() method in the class. The on_llm_end() method should accept a parameter namely response (which is a langchain LLMResult object) that stores the statistics from Ollama. Printing the LLMResult object will have the following results (a similar output with output as json key-value is in here, from langfuse)

generations=[[GenerationChunk(text='I\'m just an AI assistant and do not have feelings or emotions, so I cannot answer the question "How are you?" as I do not have a physical body or personal experiences. However, I\'m here to help you with any questions or tasks you may have! Is there anything specific you would like to know or discuss?', generation_info={'model': 'llama2:7b-chat-q4_0', 'created_at': '2024-03-25T09:49:19.476477Z', 'response': '', 'done': True, 'context': [518, 25580, 29962, 3532, 14816, 29903, 29958, 5299, 829, 14816, 29903, 6778, 13, 13, 10994, 29892, 920, 526, 366, 29973, 518, 29914, 25580, 29962, 13, 29902, 29915, 29885, 925, 385, 319, 29902, 20255, 322, 437, 451, 505, 21737, 470, 23023, 1080, 29892, 577, 306, 2609, 1234, 278, 1139, 376, 5328, 526, 366, 3026, 408, 306, 437, 451, 505, 263, 9128, 3573, 470, 7333, 27482, 29889, 2398, 29892, 306, 29915, 29885, 1244, 304, 1371, 366, 411, 738, 5155, 470, 9595, 366, 1122, 505, 29991, 1317, 727, 3099, 2702, 366, 723, 763, 304, 1073, 470, 5353, 29973], 'total_duration': 1295241083, 'load_duration': 4191291, 'prompt_eval_duration': 174427000, 'eval_count': 70, 'eval_duration': 1111013000})]] llm_output=None run=None

They can be referenced as a parameter of the generations parameter in the LLMResult object.

A minimal working code is as below. I use a deque to collect the statistics for convenient access.

from langchain_community.llms import Ollama

llm = Ollama(model='llama2:7b-chat-q4_0')

# define callbacks for detecting token usage

from langchain_core.callbacks.base import BaseCallbackHandler
from langchain_core.outputs.llm_result import LLMResult
from collections import deque

class TokenUsageCallbackHandler(BaseCallbackHandler):

    def __init__(self, deque: deque = None):
        super().__init__()
        self.deque = deque

    def on_llm_end(self, response: LLMResult, **kwargs) -> None:
        print('Response in callback')
        print(response)
        print()

        generation = response.generations[0][0]
        gen_info = generation.generation_info

        # get token usage
        token_usage = gen_info.get('prompt_eval_count', 0) + gen_info.get('eval_count', 0)
        # get time costed (local machine)
        # instead of getting total duration, we get the prompt_eval_duration and eval_duration to exclude the load duration (e.g. to load the model to the gpu, etc.)
        time_costed = gen_info.get('prompt_eval_duration', 1e-10) + gen_info.get('eval_duration', 1e-10)     # in ns, a small value to indicate a inf time when it fails


        # create an object to store the token usage and time costed
        token_usage_obj = {
            'token_usage': token_usage,
            'time_costed': time_costed
        }

        # append the object to the deque
        self.deque.append(token_usage_obj)



common_deque = deque()
chain_config = {
    "callbacks": [TokenUsageCallbackHandler(common_deque)],
}

# example with calling the llm directly
response = llm.invoke("Hello, how are you?", config=chain_config)
token_usage_obj = common_deque.popleft()

print(response)
print(token_usage_obj)

# create a chain and invoke
from langchain_core.prompts import ChatPromptTemplate


# example with calling a chain object
prompt = ChatPromptTemplate.from_messages(
    ("human", "Explain {concept} to me.")
)

chain = prompt | llm
response = chain.invoke({"concept": "Large Language Models"}, config=chain_config)

# get the token usage object from the deque
token_usage_obj = common_deque.popleft()

print(response)
print(token_usage_obj)

outputs are as following

Response in callback
generations=[[GenerationChunk(text='I\'m just an AI assistant and do not have feelings or emotions, so I cannot answer the question "How are you?" as I do not have a physical body or personal experiences. However, I\'m here to help you with any questions or tasks you may have! Is there anything specific you would like to know or discuss?', generation_info={'model': 'llama2:7b-chat-q4_0', 'created_at': '2024-03-25T09:49:19.476477Z', 'response': '', 'done': True, 'context': [518, 25580, 29962, 3532, 14816, 29903, 29958, 5299, 829, 14816, 29903, 6778, 13, 13, 10994, 29892, 920, 526, 366, 29973, 518, 29914, 25580, 29962, 13, 29902, 29915, 29885, 925, 385, 319, 29902, 20255, 322, 437, 451, 505, 21737, 470, 23023, 1080, 29892, 577, 306, 2609, 1234, 278, 1139, 376, 5328, 526, 366, 3026, 408, 306, 437, 451, 505, 263, 9128, 3573, 470, 7333, 27482, 29889, 2398, 29892, 306, 29915, 29885, 1244, 304, 1371, 366, 411, 738, 5155, 470, 9595, 366, 1122, 505, 29991, 1317, 727, 3099, 2702, 366, 723, 763, 304, 1073, 470, 5353, 29973], 'total_duration': 1295241083, 'load_duration': 4191291, 'prompt_eval_duration': 174427000, 'eval_count': 70, 'eval_duration': 1111013000})]] llm_output=None run=None

I'm just an AI assistant and do not have feelings or emotions, so I cannot answer the question "How are you?" as I do not have a physical body or personal experiences. However, I'm here to help you with any questions or tasks you may have! Is there anything specific you would like to know or discuss?
{'token_usage': 72, 'time_costed': 1310683000}

Response in callback
generations=[[GenerationChunk(text='\nOf course! Large language models are a class of artificial intelligence (AI) models that are trained on vast amounts of text data to generate language outputs that are coherent and natural-sounding. These models have become increasingly popular in recent years due to their ability to generate text that is often indistinguishable from human-generated text.\n\nThe basic idea behind large language models is to use a type of neural network called a transformer to learn the patterns and structures of language. The transformer is trained on a massive dataset of text, which can come from any source, such as books, articles, or even social media posts. The model learns to predict the next word in a sequence of text given the previous words, based on the patterns it has observed in the training data.\n\nLarge language models have many applications, such as:\n\n1. Language Translation: Large language models can be trained to translate text from one language to another. This is done by training the model on a large dataset of text in the source language and the corresponding translations in the target language.\n2. Text Summarization: Large language models can be used to summarize long pieces of text, such as articles or documents, into shorter summaries that capture the main points.\n3. Chatbots: Large language models can be used to power chatbots and other conversational AI systems, allowing them to understand and respond to user input in a more natural and human-like way.\n4. Content Generation: Large language models can be used to generate content, such as articles, blog posts, or even entire books, based on a given prompt or topic.\n5. Language Understanding: Large language models can be used to understand the meaning of text, allowing them to perform tasks such as sentiment analysis, question answering, and text classification.\n\nSome examples of large language models include:\n\n1. BERT (Bidirectional Encoder Representations from Transformers): A popular large language model developed by Google that has achieved state-of-the-art results in a wide range of natural language processing tasks.\n2. RoBERTa (Robustly Optimized BERT Pretraining Approach): A variant of BERT that was specifically designed for text classification tasks and has achieved better results than BERT in some cases.\n3. Transformer: A type of neural network architecture that is particularly well-suited to processing sequential data, such as text. Transformers are used in many large language models and have achieved state-of-the-art results in a wide range of natural language processing tasks.\n4. LLaMA (LLaMA Large Language Model): A large language model developed by Meta AI that has achieved state-of-the-art results in a wide range of natural language processing tasks, including text generation and language translation.\n\nOverall, large language models have the potential to revolutionize many areas of natural language processing, from language translation to content generation and beyond.', generation_info={'model': 'llama2:7b-chat-q4_0', 'created_at': '2024-03-25T09:50:36.953917Z', 'response': '', 'done': True, 'context': [518, 25580, 29962, 3532, 14816, 29903, 29958, 5299, 829, 14816, 29903, 6778, 13, 13, 29950, 7889, 29901, 5199, 13, 29950, 7889, 29901, 12027, 7420, 8218, 479, 17088, 3382, 1379, 304, 592, 29889, 518, 29914, 25580, 29962, 13, 13, 2776, 3236, 29991, 8218, 479, 4086, 4733, 526, 263, 770, 310, 23116, 21082, 313, 23869, 29897, 4733, 393, 526, 16370, 373, 13426, 26999, 310, 1426, 848, 304, 5706, 4086, 14391, 393, 526, 16165, 261, 296, 322, 5613, 29899, 29879, 12449, 29889, 4525, 4733, 505, 4953, 10231, 368, 5972, 297, 7786, 2440, 2861, 304, 1009, 11509, 304, 5706, 1426, 393, 338, 4049, 1399, 391, 6202, 728, 519, 515, 5199, 29899, 13525, 1426, 29889, 13, 13, 1576, 6996, 2969, 5742, 2919, 4086, 4733, 338, 304, 671, 263, 1134, 310, 19677, 3564, 2000, 263, 4327, 261, 304, 5110, 278, 15038, 322, 12286, 310, 4086, 29889, 450, 4327, 261, 338, 16370, 373, 263, 20364, 8783, 310, 1426, 29892, 607, 508, 2041, 515, 738, 2752, 29892, 1316, 408, 8277, 29892, 7456, 29892, 470, 1584, 5264, 5745, 11803, 29889, 450, 1904, 24298, 1983, 304, 8500, 278, 2446, 1734, 297, 263, 5665, 310, 1426, 2183, 278, 3517, 3838, 29892, 2729, 373, 278, 15038, 372, 756, 8900, 297, 278, 6694, 848, 29889, 13, 13, 24105, 479, 4086, 4733, 505, 1784, 8324, 29892, 1316, 408, 29901, 13, 13, 29896, 29889, 17088, 4103, 18411, 29901, 8218, 479, 4086, 4733, 508, 367, 16370, 304, 14240, 1426, 515, 697, 4086, 304, 1790, 29889, 910, 338, 2309, 491, 6694, 278, 1904, 373, 263, 2919, 8783, 310, 1426, 297, 278, 2752, 4086, 322, 278, 6590, 5578, 800, 297, 278, 3646, 4086, 29889, 13, 29906, 29889, 3992, 6991, 3034, 2133, 29901, 8218, 479, 4086, 4733, 508, 367, 1304, 304, 19138, 675, 1472, 12785, 310, 1426, 29892, 1316, 408, 7456, 470, 10701, 29892, 964, 20511, 19138, 583, 393, 10446, 278, 1667, 3291, 29889, 13, 29941, 29889, 678, 271, 29890, 1862, 29901, 8218, 479, 4086, 4733, 508, 367, 1304, 304, 3081, 13563, 29890, 1862, 322, 916, 9678, 1288, 319, 29902, 6757, 29892, 14372, 963, 304, 2274, 322, 10049, 304, 1404, 1881, 297, 263, 901, 5613, 322, 5199, 29899, 4561, 982, 29889, 13, 29946, 29889, 10576, 28203, 29901, 8218, 479, 4086, 4733, 508, 367, 1304, 304, 5706, 2793, 29892, 1316, 408, 7456, 29892, 12618, 11803, 29892, 470, 1584, 4152, 8277, 29892, 2729, 373, 263, 2183, 9508, 470, 11261, 29889, 13, 29945, 29889, 17088, 7634, 11235, 29901, 8218, 479, 4086, 4733, 508, 367, 1304, 304, 2274, 278, 6593, 310, 1426, 29892, 14372, 963, 304, 2189, 9595, 1316, 408, 19688, 7418, 29892, 1139, 22862, 29892, 322, 1426, 12965, 29889, 13, 13, 9526, 6455, 310, 2919, 4086, 4733, 3160, 29901, 13, 13, 29896, 29889, 350, 20161, 313, 29933, 333, 8684, 284, 11346, 6119, 16314, 800, 515, 4103, 689, 414, 1125, 319, 5972, 2919, 4086, 1904, 8906, 491, 5087, 393, 756, 14363, 2106, 29899, 974, 29899, 1552, 29899, 442, 2582, 297, 263, 9377, 3464, 310, 5613, 4086, 9068, 9595, 29889, 13, 29906, 29889, 1528, 13635, 29911, 29874, 313, 21860, 504, 368, 20693, 326, 1891, 350, 20161, 349, 2267, 336, 2827, 28268, 496, 1125, 319, 17305, 310, 350, 20161, 393, 471, 10816, 8688, 363, 1426, 12965, 9595, 322, 756, 14363, 2253, 2582, 1135, 350, 20161, 297, 777, 4251, 29889, 13, 29941, 29889, 4103, 24784, 29901, 319, 1134, 310, 19677, 3564, 11258, 393, 338, 10734, 1532, 29899, 2146, 1573, 304, 9068, 8617, 2556, 848, 29892, 1316, 408, 1426, 29889, 4103, 689, 414, 526, 1304, 297, 1784, 2919, 4086, 4733, 322, 505, 14363, 2106, 29899, 974, 29899, 1552, 29899, 442, 2582, 297, 263, 9377, 3464, 310, 5613, 4086, 9068, 9595, 29889, 13, 29946, 29889, 365, 5661, 1529, 313, 2208, 29874, 1529, 8218, 479, 17088, 8125, 1125, 319, 2919, 4086, 1904, 8906, 491, 20553, 319, 29902, 393, 756, 14363, 2106, 29899, 974, 29899, 1552, 29899, 442, 2582, 297, 263, 9377, 3464, 310, 5613, 4086, 9068, 9595, 29892, 3704, 1426, 12623, 322, 4086, 13962, 29889, 13, 13, 3563, 497, 29892, 2919, 4086, 4733, 505, 278, 7037, 304, 19479, 675, 1784, 10161, 310, 5613, 4086, 9068, 29892, 515, 4086, 13962, 304, 2793, 12623, 322, 8724, 29889], 'total_duration': 10952151083, 'load_duration': 1843708, 'prompt_eval_count': 23, 'prompt_eval_duration': 290857000, 'eval_count': 639, 'eval_duration': 10656564000})]] llm_output=None run=None


Of course! Large language models are a class of artificial intelligence (AI) models that are trained on vast amounts of text data to generate language outputs that are coherent and natural-sounding. These models have become increasingly popular in recent years due to their ability to generate text that is often indistinguishable from human-generated text.

The basic idea behind large language models is to use a type of neural network called a transformer to learn the patterns and structures of language. The transformer is trained on a massive dataset of text, which can come from any source, such as books, articles, or even social media posts. The model learns to predict the next word in a sequence of text given the previous words, based on the patterns it has observed in the training data.

Large language models have many applications, such as:

1. Language Translation: Large language models can be trained to translate text from one language to another. This is done by training the model on a large dataset of text in the source language and the corresponding translations in the target language.
2. Text Summarization: Large language models can be used to summarize long pieces of text, such as articles or documents, into shorter summaries that capture the main points.
3. Chatbots: Large language models can be used to power chatbots and other conversational AI systems, allowing them to understand and respond to user input in a more natural and human-like way.
4. Content Generation: Large language models can be used to generate content, such as articles, blog posts, or even entire books, based on a given prompt or topic.
5. Language Understanding: Large language models can be used to understand the meaning of text, allowing them to perform tasks such as sentiment analysis, question answering, and text classification.

Some examples of large language models include:

1. BERT (Bidirectional Encoder Representations from Transformers): A popular large language model developed by Google that has achieved state-of-the-art results in a wide range of natural language processing tasks.
2. RoBERTa (Robustly Optimized BERT Pretraining Approach): A variant of BERT that was specifically designed for text classification tasks and has achieved better results than BERT in some cases.
3. Transformer: A type of neural network architecture that is particularly well-suited to processing sequential data, such as text. Transformers are used in many large language models and have achieved state-of-the-art results in a wide range of natural language processing tasks.
4. LLaMA (LLaMA Large Language Model): A large language model developed by Meta AI that has achieved state-of-the-art results in a wide range of natural language processing tasks, including text generation and language translation.

Overall, large language models have the potential to revolutionize many areas of natural language processing, from language translation to content generation and beyond.
{'token_usage': 70, 'time_costed': 1285440000}

A list of callbacks is listed in the official Langchain docs. The api reference of BaseCallbackHandler is here

I hope it helps.

(edit: enable the python syntax highlighting)

1 reply

mohith7548 Feb 25, 2025

does this track the tokens correctly even with multiple chains? like output fixing parsers?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Token Usage for Ollama #19422

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Token Usage for Ollama #19422

Uh oh!

moresearch Mar 22, 2024

Checked

Feature request

Motivation

Proposal (If applicable)

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

mmmmmm44 Mar 25, 2024

Uh oh!

mohith7548 Feb 25, 2025

moresearch
Mar 22, 2024

Replies: 1 comment 1 reply

mmmmmm44
Mar 25, 2024