Query : Training LLM on custom library for code generation? #12628

anujmehta · 2023-10-31T10:54:22Z

anujmehta
Oct 31, 2023

I wanted to train LLM with a custom library with tons of functions. Intent is to able to generate code given LLM trained on this custom library. As the library is pretty huge I run out of tokens when adding it to context of LLM (I am using gpt-4-32k).

@anujmehta · 2023-10-31T11:03:12Z

dosubot[bot]
bot Oct 31, 2023

🤖

Hello @anujmehta,

I'm here to assist you with your query. I'm an AI bot designed to help with bug resolution, answer questions, and guide you in becoming a contributor while you wait for a human maintainer.

Regarding your question about training a Language Learning Model (LLM) on a large custom library for code generation without running out of tokens when using gpt-4-32k, you can use the streaming feature provided by the LangChain framework. This feature allows you to generate tokens in a streaming manner, which can be useful when dealing with large amounts of data.

Here is a sample code snippet:

from langchain.llms import OpenAI

# Initialize the OpenAI model with streaming enabled
openai = OpenAI(model_name="gpt-4-32k", streaming=True)

# Define your prompt
prompt = "Your prompt here"

# Generate tokens in a streaming manner
for chunk in openai._stream(prompt):
    print(chunk.text)

In this code, the _stream method is used to generate tokens in a streaming manner. This method takes a prompt as input and returns an iterator of GenerationChunk objects, each containing a chunk of generated text. This allows you to handle large amounts of data without running out of tokens.

Please note that the _stream method is a private method (as indicated by the underscore prefix), which means it's not intended to be used directly. However, it's used here for demonstration purposes. In a real-world scenario, you would likely use the public generate method, which internally uses _stream when streaming is enabled.

As for training LLMs on custom libraries, I couldn't find a direct answer within the LangChain repository. However, the streaming feature should theoretically allow you to train an LLM on a large custom library by feeding the library to the model in chunks.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Best,
Dosu

Sources

libs/langchain/langchain/llms/openai.py

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

pinyinzimu · 2025-07-15T03:27:20Z

pinyinzimu
Jul 15, 2025

hello, I'm also stuck in this type of challenge, any good ideas have you found?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Query : Training LLM on custom library for code generation? #12628

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Query : Training LLM on custom library for code generation? #12628

Uh oh!

anujmehta Oct 31, 2023

Replies: 2 comments

Uh oh!

dosubot[bot] bot Oct 31, 2023

Sources

Uh oh!

pinyinzimu Jul 15, 2025

anujmehta
Oct 31, 2023

dosubot[bot]
bot Oct 31, 2023

pinyinzimu
Jul 15, 2025