-
Notifications
You must be signed in to change notification settings - Fork 243
fix: QwQ and R1 finetune format #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: QwQ and R1 finetune format #294
Conversation
f1d23cc
to
db6cf83
Compare
66d5077
to
36f4f21
Compare
36f4f21
to
bc6aa1e
Compare
"Reasoning - Learn intermediate thinking and final response", | ||
], | ||
]} | ||
select_options={data_strategy_select_options} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to fix: changing the model can alter the available data strategies, but the selected data_strategy
isn’t updated accordingly, leaving the dropdown with an invalid or empty selection.
Not sure what is better between filtering the set of options, and disabling the unavailable ones. Probably the former
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reactive data_strategy_select_options = get_data_strategies_supported( base_model_id, )
,
you can do: $: update_data_strategies_supported(base_model_id)
and in the function set them both in the function data_strategy = data_strategy_select_options[0]
. Still reactive that way, but they are updated together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still pending. Should make updating data_strategy_select_options always check data_strategy is still valid, and reset to first if not. Let me know if you want a code sample, svelte reactivity can be a bit strange.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UI should be ready now:
- fix: selected model now propagates correctly to narrow down the set of possible strategies
- fix: show explicit
(R1 compatible)
as an option in the training strategy dropdown, if the selected model is a JSON download, except if toolcall dataset, or Vertex - fix: if dataset selected has no
thinking
filter on, bump up toerror
and block submission
bc6aa1e
to
b35706b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WIP review. Only part way through, but sending these over so you have them.
@@ -48,6 +45,10 @@ class FinetuneProviderModel(BaseModel): | |||
|
|||
name: str | |||
id: str | |||
data_strategies_supported: list[FinetuneDataStrategy] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pydantic has funny default value syntax we should use: https://docs.pydantic.dev/latest/concepts/fields/#default-values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
request.data_strategy | ||
== FinetuneDataStrategy.final_and_intermediate_r1_compatible | ||
): | ||
# TODO: check that the base model supports R1 style thinking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leaving a market here to complete the TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a validator on the CreateFinetuneRequest
pydantic model for this; and also refactored to share the same logic with the finetune_providers
endpoint for setting the compatible data strategies on each model
@@ -326,6 +345,9 @@ async def download_dataset_jsonl( | |||
status_code=400, | |||
detail=f"Data strategy '{data_strategy}' not found", | |||
) | |||
|
|||
# TODO: handle data_strategy == FinetuneDataStrategy.final_and_intermediate_r1_compatible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Marker to complete the TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the end, no branching needs to be done in this one so I removed the TODO
, because we let users download whichever data strategy they want when it is a file download (relevant discussion: #294 (comment))
r1_must_not_include = ["distill"] | ||
if any(substring in name.lower() for substring in r1_must_include) and not any( | ||
substring in name.lower() for substring in r1_must_not_include | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add tests for all the changes in this file.
In particular a parameterized test of the qwq/R1 models Fireworks has today. List here: https://docs.getkiln.ai/docs/models-and-ai-providers#additional-fine-tuneable-models
final_and_intermediate: | ||
"Reasoning - Learn intermediate thinking and final response", | ||
final_and_intermediate_r1_compatible: | ||
"Reasoning (R1 compatible) - Learn intermediate thinking and final response", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should leave the "R1 Compatible" out of the UI. 99% of users will get confused. It's an internal concern, they don't need to think about.
All QwQ/R1 fine-tunes have to be trained as R1 reasoning, and you shouldn't be able to select any alternative. So when you pick one of those models we should:
- not show the dropdown offering a choice, just render "Reasoning - Learn intermediate thinking and final response" in a non-editable way (might already be covered by how you are setting the select options)
- maybe subtitle "The selected base-model is a reasoning model, so you must train for reasoning."
- Show a error if their selected dataset isn't filtered to reasoning training data. This isn't a valid training setup. Maybe also raise error deeper in stack for API users.
Note: check out new fancy_select option for fancy UI dropdowns, which might help (optional).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but there is an edge case with the file download case (e.g. Download: HuggingFace chat template (JSONL)
and the likes). The user needs to explicitly express which export format they want, which could conceivably be any one of the 3 formats (final only, final and intermediate, final and intermediate R1 style) depending on what they are planning on doing with it.
Should we abstract away R1 compatible
in the case of a specific model, but expose it as an explicit R1 Compatible
option if the user has selected one of the Download: XYZ
as model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that makes sense.
- Tuning selected mode: picking a R1 model forces R1 specific format
- Download: Show all 3 options. "Reasoning (multi-message) - Learn intermediate thinking and final response", and "Reasoning (R1 format) - Learn intermediate thinking and final response"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Tuning selected mode: picking a R1 model forces R1 specific format
- Download: Show all 3 options. "Reasoning (multi-message) - Learn intermediate thinking and final response", and "Reasoning (R1 format) - Learn intermediate thinking and final response"
Updated to do the above
bind:value={data_strategy} | ||
/> | ||
{#if data_strategy === "final_and_intermediate" && !selecting_thinking_dataset} | ||
{#if (data_strategy === "final_and_intermediate" || data_strategy === "final_and_intermediate_r1_compatible") && !selecting_thinking_dataset} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we bump this to an error for R1 style models, and block continuing. Training one of those without thinking data will just be a disaster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in this commit: defa1ec
|
||
def supports_cot(self) -> bool: | ||
return ( | ||
self.thinking_instructions is not None | ||
and self.thinking is not None | ||
and self.thinking_final_answer_prompt is not None | ||
) or ( | ||
self.thinking_r1_style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this case becomes a raised error.
A normal model can be tuned for optional thinking, but a model that's natively thinking can't really (like we could make it blank, but it would be a disaster and more likely we're just hiding an important bug in training data). Your call if that happens here, or a if we still return bool here then raise a level up.
Also: we should make thinking_instructions are optional for R1 style models. They always think, regardless of instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see - so for example, if we have an R1 model but no thinking
, supports_cot
would return False
, and the caller incorrectly assumes this is just a normal model that doesn't support thinking. It then trains it without any thinking - and no error pops up anywhere.
Minimal change then could be to simply return True
if this is an R1 model:
def supports_cot(self) -> bool:
return (
self.thinking_instructions is not None
and self.thinking is not None
and self.thinking_final_answer_prompt is not None
) or self.thinking_r1_style
And then raise an error downstream where we try to build it if the thinking is missing:
def serialize_r1_style_message(thinking: str | None, final_output: str):
if thinking is None or len(thinking.strip()) == 0:
raise ValueError("Thinking data is required for R1 style")
return f"<think>{thinking}</think>{final_output}"
However, at that point supports_cot
would be mixing two unrelated concepts - checking for the presence of valid thinking data vs. checking for model family - which feels misleading.
Possibly better then to do the following:
- move the thinking validation into
build_training_data
(still keep the error in the R1 serialization error just as a last defense) - avoid calling supports_cot() at all for R1 models - and raise an error if it gets called on R1 data; and so explicitly move the burden of handling R1 branching to the caller
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed new changes to do what I suggested above:
build_training_data
now clearly branches differently for R1 models and for COT; also raises an error if nothinking
and makesthinking_instructions
optional for R1- still have
serialize_r1_style_message
raise an exception if there is nothinking
supports_cot
now cannot be called on R1 models; and so I clearly separated the two cases in each data formatter
Also wondering what we want to do with the
thinking_instructions` in the R1 case:
- Simply replace the
thinking
with thethinking_instructions
? (i.e.<think>custom thinking instructions here</think>final output
) - In the case where
thinking
is missing, should we throw, or should we still just replacethinking
with the provided instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing changes are good (first 3 bullets)
Note: thinking_instructions != thinking. thinking_instructions would be something like "Think step by step" so it should be at the end of the user message. I think the way you have it is good: we don't include thinking instructions in a fine-tune of a thinking model.
For COT it makes sense to fine-tune with it. The source data required the prompt to generate the COT, and we need a system message in COT tuning. Here we "think" automatically, so just learn from the thinking not the prompt.
@@ -82,6 +87,11 @@ def build_training_data( | |||
thinking_final_answer_prompt = None | |||
parent_task = task_run.parent_task() | |||
|
|||
include_cot = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat duplicative of thinking_compatible_data_strategies
Define thinking_compatible_data_strategies
where we define FinetuneDataStrategy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted this into a constant (THINKING_DATA_STRATEGIES
)
) | ||
|
||
|
||
def serialize_r1_style_message(thinking: str | None, final_output: str): | ||
if thinking is None or len(thinking) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is where we raise the error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising a ValueError
from here now if empty / missing thinking
.
Also now raising one level up in build_training_data
if the model is R1 to align with where other similar validation is being done for the COT case
name: str, | ||
) -> list[FinetuneDataStrategy]: | ||
r1_must_include = ["r1", "qwq"] | ||
r1_must_not_include = ["distill"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake, but distill models should be treated like normal QwQ/R1 models. They have the same formatting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed now (it was my mistake; and I thought it was genius for a minute, then realized it wasn't)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to fight you for credit, but I texted you contains qwq or r1 and does not contain distill)
😆
model selected is a file download
8e290cc
to
be33978
Compare
199e2b1
to
6a1f7d1
Compare
TODOs left:
The tests for logic that determines which models support which data strategies (7c7c637) are being done against a local fixture list of fake models. But possibly should add some against the real |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small changes, mostly strings. Let me know if you want example code for svelte one
name: str, | ||
) -> list[FinetuneDataStrategy]: | ||
r1_must_include = ["r1", "qwq"] | ||
r1_must_not_include = ["distill"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to fight you for credit, but I texted you contains qwq or r1 and does not contain distill)
😆
"Reasoning - Learn intermediate thinking and final response", | ||
], | ||
]} | ||
select_options={data_strategy_select_options} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still pending. Should make updating data_strategy_select_options always check data_strategy is still valid, and reset to first if not. Let me know if you want a code sample, svelte reactivity can be a bit strange.
assert ( | ||
infer_data_strategies_for_model(mock_available_models, model_id, provider) | ||
== expected_data_strategies | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a parameterize test with expected real values from the API. Just on data_strategies_from_finetune_id so the parameter is short and sweet.
Expect R1:
- qwq-32b
- DeepSeek R1 (Fast) (deepseek-r1)
- DeepSeek R1 (Basic) (deepseek-r1-basic)
- deepseek-r1-distill-llama-70b
- Deepseek R1 Distill Llama 8B (deepseek-r1-distill-llama-8b)
- Deepseek R1 Distill Qwen 14B (deepseek-r1-distill-qwen-14b)
- Deepseek R1 Distill Qwen 1.5B (deepseek-r1-distill-qwen-1p5b)
- deepseek-r1-distill-qwen-32b
- Deepseek R1 Distill Qwen 7B (deepseek-r1-distill-qwen-7b)
Some ones we expect false:
- DeepSeek V3 (deepseek-v3)
- Deepseek V3 03-24 (deepseek-v3-0324)
And some new Qwen 3 (TBD, see convo above):
- Qwen 3, 30B-A3B version (qwen3-30b-a3b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve added a test (including the qwen3 case) in commit 86c8512
Note: the validation targets the model’s id
, not its name
. Checking the name would require fetching the name from Fireworks.ai, which would then require an async Pydantic validator (which does not seem standard / supported) and extra wiring to attach data strategies per model; or alternatively some caching of the Fireworks.ai response which may come with other problems.
Since we only need this for Fireworks.ai currently - and their IDs always include the necessary keywords - validating on id
should hold ok enough until we move to the remote JSON model registry
def data_strategies_from_finetune_id( | ||
provider_finetune_id: str, | ||
) -> list[FinetuneDataStrategy]: | ||
r1_must_include = ["r1", "qwq"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And Qwen 3 already requires a change 😀
We shoud add "qwen3", and qwen3 should return final_only, final_and_intermediate_r1_compatible (since it can do both thinking or non thinking with /think /no_think directives)
Let's maybe split this into another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a pattern match in the data_strategies_from_finetune_id
to target qwen3
and only allow ['final_and_intermediate_r1_compatible', 'final_only']
- just to be able to test for the allowed strategies in a parameterized test, but have not implemented any other logic for qwen3
besides that
if data_strategy == FinetuneDataStrategy.final_and_intermediate_r1_compatible: | ||
if not task_run.has_thinking_training_data() or not thinking: | ||
raise ValueError( | ||
"Thinking data is required for R1 style. Please save a reasoning or chain of thought output." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Thinking data is required when fine-tuning thinking models (R1, QwQ, etc). Please ensure your fine-tuning dataset contains reasoning or chain of thought output for every entry."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
) | ||
|
||
|
||
def serialize_r1_style_message(thinking: str | None, final_output: str): | ||
if thinking is None or len(thinking.strip()) == 0: | ||
raise ValueError("Thinking data is required for R1 style") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Thinking data is required when fine-tuning thinking models (R1, QwQ, etc). Please ensure your fine-tuning dataset contains reasoning or chain of thought output for every entry."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
|
||
def supports_cot(self) -> bool: | ||
return ( | ||
self.thinking_instructions is not None | ||
and self.thinking is not None | ||
and self.thinking_final_answer_prompt is not None | ||
) or ( | ||
self.thinking_r1_style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing changes are good (first 3 bullets)
Note: thinking_instructions != thinking. thinking_instructions would be something like "Think step by step" so it should be at the end of the user message. I think the way you have it is good: we don't include thinking instructions in a fine-tune of a thinking model.
For COT it makes sense to fine-tune with it. The source data required the prompt to generate the COT, and we need a system message in COT tuning. Here we "think" automatically, so just learn from the thinking not the prompt.
if thinking is None or len(thinking.strip()) == 0: | ||
raise ValueError("Thinking data is required for R1 style") | ||
|
||
return f"<think>{thinking}</think>{final_output}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"<think>\n{thinking}\n</think>\n\n{final_output}"
seems more standard
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to add the newlines - also updated tests accordingly
thinking=training_data.thinking, | ||
final_output=training_data.final_output, | ||
), | ||
"tool_calls": tool_calls, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this valid? Should it be 2 assistant messages? I can look up but don't want to if you already have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was unsure as well - checked by making some calls to Fireworks and OpenAI:
- it seems that the model always responds with either a toolcall or a message; not both simultaneously (even if SDK typing may allow for it); so this format is most probably wrong
- seems like we can get a
reasoning
summary alongside a toolcall with OpenAI's reasoning models at least (see below for an example) - seems like Fireworks does not support toolcalls on the reasoning models they have
Trying to figure out what we get with LiteLLM now.
Here's an example request / response for 2.
:
Endpoint: POST https://api.openai.com/v1/responses
Request:
{
"model": "o4-mini",
"reasoning": {
"effort": "low",
"summary": "auto"
},
"input": [
{
"role": "developer",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "If I move 100km outside of Taichung heading east, should I need an umbrella tomorrow?"
}
],
"tools": [
{
"type": "function",
"name": "get_weather",
"description": "Get current temperature for a given location.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
}
},
"required": [
"location"
],
"additionalProperties": false
}
}
]
}
Response:
{
"id": "resp_xxx",
"object": "response",
"created_at": 1746039896,
"status": "completed",
"error": null,
"incomplete_details": null,
"instructions": null,
"max_output_tokens": null,
"model": "o4-mini-2025-04-16",
"output": [
{
"id": "rs_xxxxx",
"type": "reasoning",
"summary": [
{
"type": "summary_text",
"text": "**Determining weather for a travel location**\n\nThe user is moving 100km east of Taichung, which puts them in the Pacific Ocean. I wonder if I should check the nearest city for weather instead. Should I use coordinates or simply go with a city like Hualien, which is to the east of Taichung? Hualien is about 140km away, so maybe there’s a closer option like Yuli, but I’ll stick with Hualien for simplicity. I’ll check the weather there and suggest bringing an umbrella!"
}
]
},
{
"id": "fc_xxx",
"type": "function_call",
"status": "completed",
"arguments": "{\"location\":\"Hualien, Taiwan\"}",
"call_id": "call_xxx",
"name": "get_weather"
}
],
"parallel_tool_calls": true,
"previous_response_id": null,
"reasoning": {
"effort": "low",
"summary": "detailed"
},
"service_tier": "default",
"store": true,
"temperature": 1.0,
"text": {
"format": {
"type": "text"
}
},
"tool_choice": "auto",
"tools": [
{
"type": "function",
"description": "Get current temperature for a given location.",
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City and country e.g. Bogotá, Colombia"
}
},
"required": [
"location"
],
"additionalProperties": false
},
"strict": true
}
],
"top_p": 1.0,
"truncation": "disabled",
"usage": {
"input_tokens": 83,
"input_tokens_details": {
"cached_tokens": 0
},
"output_tokens": 149,
"output_tokens_details": {
"reasoning_tokens": 128
},
"total_tokens": 232
},
"user": null,
"metadata": {}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabled the R1 compatible strategy for JSON toolcall downloads (also for Vertex); on both frontend and backend
"content": serialize_r1_style_message( | ||
thinking=training_data.thinking, | ||
final_output=training_data.final_output, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disabled the R1 compatible strategy for JSON toolcall downloads (also for Vertex); on both frontend and backend
@@ -305,7 +421,28 @@ def generate_vertex_gemini( | |||
} | |||
] | |||
|
|||
if training_data.supports_cot(): | |||
if training_data.thinking_r1_style: | |||
contents.extend( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe raise an error here? No gemini model supports R1 style thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thinking there was that it seems that Vertex AI supports custom models - never used it so not exactly sure if it works how I understand it, but for example these two suggest it is a thing:
- https://medium.com/@development_52384/deepseek-on-vertex-ai-endpoints-f5ebaa0f60a3
- https://github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb
That being said, we don't seem to support custom models deployed to Vertex at this point; so probably indeed best to just raise for now
@@ -397,7 +527,7 @@ def dump_to_file( | |||
|
|||
generator = FORMAT_GENERATORS[format_type] | |||
|
|||
include_cot = data_strategy == FinetuneDataStrategy.final_and_intermediate | |||
include_cot = data_strategy in THINKING_DATA_STRATEGIES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're only using this for title now? Maybe better titles would be:
- final_and_intermediate_r1: "thinking"
- final_and_intermediate: "chain_of_thought"
- anything else: "final_only"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can change the string values; I have not found places where this would be problematic - we use them in the frontend, but they are typed according to the API schema so will fail typecheck if not updated at the same time.
Also agree changing this might decrease confusion - as R1 vs COT are different concepts rather than just variants of formatting.
Would suggest making the change in a separate PR (I can do it)
@@ -108,6 +108,11 @@ | |||
} | |||
|
|||
function build_available_model_select(models: FinetuneProvider[]) { | |||
for (const model of models) { | |||
for (const provider of model.models) { | |||
console.log(model.id, provider.name, provider.id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was already removed - possibly not reviewing the latest commits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was reviewing commit by commit since it's gotten large. Looks good now!
@@ -580,7 +579,7 @@ | |||
</div> | |||
{:else} | |||
<FormContainer | |||
{submit_visible} | |||
submit_visible={submit_visible && !config_is_error} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tag based dataset might be correctly filtered to thinking, and not have selecting_thinking_dataset==true.
Let's leave this as a warning (you have that below), and not disable submit. We'll catch the issue at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turned it back into a warning
instead of error
and no longer blocking submit
@@ -502,6 +503,25 @@ | |||
: "Reasoning - Learn intermediate thinking and final response", | |||
} | |||
|
|||
console.info({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
{#if data_strategy === "final_and_intermediate_r1_compatible" && !selecting_thinking_dataset} | ||
<Warning | ||
warning_color="error" | ||
warning_message="You are training a model of the R1 family for inference-time thinking, but are not using a dataset filtered to samples with reasoning or chain-of-thought training data. This is strongly not recommended, as it may lead to poor performance. We suggest creating a new dataset with a thinking filter." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"You are training a "thinking" model, but did not explicitly select a dataset filtered to samples with reasoning or chain-of-thought training data. If any of your training samples are missing reasoning data, it will error. If your data contains reasoning, you can ignore this warning."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the string (and turned back into a warning
instead of error
)
What does this PR do?
<think></think>
tagsThe fix involves the following changes:
FinetuneDataStrategy
(final_and_intermediate_r1_compatible
)GET /api/finetune_providers
; and changes to the UI to narrow down theModel Type / Training Strategy
optionsRelated Issues
#287
Contributor License Agreement
I, @leonardmq, confirm that I have read and agree to the Contributors License Agreement.
Checklists