Skip to content

fix: QwQ and R1 finetune format #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 1, 2025

Conversation

leonardmq
Copy link
Collaborator

What does this PR do?

  • fix: the current fine-tuning format for QwQ and R1 models is not suitable; we currently do a multi-message prompt to pass in the reasoning string; but it should be included in the same message wrapped in <think></think> tags

The fix involves the following changes:

  • add a new FinetuneDataStrategy (final_and_intermediate_r1_compatible)
  • expose the supported data strategies supported by each model in GET /api/finetune_providers; and changes to the UI to narrow down the Model Type / Training Strategy options
  • update the dataset formatting generators to format the messages according to the data strategy
  • use matching data strategy at inference time on fine-tuned model

Related Issues

#287

Contributor License Agreement

I, @leonardmq, confirm that I have read and agree to the Contributors License Agreement.

Checklists

  • Tests have been run locally and passed
  • New tests have been added to any work in /lib

@leonardmq leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from f1d23cc to db6cf83 Compare April 24, 2025 18:22
@leonardmq leonardmq marked this pull request as draft April 24, 2025 18:28
@leonardmq leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch 7 times, most recently from 66d5077 to 36f4f21 Compare April 25, 2025 10:39
@leonardmq leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from 36f4f21 to bc6aa1e Compare April 25, 2025 10:42
"Reasoning - Learn intermediate thinking and final response",
],
]}
select_options={data_strategy_select_options}
Copy link
Collaborator Author

@leonardmq leonardmq Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to fix: changing the model can alter the available data strategies, but the selected data_strategy isn’t updated accordingly, leaving the dropdown with an invalid or empty selection.

Not sure what is better between filtering the set of options, and disabling the unavailable ones. Probably the former

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of reactive data_strategy_select_options = get_data_strategies_supported( base_model_id, ),

you can do: $: update_data_strategies_supported(base_model_id) and in the function set them both in the function data_strategy = data_strategy_select_options[0]. Still reactive that way, but they are updated together.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still pending. Should make updating data_strategy_select_options always check data_strategy is still valid, and reset to first if not. Let me know if you want a code sample, svelte reactivity can be a bit strange.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UI should be ready now:

  • fix: selected model now propagates correctly to narrow down the set of possible strategies
  • fix: show explicit (R1 compatible) as an option in the training strategy dropdown, if the selected model is a JSON download, except if toolcall dataset, or Vertex
  • fix: if dataset selected has no thinking filter on, bump up to error and block submission

@leonardmq leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from bc6aa1e to b35706b Compare April 25, 2025 10:46
@leonardmq leonardmq changed the title WIP: fix: QwQ / R1 fine-tuning WIP: fix: QwQ and R1 finetune format Apr 25, 2025
Copy link
Collaborator

@scosman scosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIP review. Only part way through, but sending these over so you have them.

@@ -48,6 +45,10 @@ class FinetuneProviderModel(BaseModel):

name: str
id: str
data_strategies_supported: list[FinetuneDataStrategy] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic has funny default value syntax we should use: https://docs.pydantic.dev/latest/concepts/fields/#default-values

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

request.data_strategy
== FinetuneDataStrategy.final_and_intermediate_r1_compatible
):
# TODO: check that the base model supports R1 style thinking
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving a market here to complete the TODO

Copy link
Collaborator Author

@leonardmq leonardmq Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a validator on the CreateFinetuneRequest pydantic model for this; and also refactored to share the same logic with the finetune_providers endpoint for setting the compatible data strategies on each model

@@ -326,6 +345,9 @@ async def download_dataset_jsonl(
status_code=400,
detail=f"Data strategy '{data_strategy}' not found",
)

# TODO: handle data_strategy == FinetuneDataStrategy.final_and_intermediate_r1_compatible
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marker to complete the TODO

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, no branching needs to be done in this one so I removed the TODO, because we let users download whichever data strategy they want when it is a file download (relevant discussion: #294 (comment))

r1_must_not_include = ["distill"]
if any(substring in name.lower() for substring in r1_must_include) and not any(
substring in name.lower() for substring in r1_must_not_include
):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add tests for all the changes in this file.

In particular a parameterized test of the qwq/R1 models Fireworks has today. List here: https://docs.getkiln.ai/docs/models-and-ai-providers#additional-fine-tuneable-models

final_and_intermediate:
"Reasoning - Learn intermediate thinking and final response",
final_and_intermediate_r1_compatible:
"Reasoning (R1 compatible) - Learn intermediate thinking and final response",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should leave the "R1 Compatible" out of the UI. 99% of users will get confused. It's an internal concern, they don't need to think about.

All QwQ/R1 fine-tunes have to be trained as R1 reasoning, and you shouldn't be able to select any alternative. So when you pick one of those models we should:

  • not show the dropdown offering a choice, just render "Reasoning - Learn intermediate thinking and final response" in a non-editable way (might already be covered by how you are setting the select options)
  • maybe subtitle "The selected base-model is a reasoning model, so you must train for reasoning."
  • Show a error if their selected dataset isn't filtered to reasoning training data. This isn't a valid training setup. Maybe also raise error deeper in stack for API users.

Note: check out new fancy_select option for fancy UI dropdowns, which might help (optional).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but there is an edge case with the file download case (e.g. Download: HuggingFace chat template (JSONL) and the likes). The user needs to explicitly express which export format they want, which could conceivably be any one of the 3 formats (final only, final and intermediate, final and intermediate R1 style) depending on what they are planning on doing with it.

Should we abstract away R1 compatible in the case of a specific model, but expose it as an explicit R1 Compatible option if the user has selected one of the Download: XYZ as model?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup that makes sense.

  • Tuning selected mode: picking a R1 model forces R1 specific format
  • Download: Show all 3 options. "Reasoning (multi-message) - Learn intermediate thinking and final response", and "Reasoning (R1 format) - Learn intermediate thinking and final response"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Tuning selected mode: picking a R1 model forces R1 specific format
  • Download: Show all 3 options. "Reasoning (multi-message) - Learn intermediate thinking and final response", and "Reasoning (R1 format) - Learn intermediate thinking and final response"

Updated to do the above

bind:value={data_strategy}
/>
{#if data_strategy === "final_and_intermediate" && !selecting_thinking_dataset}
{#if (data_strategy === "final_and_intermediate" || data_strategy === "final_and_intermediate_r1_compatible") && !selecting_thinking_dataset}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we bump this to an error for R1 style models, and block continuing. Training one of those without thinking data will just be a disaster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in this commit: defa1ec


def supports_cot(self) -> bool:
return (
self.thinking_instructions is not None
and self.thinking is not None
and self.thinking_final_answer_prompt is not None
) or (
self.thinking_r1_style
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this case becomes a raised error.

A normal model can be tuned for optional thinking, but a model that's natively thinking can't really (like we could make it blank, but it would be a disaster and more likely we're just hiding an important bug in training data). Your call if that happens here, or a if we still return bool here then raise a level up.

Also: we should make thinking_instructions are optional for R1 style models. They always think, regardless of instructions.

Copy link
Collaborator Author

@leonardmq leonardmq Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see - so for example, if we have an R1 model but no thinking, supports_cot would return False, and the caller incorrectly assumes this is just a normal model that doesn't support thinking. It then trains it without any thinking - and no error pops up anywhere.

Minimal change then could be to simply return True if this is an R1 model:

    def supports_cot(self) -> bool:
        return (
            self.thinking_instructions is not None
            and self.thinking is not None
            and self.thinking_final_answer_prompt is not None
        ) or self.thinking_r1_style

And then raise an error downstream where we try to build it if the thinking is missing:

def serialize_r1_style_message(thinking: str | None, final_output: str):
    if thinking is None or len(thinking.strip()) == 0:
        raise ValueError("Thinking data is required for R1 style")

    return f"<think>{thinking}</think>{final_output}"

However, at that point supports_cot would be mixing two unrelated concepts - checking for the presence of valid thinking data vs. checking for model family - which feels misleading.

Possibly better then to do the following:

  • move the thinking validation into build_training_data (still keep the error in the R1 serialization error just as a last defense)
  • avoid calling supports_cot() at all for R1 models - and raise an error if it gets called on R1 data; and so explicitly move the burden of handling R1 branching to the caller

Copy link
Collaborator Author

@leonardmq leonardmq Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed new changes to do what I suggested above:

  • build_training_data now clearly branches differently for R1 models and for COT; also raises an error if no thinking and makes thinking_instructions optional for R1
  • still have serialize_r1_style_message raise an exception if there is no thinking
  • supports_cot now cannot be called on R1 models; and so I clearly separated the two cases in each data formatter

Also wondering what we want to do with the thinking_instructions` in the R1 case:

  1. Simply replace the thinking with the thinking_instructions? (i.e. <think>custom thinking instructions here</think>final output)
  2. In the case where thinking is missing, should we throw, or should we still just replace thinking with the provided instruction?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing changes are good (first 3 bullets)

Note: thinking_instructions != thinking. thinking_instructions would be something like "Think step by step" so it should be at the end of the user message. I think the way you have it is good: we don't include thinking instructions in a fine-tune of a thinking model.

For COT it makes sense to fine-tune with it. The source data required the prompt to generate the COT, and we need a system message in COT tuning. Here we "think" automatically, so just learn from the thinking not the prompt.

@@ -82,6 +87,11 @@ def build_training_data(
thinking_final_answer_prompt = None
parent_task = task_run.parent_task()

include_cot = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat duplicative of thinking_compatible_data_strategies

Define thinking_compatible_data_strategies where we define FinetuneDataStrategy?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted this into a constant (THINKING_DATA_STRATEGIES)

)


def serialize_r1_style_message(thinking: str | None, final_output: str):
if thinking is None or len(thinking) == 0:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this is where we raise the error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising a ValueError from here now if empty / missing thinking.

Also now raising one level up in build_training_data if the model is R1 to align with where other similar validation is being done for the COT case

name: str,
) -> list[FinetuneDataStrategy]:
r1_must_include = ["r1", "qwq"]
r1_must_not_include = ["distill"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, but distill models should be treated like normal QwQ/R1 models. They have the same formatting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed now (it was my mistake; and I thought it was genius for a minute, then realized it wasn't)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to fight you for credit, but I texted you contains qwq or r1 and does not contain distill) 😆

@leonardmq leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from 8e290cc to be33978 Compare April 27, 2025 17:36
@leonardmq leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from 199e2b1 to 6a1f7d1 Compare April 27, 2025 19:50
@leonardmq
Copy link
Collaborator Author

leonardmq commented Apr 27, 2025

TODOs left:

  • more tests for build_training_data branches, the finetune creation endpoint
  • handle thinking_instructions for R1 models; it currently does nothing
  • UI fixes: will get back to that once the backend is finished
    • Do not show explicit R1 label in data strategy dropdown; except for dataset downloads (instead of models)
    • Fix awkwardness of changing model causing empty dropdown with selected option that does not exist

The tests for logic that determines which models support which data strategies (7c7c637) are being done against a local fixture list of fake models. But possibly should add some against the real built_in_models list of models as well

Copy link
Collaborator

@scosman scosman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes, mostly strings. Let me know if you want example code for svelte one

name: str,
) -> list[FinetuneDataStrategy]:
r1_must_include = ["r1", "qwq"]
r1_must_not_include = ["distill"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not to fight you for credit, but I texted you contains qwq or r1 and does not contain distill) 😆

"Reasoning - Learn intermediate thinking and final response",
],
]}
select_options={data_strategy_select_options}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still pending. Should make updating data_strategy_select_options always check data_strategy is still valid, and reset to first if not. Let me know if you want a code sample, svelte reactivity can be a bit strange.

assert (
infer_data_strategies_for_model(mock_available_models, model_id, provider)
== expected_data_strategies
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a parameterize test with expected real values from the API. Just on data_strategies_from_finetune_id so the parameter is short and sweet.

Expect R1:

  • qwq-32b
  • DeepSeek R1 (Fast) (deepseek-r1)
  • DeepSeek R1 (Basic) (deepseek-r1-basic)
  • deepseek-r1-distill-llama-70b
  • Deepseek R1 Distill Llama 8B (deepseek-r1-distill-llama-8b)
  • Deepseek R1 Distill Qwen 14B (deepseek-r1-distill-qwen-14b)
  • Deepseek R1 Distill Qwen 1.5B (deepseek-r1-distill-qwen-1p5b)
  • deepseek-r1-distill-qwen-32b
  • Deepseek R1 Distill Qwen 7B (deepseek-r1-distill-qwen-7b)

Some ones we expect false:

  • DeepSeek V3 (deepseek-v3)
  • Deepseek V3 03-24 (deepseek-v3-0324)

And some new Qwen 3 (TBD, see convo above):

  • Qwen 3, 30B-A3B version (qwen3-30b-a3b)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve added a test (including the qwen3 case) in commit 86c8512

Note: the validation targets the model’s id, not its name. Checking the name would require fetching the name from Fireworks.ai, which would then require an async Pydantic validator (which does not seem standard / supported) and extra wiring to attach data strategies per model; or alternatively some caching of the Fireworks.ai response which may come with other problems.

Since we only need this for Fireworks.ai currently - and their IDs always include the necessary keywords - validating on id should hold ok enough until we move to the remote JSON model registry

def data_strategies_from_finetune_id(
provider_finetune_id: str,
) -> list[FinetuneDataStrategy]:
r1_must_include = ["r1", "qwq"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And Qwen 3 already requires a change 😀

We shoud add "qwen3", and qwen3 should return final_only, final_and_intermediate_r1_compatible (since it can do both thinking or non thinking with /think /no_think directives)

Let's maybe split this into another PR?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a pattern match in the data_strategies_from_finetune_id to target qwen3 and only allow ['final_and_intermediate_r1_compatible', 'final_only'] - just to be able to test for the allowed strategies in a parameterized test, but have not implemented any other logic for qwen3 besides that

if data_strategy == FinetuneDataStrategy.final_and_intermediate_r1_compatible:
if not task_run.has_thinking_training_data() or not thinking:
raise ValueError(
"Thinking data is required for R1 style. Please save a reasoning or chain of thought output."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Thinking data is required when fine-tuning thinking models (R1, QwQ, etc). Please ensure your fine-tuning dataset contains reasoning or chain of thought output for every entry."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

)


def serialize_r1_style_message(thinking: str | None, final_output: str):
if thinking is None or len(thinking.strip()) == 0:
raise ValueError("Thinking data is required for R1 style")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Thinking data is required when fine-tuning thinking models (R1, QwQ, etc). Please ensure your fine-tuning dataset contains reasoning or chain of thought output for every entry."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated


def supports_cot(self) -> bool:
return (
self.thinking_instructions is not None
and self.thinking is not None
and self.thinking_final_answer_prompt is not None
) or (
self.thinking_r1_style
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing changes are good (first 3 bullets)

Note: thinking_instructions != thinking. thinking_instructions would be something like "Think step by step" so it should be at the end of the user message. I think the way you have it is good: we don't include thinking instructions in a fine-tune of a thinking model.

For COT it makes sense to fine-tune with it. The source data required the prompt to generate the COT, and we need a system message in COT tuning. Here we "think" automatically, so just learn from the thinking not the prompt.

if thinking is None or len(thinking.strip()) == 0:
raise ValueError("Thinking data is required for R1 style")

return f"<think>{thinking}</think>{final_output}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f"<think>\n{thinking}\n</think>\n\n{final_output}" seems more standard

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to add the newlines - also updated tests accordingly

thinking=training_data.thinking,
final_output=training_data.final_output,
),
"tool_calls": tool_calls,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this valid? Should it be 2 assistant messages? I can look up but don't want to if you already have.

Copy link
Collaborator Author

@leonardmq leonardmq Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was unsure as well - checked by making some calls to Fireworks and OpenAI:

  1. it seems that the model always responds with either a toolcall or a message; not both simultaneously (even if SDK typing may allow for it); so this format is most probably wrong
  2. seems like we can get a reasoning summary alongside a toolcall with OpenAI's reasoning models at least (see below for an example)
  3. seems like Fireworks does not support toolcalls on the reasoning models they have

Trying to figure out what we get with LiteLLM now.

Here's an example request / response for 2.:

Endpoint: POST https://api.openai.com/v1/responses

Request:

{
    "model": "o4-mini",
    "reasoning": {
        "effort": "low",
        "summary": "auto"
    },
    "input": [
        {
            "role": "developer",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": "If I move 100km outside of Taichung heading east, should I need an umbrella tomorrow?"
        }
    ],
    "tools": [
        {
            "type": "function",
            "name": "get_weather",
            "description": "Get current temperature for a given location.",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country e.g. Bogotá, Colombia"
                    }
                },
                "required": [
                    "location"
                ],
                "additionalProperties": false
            }
        }
    ]
}

Response:

{
    "id": "resp_xxx",
    "object": "response",
    "created_at": 1746039896,
    "status": "completed",
    "error": null,
    "incomplete_details": null,
    "instructions": null,
    "max_output_tokens": null,
    "model": "o4-mini-2025-04-16",
    "output": [
        {
            "id": "rs_xxxxx",
            "type": "reasoning",
            "summary": [
                {
                    "type": "summary_text",
                    "text": "**Determining weather for a travel location**\n\nThe user is moving 100km east of Taichung, which puts them in the Pacific Ocean. I wonder if I should check the nearest city for weather instead. Should I use coordinates or simply go with a city like Hualien, which is to the east of Taichung? Hualien is about 140km away, so maybe there’s a closer option like Yuli, but I’ll stick with Hualien for simplicity. I’ll check the weather there and suggest bringing an umbrella!"
                }
            ]
        },
        {
            "id": "fc_xxx",
            "type": "function_call",
            "status": "completed",
            "arguments": "{\"location\":\"Hualien, Taiwan\"}",
            "call_id": "call_xxx",
            "name": "get_weather"
        }
    ],
    "parallel_tool_calls": true,
    "previous_response_id": null,
    "reasoning": {
        "effort": "low",
        "summary": "detailed"
    },
    "service_tier": "default",
    "store": true,
    "temperature": 1.0,
    "text": {
        "format": {
            "type": "text"
        }
    },
    "tool_choice": "auto",
    "tools": [
        {
            "type": "function",
            "description": "Get current temperature for a given location.",
            "name": "get_weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and country e.g. Bogotá, Colombia"
                    }
                },
                "required": [
                    "location"
                ],
                "additionalProperties": false
            },
            "strict": true
        }
    ],
    "top_p": 1.0,
    "truncation": "disabled",
    "usage": {
        "input_tokens": 83,
        "input_tokens_details": {
            "cached_tokens": 0
        },
        "output_tokens": 149,
        "output_tokens_details": {
            "reasoning_tokens": 128
        },
        "total_tokens": 232
    },
    "user": null,
    "metadata": {}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled the R1 compatible strategy for JSON toolcall downloads (also for Vertex); on both frontend and backend

"content": serialize_r1_style_message(
thinking=training_data.thinking,
final_output=training_data.final_output,
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disabled the R1 compatible strategy for JSON toolcall downloads (also for Vertex); on both frontend and backend

@@ -305,7 +421,28 @@ def generate_vertex_gemini(
}
]

if training_data.supports_cot():
if training_data.thinking_r1_style:
contents.extend(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe raise an error here? No gemini model supports R1 style thinking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking there was that it seems that Vertex AI supports custom models - never used it so not exactly sure if it works how I understand it, but for example these two suggest it is a thing:

That being said, we don't seem to support custom models deployed to Vertex at this point; so probably indeed best to just raise for now

@@ -397,7 +527,7 @@ def dump_to_file(

generator = FORMAT_GENERATORS[format_type]

include_cot = data_strategy == FinetuneDataStrategy.final_and_intermediate
include_cot = data_strategy in THINKING_DATA_STRATEGIES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're only using this for title now? Maybe better titles would be:

  • final_and_intermediate_r1: "thinking"
  • final_and_intermediate: "chain_of_thought"
  • anything else: "final_only"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change the string values; I have not found places where this would be problematic - we use them in the frontend, but they are typed according to the API schema so will fail typecheck if not updated at the same time.

Also agree changing this might decrease confusion - as R1 vs COT are different concepts rather than just variants of formatting.

Would suggest making the change in a separate PR (I can do it)

@@ -108,6 +108,11 @@
}

function build_available_model_select(models: FinetuneProvider[]) {
for (const model of models) {
for (const provider of model.models) {
console.log(model.id, provider.name, provider.id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was already removed - possibly not reviewing the latest commits?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was reviewing commit by commit since it's gotten large. Looks good now!

@@ -580,7 +579,7 @@
</div>
{:else}
<FormContainer
{submit_visible}
submit_visible={submit_visible && !config_is_error}
Copy link
Collaborator

@scosman scosman May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tag based dataset might be correctly filtered to thinking, and not have selecting_thinking_dataset==true.

Let's leave this as a warning (you have that below), and not disable submit. We'll catch the issue at runtime.

Copy link
Collaborator Author

@leonardmq leonardmq May 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turned it back into a warning instead of error and no longer blocking submit

@@ -502,6 +503,25 @@
: "Reasoning - Learn intermediate thinking and final response",
}

console.info({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

{#if data_strategy === "final_and_intermediate_r1_compatible" && !selecting_thinking_dataset}
<Warning
warning_color="error"
warning_message="You are training a model of the R1 family for inference-time thinking, but are not using a dataset filtered to samples with reasoning or chain-of-thought training data. This is strongly not recommended, as it may lead to poor performance. We suggest creating a new dataset with a thinking filter."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"You are training a "thinking" model, but did not explicitly select a dataset filtered to samples with reasoning or chain-of-thought training data. If any of your training samples are missing reasoning data, it will error. If your data contains reasoning, you can ignore this warning."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the string (and turned back into a warning instead of error)

@leonardmq leonardmq changed the title WIP: fix: QwQ and R1 finetune format fix: QwQ and R1 finetune format May 1, 2025
@leonardmq leonardmq marked this pull request as ready for review May 1, 2025 16:13
@scosman scosman merged commit 4d25808 into Kiln-AI:main May 1, 2025
2 checks passed
@leonardmq leonardmq mentioned this pull request May 1, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants