fix: QwQ and R1 finetune format #294

leonardmq · 2025-04-24T18:16:58Z

What does this PR do?

fix: the current fine-tuning format for QwQ and R1 models is not suitable; we currently do a multi-message prompt to pass in the reasoning string; but it should be included in the same message wrapped in <think></think> tags

The fix involves the following changes:

add a new FinetuneDataStrategy (final_and_intermediate_r1_compatible)
expose the supported data strategies supported by each model in GET /api/finetune_providers; and changes to the UI to narrow down the Model Type / Training Strategy options
update the dataset formatting generators to format the messages according to the data strategy
use matching data strategy at inference time on fine-tuned model

Related Issues

#287

Contributor License Agreement

I, @leonardmq, confirm that I have read and agree to the Contributors License Agreement.

Checklists

Tests have been run locally and passed
New tests have been added to any work in /lib

libs/core/kiln_ai/adapters/provider_tools.py

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

leonardmq · 2025-04-25T10:45:27Z

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte

-                  "Reasoning - Learn intermediate thinking and final response",
-                ],
-              ]}
+              select_options={data_strategy_select_options}


Need to fix: changing the model can alter the available data strategies, but the selected data_strategy isn’t updated accordingly, leaving the dropdown with an invalid or empty selection.

Not sure what is better between filtering the set of options, and disabling the unavailable ones. Probably the former

Instead of reactive data_strategy_select_options = get_data_strategies_supported( base_model_id, ),

you can do: $: update_data_strategies_supported(base_model_id) and in the function set them both in the function data_strategy = data_strategy_select_options[0]. Still reactive that way, but they are updated together.

Still pending. Should make updating data_strategy_select_options always check data_strategy is still valid, and reset to first if not. Let me know if you want a code sample, svelte reactivity can be a bit strange.

The UI should be ready now:

fix: selected model now propagates correctly to narrow down the set of possible strategies

fix: show explicit (R1 compatible) as an option in the training strategy dropdown, if the selected model is a JSON download, except if toolcall dataset, or Vertex

fix: if dataset selected has no thinking filter on, bump up to error and block submission

scosman

WIP review. Only part way through, but sending these over so you have them.

scosman · 2025-04-25T19:42:25Z

app/desktop/studio_server/finetune_api.py

@@ -48,6 +45,10 @@ class FinetuneProviderModel(BaseModel):

    name: str
    id: str
+    data_strategies_supported: list[FinetuneDataStrategy] = [


Pydantic has funny default value syntax we should use: https://docs.pydantic.dev/latest/concepts/fields/#default-values

scosman · 2025-04-25T19:48:53Z

app/desktop/studio_server/finetune_api.py

+            request.data_strategy
+            == FinetuneDataStrategy.final_and_intermediate_r1_compatible
+        ):
+            # TODO: check that the base model supports R1 style thinking


Just leaving a market here to complete the TODO

Added a validator on the CreateFinetuneRequest pydantic model for this; and also refactored to share the same logic with the finetune_providers endpoint for setting the compatible data strategies on each model

scosman · 2025-04-25T19:49:02Z

app/desktop/studio_server/finetune_api.py

@@ -326,6 +345,9 @@ async def download_dataset_jsonl(
                status_code=400,
                detail=f"Data strategy '{data_strategy}' not found",
            )
+
+        # TODO: handle data_strategy == FinetuneDataStrategy.final_and_intermediate_r1_compatible


Marker to complete the TODO

In the end, no branching needs to be done in this one so I removed the TODO, because we let users download whichever data strategy they want when it is a file download (relevant discussion: #294 (comment))

scosman · 2025-04-25T19:51:10Z

app/desktop/studio_server/finetune_api.py

+    r1_must_not_include = ["distill"]
+    if any(substring in name.lower() for substring in r1_must_include) and not any(
+        substring in name.lower() for substring in r1_must_not_include
+    ):


We should add tests for all the changes in this file.

In particular a parameterized test of the qwq/R1 models Fireworks has today. List here: https://docs.getkiln.ai/docs/models-and-ai-providers#additional-fine-tuneable-models

scosman · 2025-04-25T19:58:29Z

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte

+    final_and_intermediate:
+      "Reasoning - Learn intermediate thinking and final response",
+    final_and_intermediate_r1_compatible:
+      "Reasoning (R1 compatible) - Learn intermediate thinking and final response",


We should leave the "R1 Compatible" out of the UI. 99% of users will get confused. It's an internal concern, they don't need to think about.

All QwQ/R1 fine-tunes have to be trained as R1 reasoning, and you shouldn't be able to select any alternative. So when you pick one of those models we should:

not show the dropdown offering a choice, just render "Reasoning - Learn intermediate thinking and final response" in a non-editable way (might already be covered by how you are setting the select options)

maybe subtitle "The selected base-model is a reasoning model, so you must train for reasoning."

Show a error if their selected dataset isn't filtered to reasoning training data. This isn't a valid training setup. Maybe also raise error deeper in stack for API users.

Note: check out new fancy_select option for fancy UI dropdowns, which might help (optional).

I agree, but there is an edge case with the file download case (e.g. Download: HuggingFace chat template (JSONL) and the likes). The user needs to explicitly express which export format they want, which could conceivably be any one of the 3 formats (final only, final and intermediate, final and intermediate R1 style) depending on what they are planning on doing with it.

Should we abstract away R1 compatible in the case of a specific model, but expose it as an explicit R1 Compatible option if the user has selected one of the Download: XYZ as model?

Yup that makes sense.

Tuning selected mode: picking a R1 model forces R1 specific format

Download: Show all 3 options. "Reasoning (multi-message) - Learn intermediate thinking and final response", and "Reasoning (R1 format) - Learn intermediate thinking and final response"

Tuning selected mode: picking a R1 model forces R1 specific format

Download: Show all 3 options. "Reasoning (multi-message) - Learn intermediate thinking and final response", and "Reasoning (R1 format) - Learn intermediate thinking and final response"

Updated to do the above

scosman · 2025-04-25T20:04:38Z

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte

              bind:value={data_strategy}
            />
-            {#if data_strategy === "final_and_intermediate" && !selecting_thinking_dataset}
+            {#if (data_strategy === "final_and_intermediate" || data_strategy === "final_and_intermediate_r1_compatible") && !selecting_thinking_dataset}


I think we bump this to an error for R1 style models, and block continuing. Training one of those without thinking data will just be a disaster.

Added in this commit: defa1ec

scosman · 2025-04-25T20:13:06Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py


    def supports_cot(self) -> bool:
        return (
            self.thinking_instructions is not None
            and self.thinking is not None
            and self.thinking_final_answer_prompt is not None
+        ) or (
+            self.thinking_r1_style


I think this case becomes a raised error.

A normal model can be tuned for optional thinking, but a model that's natively thinking can't really (like we could make it blank, but it would be a disaster and more likely we're just hiding an important bug in training data). Your call if that happens here, or a if we still return bool here then raise a level up.

Also: we should make thinking_instructions are optional for R1 style models. They always think, regardless of instructions.

I see - so for example, if we have an R1 model but no thinking, supports_cot would return False, and the caller incorrectly assumes this is just a normal model that doesn't support thinking. It then trains it without any thinking - and no error pops up anywhere.

Minimal change then could be to simply return True if this is an R1 model:

def supports_cot(self) -> bool: return ( self.thinking_instructions is not None and self.thinking is not None and self.thinking_final_answer_prompt is not None ) or self.thinking_r1_style

And then raise an error downstream where we try to build it if the thinking is missing:

def serialize_r1_style_message(thinking: str | None, final_output: str): if thinking is None or len(thinking.strip()) == 0: raise ValueError("Thinking data is required for R1 style") return f"<think>{thinking}</think>{final_output}"

However, at that point supports_cot would be mixing two unrelated concepts - checking for the presence of valid thinking data vs. checking for model family - which feels misleading.

Possibly better then to do the following:

move the thinking validation into build_training_data (still keep the error in the R1 serialization error just as a last defense)

avoid calling supports_cot() at all for R1 models - and raise an error if it gets called on R1 data; and so explicitly move the burden of handling R1 branching to the caller

Pushed new changes to do what I suggested above:

build_training_data now clearly branches differently for R1 models and for COT; also raises an error if no thinking and makes thinking_instructions optional for R1

still have serialize_r1_style_message raise an exception if there is no thinking

supports_cot now cannot be called on R1 models; and so I clearly separated the two cases in each data formatter

Also wondering what we want to do with the thinking_instructions` in the R1 case:

Simply replace the thinking with the thinking_instructions? (i.e. <think>custom thinking instructions here</think>final output)

In the case where thinking is missing, should we throw, or should we still just replace thinking with the provided instruction?

Existing changes are good (first 3 bullets)

Note: thinking_instructions != thinking. thinking_instructions would be something like "Think step by step" so it should be at the end of the user message. I think the way you have it is good: we don't include thinking instructions in a fine-tune of a thinking model.

For COT it makes sense to fine-tune with it. The source data required the prompt to generate the COT, and we need a system message in COT tuning. Here we "think" automatically, so just learn from the thinking not the prompt.

scosman · 2025-04-25T20:23:18Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

@@ -82,6 +87,11 @@ def build_training_data(
    thinking_final_answer_prompt = None
    parent_task = task_run.parent_task()

+    include_cot = (


Somewhat duplicative of thinking_compatible_data_strategies

Define thinking_compatible_data_strategies where we define FinetuneDataStrategy?

Extracted this into a constant (THINKING_DATA_STRATEGIES)

scosman · 2025-04-25T20:26:38Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

    )


+def serialize_r1_style_message(thinking: str | None, final_output: str):
+    if thinking is None or len(thinking) == 0:


Maybe this is where we raise the error?

Raising a ValueError from here now if empty / missing thinking.

Also now raising one level up in build_training_data if the model is R1 to align with where other similar validation is being done for the COT case

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

scosman · 2025-04-25T20:41:56Z

app/desktop/studio_server/finetune_api.py

+    name: str,
+) -> list[FinetuneDataStrategy]:
+    r1_must_include = ["r1", "qwq"]
+    r1_must_not_include = ["distill"]


My mistake, but distill models should be treated like normal QwQ/R1 models. They have the same formatting.

Fixed now (it was my mistake; and I thought it was genius for a minute, then realized it wasn't)

Not to fight you for credit, but I texted you contains qwq or r1 and does not contain distill) 😆

app/desktop/studio_server/finetune_api.py

model selected is a file download

…ests

leonardmq · 2025-04-27T20:11:29Z

TODOs left:

more tests for build_training_data branches, the finetune creation endpoint
handle thinking_instructions for R1 models; it currently does nothing
UI fixes: will get back to that once the backend is finished
- Do not show explicit R1 label in data strategy dropdown; except for dataset downloads (instead of models)
- Fix awkwardness of changing model causing empty dropdown with selected option that does not exist

The tests for logic that determines which models support which data strategies (7c7c637) are being done against a local fixture list of fake models. But possibly should add some against the real built_in_models list of models as well

scosman

Some small changes, mostly strings. Let me know if you want example code for svelte one

scosman · 2025-04-29T16:51:19Z

app/desktop/studio_server/finetune_api.py

+    name: str,
+) -> list[FinetuneDataStrategy]:
+    r1_must_include = ["r1", "qwq"]
+    r1_must_not_include = ["distill"]


Not to fight you for credit, but I texted you contains qwq or r1 and does not contain distill) 😆

scosman · 2025-04-29T17:04:38Z

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte

-                  "Reasoning - Learn intermediate thinking and final response",
-                ],
-              ]}
+              select_options={data_strategy_select_options}


Still pending. Should make updating data_strategy_select_options always check data_strategy is still valid, and reset to first if not. Let me know if you want a code sample, svelte reactivity can be a bit strange.

scosman · 2025-04-29T17:14:08Z

app/desktop/studio_server/test_finetune_api.py

+    assert (
+        infer_data_strategies_for_model(mock_available_models, model_id, provider)
+        == expected_data_strategies
+    )


I'd add a parameterize test with expected real values from the API. Just on data_strategies_from_finetune_id so the parameter is short and sweet.

Expect R1:

qwq-32b

DeepSeek R1 (Fast) (deepseek-r1)

DeepSeek R1 (Basic) (deepseek-r1-basic)

deepseek-r1-distill-llama-70b

Deepseek R1 Distill Llama 8B (deepseek-r1-distill-llama-8b)

Deepseek R1 Distill Qwen 14B (deepseek-r1-distill-qwen-14b)

Deepseek R1 Distill Qwen 1.5B (deepseek-r1-distill-qwen-1p5b)

deepseek-r1-distill-qwen-32b

Deepseek R1 Distill Qwen 7B (deepseek-r1-distill-qwen-7b)

Some ones we expect false:

DeepSeek V3 (deepseek-v3)

Deepseek V3 03-24 (deepseek-v3-0324)

And some new Qwen 3 (TBD, see convo above):

Qwen 3, 30B-A3B version (qwen3-30b-a3b)

I’ve added a test (including the qwen3 case) in commit 86c8512

Note: the validation targets the model’s id, not its name. Checking the name would require fetching the name from Fireworks.ai, which would then require an async Pydantic validator (which does not seem standard / supported) and extra wiring to attach data strategies per model; or alternatively some caching of the Fireworks.ai response which may come with other problems.

Since we only need this for Fireworks.ai currently - and their IDs always include the necessary keywords - validating on id should hold ok enough until we move to the remote JSON model registry

scosman · 2025-04-29T17:16:34Z

app/desktop/studio_server/finetune_api.py

+def data_strategies_from_finetune_id(
+    provider_finetune_id: str,
+) -> list[FinetuneDataStrategy]:
+    r1_must_include = ["r1", "qwq"]


And Qwen 3 already requires a change 😀

We shoud add "qwen3", and qwen3 should return final_only, final_and_intermediate_r1_compatible (since it can do both thinking or non thinking with /think /no_think directives)

Let's maybe split this into another PR?

I added a pattern match in the data_strategies_from_finetune_id to target qwen3 and only allow ['final_and_intermediate_r1_compatible', 'final_only'] - just to be able to test for the allowed strategies in a parameterized test, but have not implemented any other logic for qwen3 besides that

scosman · 2025-04-29T17:41:35Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

+        if data_strategy == FinetuneDataStrategy.final_and_intermediate_r1_compatible:
+            if not task_run.has_thinking_training_data() or not thinking:
+                raise ValueError(
+                    "Thinking data is required for R1 style. Please save a reasoning or chain of thought output."


"Thinking data is required when fine-tuning thinking models (R1, QwQ, etc). Please ensure your fine-tuning dataset contains reasoning or chain of thought output for every entry."

scosman · 2025-04-29T17:46:46Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

    )


+def serialize_r1_style_message(thinking: str | None, final_output: str):
+    if thinking is None or len(thinking.strip()) == 0:
+        raise ValueError("Thinking data is required for R1 style")


"Thinking data is required when fine-tuning thinking models (R1, QwQ, etc). Please ensure your fine-tuning dataset contains reasoning or chain of thought output for every entry."

scosman · 2025-04-29T17:55:49Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py


    def supports_cot(self) -> bool:
        return (
            self.thinking_instructions is not None
            and self.thinking is not None
            and self.thinking_final_answer_prompt is not None
+        ) or (
+            self.thinking_r1_style


Existing changes are good (first 3 bullets)

Note: thinking_instructions != thinking. thinking_instructions would be something like "Think step by step" so it should be at the end of the user message. I think the way you have it is good: we don't include thinking instructions in a fine-tune of a thinking model.

For COT it makes sense to fine-tune with it. The source data required the prompt to generate the COT, and we need a system message in COT tuning. Here we "think" automatically, so just learn from the thinking not the prompt.

scosman · 2025-04-29T18:35:52Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

+    if thinking is None or len(thinking.strip()) == 0:
+        raise ValueError("Thinking data is required for R1 style")
+
+    return f"<think>{thinking}</think>{final_output}"


f"<think>\n{thinking}\n</think>\n\n{final_output}" seems more standard

Updated to add the newlines - also updated tests accordingly

scosman · 2025-04-29T18:47:01Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

+                        thinking=training_data.thinking,
+                        final_output=training_data.final_output,
+                    ),
+                    "tool_calls": tool_calls,


Is this valid? Should it be 2 assistant messages? I can look up but don't want to if you already have.

Was unsure as well - checked by making some calls to Fireworks and OpenAI:

it seems that the model always responds with either a toolcall or a message; not both simultaneously (even if SDK typing may allow for it); so this format is most probably wrong

seems like we can get a reasoning summary alongside a toolcall with OpenAI's reasoning models at least (see below for an example)

seems like Fireworks does not support toolcalls on the reasoning models they have

Trying to figure out what we get with LiteLLM now.

Here's an example request / response for 2.:

Endpoint: POST https://api.openai.com/v1/responses

Request:

{ "model": "o4-mini", "reasoning": { "effort": "low", "summary": "auto" }, "input": [ { "role": "developer", "content": "You are a helpful assistant." }, { "role": "user", "content": "If I move 100km outside of Taichung heading east, should I need an umbrella tomorrow?" } ], "tools": [ { "type": "function", "name": "get_weather", "description": "Get current temperature for a given location.", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and country e.g. Bogotá, Colombia" } }, "required": [ "location" ], "additionalProperties": false } } ] }

Response:

{ "id": "resp_xxx", "object": "response", "created_at": 1746039896, "status": "completed", "error": null, "incomplete_details": null, "instructions": null, "max_output_tokens": null, "model": "o4-mini-2025-04-16", "output": [ { "id": "rs_xxxxx", "type": "reasoning", "summary": [ { "type": "summary_text", "text": "**Determining weather for a travel location**\n\nThe user is moving 100km east of Taichung, which puts them in the Pacific Ocean. I wonder if I should check the nearest city for weather instead. Should I use coordinates or simply go with a city like Hualien, which is to the east of Taichung? Hualien is about 140km away, so maybe there’s a closer option like Yuli, but I’ll stick with Hualien for simplicity. I’ll check the weather there and suggest bringing an umbrella!" } ] }, { "id": "fc_xxx", "type": "function_call", "status": "completed", "arguments": "{\"location\":\"Hualien, Taiwan\"}", "call_id": "call_xxx", "name": "get_weather" } ], "parallel_tool_calls": true, "previous_response_id": null, "reasoning": { "effort": "low", "summary": "detailed" }, "service_tier": "default", "store": true, "temperature": 1.0, "text": { "format": { "type": "text" } }, "tool_choice": "auto", "tools": [ { "type": "function", "description": "Get current temperature for a given location.", "name": "get_weather", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "City and country e.g. Bogotá, Colombia" } }, "required": [ "location" ], "additionalProperties": false }, "strict": true } ], "top_p": 1.0, "truncation": "disabled", "usage": { "input_tokens": 83, "input_tokens_details": { "cached_tokens": 0 }, "output_tokens": 149, "output_tokens_details": { "reasoning_tokens": 128 }, "total_tokens": 232 }, "user": null, "metadata": {} }

Disabled the R1 compatible strategy for JSON toolcall downloads (also for Vertex); on both frontend and backend

scosman · 2025-04-29T18:47:38Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

+                    "content": serialize_r1_style_message(
+                        thinking=training_data.thinking,
+                        final_output=training_data.final_output,
+                    ),


same question

Disabled the R1 compatible strategy for JSON toolcall downloads (also for Vertex); on both frontend and backend

scosman · 2025-04-29T18:48:38Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

@@ -305,7 +421,28 @@ def generate_vertex_gemini(
        }
    ]

-    if training_data.supports_cot():
+    if training_data.thinking_r1_style:
+        contents.extend(


Maybe raise an error here? No gemini model supports R1 style thinking.

The thinking there was that it seems that Vertex AI supports custom models - never used it so not exactly sure if it works how I understand it, but for example these two suggest it is a thing:

https://medium.com/@development_52384/deepseek-on-vertex-ai-endpoints-f5ebaa0f60a3

https://github.com/GoogleCloudPlatform/generative-ai/blob/main/open-models/use-cases/vertex_ai_deepseek_smolagents.ipynb

That being said, we don't seem to support custom models deployed to Vertex at this point; so probably indeed best to just raise for now

scosman · 2025-04-29T18:52:46Z

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py

@@ -397,7 +527,7 @@ def dump_to_file(

        generator = FORMAT_GENERATORS[format_type]

-        include_cot = data_strategy == FinetuneDataStrategy.final_and_intermediate
+        include_cot = data_strategy in THINKING_DATA_STRATEGIES


We're only using this for title now? Maybe better titles would be:

final_and_intermediate_r1: "thinking"

final_and_intermediate: "chain_of_thought"

anything else: "final_only"

I think we can change the string values; I have not found places where this would be problematic - we use them in the frontend, but they are typed according to the API schema so will fail typecheck if not updated at the same time.

Also agree changing this might decrease confusion - as R1 vs COT are different concepts rather than just variants of formatting.

Would suggest making the change in a separate PR (I can do it)

scosman · 2025-05-01T12:57:56Z

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte

@@ -108,6 +108,11 @@
  }

  function build_available_model_select(models: FinetuneProvider[]) {
+    for (const model of models) {
+      for (const provider of model.models) {
+        console.log(model.id, provider.name, provider.id)


should remove

This was already removed - possibly not reviewing the latest commits?

was reviewing commit by commit since it's gotten large. Looks good now!

scosman · 2025-05-01T13:03:36Z

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte

@@ -580,7 +579,7 @@
      </div>
    {:else}
      <FormContainer
-        {submit_visible}
+        submit_visible={submit_visible && !config_is_error}


A tag based dataset might be correctly filtered to thinking, and not have selecting_thinking_dataset==true.

Let's leave this as a warning (you have that below), and not disable submit. We'll catch the issue at runtime.

Turned it back into a warning instead of error and no longer blocking submit

scosman · 2025-05-01T13:07:48Z

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte

@@ -502,6 +503,25 @@
        : "Reasoning - Learn intermediate thinking and final response",
    }

+    console.info({


scosman · 2025-05-01T13:12:05Z

app/web_ui/src/routes/(app)/fine_tune/[project_id]/[task_id]/create_finetune/+page.svelte

+            {#if data_strategy === "final_and_intermediate_r1_compatible" && !selecting_thinking_dataset}
+              <Warning
+                warning_color="error"
+                warning_message="You are training a model of the R1 family for inference-time thinking, but are not using a dataset filtered to samples with reasoning or chain-of-thought training data. This is strongly not recommended, as it may lead to poor performance. We suggest creating a new dataset with a thinking filter."


"You are training a "thinking" model, but did not explicitly select a dataset filtered to samples with reasoning or chain-of-thought training data. If any of your training samples are missing reasoning data, it will error. If your data contains reasoning, you can ignore this warning."

Updated the string (and turned back into a warning instead of error)

leonardmq added 2 commits April 23, 2025 14:43

wip: annotate todos

a0334cc

wip: fix qwq/r1 data strategy

db6cf83

leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from f1d23cc to db6cf83 Compare April 24, 2025 18:22

leonardmq commented Apr 24, 2025

View reviewed changes

libs/core/kiln_ai/adapters/provider_tools.py Outdated Show resolved Hide resolved

leonardmq marked this pull request as draft April 24, 2025 18:28

leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch 7 times, most recently from 66d5077 to 36f4f21 Compare April 25, 2025 10:39

leonardmq commented Apr 25, 2025

View reviewed changes

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py Outdated Show resolved Hide resolved

leonardmq commented Apr 25, 2025

View reviewed changes

libs/core/kiln_ai/adapters/fine_tune/dataset_formatter.py Outdated Show resolved Hide resolved

leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from 36f4f21 to bc6aa1e Compare April 25, 2025 10:42

leonardmq commented Apr 25, 2025

View reviewed changes

test: add cases and update existing tests

b35706b

leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from bc6aa1e to b35706b Compare April 25, 2025 10:46

leonardmq changed the title ~~WIP: fix: QwQ / R1 fine-tuning~~ WIP: fix: QwQ and R1 finetune format Apr 25, 2025

scosman requested changes Apr 25, 2025

View reviewed changes

leonardmq commented Apr 26, 2025

View reviewed changes

app/desktop/studio_server/finetune_api.py Outdated Show resolved Hide resolved

leonardmq added 7 commits April 26, 2025 16:48

fix: default values on finetune_model_provider pydantic model

096c6db

fix: expose all data strategies in UI dropdown when

e9b06a9

model selected is a file download

fix: dataset formatter duplicate final output message

26ee24a

fix: validation of data strategy, and small refactor and validation t…

7c7c637

…ests

chore: remove obsolete todo

7b14cc9

refactor: extract valid thinking data strategies into own constant

2415c94

fix: raise error in r1 serialization if none or empty thinking

be33978

leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from 8e290cc to be33978 Compare April 27, 2025 17:36

refactor: data formatter generators and fixes on COT vs R1

6a1f7d1

leonardmq force-pushed the leonard/fix-qwq-fine-tune-format branch from 199e2b1 to 6a1f7d1 Compare April 27, 2025 19:50

scosman requested changes Apr 29, 2025

View reviewed changes

leonardmq added 6 commits April 30, 2025 19:20

test: add tests for data_strategies_from_finetune_id and qwen3 match

86c8512

chore: replace error message strings

2848879

fix: formatter fix (newlines, and throw if R1 while vertex)

b25ea6c

fix: UI to select strategy, clean labels, clean switch default option

fca9792

ui: error and block submit when no thinking filter dataset for R1 model

defa1ec

fix: disable R1 data strategy for vertex download, toolcall downloads

1070910

scosman reviewed May 1, 2025

View reviewed changes

leonardmq added 3 commits May 1, 2025 21:13

chore: remove lingering console log

f45eb69

fix: string change for warning, and not block submit

ea59f62

fix: no thinking instructions on r1 model and update validation

fb123c8

leonardmq changed the title ~~WIP: fix: QwQ and R1 finetune format~~ fix: QwQ and R1 finetune format May 1, 2025

leonardmq marked this pull request as ready for review May 1, 2025 16:13

scosman merged commit 4d25808 into Kiln-AI:main May 1, 2025
2 checks passed

leonardmq mentioned this pull request May 1, 2025

[Bug] QwQ fine-tune dataset format #287

Closed

4 tasks

fix: QwQ and R1 finetune format #294

fix: QwQ and R1 finetune format #294

Uh oh!

Conversation

leonardmq commented Apr 24, 2025

What does this PR do?

Related Issues

Contributor License Agreement

Checklists

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leonardmq Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scosman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leonardmq Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leonardmq Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leonardmq Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

leonardmq Apr 25, 2025 •

edited

Loading

leonardmq Apr 27, 2025 •

edited

Loading

leonardmq Apr 27, 2025 •

edited

Loading

leonardmq Apr 27, 2025 •

edited

Loading

leonardmq commented Apr 27, 2025 •

edited

Loading