Skip to content
#

chainofthought

Here are 28 public repositories matching this topic...

Fine-tuned Qwen 3-1.7 B on the open-source PHYBench dataset using memory-light 4-bit LoRA adapters and Group Relative Policy Optimization with Chain-of-Thought prompts. A custom reward (boxed-answer format 0.2 | trace ≥ 6 lines 0.3 | SymPy-verified correctness 0.5) lifts mean reward from 0.19 to 0.54 in 550 Steps.

  • Updated May 18, 2025
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the chainofthought topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the chainofthought topic, visit your repo's landing page and select "manage topics."

Learn more