Qwulu 3: Fine-Tuning Qwen3 Base with LoRA and TULU 3's Supervised Fine-Tuning Recipe

In a previous article, we explored how to apply a state-of-the-art recipe for supervised fine-tuning (SFT), following the approach proposed by AI2 to train the TULU 3 models.

However, instead of performing full fine-tuning, we opted for LoRA (Low-Rank Adaptation), significantly reducing GPU requirements, from clusters with H100 nodes to a single RTX 4090.

The original TULU 3 recipe was designed, tested, and evaluated specifically for Llama 3.1 models, which are now nearly a year old. Since then, base language models have significantly advanced. For example, Qwen3 8B Base clearly outperforms Llama 3.1 8B in multiple benchmarks.

Can we effectively apply the same TULU 3 recipe, using the same datasets and hyperparameters, to post-train Qwen3 Base models?

In other words, can we train a TULU-style Qwen3 model? Let’s call this variant Qwulu 3.

Before starting this work, my intuition was that the approach should transfer well. The datasets used to train TULU 3 are of high quality, and a stronger base model like Qwen3 should be able to learn from them even more effectively. The assumption about reusing the same hyperparameters is bolder: while Qwen3 and Llama 3.1 have similar “Llama-like” architectures, it wasn’t obvious that hyperparameters would transfer cleanly between them, especially across implementations with subtle differences.

In this article, we put these assumptions to the test. We’ll walk through how to train Qwen3 Base models using the TULU 3 supervised fine-tuning recipe, again leveraging LoRA and a single 24GB GPU (RTX 4090). We’ll analyze the model’s learning curves and observe how well Qwen3 adapts. As you’ll see, Qwen3 trains effectively under this setup and significantly closes the performance gap between Qwen3 8B Base and the fully post-trained, official Qwen3 8B.

The SFT recipe applied to Qwen3 can be tested in this notebook:

Get the notebook (#169)

In a follow-up article, we’ll explore whether these results can be further improved using reinforcement learning, specifically with GRPO, again using only LoRA. The goal is to determine whether we can approach the performance of the official Qwen3 8B model, confirming that high-quality, cost-effective custom post-training of Qwen3 is possible even with limited resources.

Fine-Tuning Qwen3 Base with the TULU 3 Recipe

The Kaitchup – AI on a Budget

Teds Woodworking Review

Leave a Reply Cancel reply

Fine-Tuning Qwen3 Base with the TULU 3 Recipe

Author: admin

Leave a Reply Cancel reply