--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/dbbench_sft_dataset_react_v4 - u-10bei/sft_alfworld_trajectory_dataset_v5 language: - en license: apache-2.0 library_name: peft pipeline_tag: text-generation tags: - lora - peft - unsloth - agent - tool-use - agentbench - alfworld - dbbench --- # qwen3-4b-agentbench-dbalf-lora This repository provides a **LoRA adapter** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth** for **AgentBench-style multi-turn agent trajectories**. This repository contains **LoRA adapter weights only**. The base model must be loaded separately. ## Training Objective This adapter is trained to improve **multi-turn agent task performance** on: - **DBBench** (database operation / SQL generation trajectories) - **ALFWorld** (household task trajectories) Loss is applied to **all assistant turns** in the trajectory, enabling the model to learn: - environment observation - action selection - tool use / operation formatting - recovery from intermediate errors ## Training Data - DBBench dataset: `u-10bei/dbbench_sft_dataset_react_v4` - ALFWorld dataset: `u-10bei/sft_alfworld_trajectory_dataset_v5` - Mixing ratio (pre-merge target): **DB:ALF = 1:0** ### DB Oversampling (category-aware) Enabled: **False** DB category weights used during training-data preparation: - counting: 1 - comparison: 1 - ranking: 1 - select: 1 - insert: 1 - update: 1 - other: 1 ## Training Configuration - Base model: Qwen/Qwen3-4B-Instruct-2507 - Method: LoRA (full precision base) - Max sequence length: 2048 - Epochs: 1.2 - Learning rate: 3e-06 - LoRA: r=32, alpha=64, dropout=0.0 - Per-device train batch size: 2 - Gradient accumulation: 4 ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base = "Qwen/Qwen3-4B-Instruct-2507" adapter = "AF0815/agentbench" tokenizer = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.float16, device_map="auto", ) model = PeftModel.from_pretrained(model, adapter) ``` ## Notes - This repository is intended for **adapter-only** distribution. - Please ensure compliance with the **base model license/terms** in addition to this repository's license. - If you publish evaluation results, it is recommended to report: - AgentBench task split / seeds - DBBench / ALFWorld mix ratio - DB oversampling settings - decoding settings ## Sources & Terms (IMPORTANT) Training data: - u-10bei/dbbench_sft_dataset_react_v4 - u-10bei/sft_alfworld_trajectory_dataset_v5 Dataset license / terms: - Please follow the original license and terms of each dataset repository. - This adapter repository license (**apache-2.0**) applies to the adapter files in this repository.