Text Generation
PEFT
Safetensors
English
qwen3
lora
unsloth
agent
tool-use
agentbench
alfworld
dbbench
conversational
File size: 2,811 Bytes
fd778b3
 
 
d928909
3327c76
fd778b3
 
 
 
 
 
 
d928909
 
fd778b3
 
d928909
fd778b3
 
 
 
d928909
fd778b3
d928909
 
fd778b3
 
 
 
 
 
d928909
fd778b3
d928909
 
 
 
 
 
 
 
 
 
 
 
 
 
8ada862
d928909
 
79fe4e9
d928909
 
 
348e1a8
bdf7c69
03ff6d6
d928909
5fd2d89
d928909
348e1a8
fd778b3
 
 
 
 
46a04e4
5fd2d89
8ada862
5fd2d89
d928909
 
fd778b3
 
 
 
 
 
 
 
 
4f4b30a
fd778b3
 
 
 
 
 
 
 
 
 
d928909
 
 
 
 
 
 
 
 
 
fd778b3
 
d928909
 
 
fd778b3
d928909
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- u-10bei/dbbench_sft_dataset_react_v4
- u-10bei/sft_alfworld_trajectory_dataset_v5
language:
- en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- peft
- unsloth
- agent
- tool-use
- agentbench
- alfworld
- dbbench
---

# qwen3-4b-agentbench-dbalf-lora

This repository provides a **LoRA adapter** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**
using **LoRA + Unsloth** for **AgentBench-style multi-turn agent trajectories**.

This repository contains **LoRA adapter weights only**.
The base model must be loaded separately.

## Training Objective

This adapter is trained to improve **multi-turn agent task performance** on:

- **DBBench** (database operation / SQL generation trajectories)
- **ALFWorld** (household task trajectories)

Loss is applied to **all assistant turns** in the trajectory, enabling the model to learn:

- environment observation
- action selection
- tool use / operation formatting
- recovery from intermediate errors

## Training Data

- DBBench dataset: `u-10bei/dbbench_sft_dataset_react_v4`
- ALFWorld dataset: `u-10bei/sft_alfworld_trajectory_dataset_v5`
- Mixing ratio (pre-merge target): **DB:ALF = 1:0**

### DB Oversampling (category-aware)
Enabled: **False**

DB category weights used during training-data preparation:

- counting: 1
- comparison: 1
- ranking: 1
- select: 1
- insert: 1
- update: 1
- other: 1

## Training Configuration

- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (full precision base)
- Max sequence length: 2048
- Epochs: 1.2
- Learning rate: 3e-06
- LoRA: r=32, alpha=64, dropout=0.0
- Per-device train batch size: 2
- Gradient accumulation: 4

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "AF0815/agentbench"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
```

## Notes

- This repository is intended for **adapter-only** distribution.
- Please ensure compliance with the **base model license/terms** in addition to this repository's license.
- If you publish evaluation results, it is recommended to report:
  - AgentBench task split / seeds
  - DBBench / ALFWorld mix ratio
  - DB oversampling settings
  - decoding settings

## Sources & Terms (IMPORTANT)

Training data:
- u-10bei/dbbench_sft_dataset_react_v4
- u-10bei/sft_alfworld_trajectory_dataset_v5

Dataset license / terms:
- Please follow the original license and terms of each dataset repository.
- This adapter repository license (**apache-2.0**) applies to the adapter files in this repository.