Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)

79fe4e9 verified 3 months ago

2.81 kB

	---
	base_model: Qwen/Qwen3-4B-Instruct-2507
	datasets:
	- u-10bei/dbbench_sft_dataset_react_v4
	- u-10bei/sft_alfworld_trajectory_dataset_v5
	language:
	- en
	license: apache-2.0
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- peft
	- unsloth
	- agent
	- tool-use
	- agentbench
	- alfworld
	- dbbench
	---

	# qwen3-4b-agentbench-dbalf-lora

	This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507
	using LoRA + Unsloth for AgentBench-style multi-turn agent trajectories.

	This repository contains LoRA adapter weights only.
	The base model must be loaded separately.

	## Training Objective

	This adapter is trained to improve multi-turn agent task performance on:

	- DBBench (database operation / SQL generation trajectories)
	- ALFWorld (household task trajectories)

	Loss is applied to all assistant turns in the trajectory, enabling the model to learn:

	- environment observation
	- action selection
	- tool use / operation formatting
	- recovery from intermediate errors

	## Training Data

	- DBBench dataset: `u-10bei/dbbench_sft_dataset_react_v4`
	- ALFWorld dataset: `u-10bei/sft_alfworld_trajectory_dataset_v5`
	- Mixing ratio (pre-merge target): DB:ALF = 1:0

	### DB Oversampling (category-aware)
	Enabled: False

	DB category weights used during training-data preparation:

	- counting: 1
	- comparison: 1
	- ranking: 1
	- select: 1
	- insert: 1
	- update: 1
	- other: 1

	## Training Configuration

	- Base model: Qwen/Qwen3-4B-Instruct-2507
	- Method: LoRA (full precision base)
	- Max sequence length: 2048
	- Epochs: 1.2
	- Learning rate: 3e-06
	- LoRA: r=32, alpha=64, dropout=0.0
	- Per-device train batch size: 2
	- Gradient accumulation: 4

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	base = "Qwen/Qwen3-4B-Instruct-2507"
	adapter = "AF0815/agentbench"

	tokenizer = AutoTokenizer.from_pretrained(base)
	model = AutoModelForCausalLM.from_pretrained(
	base,
	torch_dtype=torch.float16,
	device_map="auto",
	)
	model = PeftModel.from_pretrained(model, adapter)
	```

	## Notes

	- This repository is intended for adapter-only distribution.
	- Please ensure compliance with the base model license/terms in addition to this repository's license.
	- If you publish evaluation results, it is recommended to report:
	- AgentBench task split / seeds
	- DBBench / ALFWorld mix ratio
	- DB oversampling settings
	- decoding settings

	## Sources & Terms (IMPORTANT)

	Training data:
	- u-10bei/dbbench_sft_dataset_react_v4
	- u-10bei/sft_alfworld_trajectory_dataset_v5

	Dataset license / terms:
	- Please follow the original license and terms of each dataset repository.
	- This adapter repository license (apache-2.0) applies to the adapter files in this repository.