Safetensors
qwen3_5

Model Summary

UnifiedReward-Flex-qwen35-4b is a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning!!

[2026/05/18] 🔥🔥 We updated the model weights and enhanced the training data to mitigate the position bias issue!! The model weights for other sizes will also be updated soon.

For further details, please refer to the following resources:

vLLM Server Deployment

export VLLM_DISABLE_FLASHINFER_GDN_PREFILL=1
export TOKENIZERS_PARALLELISM=false
vllm serve CodeGoat24/UnifiedReward-Flex-qwen35-4b \
 --host localhost \
 --port 8080 \
 --trust-remote-code \
 --served-model-name UnifiedReward \
 --gpu-memory-utilization 0.95 \
 --mm-encoder-tp-mode data \
 --mm-processor-cache-type shm \
 --enable-prefix-caching \
 --tensor-parallel-size 8 \
 --default-chat-template-kwargs '{"enable_thinking": false}'

The inference code is provided here.

Citation

@article{unifiedreward-flex,
  title={Unified Personalized Reward Model for Vision Generation},
  author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
  journal={arXiv preprint arXiv:2602.02380},
  year={2026}
}
Downloads last month
52
Safetensors
Model size
5B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CodeGoat24/UnifiedReward-Flex-qwen35-4b

Finetuned
Qwen/Qwen3.5-4B
Finetuned
(1)
this model
Quantizations
2 models

Dataset used to train CodeGoat24/UnifiedReward-Flex-qwen35-4b

Collection including CodeGoat24/UnifiedReward-Flex-qwen35-4b

Paper for CodeGoat24/UnifiedReward-Flex-qwen35-4b