CodeGoat24/UnifiedReward-Flex-SFT-90K
Viewer • Updated • 1.39M • 567 • 3
UnifiedReward-Flex-qwen35-4b is a unified personalized reward model for vision generation that couples reward modeling with flexible and context-adaptive reasoning!!
[2026/05/18] 🔥🔥 We updated the model weights and enhanced the training data to mitigate the position bias issue!! The model weights for other sizes will also be updated soon.
For further details, please refer to the following resources:
export VLLM_DISABLE_FLASHINFER_GDN_PREFILL=1
export TOKENIZERS_PARALLELISM=false
vllm serve CodeGoat24/UnifiedReward-Flex-qwen35-4b \
--host localhost \
--port 8080 \
--trust-remote-code \
--served-model-name UnifiedReward \
--gpu-memory-utilization 0.95 \
--mm-encoder-tp-mode data \
--mm-processor-cache-type shm \
--enable-prefix-caching \
--tensor-parallel-size 8 \
--default-chat-template-kwargs '{"enable_thinking": false}'
The inference code is provided here.
@article{unifiedreward-flex,
title={Unified Personalized Reward Model for Vision Generation},
author={Wang, Yibin and Zang, Yuhang and Han, Feng and Bu, Jiazi and Zhou, Yujie and Jin, Cheng and Wang, Jiaqi},
journal={arXiv preprint arXiv:2602.02380},
year={2026}
}
Base model
Qwen/Qwen3.5-4B-Base