Skywork-R1V4

1. Model Introduction

Skywork-R1V4 is a 30B (A3B) multimodal agent that unifies:

Multimodal task planning
Active image manipulation (“thinking with images”)
Deep multimodal search (text × image)
Interleaved tool-grounded reasoning

Skywork-R1V4 is trained purely via supervised finetuning on < 30k high-quality, execution-consistent trajectories.

At inference time, the model exhibits emergent long-horizon reasoning, executing 10+ tool calls across visual operations and web search to solve complex real-world tasks.

Skywork-R1V4 achieves state-of-the-art performance on multimodal search benchmarks:

MMSearch: 66.1
FVQA: 67.2
Beats Gemini 2.5 Flash on all 11 comparable metrics

2. Feature

🔍 “Thinking With Images”

Skywork-R1V4 actively manipulates images through:

    • Multi-stage cropping
    • Local detail extraction
    • Region attention
    • Visual clue refinement

🔄 Interleaved Reasoning

The model alternates between:

    1. Visual reasoning
    2. Image operation
    3. Web search
    4. Cross-evidence verification

3. Links

Model Center: https://platform.skyworkmodel.ai/#/model-center
API Documentation (R1V4): https://docs.skyworkmodel.ai/r1v4/api-reference/completions.html

4. Citation

If you use Skywork-R1V4 in your research, please cite:

@misc{zhang2025skyworkr1v4agenticmultimodalintelligence,
      title={Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch}, 
      author={Yifan Zhang and Liang Hu and Haofeng Sun and Peiyu Wang and Yichen Wei and Shukang Yin and Jiangbo Pei and Wei Shen and Peng Xia and Yi Peng and Tianyidan Xie and Eric Li and Yang Liu and Xuchen Song and Yahui Zhou},
      year={2025},
      eprint={2512.02395},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.02395}, 
}

@misc{peng2025skyworkr1vpioneeringmultimodal,
      title={Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought}, 
      author={Yi Peng and Peiyu Wang and Xiaokun Wang and Yichen Wei and Jiangbo Pei and Weijie Qiu and Ai Jian and Yunzhuo Hao and Jiachun Pan and Tianyidan Xie and Li Ge and Rongxian Zhuang and Xuchen Song and Yang Liu and Yahui Zhou},
      year={2025},
      eprint={2504.05599},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.05599}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Skywork/R1V4

Skywork-R1V4

Collection

Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch • 4 items • Updated Dec 9, 2025 • 7

Papers for Skywork/R1V4

Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch

Paper • 2512.02395 • Published Dec 2, 2025 • 51

Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought

Paper • 2504.05599 • Published Apr 8, 2025 • 86