Instructions to use ByteDance/XVerse with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use ByteDance/XVerse with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("ByteDance/XVerse", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| base_model: | |
| - black-forest-labs/FLUX.1-dev | |
| pipeline_tag: text-to-image | |
| tags: | |
| - LoRA | |
| - personalization | |
| - multi-subject | |
| # XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation | |
| This repository contains the official model of the paper [XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation](https://arxiv.org/abs/2506.21416). | |
| <p align="center"> | |
| <a href="https://arxiv.org/abs/2506.21416"> | |
| <img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-2506.21416-b31b1b.svg"> | |
| </a> | |
| <a href="https://bytedance.github.io/XVerse/"> | |
| <img alt="Project Page" src="https://img.shields.io/badge/Project-Page-blue"> | |
| </a> | |
| <a href="https://github.com/bytedance/XVerse"> | |
| <img alt="Github" src="https://img.shields.io/badge/GitHub-Code-darkgreen.svg?logo=github"> | |
| </a> | |
| <a href="https://huggingface.co/ByteDance/XVerse"> | |
| <img alt="Build" src="https://img.shields.io/badge/🤗-HF%20Model-yellow"> | |
| </a> | |
| </p> | |
|  | |
| ## Introduction | |
| **XVerse** introduces a novel approach to multi-subject image synthesis, offering **precise and independent control over individual subjects** without disrupting the overall image latents or features. We achieve this by transforming reference images into offsets for token-specific text-stream modulation. | |
| This innovation enables high-fidelity, editable image generation where you can robustly control both **individual subject characteristics** (identity) and their **semantic attributes**. XVerse significantly enhances capabilities for personalized and complex scene generation. | |
| ## How to Use | |
| see https://github.com/bytedance/XVerse | |
| Where to send questions or comments about the model: https://github.com/bytedance/XVerse/issues | |
| ## Citation | |
| If XVerse is helpful, please help to ⭐ the repo. | |
| If you find this project useful for your research, please consider citing our paper: | |
| ```bibtex | |
| @article{chen2025xverse, | |
| title={XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation}, | |
| author={Chen, Bowen and Zhao, Mengyi and Sun, Haomiao and Chen, Li and Wang, Xu and Du, Kang and Wu, Xinglong}, | |
| journal={arXiv preprint arXiv:2506.21416}, | |
| year={2025} | |
| } | |
| ``` |