Instructions to use PYY2001/BizGen with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use PYY2001/BizGen with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("PYY2001/BizGen", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
| license: apache-2.0 | |
| language: | |
| - en | |
| pipeline_tag: text-to-image | |
| # BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation (Glyph-ByT5-v3) | |
| <a href="https://arxiv.org/abs/2503.20672"><img src="https://img.shields.io/badge/Paper-arXiv-red?style=for-the-badge" height=22.5></a> | |
| <a href="https://github.com/1230young/bizgen"><img src="https://img.shields.io/badge/Gihub-Code-succees?style=for-the-badge&logo=GitHub" height=22.5></a> | |
| <a href="https://bizgen-msra.github.io"><img src="https://img.shields.io/badge/Project-Page-blue?style=for-the-badge" height=22.5></a> | |
| <table> | |
| <tr> | |
| <td><img src="assets/teaser_info.png" alt="teaser example 0" width="1200"/></td> | |
| </tr> | |
| <tr> | |
| <td><img src="assets/teaser_slide.png" alt="teaser example 1" width="1200"/></td> | |
| </tr> | |
| </table> | |
| ## Abstract | |
| <p> | |
| Recently, state-of-the-art text-to-image generation models, such as Flux and Ideogram 2.0, have made | |
| significant progress in sentence-level visual text rendering. In this paper, we focus on the more | |
| challenging scenarios of article-level visual text rendering and address a novel task of generating | |
| high-quality business content, including infographics and slides, based on user provided article-level | |
| descriptive prompts and ultra-dense layouts. The fundamental challenges are twofold: significantly | |
| longer context lengths and the scarcity of high-quality business content data. | |
| </p> | |
| <p> | |
| In contrast to most previous works that focus on a limited number of sub-regions and sentence-level | |
| prompts, ensuring precise adherence to ultra-dense layouts with tens or even hundreds of sub-regions in | |
| business content is far more challenging. We make two key technical contributions: (i) the construction | |
| of scalable, high-quality business content dataset, i.e., Infographics-650K, equipped with | |
| ultra-dense layouts and prompts by implementing a layer-wise retrieval-augmented infographic generation | |
| scheme; and (ii) a layout-guided cross attention scheme, which injects tens of region-wise prompts into | |
| a set of cropped region latent space according to the ultra-dense layouts, and refine each sub-regions | |
| flexibly during inference using a layout conditional CFG. | |
| </p> | |
| <p> | |
| We demonstrate the strong results of our system compared to previous SOTA systems such as Flux and SD3 | |
| on our BizEval prompt set. Additionally, we conduct thorough ablation experiments to verify the | |
| effectiveness of each component. We hope our constructed Infographics-650K and BizEval can encourage | |
| the broader community to advance the progress of business content generation. | |
| </p> | |
| ## Model Description | |
| The ByT5 model is finetuned from [Glyph-ByT5-v2](https://arxiv.org/abs/2406.10208), which supports accurate visual text rendering in ten different languages. | |
| The [SPO](https://huggingface.co/SPO-Diffusion-Models) model is a substitute for the original sdxl-base-1.0 for aesthetic improvement. The [lora/infographic](https://huggingface.co/PYY2001/BizGen/tree/main/lora/infographic) and [lora/slides](https://huggingface.co/PYY2001/BizGen/tree/main/lora/slides) are respectively tuned on our infographics and slides datasets. | |
| You can follow our [github](https://github.com/1230young/bizgen) to organize and run the model. | |
| ## Citation | |
| If you find our work or codebase useful, please consider giving us a star and citing our work. | |
| ``` | |
| @misc{peng2025bizgenadvancingarticlelevelvisual, | |
| title={BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation}, | |
| author={Yuyang Peng and Shishi Xiao and Keming Wu and Qisheng Liao and Bohan Chen and Kevin Lin and Danqing Huang and Ji Li and Yuhui Yuan}, | |
| year={2025}, | |
| eprint={2503.20672}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CV}, | |
| url={https://arxiv.org/abs/2503.20672}, | |
| } | |
| ``` | |
| ``` | |
| @article{liu2024glyphv2, | |
| title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering}, | |
| author={Liu, Zeyu and Liang, Weicong and Zhao, Yiming and Chen, Bohan and Li, Ji and Yuan, Yuhui}, | |
| journal={arXiv preprint arXiv:2406.10208}, | |
| year={2024} | |
| } | |
| ``` | |
| ``` | |
| @article{liu2024glyph, | |
| title={Glyph-byt5: A customized text encoder for accurate visual text rendering}, | |
| author={Liu, Zeyu and Liang, Weicong and Liang, Zhanhao and Luo, Chong and Li, Ji and Huang, Gao and Yuan, Yuhui}, | |
| journal={arXiv preprint arXiv:2403.09622}, | |
| year={2024} | |
| } | |
| ``` |