PP-DocLayoutV3 ONNX for PaddleOCR-VL-ROCm
This repository hosts the verified PP-DocLayoutV3 ONNX layout model used by the open-source project AIwork4me/PaddleOCR-VL-ROCm.
中文说明
本仓库提供已经验证过的 PP-DocLayoutV3-onnx 模型文件,供 PaddleOCR-VL-ROCm 直接下载使用。
用户不需要再安装 Paddle、Paddle2ONNX,也不需要自己从 Paddle 模型导出 ONNX。克隆开源项目后,只需运行下载脚本即可准备 layout 模型。
Files
inference.onnx: PP-DocLayoutV3 ONNX layout detection model.inference.yml: model configuration used by the ONNXRuntime pipeline.
Verified checksums:
| File | SHA256 |
|---|---|
inference.onnx |
BC307C102A52A10EEDF20F36A03DF384B8EB2224BEB2E5E716C581901A8F0B61 |
inference.yml |
506FCFAC13B3B546AE40D7886B44126420F392ADB694E3F8BB6A6286A1F90FDC |
Open-Source Project
Recommended runtime project:
https://github.com/AIwork4me/PaddleOCR-VL-ROCm
PaddleOCR-VL-ROCm is a lightweight No-Paddle inference implementation for PaddleOCR-VL-style document parsing:
- Layout detection runs with ONNXRuntime and this
PP-DocLayoutV3-onnxmodel. - Visual language recognition is served by an OpenAI-compatible ROCm endpoint, such as vLLM or llama.cpp server.
- The project exposes both CLI and Python APIs.
- Outputs are saved as PaddleOCR-VL-style JSON and Markdown.
- The code repository is open source and uses the MIT license.
Why This Helps Users
This model repository removes the most painful setup step for users.
Before this model repository, users often had to:
- Install Paddle or PaddleX dependencies.
- Install and configure Paddle2ONNX.
- Export PP-DocLayoutV3 by themselves.
- Debug model file names, model config files, and ONNXRuntime input compatibility.
With this repository, users can directly download the verified ONNX model used by PaddleOCR-VL-ROCm:
pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.py
The script downloads from this Hugging Face repository by default and prepares:
models/PP-DocLayoutV3-onnx/
inference.onnx
inference.yml
This gives users a simpler path:
- No PaddlePaddle runtime is required for inference.
- No Paddle2ONNX conversion is required.
- No large model files are stored in the GitHub repo.
- The same verified model artifact is shared by all users.
- The GitHub repo stays small, clean, and easy to clone.
- ROCm acceleration can be handled by the VLM server while layout remains portable through ONNXRuntime.
Validation Result
The ONNXRuntime layout path used by PaddleOCR-VL-ROCm has been validated against the Paddle native pipeline on 1355 images.
| Item | Result |
|---|---|
| Full-run success | 1355 / 1355 |
| Payload alignment | 1355 / 1355 |
| Layout, crop, request order, request payload | Strictly aligned |
This means the open-source runtime can use this ONNX layout model as a practical replacement for the Paddle layout stage in the validated inference path.
Quick Start With PaddleOCR-VL-ROCm
git clone https://github.com/AIwork4me/PaddleOCR-VL-ROCm.git
cd PaddleOCR-VL-ROCm
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .[download]
python scripts/download_ppdoclayoutv3_onnx.py
Then run inference with your OpenAI-compatible ROCm VLM endpoint:
paddleocr-vl-rocm `
--input examples/input/handwrite_ch_demo.png `
--output outputs/smoke `
--layout-model models/PP-DocLayoutV3-onnx `
--server-url http://127.0.0.1:8000/v1 `
--api-model-name PaddleOCR-VL-1.5-0.9B `
--vlm-backend vllm-server
Expected output files:
outputs/smoke/handwrite_ch_demo_res.json
outputs/smoke/handwrite_ch_demo.md
Python API Example
from paddleocr_vl_rocm import PaddleOCRVLROCm
pipeline = PaddleOCRVLROCm(
layout_model_dir="models/PP-DocLayoutV3-onnx",
vlm_server_url="http://127.0.0.1:8000/v1",
api_model_name="PaddleOCR-VL-1.5-0.9B",
)
result = pipeline.predict("examples/input/handwrite_ch_demo.png")
result.save_to_json("outputs")
result.save_to_markdown("outputs", pretty=False)
Scope
This repository only contains the layout model files for the ONNXRuntime stage. It does not include PaddleOCR-VL VLM weights. For the complete inference pipeline, use AIwork4me/PaddleOCR-VL-ROCm together with a ROCm-backed OpenAI-compatible VLM service.
中文摘要
这个 Hugging Face 仓库的作用是给 PaddleOCR-VL-ROCm 提供可直接下载的、已验证的 PP-DocLayoutV3-onnx layout 模型。用户克隆 GitHub 项目后,只需要运行下载脚本即可准备模型,不需要安装 Paddle2ONNX,也不需要自己转换模型。
开源项目地址:AIwork4me/PaddleOCR-VL-ROCm
主要好处:
- 降低安装门槛。
- 避免 Paddle2ONNX 转换差异。
- GitHub 仓库保持轻量,不提交大模型。
- ONNXRuntime 负责 layout,ROCm/vLLM 或 llama.cpp 负责 VLM 推理。
- 已在 1355 张图片上完成验证,full-run success 和 payload alignment 均为
1355 / 1355。