Instructions to use ai4colonoscopy/ColonR1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ai4colonoscopy/ColonR1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="ai4colonoscopy/ColonR1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("ai4colonoscopy/ColonR1")
model = AutoModelForImageTextToText.from_pretrained("ai4colonoscopy/ColonR1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use ai4colonoscopy/ColonR1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ai4colonoscopy/ColonR1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai4colonoscopy/ColonR1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/ai4colonoscopy/ColonR1

SGLang

How to use ai4colonoscopy/ColonR1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ai4colonoscopy/ColonR1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai4colonoscopy/ColonR1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ai4colonoscopy/ColonR1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ai4colonoscopy/ColonR1",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use ai4colonoscopy/ColonR1 with Docker Model Runner:
```
docker model run hf.co/ai4colonoscopy/ColonR1
```

The first R1-styled model (ColonR1) tailored for reasoning in colonoscopy tasks

📖 Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning

🏠 More details refer to our project page: https://github.com/ai4colonoscopy/Colon-X

Figure 1: Details of our colonoscopy-specific reasoning model, ColonR1.

Quick start

Below is a code snippet to help you quickly try out our ColonR1 model using Hugging Face Transformers. For convenience, we manually combined some configuration and code files. Please note that this is a quick code, we recommend you using a source code to explore more.

Before running the snippet, you need to install the following minimum dependencies.

conda create -n quickstart python=3.10
conda activate quickstart
pip install torch transformers accelerate pillow

Then you can use python ColonR1/quickstart.py to run it, as shown in the following code.

import torch
from transformers import AutoProcessor, Qwen2_5_VLForConditionalGeneration
from PIL import Image
import warnings
import os

warnings.filterwarnings('ignore')
device = "cuda" if torch.cuda.is_available() else "cpu"

MODEL_PATH = "ai4colonoscopy/ColonR1"
IMAGE_PATH = "assets/example.jpg"
Question = "Does the image contain a polyp? Answer me with Yes or No."

print(f"[Info] Loading model from {MODEL_PATH}...")
model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    MODEL_PATH,
    torch_dtype=torch.bfloat16,
    attn_implementation="flash_attention_2",
    device_map="auto"
)
model.eval()

processor = AutoProcessor.from_pretrained(MODEL_PATH)

if not os.path.exists(IMAGE_PATH):
    raise FileNotFoundError(f"Image not found at {IMAGE_PATH}. Please provide a valid image path.")

image = Image.open(IMAGE_PATH).convert("RGB")

TASK_SUFFIX = (
    "Your task: 1. First, Think through the question step by step, enclose your reasoning process "
    "in <think>...</think> tags. 2. Then provide the correct answer inside <answer>...</answer> tags. "
    "3. No extra information or text outside of these tags."
)

final_question = f"{Question}

{TASK_SUFFIX}"

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": IMAGE_PATH},
            {"type": "text", "text": final_question},
        ],
    }
]

print("[Info] Processing inputs...")
text_prompt = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    text=[text_prompt],
    images=[image],
    padding=True,
    return_tensors="pt",
).to(device)


print("[Info] Generating response...")
with torch.no_grad():
    generated_ids = model.generate(
        **inputs,
        max_new_tokens=1024,
        do_sample=False
    )

generated_ids_trimmed = generated_ids[:, inputs.input_ids.shape[1]:]
output_text = processor.batch_decode(generated_ids_trimmed, skip_special_tokens=True)[0]

print(output_text)
```

Reference

Feel free to cite if you find the Colon-X Project useful for your work:

@article{ji2025colonx,
    title={Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning},
    author={Ji, Ge-Peng and Liu, Jingyi and Fan, Deng-Ping and Barnes, Nick},
    journal={arXiv preprint arXiv:2512.03667},
    year={2025}
}

License

This project utilizes certain datasets and checkpoints that are subject to their respective original licenses. Users must comply with all terms and conditions of these original licenses. The content of this project itself is licensed under the Apache license 2.0.

Downloads last month: 123

Paper for ai4colonoscopy/ColonR1

Colon-X: Advancing Intelligent Colonoscopy from Multimodal Understanding to Clinical Reasoning

Paper • 2512.03667 • Published Dec 3, 2025 • 6