Gemma 4 E4B — Opus Reasoning + Claude Code | GGUF

GGUF version of our Opus 4.6 reasoning model. Ollama ✅ LM Studio ✅ llama.cpp ✅ Reasoning baked in — no adapter needed.

Built by RavenX AI · GGUF converted from MLX source

What is this?

This is the GGUF version of gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-mlx-4bit — Gemma 4 E4B with Opus 4.6 reasoning and Claude Code LoRA fused directly into the weights.

No adapter needed, no extra config — just load and run with Claude-style <think> reasoning baked in.

Looking for the Apple Silicon MLX version? → MLX 4-bit model (optimized for Metal GPU)

Available Quantizations

Quantization	Size	Use case
`Q4_K_M`	2.7 GB	Recommended — best balance of quality and speed
`Q5_K_M`	3.1 GB	Higher quality, slightly more RAM
`Q8_0`	4.5 GB	Near-lossless, needs more RAM
`F16`	8.3 GB	Full precision GGUF

Sizes will be updated once conversion is complete.

🦙 Ollama — One Command

ollama run hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

With a custom system prompt

Create a Modelfile:

FROM hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

SYSTEM "You are a helpful assistant with tool-use capabilities. Think through problems step by step using <think> tags."

PARAMETER temperature 0.7
PARAMETER num_ctx 8192

ollama create ravenx-gemma4 -f Modelfile
ollama run ravenx-gemma4

OpenAI-compatible API

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF",
    "messages": [{"role": "user", "content": "Explain why RSA encryption is hard to break."}]
  }'

💻 LM Studio

Open LM Studio
Search for deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF
Download any quantization (Q4_K_M recommended)
Load and chat — reasoning is baked in

🔧 llama.cpp

CLI

llama-cli \
  -hf deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF \
  -p "Explain why RSA encryption is hard to break." \
  -n 1024

Server (OpenAI-compatible)

llama-server \
  -hf deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF \
  --port 8080

# Use with any OpenAI client
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

What's different from the base Gemma 4

	Base Gemma 4 E4B	This model
`<think>` tag reasoning	❌	✅ baked in
Claude-style structured answers	❌	✅
Tool-use patterns	❌	✅
Requires adapter	—	❌ no adapter needed
Vision support	✅	✅
Ollama / LM Studio / llama.cpp	✅	✅

🧪 Live Demos — Try It Now

Space	What to try
🔥 Agentic Tool Calling Demo	Live agentic loop — tool calling, `<think>` reasoning, calculator, web search
🐳 OpenClaw Sandbox Demo	OpenClaw-style orchestration, Docker runtime, sandbox/approval modes

🧩 Agent Stack Compatibility

This model is built to sit inside a real agent stack, not just a chat box.

Layer	Role
Gemma 4 E4B Opus Reasoning + Claude Code	Reasoning + tool-use baked into weights
Gemini CLI	Coding agent + tool orchestration
OpenHarness	Harness runtime, tool loop, swarm, hooks, memory
OpenClaw	Orchestration, sessions, skills, messaging
Hermes skill	Agent behavior for concise, terminal-first execution

→ Gemini CLI fork · TurboQuant-MLX · RavenX Inference Harness

How it was made

Training data

Source	Examples
Crownelius/Opus-4.6-Reasoning-2100x-formatted	2,054
Claude Code tool-use patterns	140 files
Total	2,163

Training

Base:      deadbydawn101/gemma-4-E4B-mlx-4bit
Method:    SFT completions-only (mlx_vlm.lora)
Rank:      8 · Alpha: 16 · LR: 1e-5 · Iters: 1,000
Hardware:  Apple M4 Max 128GB · Peak mem: 7.876 GB

Final loss: ~3.5e-7

Fusion + GGUF Conversion

All 378 LoRA pairs merged via weight arithmetic into base weights
De-quantized from MLX 4-bit to FP16
Converted to GGUF using llama.cpp/convert_hf_to_gguf.py
Quantized to multiple GGUF formats using llama-quantize

Related Models

Model	Format	Size	Notes
MLX 4-bit (source)	MLX	~10.5 GB	Apple Silicon optimized, Metal GPU
This model (GGUF)	GGUF	varies	Ollama, LM Studio, llama.cpp
Base model (4-bit)	MLX	4.86 GB	Base model (use with adapter)
LoRA adapter only	Safetensors	658 MB	Adapter-only
2B abliterated	MLX	3.34 GB	2B abliterated
21B MoE REAP	MLX	12 GB	21B MoE REAP

License

Gemma Terms of Use

Built with 🖤 by RavenX AI · TurboQuant-MLX · Gemini CLI

Downloads last month: 1,175

GGUF

Model size

8B params

Architecture

gemma4

Hardware compatibility

4-bit

5-bit

8-bit

16-bit

Model tree for deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

Base model

google/gemma-4-E4B-it

Quantized

deadbydawn101/gemma-4-E4B-mlx-4bit

Finetuned

deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-mlx-4bit