Gemma 4 E4B — Opus Reasoning + Claude Code | GGUF

GGUF version of our Opus 4.6 reasoning model. Ollama ✅ LM Studio ✅ llama.cpp ✅ Reasoning baked in — no adapter needed.

Built by RavenX AI · GGUF converted from MLX source

Ollama LM Studio llama.cpp License


What is this?

This is the GGUF version of gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-mlx-4bit — Gemma 4 E4B with Opus 4.6 reasoning and Claude Code LoRA fused directly into the weights.

No adapter needed, no extra config — just load and run with Claude-style <think> reasoning baked in.

Looking for the Apple Silicon MLX version?MLX 4-bit model (optimized for Metal GPU)


Available Quantizations

Quantization Size Use case
Q4_K_M 2.7 GB Recommended — best balance of quality and speed
Q5_K_M 3.1 GB Higher quality, slightly more RAM
Q8_0 4.5 GB Near-lossless, needs more RAM
F16 8.3 GB Full precision GGUF

Sizes will be updated once conversion is complete.


🦙 Ollama — One Command

ollama run hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

With a custom system prompt

Create a Modelfile:

FROM hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

SYSTEM "You are a helpful assistant with tool-use capabilities. Think through problems step by step using <think> tags."

PARAMETER temperature 0.7
PARAMETER num_ctx 8192
ollama create ravenx-gemma4 -f Modelfile
ollama run ravenx-gemma4

OpenAI-compatible API

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "hf.co/deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF",
    "messages": [{"role": "user", "content": "Explain why RSA encryption is hard to break."}]
  }'

💻 LM Studio

  1. Open LM Studio
  2. Search for deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF
  3. Download any quantization (Q4_K_M recommended)
  4. Load and chat — reasoning is baked in

🔧 llama.cpp

CLI

llama-cli \
  -hf deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF \
  -p "Explain why RSA encryption is hard to break." \
  -n 1024

Server (OpenAI-compatible)

llama-server \
  -hf deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF \
  --port 8080

# Use with any OpenAI client
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

What's different from the base Gemma 4

Base Gemma 4 E4B This model
<think> tag reasoning ✅ baked in
Claude-style structured answers
Tool-use patterns
Requires adapter ❌ no adapter needed
Vision support
Ollama / LM Studio / llama.cpp

🧪 Live Demos — Try It Now

Space What to try
🔥 Agentic Tool Calling Demo Live agentic loop — tool calling, <think> reasoning, calculator, web search
🐳 OpenClaw Sandbox Demo OpenClaw-style orchestration, Docker runtime, sandbox/approval modes

🧩 Agent Stack Compatibility

This model is built to sit inside a real agent stack, not just a chat box.

Layer Role
Gemma 4 E4B Opus Reasoning + Claude Code Reasoning + tool-use baked into weights
Gemini CLI Coding agent + tool orchestration
OpenHarness Harness runtime, tool loop, swarm, hooks, memory
OpenClaw Orchestration, sessions, skills, messaging
Hermes skill Agent behavior for concise, terminal-first execution

Gemini CLI fork · TurboQuant-MLX · RavenX Inference Harness


How it was made

Training data

Source Examples
Crownelius/Opus-4.6-Reasoning-2100x-formatted 2,054
Claude Code tool-use patterns 140 files
Total 2,163

Training

Base:      deadbydawn101/gemma-4-E4B-mlx-4bit
Method:    SFT completions-only (mlx_vlm.lora)
Rank:      8 · Alpha: 16 · LR: 1e-5 · Iters: 1,000
Hardware:  Apple M4 Max 128GB · Peak mem: 7.876 GB

Final loss: ~3.5e-7

Fusion + GGUF Conversion

  1. All 378 LoRA pairs merged via weight arithmetic into base weights
  2. De-quantized from MLX 4-bit to FP16
  3. Converted to GGUF using llama.cpp/convert_hf_to_gguf.py
  4. Quantized to multiple GGUF formats using llama-quantize

Related Models

Model Format Size Notes
MLX 4-bit (source) MLX ~10.5 GB Apple Silicon optimized, Metal GPU
This model (GGUF) GGUF varies Ollama, LM Studio, llama.cpp
Base model (4-bit) MLX 4.86 GB Base model (use with adapter)
LoRA adapter only Safetensors 658 MB Adapter-only
2B abliterated MLX 3.34 GB 2B abliterated
21B MoE REAP MLX 12 GB 21B MoE REAP

License

Gemma Terms of Use


Built with 🖤 by RavenX AI · TurboQuant-MLX · Gemini CLI
Downloads last month
1,175
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF

Dataset used to train deadbydawn101/gemma-4-E4B-Agentic-Opus-Reasoning-GeminiCLI-GGUF