Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

posted an update 2 days ago

Post

10130

We are excited to announce Sipp.sh: a high-performance library for running AI inference locally and in the cloud through a unified API.

We began to realize that an LLM isn't just a chat interface for information retrieval. It can be integrated directly into web, games, or productivity apps to handle continuous monitoring and decision-making. It can act as a sort of "second brain,” the silent hand that guides and helps a user without them even realizing it. We see this as the next frontier of UX design, but this is only possible if developers have access to low-cost, zero-latency compute and absolute data privacy.

That's why we created Sipp. It’s an opinionated library that lets developers integrate local AI into any application, giving them the superpowers to completely rethink user experiences across the web, games, and desktop.

To achieve this, we built an entirely new stack in Rust and C++, working alongside the llama.cpp project. Through our work, we were able to contribute back to that community to help upgrade the GGML WebGPU backend. This deep optimization is what enables our fast, responsive decode speeds directly in the browser. Sipp ships as a zero-dependency library for desktop and web, achieving 3x to 5x speedup in token decode compared to popular alternatives.

We are already seeing some incredible use cases emerge from this, from continuous monitoring using local vision to the dynamic generation of game elements in a real-time wizard vs. wizard game.

The best part? It's fully open-source!

We see this as the start of a dialogue about what the future of user interaction is going to look like, and we built Sipp to lay the foundation for that exciting future. Check out the live demos on our site, run your own benchmarks, or come hang out with us in our Discord.

Website: https://www.sipp.sh/
Github: https://github.com/noumena-labs/Sipp

1 reply

danielhanchen

posted an update 4 days ago

Post

2628

1-bit GLM-5.2 GGUF vs. Claude 4.8 Opus vs. GPT-5.5

We gave 3 models the same prompt and compared one-shot outputs.

The 1-bit GLM-5.2 GGUF ran locally on a Mac Studio M3 Ultra with 256GB RAM at ~21.6 tok/s.

Which output do you like best?
GGUF: unsloth/GLM-5.2-GGUF

3 replies

ST-x-Tony

posted an update 1 day ago

Post

1801

Hello AI Community! 👋

We are thrilled to announce the release of **NRS_QWEN_MYTHOS_1M**, a high-performance reasoning model built on the powerful **Qwen 3.5 9B** base. At **SKT AI LABS**, we’ve applied our proprietary **Neural Reasoning System (NRS)** to push the boundaries of what a 9B model can do.

🔥 **Why this model is a Game-Changer:**

✅ **100x High Reasoning Capacity:** Deep logical thinking and complex problem-solving via NRS Boosting.
✅ **1 Million Token Context:** Handle massive codebases, long documents, and multi-turn agentic tasks with ease (YaRN Scaling).
✅ **Advanced Thinking Mode:** Native tags for step-by-step Chain-of-Thought reasoning.
✅ **Tool-Use Ready:** Optimized for Python execution and Web Search with self-correction.
✅ **Blazing Fast:** Efficient 9B architecture that runs smoothly on consumer hardware (RTX 3090/4090).

🛠️ **Technical Highlights:**
* **Base:** Qwen 3.5 9B
* **Tuning:** NRS Specific Tuning high-quality samples.
* **License:** NRS DOCS
Whether you are a developer building coding agents, a researcher dealing with long-context data, or just someone who loves deep reasoning, this model is built for you.

👇 **Try it now on Hugging Face:**
SKT-NRS/NRS_QWEN_MYTHOS_1M

ST-x-Tony

posted an update 3 days ago

Post

10324

Hello everyone,

We are excited to share that SKT-NRS is now live on Hugging Face.
We’ve developed a Neural Reasoning System (NRS) designed to enhance the capabilities of foundation models — giving them stronger reasoning, improved performance, and more reliable outputs across a wide range of tasks.

Our goal is to bring meaningful quality improvements to both new and existing models. You’ll start seeing boosted versions of various models released here soon, each refined with our NRS approach.

**What to Expect* ❤️‍🩹

Regular releases of Neural Reasoning-enhanced models
Clear focus on better reasoning and overall model quality
Ongoing improvements based on community feedback

If you’d like to stay updated, feel free to follow this space — we’ll be posting the first boosted models very soon.

**Community Requests**

Have a specific model you’d like us to work on? Looking for improvements on an existing model, or have any other requests?
We’re happy to hear from you. Please share your suggestions here:

## Community Requests → SKT-NRS/README#1

**Thank you for your support! We look forward to building better models together.**

10 replies

ovi054

posted an update about 4 hours ago

Post

Qwen3-14B Manim Expert LoRA

For "Build Small Hackathon", I built a Gradio app that turns any concept into a Manim explainer video.

This is powered by Qwen3-14B + Manim LoRA I trained on a synthetic 10k dataset I generated.

👉 Try it now:
build-small-hackathon/anim-vid-ai

kanaria007

posted an update about 21 hours ago

Post

✅ Article highlight: Structural Abstraction Stack: From Raw Perception to Reusable Jumps (art-60-183, v0.1)

TL;DR:
This article argues that abstraction is not summary polish.

Once embodied systems parse, regulate, react, and act with receipts, they still need a way to learn reusable structure from real episodes. 183 defines that stack: extract invariant relation form, neutralize local semantics, preserve evaluative caution, and register only bounded jump anchors.

Read:
kanaria007/agi-structural-intelligence-protocols

Why it matters:
• prevents pattern learning from becoming a hidden heuristic library
• keeps abstractions downstream of parsed, receipted episodes
• preserves contradiction, missingness, fit limits, and failure modes
• separates structural abstraction from surface analogy
• makes reusable jumps bounded, reviewable, and revisable

What’s inside:
• candidate records from observation, reflex, actuation, posture, and failure traces
• structural abstraction records for invariant relation form
• semantic maps that keep source terms and provenance visible
• evaluative profiles for fit, non-fit, failure modes, and sandbox-first caution
• jump registration objects with thresholds, constraints, review hooks, and revision triggers
• rejection and reentry receipts for patterns that stay local, sandbox-only, quarantined, or blocked

Key idea:
Do not say:

“the system generalized from prior cases.”

Say:

“this pattern came from these parsed episodes, preserved this relation form, generalized these terms without erasing provenance, carried these fit and failure conditions, and registered only this bounded jump anchor.”

Abstraction is not a clever sentence.

It is governed reuse.

TravisMuhlestein

posted an update 1 day ago

Post

The conversation around AI agents is evolving.

We're moving beyond model capabilities and toward the infrastructure needed for agents to work together.

Over the past few weeks we've seen meaningful momentum around the foundational building blocks of the emerging agentic web.

Agent Name Service (ANS) is addressing identity and trust.
Agentic Resource Discovery (ARD) is helping standardize how agents discover resources and capabilities.

Together, these efforts represent something bigger than individual projects.

They point toward an ecosystem built on open, interoperable infrastructure rather than isolated implementations.

As builders, we'll likely spend the next few years solving challenges around identity, discovery, trust, interoperability, and governance—not just model performance.

It will be interesting to see how these efforts evolve—and where the community chooses to collaborate next.

Learn more:

🔗 Linux Foundation ANS: https://www.linuxfoundation.org/press/linux-foundation-announces-intent-to-launch-agent-name-service-to-establish-trusted-identity-infrastructure-for-ai-agents

🔗 Agentic Resource Discovery: https://developers.googleblog.com/announcing-the-agentic-resource-discovery-specification/

mmhamdy

posted an update 2 days ago

Post

128

It has been more than a decade now since the knowledge distillation paper came out.

Knowledge Distillation (KD) is one of my favorite topics, but I have to confess that I'm not a huge fan of the term because I find it confusing (or at least, it has became so over time).

The idea behind KD is not novel; it was there almost a decade before the paper came out (and arguably even a decade before that, back to 1990-91). But this paper is the one that clicked, the one that made the topic much more popular and introduced it to a broader audience.

First, the timing and the authors played a big role: we have Geoffrey Hinton, Oriol Vinyals, and Jeff Dean here. And second, Geoffrey Hinton is really good at idea branding: Model compression?! No, no, no! Let's call it "Knowledge Distillation" and use evocative terms such as "Dark Knowledge" to describe what is being transferred.

It's a great name, but as time has passed, the term became a bit of a relic. KD is no longer solely about compression (KD used to be introduced as a method for model compression, but now model compression is just one application of KD). And the other thing is that the word "distillation" implies some sort of potency here, that the student is somehow more powerful than the teacher, which is not the case (but many counterarguments could be made, for example, more powerful compared to another model trained with no teacher)

Nevertheless, the paper is incredibly well-written, short, and fun to read. It's one of few papers that I read several times. Check it out, and maybe share your thoughts on the topic with us here!

If you had to choose another name for Knowledge Distillation, what would it be?

2 replies

PeetPedro

posted an update 2 days ago

Post

106

hey, I'm doing some experimenting, looping around :slight_smile:
---
**kompress-v6** *shipped* — trained on Claude Code agent patterns (bash output, file reads, stack traces, search results, JSON tool responses). 3k synthetic pairs + 2k existing, fine-tuned from v4, $0.20 on vast.ai.

Results:
heretic exact_pct 0.962 (v4: 0.967),
keep_rate 0.854 (v4: 0.823),
override delta 0.
Model got more conservative — higher keep_rate on structured technical content.
Real proxy:
v4 compressed 9.5%,
v6 compressed 4.2% on the same session.
Less aggressive, fewer must-keep tokens dropped on paths and identifiers.

Interesting failure: self-labeling with v4+override collapsed mk_in_ref to 0.652.
TokenExpiredError splits into Token+Expired+Error — subtokens that don't individually match the must-keep regex, so the force-keep never fires. Generator references (mk_in_ref=1.0 by construction) ended up being better labels than v4's compressed output for agent data.
Fix for next run: slide a 2-3 subtoken window instead of checking individual subtokens. Would let self-labeling work on agent content and potentially produce a more compression-aggressive v7.

Models on HF:
- PeetPedro/kompress-v6
- PeetPedro/kompress-v4
- PeetPedro/kompress-v3
Write-up: https://pocoo.vaked.dev/posts/2026-06-25-kompress-v6-agent-distribution

ManniX-ITA

posted an update 2 days ago

Post

112

---
🚀 Gemma-4-A4B 98e v7-coder cohort — loop-fixed re-release. Two 20.8B MoE coders (4B-active), fresh-map prunes of Gemma 4 26B-A4B, 30/128 experts dropped per layer. The headline isn't a benchmark: the agentic loop is
gone at the weights, not papered over by the sampler.

🔧 How: at prune time we force-keep the 46 agentic_eog experts a loop-protection signal flags as load-bearing for clean multi-turn termination (+ shared-FFN α=1.2). Result: 0 loops across 48 seeds on every published
tier.

📊 Q6_K · llama.cpp · greedy · same host (from summary.json):

⚖️ v7-coder (fkbroad code3/lcb2) — balanced coder: LCB-med-55 98.18, HumanEval 98.17, HE+ 92.07, AIME 80.0, MATH-500 95.0, GSM8K 91, IFEval 92, MultiPL-E 89.7, ARC 92.2.

⚡ v7-coderx (code4/lcb3) — code-maximal: all-hard LCB-77 85.71 (cohort-best; 128e 79.22, v7-coder 84.42), HE+ 93.29, GSM8K 93, MATH-500 95.0, AIME 76.67. Whole budget on code.

🎯 Both land near GPQA ~51 — graduate science is the budget axis, neither is a science model. Pick v7-coder for the broad LCB-medium + HumanEval lead; v7-coderx for the all-hard slice and HE+.

🧪 The harness we used to prove the fix is now an omk tool: agentic-loop-harness replays a frozen agentic conversation across a sampler×seed matrix and reports a fail-rate per chat-template, so you can isolate a loop
to one variable. Model-agnostic — any OpenAI-compatible server. The version we shared with Google: google/gemma-4-12B-it#41

📦 Each ships bf16 · GGUF (+ CD-* + imatrix + mmproj vision) · NVFP4A16 (~13 GB) · Ollama.
🔗 ManniX-ITA/gemma-4-A4B-98e-v7-coder-it (+ -it-GGUF, -NVFP4A16) · https://ollama.com/mannix/gemma4-98e-v7-coder
🔗 ManniX-ITA/gemma-4-A4B-98e-v7-coderx-it (+ -it-GGUF, -NVFP4A16) · https://ollama.com/mannix/gemma4-98e-v7-coderx
🔧 https://github.com/mann1x/omnimergekit/tree/main/tools/agentic-loop-harness

Recently active users