15 6 27

ManniX PRO

ManniX-ITA

https://github.com/mann1x

mann1x

AI & ML interests

None yet

Recent Activity

repliedto their post about 2 hours ago

🚀 Two releases this week pushing merge methodology forward. ▶ Qwen3.6-27B-Omnimerge-v4-MLP https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election). Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in layers 27–52. Fix: MLP-passthrough surgery — copy MLPs verbatim from base, keep merged attn + linear_attn. Leak → 0%. Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5): • HumanEval: 84.76% (= base, +5.49 pp vs v2) • MBPP corrected: 73.40% (+15.80 pp vs base, ≈ v2) • GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2) ▶ Qwen3.5-4B Importance-Signal Study (M1..M5) Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies. Q6_K HE / MBPP pass@1: • M1 Vanilla DARE-TIES → 51.22 / 47.00 • M2 OMv2 (no signal) → 52.44 / 49.40 • M3 OMv2 + Fisher → 57.93 🥇 / 48.80 • M4 mergekit ex-LRP (PR #682) → 51.22 / 49.40 • M5 OMv2 + LRP → 53.05 / 51.40 🥇 Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed 5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt. All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.

updated a model about 3 hours ago

ManniX-ITA/Qwen3.5-4B-M4-v2-ex-LRP-turbo

published a model about 4 hours ago

ManniX-ITA/Qwen3.5-4B-M4-v2-ex-LRP-turbo

View all activity

Organizations

None yet

replied to their post about 2 hours ago

🔄 Follow-up to the M1..M5 study above — re-ran M4 against the updated PR #682 head ("turbo" branch by @Tusm11, HEAD 8d989f6) with rebalanced hyperparams (equal weights w=1/1, density 0.7, closer to the PR's worked example). Same AttnLRP signal as M4-orig, same sources.

▶ Qwen3.5-4B-M4-v2-ex-LRP-turbo
ManniX-ITA/Qwen3.5-4B-M4-v2-ex-LRP-turbo

Q6_K HE / MBPP pass@1, M4-v2 inserted:

M1 Vanilla DARE-TIES → 51.22 / 47.00
M2 OMv2 (no signal) → 52.44 / 49.40
M3 OMv2 + Fisher → 57.93 🥇 / 48.80
M4 ex-LRP (PR #682 orig) → 51.22 / 49.40
M4-v2 ex-LRP (PR #682 turbo) → 55.49 / 52.20 🥇
M5 OMv2 + LRP → 53.05 / 51.40

Δ M4-v2 vs M4-orig: +4.27 pp HE, +2.80 pp MBPP. M4-v2 takes the MBPP medal of the whole study (overtakes M5) while staying competitive on HumanEval. The turbo code path + rebalanced hyperparams clearly beat the original PR head on this configuration.

Findings refresh: Fisher still leads HE; ex-LRP (turbo) now leads MBPP, narrowly ahead of OMv2+LRP. Both LRP variants land within 1 pp on MBPP — strong signal that LRP-driven sparsification is doing real work for code-gen on small Qwen merges.

Big thanks to @Tusm11 for the supercharged ex-LRP turbo head — multimodal support + Iron-Man stabilization + in-place math are a real upgrade. Posted full results + the 6 patches needed to run it against Qwen3_5ForConditionalGeneration on the PR thread:
https://github.com/arcee-ai/mergekit/pull/682

posted an update about 22 hours ago

Post

1017

🚀 Two releases this week pushing merge methodology forward.

▶ Qwen3.6-27B-Omnimerge-v4-MLP
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4

Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election).

Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in
layers 27–52. Fix: MLP-passthrough surgery — copy MLPs verbatim from base, keep merged attn + linear_attn. Leak → 0%.

Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5):
• HumanEval: 84.76% (= base, +5.49 pp vs v2)
• MBPP corrected: 73.40% (+15.80 pp vs base, ≈ v2)
• GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2)

▶ Qwen3.5-4B Importance-Signal Study (M1..M5)

Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies.

Q6_K HE / MBPP pass@1:
• M1 Vanilla DARE-TIES → 51.22 / 47.00
• M2 OMv2 (no signal) → 52.44 / 49.40
• M3 OMv2 + Fisher → 57.93 🥇 / 48.80
• M4 mergekit ex-LRP (PR #682) → 51.22 / 49.40
• M5 OMv2 + LRP → 53.05 / 51.40 🥇

Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed
5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt.

All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.

1 reply

posted an update 5 days ago

Post

176

Two custom releases — both unusual takes on common problems, on a single RTX 3090 + a vast.ai pod.

🔹 ManniX-ITA/Qwen3.5-27B-Omnimerge-v2

3-source weight-space merge over Qwen3.5-27B combining OBIM-lite magnitude masking + DAREx rescaling + EMR election (sign from consensus, amplitude from max-abs across sources). GPU-accelerated, ~35× over CPU.

Sources: Claude-4.6-Opus-distill (0.40), Esper3.1 code (0.35), Gemini-3.1-Pro-distill (0.25). density 0.53, DAREx q 0.75.

Q6_K vs best source:
• GPQA Diamond: 53.03 → 69.19 (+16.16 pp)
• MBPP pass@1: 71.20 → 74.60 (+3.40)
• HumanEval pass@1: 76.22 → 79.27 (+3.05)

vs Omnimerge v1 (vanilla DARE-TIES): +8.08 pp GPQA, +2.80 MBPP. Amplitude-from-max + sign-from-consensus is what unlocked the GPQA jump.

🔹 ManniX-ITA/gemma-4-A4B-98e-v3-it

Gemma 4 26B-A4B pruned 128 → 98 experts/layer (-23.4% MoE capacity, -5.2B params), zero GPQA degradation.

GPQA Diamond:
• 128e reference: 75.25%
• 98e v3 (this): 75.25% — +0.00 pp despite -23.4% capacity, -5.2B params
• 109e v3 (older): 71.72% — -3.53 pp

The win over 109e v3 came from changing the importance map: aggregate per-expert contribution across math/logic/code/science/creative via 128-token teacher-force, instead of GPQA-specific per-question top-16 (which overfitted). Result: more experts dropped, quality preserved.

Findings worth flagging:
• Experts NOT topic-specialized — 28/32 overlap math/creative top-32.
• Expert weight cosine ≈ 0.05 max → merging destroys the model. Dropping is the only viable structural compression here.
• Contribution Gini ≈ 0.38 → ~75 experts/layer carry 80% of signal.

Eval: lm-eval gpqa_diamond_cot_zeroshot, llama-server --reasoning-format deepseek --reasoning-budget 8192, Gemma 4 official sampling. Feedback welcome.

reacted to mlabonne's post with 🚀 4 months ago

Post

10324

New family of 1B models just dropped!

> LiquidAI/LFM2.5-1.2B-Base: 10T → 28T tokens
> LiquidAI/LFM2.5-1.2B-Instruct: new large-scale multi-stage RL
> LiquidAI/LFM2.5-1.2B-JP: our most polite model
> LiquidAI/LFM2.5-VL-1.6B: multi-image multilingual
> LiquidAI/LFM2.5-Audio-1.5B: 8x times faster, no quality loss

Super proud of this release 🤗

3 replies

reacted to hesamation's post with ❤️ 8 months ago

Post

13336

a senior engineer at google just dropped a 400-page free book on docs for review: agentic design patterns.

the table of contents looks like everything you need to know about agents + code:
> advanced prompt techniques
> multi-agent patterns
> tool use and MCP
> you name it

read it here: https://docs.google.com/document/d/1rsaK53T3Lg5KoGwvf8ukOUvbELRtH-V0LnOIFDxBryE/edit?tab=t.0#heading=h.pxcur8v2qagu

you can also pre-order on Amazon (published by Springer) and the royalties goes to Save the Children: https://www.amazon.com/Agentic-Design-Patterns-Hands-Intelligent/dp/3032014018/

reacted to ehristoforu's post with 👍 almost 2 years ago

Post

6415

🤗 Hello from the Project Fluently team!

🥏 We are ready to announce a new series of Supple Diffusion models, these are new generation diffusion models (about 1-2 weeks left before release).

🦾 The new series aims to take diffusion models to the next level, with performance and versatility as the main goal.

🧐 How will our models be better than others? Firstly, we worked on the CLIP models, now they understand your requests better, it will become easier to process. Secondly, we trained the models with high quality, even better than all our previous ones. Thirdly, you won’t have to keep 20 models on your disk; only 4-6 will be enough.

🗺️ Roadmap:
1. Create Supple Diffusion Small
2. Creating Supple Diffusion Medium
3. Create Supple Diffusion Large

🎆 Our models are universal for realism, and for cartoons, and for anime, and for caricatures.

💖 The project really needs your support and your recommendations and reviews, please do not hesitate to write comments under this post, thank you!

🖼️ Below are demo images made with the pre-release version of Supple Diffusion Small.

4 replies

reacted to leonardlin's post with 👍 almost 2 years ago

Post

2137

My weekened project ended up being doing some testing between torchtune, axolotl, and unsloth. I *think* it's a 1:1 comparison of what LoRA fine-tuning performance looks like between the different hardware I have in my dev boxes (4090, 3090, 7900 XTX, W7900) with a few other interesting tidbits.

Tonight I wrote up a WandB report (the panel editor is super broken in Firefox 😔) that sums up some of the more interesting bits from the results: https://wandb.ai/augmxnt/train-bench/reports/torchtune-vs-axolotl-vs-unsloth-Trainer-Comparison--Vmlldzo4MzU3NTAx

1 reply

reacted to mlabonne's post with ❤️ about 2 years ago

Post

9600

⚡ AutoQuant

AutoQuant is the evolution of my previous AutoGGUF notebook (https://colab.research.google.com/drive/1P646NEg33BZy4BfLDNpTz0V0lwIU3CHu). It allows you to quantize your models in five different formats:

- GGUF: perfect for inference on CPUs (and LM Studio)
- GPTQ/EXL2: fast inference on GPUs
- AWQ: super fast inference on GPUs with vLLM (https://github.com/vllm-project/vllm)
- HQQ: extreme quantization with decent 2-bit and 3-bit models

Once the model is converted, it automatically uploads it on the Hugging Face Hub. To quantize a 7B model, GGUF only needs a T4 GPU, while the other methods require an A100 GPU.

Here's an example of a model I quantized using HQQ and AutoQuant: mlabonne/AlphaMonarch-7B-2bit-HQQ

I hope you'll enjoy it and quantize lots of models! :)

💻 AutoQuant: https://colab.research.google.com/drive/1b6nqC7UZVt8bx4MksX7s656GXPM-eWw4

19 replies

ManniX PRO

AI & ML interests

Recent Activity

Organizations

ManniX-ITA's activity