15 6 27

ManniX PRO

ManniX-ITA

https://github.com/mann1x

mann1x

AI & ML interests

None yet

Recent Activity

repliedto their post 31 minutes ago

🚀 Two releases this week pushing merge methodology forward. ▶ Qwen3.6-27B-Omnimerge-v4-MLP https://huggingface.co/ManniX-ITA/Qwen3.6-27B-Omnimerge-v4 Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election). Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in layers 27–52. Fix: MLP-passthrough surgery — copy MLPs verbatim from base, keep merged attn + linear_attn. Leak → 0%. Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5): • HumanEval: 84.76% (= base, +5.49 pp vs v2) • MBPP corrected: 73.40% (+15.80 pp vs base, ≈ v2) • GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2) ▶ Qwen3.5-4B Importance-Signal Study (M1..M5) Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies. Q6_K HE / MBPP pass@1: • M1 Vanilla DARE-TIES → 51.22 / 47.00 • M2 OMv2 (no signal) → 52.44 / 49.40 • M3 OMv2 + Fisher → 57.93 🥇 / 48.80 • M4 mergekit ex-LRP (PR #682) → 51.22 / 49.40 • M5 OMv2 + LRP → 53.05 / 51.40 🥇 Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed 5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt. All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.

updated a model about 1 hour ago

ManniX-ITA/Qwen3.5-4B-M4-v2-ex-LRP-turbo

published a model about 2 hours ago

ManniX-ITA/Qwen3.5-4B-M4-v2-ex-LRP-turbo

View all activity

Organizations

None yet

Posts 2

Post

829

🚀 Two releases this week pushing merge methodology forward.

▶ Qwen3.6-27B-Omnimerge-v4-MLP
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4

Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election).

Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in
layers 27–52. Fix: MLP-passthrough surgery — copy MLPs verbatim from base, keep merged attn + linear_attn. Leak → 0%.

Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5):
• HumanEval: 84.76% (= base, +5.49 pp vs v2)
• MBPP corrected: 73.40% (+15.80 pp vs base, ≈ v2)
• GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2)

▶ Qwen3.5-4B Importance-Signal Study (M1..M5)

Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies.

Q6_K HE / MBPP pass@1:
• M1 Vanilla DARE-TIES → 51.22 / 47.00
• M2 OMv2 (no signal) → 52.44 / 49.40
• M3 OMv2 + Fisher → 57.93 🥇 / 48.80
• M4 mergekit ex-LRP (PR #682) → 51.22 / 49.40
• M5 OMv2 + LRP → 53.05 / 51.40 🥇

Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed
5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt.

All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.

Post

174

Two custom releases — both unusual takes on common problems, on a single RTX 3090 + a vast.ai pod.

🔹 ManniX-ITA/Qwen3.5-27B-Omnimerge-v2

3-source weight-space merge over Qwen3.5-27B combining OBIM-lite magnitude masking + DAREx rescaling + EMR election (sign from consensus, amplitude from max-abs across sources). GPU-accelerated, ~35× over CPU.

Sources: Claude-4.6-Opus-distill (0.40), Esper3.1 code (0.35), Gemini-3.1-Pro-distill (0.25). density 0.53, DAREx q 0.75.

Q6_K vs best source:
• GPQA Diamond: 53.03 → 69.19 (+16.16 pp)
• MBPP pass@1: 71.20 → 74.60 (+3.40)
• HumanEval pass@1: 76.22 → 79.27 (+3.05)

vs Omnimerge v1 (vanilla DARE-TIES): +8.08 pp GPQA, +2.80 MBPP. Amplitude-from-max + sign-from-consensus is what unlocked the GPQA jump.

🔹 ManniX-ITA/gemma-4-A4B-98e-v3-it

Gemma 4 26B-A4B pruned 128 → 98 experts/layer (-23.4% MoE capacity, -5.2B params), zero GPQA degradation.

GPQA Diamond:
• 128e reference: 75.25%
• 98e v3 (this): 75.25% — +0.00 pp despite -23.4% capacity, -5.2B params
• 109e v3 (older): 71.72% — -3.53 pp

The win over 109e v3 came from changing the importance map: aggregate per-expert contribution across math/logic/code/science/creative via 128-token teacher-force, instead of GPQA-specific per-question top-16 (which overfitted). Result: more experts dropped, quality preserved.

Findings worth flagging:
• Experts NOT topic-specialized — 28/32 overlap math/creative top-32.
• Expert weight cosine ≈ 0.05 max → merging destroys the model. Dropping is the only viable structural compression here.
• Contribution Gini ≈ 0.38 → ~75 experts/layer carry 80% of signal.

Eval: lm-eval gpqa_diamond_cot_zeroshot, llama-server --reasoning-format deepseek --reasoning-budget 8192, Gemma 4 official sampling. Feedback welcome.

View all Posts

models 27

datasets 1

ManniX-ITA/osync-code

Viewer • Updated Jan 12 • 1 • 9