Post
829
š Two releases this week pushing merge methodology forward.
ā¶ Qwen3.6-27B-Omnimerge-v4-MLP
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4
Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election).
Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in
layers 27ā52. Fix: MLP-passthrough surgery ā copy MLPs verbatim from base, keep merged attn + linear_attn. Leak ā 0%.
Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5):
⢠HumanEval: 84.76% (= base, +5.49 pp vs v2)
⢠MBPP corrected: 73.40% (+15.80 pp vs base, ā v2)
⢠GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2)
ā¶ Qwen3.5-4B Importance-Signal Study (M1..M5)
Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies.
Q6_K HE / MBPP pass@1:
⢠M1 Vanilla DARE-TIES ā 51.22 / 47.00
⢠M2 OMv2 (no signal) ā 52.44 / 49.40
⢠M3 OMv2 + Fisher ā 57.93 š„ / 48.80
⢠M4 mergekit ex-LRP (PR #682) ā 51.22 / 49.40
⢠M5 OMv2 + LRP ā 53.05 / 51.40 š„
Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed
5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt.
All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.
ā¶ Qwen3.6-27B-Omnimerge-v4-MLP
ManniX-ITA/Qwen3.6-27B-Omnimerge-v4
Same-base DARE-TIES merge of Qwen3.6-27B + 3 fine-tunes (rico03 Claude distill, Esper3.1, kai-os Opus reasoning anchor) via my Omnimerge_v2 method (OBIM-lite + DAREx-q + EMR election).
Hit a Qwen3.6-specific fragility: hyperparams that work flawlessly on 3.5 produced 80% unclosed-<think> on 3.6, collapsing pass@1 to ~20%. Per-tensor delta forensics localized the failure to mlp.{gate,up,down}_proj in
layers 27ā52. Fix: MLP-passthrough surgery ā copy MLPs verbatim from base, keep merged attn + linear_attn. Leak ā 0%.
Q6_K results (vs Qwen3.6 base / vs Omnimerge-v2 on Qwen3.5):
⢠HumanEval: 84.76% (= base, +5.49 pp vs v2)
⢠MBPP corrected: 73.40% (+15.80 pp vs base, ā v2)
⢠GPQA Diamond: ~84.75% partial 192/198 (+15.5 pp vs v2)
ā¶ Qwen3.5-4B Importance-Signal Study (M1..M5)
Controlled 5-way comparison: same Qwen3.5-4B base, same 2 fine-tunes (Jackrong Claude-4.5 distill + Crow Opus-4.6 distill), only the importance signal driving DARE-TIES sparsification varies.
Q6_K HE / MBPP pass@1:
⢠M1 Vanilla DARE-TIES ā 51.22 / 47.00
⢠M2 OMv2 (no signal) ā 52.44 / 49.40
⢠M3 OMv2 + Fisher ā 57.93 š„ / 48.80
⢠M4 mergekit ex-LRP (PR #682) ā 51.22 / 49.40
⢠M5 OMv2 + LRP ā 53.05 / 51.40 š„
Findings: Fisher wins HE (+4.88 pp over vanilla), LRP wins MBPP (+2.60 pp). Both signals + Omnimerge_v2 recipe beat vanilla. To make multimodal-LM ex-LRP work end-to-end against Qwen3_5ForConditionalGeneration, I filed
5 patches against arcee-ai/mergekit PR #682 + 1 against rachtibat/lxt.
All five Mx checkpoints + Fisher/LRP signal safetensors + reproducer scripts published.