π Follow-up to the M1..M5 study above β re-ran M4 against the updated PR #682 head ("turbo" branch by @Tusm11, HEAD 8d989f6) with rebalanced hyperparams (equal weights w=1/1, density 0.7, closer to the PR's worked example). Same AttnLRP signal as M4-orig, same sources.
βΆ Qwen3.5-4B-M4-v2-ex-LRP-turbo
ManniX-ITA/Qwen3.5-4B-M4-v2-ex-LRP-turbo
Q6_K HE / MBPP pass@1, M4-v2 inserted:
- M1 Vanilla DARE-TIES β 51.22 / 47.00
- M2 OMv2 (no signal) β 52.44 / 49.40
- M3 OMv2 + Fisher β 57.93 π₯ / 48.80
- M4 ex-LRP (PR #682 orig) β 51.22 / 49.40
- M4-v2 ex-LRP (PR #682 turbo) β 55.49 / 52.20 π₯
- M5 OMv2 + LRP β 53.05 / 51.40
Ξ M4-v2 vs M4-orig: +4.27 pp HE, +2.80 pp MBPP. M4-v2 takes the MBPP medal of the whole study (overtakes M5) while staying competitive on HumanEval. The turbo code path + rebalanced hyperparams clearly beat the original PR head on this configuration.
Findings refresh: Fisher still leads HE; ex-LRP (turbo) now leads MBPP, narrowly ahead of OMv2+LRP. Both LRP variants land within 1 pp on MBPP β strong signal that LRP-driven sparsification is doing real work for code-gen on small Qwen merges.
Big thanks to @Tusm11 for the supercharged ex-LRP turbo head β multimodal support + Iron-Man stabilization + in-place math are a real upgrade. Posted full results + the 6 patches needed to run it against Qwen3_5ForConditionalGeneration on the PR thread:
https://github.com/arcee-ai/mergekit/pull/682