Instructions to use galqiwi/flute_kernels with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Kernels
How to use galqiwi/flute_kernels with Kernels:
# !pip install kernels from kernels import get_kernel kernel = get_kernel("galqiwi/flute_kernels") - Notebooks
- Google Colab
- Kaggle
flute_kernels
CUDA matmul kernels for LUT-quantized LLMs, packaged for the
kernels library.
Upstream: hanguo97/flute (Han Guo et al., Apache-2.0).
Use
import torch
from kernels import get_kernel
flute = get_kernel("galqiwi/flute_kernels", version=1)
# qgemm: y = x · dequant(Q, table)·s
y = flute.qgemm(x, Q, scales, table, table2, workspace,
num_bits, group_size, template_id, num_sms)
# fused HadaCore rotation + qgemm (HIGGS path)
y = flute.qgemm_hadamard(x, Q, scales, table, table2, workspace,
num_bits, group_size, hadamard_size,
template_id, num_sms)
# stand-alone Hadamard transform (HadaCore, fp16/bf16, pow-2 dim ≤ 32768)
y = flute.hadamard_transform(x, inplace=False)
Load-time helpers
flute.utils.pack(W, num_bits, template_ids, num_sms)
flute.utils.make_qmap2_from_qmap(qmap)
flute.utils.get_workspace_streamk(device)
flute.utils.get_template_config(num_bits, template_id, num_sms)
flute.utils.get_template_ids(num_bits)
flute.utils.is_template_supported(M, N, K, num_bits, template_id, num_sms)
flute.utils.get_device_num_sms(device)
flute.TEMPLATE_CONFIGS # the pre-tuned config dict
Attribution
CUDA code is adapted from hanguo97/flute (Apache-2.0). HadaCore kernel borrowed from pytorch-labs/applied-ai. Built against NVIDIA CUTLASS v3.5 (BSD-3-Clause); upstream FLUTE pins v3.4.1 but CuTe API is stable across 3.x.
- Downloads last month
- 15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support