LTZ E1 (mini)

A ModernBERT-based masked language model pretrained on Luxembourgish, following the Ettin recipe (see here: https://huggingface.co/jhu-clsp/ettin-encoder-68m)

Model Details

  • Architecture: ModernBERT (encoder)
  • Size: mini
  • Vocabulary: 50,368 tokens (BPE, GPTNeoXTokenizerFast)
  • Context length: 1,024 tokens
  • Language: Luxembourgish (lb/ltz)
  • License: CC BY-SA 4.0

Usage

Requires transformers>=4.48.0.

from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch

tokenizer = AutoTokenizer.from_pretrained("instilux/ltz-e1-mini")
model = AutoModelForMaskedLM.from_pretrained("instilux/ltz-e1-mini")

inputs = tokenizer("Wéi spéit [MASK] et?", return_tensors="pt")
mask_pos = (inputs["input_ids"] == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]

with torch.no_grad():
    outputs = model(**inputs)

top_tokens = outputs.logits[0, mask_pos].topk(5)
for token_id, score in zip(top_tokens.indices[0], top_tokens.values[0]):
    token = tokenizer.decode(token_id)
    print(f"{token:15s} {score:.3f}")

Tokenizer Notes

The tokenizer is BPE-based (GPTNeoXTokenizerFast) with BERT-style special tokens ([CLS], [SEP], [MASK], [PAD]). A [CLS] token is prepended automatically (add_bos_token: true).

Citation

Please cite this paper (preprint, accepted to ACL 2026 Findings) if you use this model in your work.

@misc{plum2026ltzglueluxembourgishgenerallanguage, title={ltzGLUE: Luxembourgish General Language Understanding Evaluation}, author={Alistair Plum and Felicia Körner and Anne-Marie Lutgen and Laura Bernardy and Fred Philippy and Emilia Milano and Nils Rehlinger and Cédric Lothritz and Tharindu Ranasinghe and Barbara Plank and Christoph Purschke}, year={2026}, eprint={2604.17976}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2604.17976}, }

Downloads last month
68
Safetensors
Model size
68.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including instilux/ltz-e1-mini

Paper for instilux/ltz-e1-mini