🧩 Model Weights for Towards Atoms of Large Language Models

arXiv GitHub Hugging Face

This repository contains the model weights associated with the paper:

👉 Towards Atoms of Large Language Models

Specifically, it provides the weights of threshold-activated sparse autoencoders (TSAEs) trained on activations across layers of Gemma 2 2B, using the CounterFact dataset.

Description

The paper introduces Atom Theory to define and identify the fundamental representational units (FRUs) of large language models, termed atoms. Using threshold-activated sparse autoencoders (TSAEs) and a non-Euclidean metric called the atomic inner product (AIP), the authors identify units with near-perfect faithfulness and stability across layers of Gemma2.

Usage

Note that only the model weights are included in this repository. For the complete implementation, including training scripts, data preprocessing, and evaluation pipelines, please refer to the main codebase:

👉 https://github.com/ChenhuiHu/towards_atoms

Citation

@article{hu2025towards,
  title={Towards Atoms of Large Language Models},
  author={Hu, Chenhui and Cao, Pengfei and Chen, Yubo and Liu, Kang and Zhao, Jun},
  journal={arXiv preprint arXiv:2509.20784},
  year={2025}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ericccccc/towards_atoms

Finetuned
(560)
this model

Paper for Ericccccc/towards_atoms