🧩 Model Weights for Towards Atoms of Large Language Models
This repository contains the model weights associated with the paper:
👉 Towards Atoms of Large Language Models
Specifically, it provides the weights of threshold-activated sparse autoencoders (TSAEs) trained on activations across layers of Gemma 2 2B, using the CounterFact dataset.
Description
The paper introduces Atom Theory to define and identify the fundamental representational units (FRUs) of large language models, termed atoms. Using threshold-activated sparse autoencoders (TSAEs) and a non-Euclidean metric called the atomic inner product (AIP), the authors identify units with near-perfect faithfulness and stability across layers of Gemma2.
Usage
Note that only the model weights are included in this repository. For the complete implementation, including training scripts, data preprocessing, and evaluation pipelines, please refer to the main codebase:
👉 https://github.com/ChenhuiHu/towards_atoms
Citation
@article{hu2025towards,
title={Towards Atoms of Large Language Models},
author={Hu, Chenhui and Cao, Pengfei and Chen, Yubo and Liu, Kang and Zhao, Jun},
journal={arXiv preprint arXiv:2509.20784},
year={2025}
}
Model tree for Ericccccc/towards_atoms
Base model
google/gemma-2-2b