Overview

Style Representations trained with the approach described in "Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion". These representations are trained in an unsupervised manner, that is, without the use of any authorship labels. We've found the representations to be performant for machine-text detection in particular, although they show some transfer to the tasks of authorship verification (see below).

We expect to release more performant versions of LUSR in the future. We'll link all such versions here.

Few-Shot Machine-text Detection

The following table shows machine-text detection performance on the M4 dataset using the same setup as: Few-Shot Detection of Machine-Generated Text using Style Representations .

Zero-Shot Approaches	AUROC(1)
Binoculars	69
FastDetectGPT	65
Rank	50
LogRank	50
LRR	50
Revise-Detect	60
DNA-GPT	51
Supervised Classifiers
Rank	50
Longformer	58
RADAR	50
RemoDetect	64

Few-Shot Approaches	k=1	k=5
LUAR CRUD	60	87
LUAR Multi-LLM	61	88
LUAR Multidomain	60	89
CISR	58	84
ProtoNet	61	87
SBERT	52	62
LUSR	69	96

Authorship Verification

AUROC is averaged across PAN13/14/15/20/21 authorship verification tasks. The comparison includes supervised and unsupervised style representations, with LUSR evaluated without any training on authorship labels.