Overview
Style Representations trained with the approach described in "Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion". These representations are trained in an unsupervised manner, that is, without the use of any authorship labels. We've found the representations to be performant for machine-text detection in particular, although they show some transfer to the tasks of authorship verification (see below).
We expect to release more performant versions of LUSR in the future. We'll link all such versions here.
Few-Shot Machine-text Detection
The following table shows machine-text detection performance on the M4 dataset using the same setup as: Few-Shot Detection of Machine-Generated Text using Style Representations .
| Zero-Shot Approaches | AUROC(1) |
|---|---|
| Binoculars | 69 |
| FastDetectGPT | 65 |
| Rank | 50 |
| LogRank | 50 |
| LRR | 50 |
| Revise-Detect | 60 |
| DNA-GPT | 51 |
| Supervised Classifiers | |
| Rank | 50 |
| Longformer | 58 |
| RADAR | 50 |
| RemoDetect | 64 |
| Few-Shot Approaches | k=1 | k=5 |
|---|---|---|
| LUAR CRUD | 60 | 87 |
| LUAR Multi-LLM | 61 | 88 |
| LUAR Multidomain | 60 | 89 |
| CISR | 58 | 84 |
| ProtoNet | 61 | 87 |
| SBERT | 52 | 62 |
| LUSR | 69 | 96 |
Authorship Verification
AUROC is averaged across PAN13/14/15/20/21 authorship verification tasks. The comparison includes supervised and unsupervised style representations, with LUSR evaluated without any training on authorship labels.
| Model | AUROC |
|---|---|
| Supervised | |
| LUAR | 78 |
| MSR | 73 |
| CISR | 70 |
| StyleDistance | 73 |
| --- | --- |
| Unsupervised | |
| LUSR | 74 |
- Downloads last month
- 479
Model tree for rrivera1849/LUSR
Base model
FacebookAI/roberta-base