Instructions to use latincy/la_senter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- spaCy
How to use latincy/la_senter with spaCy:
!pip install https://huggingface.co/latincy/la_senter/resolve/main/la_senter-any-py3-none-any.whl # Using spacy.load(). import spacy nlp = spacy.load("la_senter") # Importing as module. import la_senter nlp = la_senter.load() - Notebooks
- Google Colab
- Kaggle
la_senter
Latin sentence segmentation model for LatinCy.
| Feature | Description |
|---|---|
| Name | la_senter |
| Version | 3.9.2 |
| spaCy | >=3.8.0,<3.9.0 |
| Default Pipeline | senter, token_fix |
| Components | senter, token_fix |
| Sources | UD_Latin-Perseus, UD_Latin-PROIEL, UD_Latin-ITTB, UD_Latin-LLCT, UD_Latin-UDante |
| License | MIT |
| Author | Patrick J. Burns |
Install
pip install https://huggingface.co/latincy/la_senter/resolve/main/la_senter-3.9.2-py3-none-any.whl
Usage
import spacy
nlp = spacy.load("la_senter")
doc = nlp("Gallia est omnis divisa in partes tres. Quarum unam incolunt Belgae.")
for sent in doc.sents:
print(sent.text)
# Gallia est omnis divisa in partes tres.
# Quarum unam incolunt Belgae.
doc = nlp("Iphicles, frater Herculis, magna voce exclamavit; sed Hercules ipse, fortissimus puer, haudquaquam territus est.")
for sent in doc.sents:
print(sent.text)
# Iphicles, frater Herculis, magna voce exclamavit;
# sed Hercules ipse, fortissimus puer, haudquaquam territus est.
What's new in 3.9.2
This release adds token_fix, a rule-based component that runs after
senter to repair sentence boundaries around punctuation that the statistical
model mis-splits. It is bundled with the model and registered automatically as
a spaCy factory, so spacy.load("la_senter") activates it with no extra imports.
- Parentheticals โ a
!or?inside(...)or[...]no longer ends the sentence:Verba (mirabile dictu!) sunt.stays one sentence. - Dash asides โ the same for em/en/hyphen-delimited asides:
Nostraque โ me miseram! โ timui.stays one sentence. - Orphaned closing quotes โ a sentence-final closing quote is kept with its
sentence rather than split onto the next:
"Ita est." "Dixit."breaks after the first closing quote, not before it.
3.9.1
Internal development version โ first bundled the fixers but was never published
to HuggingFace; superseded by 3.9.2 (which corrects the component registration
and aligns the version with the latincy-pipelines releases).
What's new in 3.9.0
- Case-insensitive segmentation: correctly handles lowercased and mixed-case input
- Sentence splitting on semicolons and colons
- Bracketed reference handling:
[2] O tempora...is treated as one sentence
Accuracy
| Type | Score |
|---|---|
SENTS_F |
99.71 |
SENTS_P |
99.66 |
SENTS_R |
99.76 |
Evaluated on held-out test split from the combined UD treebanks.
Intended use
Sentence segmentation of well-punctuated Latin text from digital editions, corpora, and scholarly sources. Not designed for punctuation-free text (scriptura continua).
Training
Trained on five Universal Dependencies Latin treebanks using spaCy's senter component.
- Downloads last month
- 187
Evaluation results
- Sentences F-Scoreself-reported0.997
- Sentences Precisionself-reported0.997
- Sentences Recallself-reported0.998