Instructions to use FacebookAI/xlm-roberta-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use FacebookAI/xlm-roberta-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="FacebookAI/xlm-roberta-base")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-roberta-base") model = AutoModelForMaskedLM.from_pretrained("FacebookAI/xlm-roberta-base") - Inference
- Notebooks
- Google Colab
- Kaggle
Tokenizer vocabulary
#28
by DjTobalito - opened
Hi,
Using the XLM Roberta for multilanguage classification with success. I am trying to understand a bit better the tokenizer.
Naively, I expected that common words of small size in the languages present in the dataset to be present in the tokenizer.vocab dictionary.
But it seems that for French for example, the word "oui" (yes in French) is not in the tokenizer.vocab dictionary.
Am I misunderstanding the tokenizer.vocab dictionary ?