3.36 GB
21 files
Updated 6 days ago
NameSize
data
urls
.gitattributes2.31 kB
xet
.gitignore654 Bytes
xet
Arabic-OpenHermes-2.5.png2.84 MB
xet
LICENSE.md19.9 kB
xet
OpenHermes-AR-full-fixed.parquet1.67 GB
xet
README.md4.9 kB
xet
dolma.py4.84 kB
xet
README.md

Dataset Card for "Arabic-OpenHermes-2.5"

Original Dataset Card of Arabic-OpenHermes-2.5 by 2A2I

Dataset Sources & Infos

Overview

Arabic-OpenHermes-2.5 is a carefully curated dataset extracted / translated from the OpenHermes-2.5 collection provided by teknium.

Purpose

Arabic-OpenHermes-2.5 streamlines Arabic language research and applications by offering a high quality text resource in the conversational style to help better alignement of the Arabic Base LLMs, saving time and effort for researchers, technologists, and linguists in Arabic NLP/AI projects.

  • Enjoy using Arabic-OpenHermes-2.5 dataset directly for your Arabic applications and research! 😀

Usage

This dataset serves as an essential tool for those venturing into Arabic language projects, spanning from academic research to commercial applications. By presenting a source of Arabic text, Arabic-OpenHermes-2.5 empowers users to plunge directly into model finetuning, analysis, and application development, eliminating the initial challenges of synthetic data creation.

Use with HuggingFace

To load this dataset with Datasets, you'll need to install the datasets library with pip install datasets --upgrade and then use the following code:

from datasets import load_dataset

dataset = load_dataset("2A2I/Arabic-OpenHermes-2.5")

Contribution and Collaborative Engagement

Find 'Arabic-OpenHermes-2.5' on the Hugging Face Hub at 2A2I/Arabic-OpenHermes-2.5, where community contributions are welcomed. Users are invited to share feedback and propose enhancements.

Support and Collaborate

We are dedicated to cultivating an inclusive and encouraging space for Arabic AI and NLP research. For assistance, collaboration opportunities, or inquiries related to the dataset, please connect with us through the Hugging Face Hub's discussion section or contact us via 2A2I Contact Email.


Original Dataset Card of OpenHermes-2.5 by teknium

Original Dataset Card of OpenHermes by teknium

Dataset Summary

The Open Hermes 2/2.5 and Nous Hermes 2 models have recently achieved noteworthy progress in state-of-the-art language models (LLMs). These advancements are rooted in the innovative utilization of large-scale training data, specifically tailored for language modeling tasks.

For further information, please visit teknium/OpenHermes-2.5.

We hope the Arabic-OpenHermes-2.5 dataset serves your needs well and propels your Arabic NLP endeavors to new heights!

Citation

@misc{OpenHermes 2.5,
  title = {OpenHermes 2.5: An Open Dataset of Synthetic Data for Generalist LLM Assistants},
  author = {Teknium},
  year = {2023},
  publisher = {HuggingFace},
  url = {https://huggingface.co/datasets/teknium/OpenHermes-2.5}
}
@misc{Arabic OpenHermes 2.5,
  title = {Arabic OpenHermes 2.5: An Arabic version of Synthetic Data for Generalist Arabic LLM Assistants},
  author = {Marwa El Kamil, Mohammed Machrouh},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/datasets/2A2I/Arabic-OpenHermes-2.5}
}
Total size
3.36 GB
Files
21
Last updated
May 26
Pre-warmed CDN
US EU US EU

Contributors