VibeVoice bezzam/VibeVoice-1.5B Text-to-Speech • 3B • Updated Feb 16 • 98 • 1 bezzam/VibeVoice-7B Text-to-Speech • 9B • Updated May 5 • 316 bezzam/VibeVoice-AcousticTokenizer Feature Extraction • 0.7B • Updated Feb 5 • 10 bezzam/VibeVoice-SemanticTokenizer Feature Extraction • 0.3B • Updated Dec 3, 2025 • 5
Neural codecs facebook/encodec_48khz Feature Extraction • 19.1M • Updated Sep 6, 2023 • 8.75k • 35 facebook/encodec_32khz Feature Extraction • 59M • Updated Sep 4, 2023 • 50.9k • 18 facebook/encodec_24khz Feature Extraction • 23.3M • Updated Jul 25, 2023 • 42.4k • 54 descript/dac_44khz Feature Extraction • 76.6M • Updated Oct 11, 2024 • 87.5k • • 11
VibeVoice bezzam/VibeVoice-1.5B Text-to-Speech • 3B • Updated Feb 16 • 98 • 1 bezzam/VibeVoice-7B Text-to-Speech • 9B • Updated May 5 • 316 bezzam/VibeVoice-AcousticTokenizer Feature Extraction • 0.7B • Updated Feb 5 • 10 bezzam/VibeVoice-SemanticTokenizer Feature Extraction • 0.3B • Updated Dec 3, 2025 • 5
Neural codecs facebook/encodec_48khz Feature Extraction • 19.1M • Updated Sep 6, 2023 • 8.75k • 35 facebook/encodec_32khz Feature Extraction • 59M • Updated Sep 4, 2023 • 50.9k • 18 facebook/encodec_24khz Feature Extraction • 23.3M • Updated Jul 25, 2023 • 42.4k • 54 descript/dac_44khz Feature Extraction • 76.6M • Updated Oct 11, 2024 • 87.5k • • 11
Running Open ASR Leaderboard configuration for Qwen3 ASR 🎙 Normalize text to a clean, consistent format
Running Open ASR Leaderboard configuration for Omnilingual ASR models 🎙 Normalize raw text into a clean, standardized format