TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification Paper • 2604.14531 • Published 5 days ago • 6
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 15 items • Updated about 5 hours ago • 270
Gemma 4 Collection Gemma 4 is Google's new model family including including E2B, E4B, 26B-A4B, and 31B. • 28 items • Updated 4 days ago • 150
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 19 days ago • 867
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation Paper • 2603.19220 • Published Mar 19 • 66
view article Article Introducing Cohere-transcribe: state-of-the-art speech recognition 25 days ago • 37
The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus Paper • 2603.20105 • Published Mar 20 • 37
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published Feb 11 • 195
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test Paper • 2503.01840 • Published Mar 3, 2025 • 9
view article Article Exploring New Frontiers of LLMs: Adaptive Dual-Search Distillation (ADS) and the 30B Model Open Beta Mar 1 • 2
Quantized Qwen3.5 Collection Verified models. Compatible with Transformers v5.3 and vLLM v0.16.1rc1 (nightly). Under evaluation. • 9 items • Updated Mar 12 • 9
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated Feb 26 • 96
Claude 4.5 Opus Collection Distilled models and datasets for Claude 4.5 Opus. • 12 items • Updated 9 days ago • 32