sunmaxim (sun)

upvoted a collection about 1 year ago

InternVL3

Collection

33 items • Updated Mar 2 • 84

upvoted 2 articles about 1 year ago

Article

Reasoning Datasets Competition

bespokelabs

•

Apr 9, 2025

• 38

Article

Welcome Llama 4 Maverick & Scout on Hugging Face

+5

burtenshaw, reach-vb, pcuenq, clem, rajatarya, jsulz, lysandre

•

Apr 5, 2025

• 149

upvoted a paper about 1 year ago

Scalable-Softmax Is Superior for Attention

Paper • 2501.19399 • Published Jan 31, 2025 • 25

upvoted an article about 1 year ago

Article

Training Large Language Models with Interpreter Feedback using WebAssembly

axolotl-ai-co

•

Apr 3, 2025

• 14

upvoted a paper about 1 year ago

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

Paper • 2503.10460 • Published Mar 13, 2025 • 30

upvoted a collection about 1 year ago

OLMo 2

Collection

Artifacts for the OLMo 2 release. • 35 items • Updated Mar 3 • 155

upvoted 3 articles about 1 year ago

Article

Open R1: Update #3

open-r1

•

Mar 11, 2025

• 297

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

+4

RQlee, ArthurZ, achikundu, lwtr, rganti, mayank-mishra

•

Aug 21, 2024

• 41

Article

双流并行(DualPipe) 没有双流会更好

ufotalent

•

Feb 28, 2025

• 7

upvoted a paper about 1 year ago

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Paper • 2502.14768 • Published Feb 20, 2025 • 47

upvoted 3 papers over 1 year ago

upvoted an article over 1 year ago

Article

Open-source DeepResearch – Freeing our search agents

+3

m-ric, albertvillanova, merve, thomwolf, clefourrier

•

Feb 4, 2025

• 1.32k

upvoted a collection over 1 year ago

high-quality Chinese training datasets

Collection

a suite of high-quality Chinese datasets, used for pretraining, fine-tuning or preference alignment. And the models trained on these datasets. • 13 items • Updated May 22, 2025 • 24

upvoted an article over 1 year ago

Article

What is test-time compute and how to scale it?

Kseniase

•

Feb 6, 2025

• 120

upvoted 3 papers over 1 year ago

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Paper • 2407.21787 • Published Jul 31, 2024 • 14

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 379

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 45

sun

AI & ML interests

Organizations

InternVL3

Reasoning Datasets Competition

Welcome Llama 4 Maverick & Scout on Hugging Face

Scalable-Softmax Is Superior for Attention

Training Large Language Models with Interpreter Feedback using WebAssembly

Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond

OLMo 2

Open R1: Update #3

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

双流并行(DualPipe) 没有双流会更好

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

The Curse of Depth in Large Language Models

Cautious Optimizers: Improving Training with One Line of Code

On Teacher Hacking in Language Model Distillation

Open-source DeepResearch – Freeing our search agents

high-quality Chinese training datasets

What is test-time compute and how to scale it?

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling

Qwen2.5 Technical Report

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

sun

AI & ML interests

Organizations

sunmaxim's activity

Reasoning Datasets Competition

Welcome Llama 4 Maverick & Scout on Hugging Face

Training Large Language Models with Interpreter Feedback using WebAssembly

Open R1: Update #3

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

双流并行(DualPipe) 没有双流会更好

Open-source DeepResearch – Freeing our search agents

What is test-time compute and how to scale it?