E Sanchez's picture

E Sanchez

esanchez43

AI & ML interests

None yet

Recent Activity

upvoted a paper 1 day ago

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

liked a model 1 day ago

tencent/Hy-MT2-1.8B

liked a dataset 6 days ago

introvoyz041/OptimalSpectroscopicMeasurementDesign

View all activity

Organizations

None yet

upvoted a paper 1 day ago

DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards

Paper • 2605.21467 • Published 4 days ago • 174

upvoted a paper 16 days ago

OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

Paper • 2605.05185 • Published 18 days ago • 99

upvoted 4 papers about 1 month ago

UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

Paper • 2604.14967 • Published Apr 16 • 15

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

Paper • 2604.11626 • Published Apr 13 • 102

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

Paper • 2604.08377 • Published Apr 9 • 291

Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization

Paper • 2604.08476 • Published Apr 9 • 8

upvoted 2 papers about 2 months ago

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

Paper • 2604.02721 • Published Apr 3 • 629

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

Paper • 2603.28032 • Published Mar 30 • 342

upvoted 2 papers 2 months ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 371

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 210

upvoted 3 papers 3 months ago

Heterogeneous Agent Collaborative Reinforcement Learning

Paper • 2603.02604 • Published Mar 3 • 195

VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training

Paper • 2602.10693 • Published Feb 11 • 220

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 523