Papers
arxiv:2602.08498

Characterizing, Evaluating, and Optimizing Complex Reasoning

Published on Jun 3
Authors:
,
,
,
,
,
,

Abstract

Large Reasoning Models benefit from a unified framework that evaluates complex reasoning traces using DAG-based methods and reward modeling to improve task performance.

Large Reasoning Models (LRMs) increasingly rely on reasoning traces with complex internal structures. However, existing work lacks a unified answer to three fundamental questions: (1) what defines high-quality reasoning, (2) how to reliably evaluate long, implicitly structured reasoning traces, and (3) how to use such evaluation signals for reasoning optimization. To address these challenges, we provide a unified perspective. (1) We introduce the ME^2 principle to characterize reasoning quality along macro- and micro-level concerning efficiency and effectiveness. (2) Built on this principle, we model reasoning traces as directed acyclic graphs (DAGs) and develop a DAG-based pairwise evaluation method, capturing complex reasoning structures. (3) Based on this method, we construct the TRM-Preference dataset and train a Thinking Reward Model (TRM) to evaluate reasoning quality at scale. Experiments show that thinking rewards serve as an effective optimization signal. At test time, selecting better reasoning leads to better outcomes (up to 19.3\% gain), and during RL training, thinking rewards enhance reasoning and performance (up to 3.9\% gain) across diverse tasks. Code and data are available at https://github.com/Simplified-Reasoning/TRM.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2602.08498
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.08498 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.