Publications | Sunghwan Kim

2025

Memory

MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents

Arxiv preprint

arXiv
Interaction

ToolHaystack: Stress-Testing Tool-Augmented Language Models in Realistic Long-Term Interactions

Beong-woo Kwak, Minju Kim, Dongha Lim, Hyungjoo Chae, Dongjin Kang, Sunghwan Kim , Dongil Yang, and Jinyoung Yeo

EMNLP 2025 findings

arXiv
Interaction

LLM Meets Scene Graph: Can Large Language Models Understand and Generate Scene Graphs? A Benchmark and Empirical Study

Dongil Yang, Minjin Kim, Sunghwan Kim , Beong-woo Kwak, Minjun Park, Jinseok Hong, Woontack Woo, and Jinyoung Yeo

arXiv
Memory

Embodied Agents Meet Personalization: Exploring Memory Utilization for Personalized Assistance

Taeyoon Kwon^†, Dongwook Choi^†, Sunghwan Kim , Hyojun Kim, Seungjun Moon, Beong-woo Kwak, Kuan-Hao Huang, and Jinyoung Yeo

Arxiv preprint

arXiv
Reward Model

Web-Shepherd: Advancing PRMs for Reinforcing Web Agents

Hyungjoo Chae^†, Sunghwan Kim^† , Junhee Cho^†, Seungone Kim, Seungjun Moon, Gyeom Hwangbo, Dongha Lim, Minjin Kim, Yeonjun Hwang, Minju Gwak, Dongwook Choi, Minseok Kang, Gwanhoon Im, ByeongUng Cho, Hyojun Kim, Jun Hee Han, Taeyoon Kwon, Minju Kim, Beong-woo Kwak, Dongjin Kang, and 1 more author

NeuIPS 2025 (Spotlight)

arXiv
Reward Model

Rethinking Reward Model Evaluation Through the Lens of Reward Overoptimization

Sunghwan Kim^† , Dongjin Kang^†, Taeyoon Kwon, Hyungjoo Chae, Dongha Lee, and Jinyoung Yeo

ACL 2025 (Oral)

arXiv
Interaction

Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation

Hyungjoo Chae, Namyoung Kim, Kai Tzu-iunn Ong, Minju Gwak, Gwanwoo Song, Jihoon Kim, Sunghwan Kim , Dongha Lee, and Jinyoung Yeo

ICLR 2025

arXiv Code

2024

Reward Model

Evaluating Robustness of Reward Models for Mathematical Reasoning

Sunghwan Kim^† , Dongjin Kang^†, Taeyoon Kwon, Hyungjoo Chae, Jungsoo Won, Dongha Lee, and Jinyoung Yeo

Arxiv preprint

arXiv Code
Dialogue

Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory

Suyeon Lee^†, Sunghwan Kim^† , Minju Kim^†, Dongjin Kang, Dongil Yang, Harim Kim, Minseok Kang, Dayi Jung, Min Hee Kim, Seungbeen Lee, Kyoung-Mee Chung, Youngjae Yu, Dongha Lee, and Jinyoung Yeo

EMNLP 2024 findings

arXiv Code
Reasoning

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models

Hyungjoo Chae, Yeonghyeon Kim, Seungone Kim, Kai Tzu-iunn Ong, Beong-woo Kwak, Moohyeon Kim, Sunghwan Kim , Taeyoon Kwon, Jiwan Chung, Youngjae Yu, and Jinyoung Yeo

EMNLP 2024

arXiv Code
Dialogue

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation

Dongjin Kang^†, Sunghwan Kim^† , Taeyoon Kwon, Seungjun Moon, Hyunsouk Cho, Youngjae Yu, Dongha Lee, and Jinyoung Yeo

ACL 2024

🏆 Outstanding Paper Award 🏆

Abs arXiv

Outstanding Paper Award

Emotional Support Conversation (ESC) is a task aimed at alleviating individuals’ emotional distress through daily conversation. Given its inherent complexity and non-intuitive nature, ESConv dataset incorporates support strategies to facilitate the generation of appropriate responses. Recently, despite the remarkable conversational ability of large language models (LLMs), previous studies have suggested that they often struggle with providing useful emotional support. Hence, this work initially analyzes the results of LLMs on ESConv, revealing challenges in selecting the correct strategy and a notable preference for a specific strategy. Motivated by these, we explore the impact of the inherent preference in LLMs on providing emotional support, and consequently, we observe that exhibiting high preference for specific strategies hinders effective emotional support, aggravating its robustness in predicting the appropriate strategy. Moreover, we conduct a methodological study to offer insights into the necessary approaches for LLMs to serve as proficient emotional supporters. Our findings emphasize that (1) low preference for specific strategies hinders the progress of emotional support, (2) external assistance helps reduce preference bias, and (3) existing LLMs alone cannot become good emotional supporters. These insights suggest promising avenues for future research to enhance the emotional intelligence of LLMs.