Wei Fu

I’m currently a fifth-year PhD student in computer science at IIIS, Tsinghua University (expected graduation: June 2026). I am very fortunate to be advised by Professor Yi Wu. Prior to my PhD studies, I received my BEng degree from the Department of Electrical Engineering in 2021.

My research interests lie at the intersection of reinforcement learning (RL) and distributed systems. Currently, I focus on developing AReaL — the fastest and easiest way to scale up agentic RL training for LLM/VLMs. The project has accumulated 2.5k GitHub stars as of September 2025.

I’d describe myself more as a programmer than a researcher.

I enjoy coding, cooking, listening to pop music, and working out at the gym.

News

Aug 09, 2025	Introducing ASearcher — a 32B search agent trained with AReaL that exhibits extreme long-horizon agentic execution capability. It achieves SOTA performance compared with other open-source models of the same size. Check out our open-sourced code, dataset, and checkpoints!
Aug 01, 2025	We refactored the legacy AReaL code into AReaL-lite, a light-weight and algorithm-first codebase that prioritizes better development experiences for AI researchers. Check out the quickstart guide to begin your journey with AReaL-lite!
Jun 23, 2024	Introducing *ReaLHF, a highly efficient system for RLHF training of LLMs. It can achieve up to 10x higher training speedup than existing open-source systems! Check our open-sourced code and the documentation to get started with ReaL* quickly! 🚀

Selected Publications

ASearcher

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Jiaxuan Gao , Wei Fu, Minyang Xie , Shusheng Xu , Chuyi He , Zhiyu Mei , Banghua Zhu , and Yi Wu

Aug 2025

PDF Code Website
AReaL

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

Wei Fu, Jiaxuan Gao , Xujie Shen , Chen Zhu , Zhiyu Mei , Chuyi He , Shusheng Xu , Guo Wei , Jun Mei , Jiashu Wang , and 3 more authors

May 2025

PDF Code Website
ReaLHF

ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation

Zhiyu Mei , Wei Fu, Kaiwei Li , Guangju Wang , Huanchen Zhang , and Yi Wu

In MLSys 2025 (*: Equal Contribution) , May 2025

PDF Code Website
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Shusheng Xu , Wei Fu, Jiaxuan Gao , Wenjie Ye , Weilin Liu , Zhiyu Mei , Guangju Wang , Chao Yu , and Yi Wu

ICML. (Oral) , Jul 2024

arXiv
SRL

SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores

Zhiyu Mei* , Wei Fu*, Guangju Wang , Huanchen Zhang , and Yi Wu

ICLR. (*: Equal Contribution) , May 2024

PDF Code
Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Wei Fu, Chao Yu , Zelai Xu , Jiaqi Yang , and Yi Wu

ICML, Jul 2022

arXiv Code Website