CV
Contact Information
Name: Wei Fu
Email: fuwth17 AT gmail DOT com
Email: fuwth17 AT gmail DOT com
Location: Beijing, CN
GitHub: garrett4wade
GitHub: garrett4wade
Education
Ph.D. in Institute for Interdisciplinary Information Sciences (IIIS)
Tsinghua University, Beijing, China
Advisor: Yi Wu
Research Direction: RL, MARL, Distributed Systems
Tsinghua University, Beijing, China
Advisor: Yi Wu
Research Direction: RL, MARL, Distributed Systems
2021.9 – 2026.7
B.E. in Department of Electronic Engineering
Tsinghua University, Beijing, China
Advisor: Yi Wu
Thesis: Distributed Systems for Multi-Agent Games
Tsinghua University, Beijing, China
Advisor: Yi Wu
Thesis: Distributed Systems for Multi-Agent Games
2017.9 – 2021.7
High School
Yaohua High School, Tianjin, China
Yaohua High School, Tianjin, China
2014.9 – 2017.7
Publications
2025
Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
Jiaxuan Gao, Wei Fu, Minyang Xie, Shusheng Xu, Chuyi He, Zhiyu Mei, Banghua Zhu, Yi Wu
Arxiv Preprint
Jiaxuan Gao, Wei Fu, Minyang Xie, Shusheng Xu, Chuyi He, Zhiyu Mei, Banghua Zhu, Yi Wu
Arxiv Preprint
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu
Arxiv Preprint
Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, Tongkai Yang, Binhang Yuan, Yi Wu
Arxiv Preprint
How Far Are We from Optimal Reasoning Efficiency?
Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu
Arxiv Preprint
Jiaxuan Gao, Shu Yan, Qixin Tan, Lu Yang, Shusheng Xu, Wei Fu, Zhiyu Mei, Kaifeng Lyu, Yi Wu
Arxiv Preprint
ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
Zhiyu Mei*, Wei Fu*, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu
MLSys 2025 (*: Equal Contribution)
Zhiyu Mei*, Wei Fu*, Kaiwei Li, Guangju Wang, Huanchen Zhang, Yi Wu
MLSys 2025 (*: Equal Contribution)
2024
On Designing Effective RL Reward at Training Time for LLM Reasoning
Jiaxuan Gao, Shusheng Xu, Wenjie Ye, Weilin Liu, Chuyi He, Wei Fu, Zhiyu Mei, Guangju Wang, Yi Wu
Arxiv Preprint
Jiaxuan Gao, Shusheng Xu, Wenjie Ye, Weilin Liu, Chuyi He, Wei Fu, Zhiyu Mei, Guangju Wang, Yi Wu
Arxiv Preprint
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu
ICML 2024 Oral, 1.5%
Shusheng Xu, Wei Fu, Jiaxuan Gao, Wenjie Ye, Weilin Liu, Zhiyu Mei, Guangju Wang, Chao Yu, Yi Wu
ICML 2024 Oral, 1.5%
SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
Zhiyu Mei*, Wei Fu*, Guangju Wang, Huanchen Zhang, Yi Wu
ICLR 2024 (*: Equal Contribution) ES-FoMo Workshop Oral
Zhiyu Mei*, Wei Fu*, Guangju Wang, Huanchen Zhang, Yi Wu
ICLR 2024 (*: Equal Contribution) ES-FoMo Workshop Oral
Learning Agile Bipedal Motions on a Quadrupedal Robot
Yunfei Li, Jinhan Li, Wei Fu, Yi Wu
ICRA 2024 EXPO Best Paper Final List
Yunfei Li, Jinhan Li, Wei Fu, Yi Wu
ICRA 2024 EXPO Best Paper Final List
2023
Iteratively Learn Diverse Strategies with State Distance Information
Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, Yi Wu
NeurIPS 2023
Wei Fu, Weihua Du, Jingwei Li, Sunli Chen, Jingzhao Zhang, Yi Wu
NeurIPS 2023
2022
Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu
ICML 2022 Spotlight Talk
Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, Yi Wu
ICML 2022 Spotlight Talk
Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
Zihan Zhou*, Wei Fu*, Bingliang Zhang, Yi Wu
ICLR 2022 (*: Equal Contribution)
Zihan Zhou*, Wei Fu*, Bingliang Zhang, Yi Wu
ICLR 2022 (*: Equal Contribution)
2021
Unlocking the Potential of MAPPO with Asynchronous Optimization
Wei Fu, Chao Yu, Yunfei Li, Yi Wu
CICAI 2021 Oral
Wei Fu, Chao Yu, Yunfei Li, Yi Wu
CICAI 2021 Oral
Experience
Research Intern
Ant Research, RL Lab
We built ASearcher, a search agent trained with AReaL. Our agent exhibits extreme long-horizon search, with tool calls exceeding 40 turns and output tokens exceeding 150k during training time. With a simple agent design and no external LLMs, ASearcher achieves Avg@4 scores of 51.1 on xBench and 58.7 on GAIA, surpassing existing open-source 32B agents.
Ant Research, RL Lab
We built ASearcher, a search agent trained with AReaL. Our agent exhibits extreme long-horizon search, with tool calls exceeding 40 turns and output tokens exceeding 150k during training time. With a simple agent design and no external LLMs, ASearcher achieves Avg@4 scores of 51.1 on xBench and 58.7 on GAIA, surpassing existing open-source 32B agents.
2025.06 – 2025.08
Research Intern
Ant Research, RL Lab
I lead the AReaL project, an asynchronous RL system for reasoning and agentic LLM training. It delivers the simplest solution for customizing agentic workflows, while reducing overall training time by 2.5× compared to the best open-source synchronous systems.
Ant Research, RL Lab
I lead the AReaL project, an asynchronous RL system for reasoning and agentic LLM training. It delivers the simplest solution for customizing agentic workflows, while reducing overall training time by 2.5× compared to the best open-source synchronous systems.
2024.12 – 2025.06
Research Intern
Shanghai Qi Zhi Institute
We designed ReaL, an efficient distributed system for LLM RLHF. It exhibits an average 26% improvement over heuristic approaches based on Megatron-LM, and at least 2× higher throughput than DeepSpeed-Chat.
Shanghai Qi Zhi Institute
We designed ReaL, an efficient distributed system for LLM RLHF. It exhibits an average 26% improvement over heuristic approaches based on Megatron-LM, and at least 2× higher throughput than DeepSpeed-Chat.
2023.5 – 2024.10
Research Intern
Shanghai Qi Zhi Institute
We implemented SRL, a general-purpose game-centric RL system that can scale to over 10k CPU cores and 96 GPUs. We were the first to reproduce OpenAI's results in the Hide-And-Seek game.
Shanghai Qi Zhi Institute
We implemented SRL, a general-purpose game-centric RL system that can scale to over 10k CPU cores and 96 GPUs. We were the first to reproduce OpenAI's results in the Hide-And-Seek game.
2021.10 – 2023.5
Research Intern
ByteDance
We implemented a Ray-based system (not RLlib) to train an agent to play Unity-based FPS games. While the baseline required several days for training, our system could finish training within approximately 8 hours.
ByteDance
We implemented a Ray-based system (not RLlib) to train an agent to play Unity-based FPS games. While the baseline required several days for training, our system could finish training within approximately 8 hours.
2020.6 – 2020.9
Awards
ByteDance Scholarship
2024
Scholarships in Tsinghua University
2017 – 2024
Service
Reviewer
NeurIPS/ICML/ICLR
NeurIPS/ICML/ICLR
2022 – 2024