Publications

2025

  1. ASearcher
    asearcher.png
    Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL
    Jiaxuan Gao , Wei Fu, Minyang Xie , Shusheng Xu , Chuyi He , Zhiyu Mei , Banghua Zhu , and Yi Wu
    Aug 2025
  2. reasoning_efficiency.png
    How Far Are We from Optimal Reasoning Efficiency?
    Jiaxuan Gao , Shu Yan , Qixin Tan , Lu Yang , Shusheng Xu , Wei Fu, Zhiyu Mei , Kaifeng Lyu , and Yi Wu
    Aug 2025
  3. AReaL
    areal.png
    AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
    Wei Fu, Jiaxuan Gao , Xujie Shen , Chen Zhu , Zhiyu Mei , Chuyi He , Shusheng Xu , Guo Wei , Jun Mei , Jiashu Wang , and 3 more authors
    May 2025
  4. ReaLHF
    realhf.png
    ReaL: Efficient RLHF Training of Large Language Models with Parameter Reallocation
    Zhiyu Mei , Wei Fu, Kaiwei Li , Guangju Wang , Huanchen Zhang , and Yi Wu
    In MLSys 2025 (*: Equal Contribution) , May 2025

2024

  1. effective_reward.png
    On Designing Effective RL Reward at Training Time for LLM Reasoning
    Jiaxuan Gao , Shusheng Xu , Wenjie Ye , Weilin Liu , Chuyi He , Wei Fu, Zhiyu Mei , Guangju Wang , and Yi Wu
    May 2024
  2. dpo.png
    Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
    Shusheng Xu , Wei Fu, Jiaxuan Gao , Wenjie Ye , Weilin Liu , Zhiyu Mei , Guangju Wang , Chao Yu , and Yi Wu
    ICML. (Oral) , Jul 2024
  3. SRL
    srl.png
    SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores
    Zhiyu Mei* , Wei Fu*, Guangju Wang , Huanchen Zhang , and Yi Wu
    ICLR. (*: Equal Contribution) , May 2024
  4. agile.png
    Learning Agile Bipedal Motions on a Quadrupedal Robot
    Yunfei Li , Jinhan Li , Wei Fu, and Yi Wu
    ICRA, May 2024

2023

  1. SIPO
    sipo.gif
    Iteratively Learn Diverse Strategies with State Distance Information
    Wei Fu, Weihua Du* , Jingwei Li* , Sunli Chen , Jingzhao Zhang , and Yi Wu
    NeurIPS. (*: Equal Contribution) , Dec 2023

2022

  1. ar.png
    Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
    Wei Fu, Chao Yu , Zelai Xu , Jiaqi Yang , and Yi Wu
    ICML, Jul 2022
  2. RSPO
    smac.gif
    Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization
    Zihan Zhou* , Wei Fu*, Bingliang Zhang , and Yi Wu
    ICLR. (*: Equal Contribution) , Apr 2022

2021

  1. amappo.png
    Unlocking the Potential of MAPPO with Asynchronous Optimization
    Wei Fu, Chao Yu , Yunfei Li , and Yi Wu
    In CICAI , Oral , Jun 2021