2024 ReaLHF ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation Zhiyu Mei* , Wei Fu*, Kaiwei Li , Guangju Wang , Huanchen Zhang , and Yi Wu Preprint (*: Equal Contribution) , Jul 2024 arXiv PDF Code Website Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Shusheng Xu , Wei Fu, Jiaxuan Gao , Wenjie Ye , Weilin Liu , Zhiyu Mei , Guangju Wang , Chao Yu , and Yi Wu ICML. (Oral) , Jul 2024 arXiv SRL SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores Zhiyu Mei* , Wei Fu*, Guangju Wang , Huanchen Zhang , and Yi Wu ICLR. (*: Equal Contribution) , May 2024 PDF Code Learning Agile Bipedal Motions on a Quadrupedal Robot Yunfei Li , Jinhan Li , Wei Fu, and Yi Wu ICRA, May 2024 arXiv Video Code Website 2023 SIPO Iteratively Learn Diverse Strategies with State Distance Information Wei Fu, Weihua Du* , Jingwei Li* , Sunli Chen , Jingzhao Zhang , and Yi Wu NeurIPS. (*: Equal Contribution) , Dec 2023 arXiv Code Website 2022 Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning Wei Fu, Chao Yu , Zelai Xu , Jiaqi Yang , and Yi Wu ICML, Jul 2022 arXiv Code Website RSPO Continuously Discovering Novel Strategies via Reward-Switching Policy Optimization Zihan Zhou* , Wei Fu*, Bingliang Zhang , and Yi Wu ICLR. (*: Equal Contribution) , Apr 2022 arXiv Code Website 2021 Unlocking the Potential of MAPPO with Asynchronous Optimization Wei Fu, Chao Yu , Yunfei Li , and Yi Wu In CICAI , Oral , Jun 2021 HTML