See my Google Scholar for more details.
The Crucial Role of Samplers in Online Direct Preference Optimization
Ruizhe Shi*, Runlong Zhou*, Simon S. Du
Mathematics of Modern Machine Learning (NeurIPS M3L workshop) 2024
Decoding-Time Language Model Alignment with Multiple Objectives
Ruizhe Shi, Yifang Chen, Yushi Hu, Alisa Liu, Hannaneh Hajishirzi, Noah A. Smith, Simon S. Du
Conference on Neural Information Processing Systems (NeurIPS) 2024
Rethinking Transformers in Solving POMDPs
Chenhao Lu, Ruizhe Shi*, Yuyao Liu*, Kaizhe Hu, Simon S. Du, Huazhe Xu
International Conference on Machine Learning (ICML) 2024
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Ruizhe Shi*, Yuyao Liu*, Yanjie Ze, Simon S. Du, Huazhe Xu
International Conference on Learning Representations (ICLR) 2024
H-InDex: Visual Reinforcement Learning with Hand-Informed Representations for Dexterous Manipulation
Yanjie Ze, Yuyao Liu*, Ruizhe Shi*, Jiaxin Qin, Zhecheng Yuan, Jiashun Wang, Huazhe Xu
Conference on Neural Information Processing Systems (NeurIPS) 2023