(only for reference purpose) Logit mixing and RLHF paper reading [slide] Decoding-Time Language Model Alignment with Multiple Objectives [slide] Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning [slide]