Talks
These slides are only for reference purpose.
The crucial role of samplers in online direct preference optimization
[slide]
Logit mixing and RLHF paper reading
[slide]
Decoding-time language model alignment with multiple objectives
[slide]
Unleashing the power of pre-trained language models for offline reinforcement learning
[slide]