avatar

Ruizhe Shi

我们从坚果剥出时间并教它走路: 而时间回到壳中.

Talks

(only for reference purpose)

  • Logit mixing and RLHF paper reading

    [slide]

  • Decoding-Time Language Model Alignment with Multiple Objectives

    [slide]

  • Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning

    [slide]