Direct Preference Optimization - Your Language Model is Secretly a Reward Model
Direct Preference Optimization - Your Language Model is Secretly a Reward Model
0
comments
Direct Preference Optimization - Your Language Model is Secretly a Reward Model