Skip Navigation

Direct Preference Optimization - Your Language Model is Secretly a Reward Model

0
0 comments