Get YouTube subscribers that watch and like your videos
Get Free YouTube Subscribers, Views and Likes

ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)

Follow
Yannic Kilcher

Paper: https://arxiv.org/abs/2403.07691

Abstract:
While recent preference alignment algorithms for language models have demonstrated promising results, supervised finetuning (SFT) remains imperative for achieving successful convergence. In this paper, we study the crucial role of SFT within the context of preference alignment, emphasizing that a minor penalty for the disfavored generation style is sufficient for preferencealigned SFT. Building on this foundation, we introduce a straightforward and innovative reference modelfree monolithic odds ratio preference optimization algorithm, ORPO, eliminating the necessity for an additional preference alignment phase. We demonstrate, both empirically and theoretically, that the odds ratio is a sensible choice for contrasting favored and disfavored styles during SFT across the diverse sizes from 125M to 7B. Specifically, finetuning Phi2 (2.7B), Llama2 (7B), and Mistral (7B) with ORPO on the UltraFeedback alone surpasses the performance of stateoftheart language models with more than 7B and 13B parameters: achieving up to 12.20% on AlpacaEval2.0 (Figure 1), 66.19% on IFEval (instructionlevel loose, Table 6), and 7.32 in MTBench (Figure 12). We release code and model checkpoints for MistralORPOα (7B) and MistralORPOβ (7B).

Authors: Jiwoo Hong, Noah Lee, James Thorne

Links:
Homepage: https://ykilcher.com
Merch: https://ykilcher.com/merch
YouTube:    / yannickilcher  
Twitter:   / ykilcher  
Discord: https://ykilcher.com/discord
LinkedIn:   / ykilcher  

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar: https://www.subscribestar.com/yannick...
Patreon:   / yannickilcher  
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

posted by smskahr