KTO: Model Alignment as Prospect Theoretic Optimization

Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela

Alignment methods like RLHF and DPO expect feedback in the form of preferences (e.g., Output A is better than B for input X). Utilizing human annotation efforts for this feedback quickly gets very expensive, and can also result in conflicting data. Kahneman-Tversky Optimization (KTO) matches or exceeds (state-of-the-art) DPO performance without using preference data. KTO is far easier to use in the real world, where preferences are scarce and expensive to collect.

[Spotlight] International Conference on Machine Learning (ICML), 2024.

Paper | Leaderboard | Blog | Code | Model Checkpoints

winnie xu

Explorer

Research

KTO: Model Alignment as Prospect Theoretic Optimization

Revisiting Associative Compression: I Can't Believe It's Not Better

Language Model Cascades

Adventure

Japan

Crafts

Writing

KTO: Model Alignment as Prospect Theoretic Optimization

Graph View