On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

Reinforcement Studying from Human Suggestions (RLHF) is an efficient method for aligning language fashions to human preferences. Central to RLHF is studying a reward perform for scoring human preferences. Two most important approaches for studying a reward mannequin are 1) coaching an specific reward mannequin as in RLHF, and a pair of) utilizing an implicit reward discovered from desire knowledge via strategies equivalent to Direct Desire Optimization (DPO). Prior work has proven that the implicit reward mannequin of DPO can approximate a educated reward mannequin, however it’s unclear to what extent DPO can generalize to distribution shifts, a difficulty which may happen as a consequence of restricted desire knowledge, or altering language from the educated mannequin. We handle this query by evaluating the accuracy at distinguishing most well-liked and rejected solutions utilizing each DPO and RLHF rewards. Our findings point out that DPO’s implicit reward performs equally to RLHF rewards on in-distribution knowledge, however severely under-performs RLHF reward fashions. Throughout 5 out-of-domain settings, DPO has a imply drop in accuracy of three% and a most drop of seven%, highlighting the shortcomings of DPO’s implicit reward mannequin for desire optimization. These findings spotlight that DPO’s implicit reward mannequin has restricted generalization capability and substantiates the combination of an specific reward mannequin in iterative DPO approaches.

Source link

On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization

The coolest Prime Day board game deals, as picked by our tabletop gaming obsessives

Brazil-based Asaas, which offers payment processing, invoicing, fraud protection, and other financial services to SMBs, raised a $148M Series C led by BOND (Jacob Robbins/PitchBook)

Related Posts

Boost productivity by using AI in cloud operational health management | Amazon Web Services

How Combining RAG with Streaming Databases Can Transform Real-Time Data Interaction

VAP Group Set to Host Second Edition of Global AI Show in Dubai – ai2people.com

Dynamic Contrastive Decoding (DCD): A New AI Approach that Selectively Removes Unreliable Logits to Improve Answer Accuracy in Large Vision-Language Models

Demis Hassabis & John Jumper awarded Nobel Prize in Chemistry

Google DeepMind leaders share Nobel Prize in chemistry for protein prediction AI

Brazil-based Asaas, which offers payment processing, invoicing, fraud protection, and other financial services to SMBs, raised a $148M Series C led by BOND (Jacob Robbins/PitchBook)

DJI Neo review: The best $200 drone ever made

Microsoft ends Surface Duo 2 support with final security update

Leave a Reply Cancel reply

DJI RC Pro Review (Everything You Need to Know)

Windows 11 24H2 is out! @ AskWoody

Dual-Motor Rivian R1T Is a More Efficient, Less Expensive Electric Pickup

Watch the mind-bending new trailer for sci-fi epic ‘3 Body Problem’ (video)

The Explorer 2025 is the first Ford to run its new Android infotainment system

iPhone 16 and iPhone 16 Plus to Get More RAM, Faster Wi-Fi: Report

Google Pixel 9 range tipped for major display brightness upgrade

AALTO achieves milestone HAPS regulation, with Design Organisation Approval from UK Civil Aviation Authority

OpenAI Launches Custom GPT Store: How to Access and Use It Right Now

Metaphor: ReFantazio tips: 10 things I wish I knew before starting this 100-hour RPG

Filing: in OpenAI's $6.6B round, Khosla Ventures invested $405M; the majority or possibly all was pooled from other investors through a special purpose vehicle (Marina Temkin/TechCrunch)

Best Meta Quest 2 and Quest 3 games 2024

NLRB accuses Apple of illegally restricting employee Slack and social media use

Boost productivity by using AI in cloud operational health management | Amazon Web Services

Researchers want to use pyrolysis to turn asteroids into food for long-term space exploration

CATEGORIES

SITE MAP

Welcome Back!

Retrieve your password