MR-FlowDPO: Multi-Reward Direct Preference Optimization

for Flow-Matching Text-to-Music Generation

Alon Ziv1,2*, Sanyuan Chen1, Andros Tjandra1, Yossi Adi1,2, Wei-Ning Hsu1, Bowen Shi1


1FAIR Team, Meta MSL

2The Hebrew University of Jerusalem

[paper] [code]

Abstract


A key challenge in music generation models is their lack of direct alignment with human preferences, as music evaluation is inherently subjective and varies widely across individuals. We introduce MR-FlowDPO - a novel approach that enhances flow-matching-based music generation models - a major class of modern music generative models, using Direct Preference Optimization (DPO) with multiple musical rewards. The musical rewards are crafted to assess music quality across three key dimensions: text alignment, audio production quality, and musicality, utilizing scalable off-the-shelf models for each reward prediction. We employ these rewards in two ways: by constructing preference data for DPO and by integrating the rewards into text prompting. To address the ambiguity in musicality evaluation, we propose a novel scoring mechanism leveraging the semantic self-supervised representation, which significantly improves the rhythmic stability of generated music. We conduct an extensive evaluation using a variety of music-specific objective metrics as well as a human study. Results show that MR-FlowDPO significantly enhances overall music generation quality and is consistently preferred over highly competitive baselines in terms of audio quality, text alignment, and musicality. Our code is publicly available.

MRSD data pairs - Semantic Consistency

Reference model: MelodyFlow-1B

text prompt ↓ Positive Sample Positive Score Negative Sample Negative Score
Hard driving, 90s style alternative rock. Your browser does not support the audio element. 0.40 Your browser does not support the audio element. 0.21
Light, rhythmic music with a touch of sadness, but evoke positive emotions. Suitable for fast frames of a movie/tv/web/games etc. Your browser does not support the audio element. 0.38 Your browser does not support the audio element. 0.24

MRSD data pairs - Production Quality

Reference model: MelodyFlow-1B

text prompt ↓ Positive Sample Positive Score Negative Sample Negative Score
Happy swinging track, finger snaps and bending organ.Perfect for a video tutorial, cooking vlog or as a soundtrack in your podcast.Funny vibes and light-hearted mood. Your browser does not support the audio element. 8.23 Your browser does not support the audio element. 6.58
Electronic Music Track for Video Editing or Slide show and Presentationsuseful for Technology, informative, science, Travel and other themes Your browser does not support the audio element. 8.49 Your browser does not support the audio element. 6.79

MRSD data pairs - Text Alignment

Reference model: MelodyFlow-1B

text prompt ↓ Positive Sample Positive Score Negative Sample Negative Score
The Power of Synths combined with originality that creates epic atmosphere, going into a battle with strengthened spirit kind of music. Your browser does not support the audio element. 0.45 Your browser does not support the audio element. 0.20
Calm & dark pop punk inspired sound for all types of podcasts, youtube, action content & more. Your browser does not support the audio element. 0.43 Your browser does not support the audio element. 0.17
This soundtrack makes a fun, kids, mysterious mood. This is just melody with Halloween mood. Spooky and quirky background instrumental music, full of eerie cartoon atmosphere and creepy Halloween fun. Witches, ghosts, monsters, scarecrows, pumpkins, zombies, vampires. Instruments used: chamber orchestra, harpsichord, theremin, celeste, and percussion set. Perfect for Halloween themes, cartoons, animation, games, kids, children's media, comics, creepy slide shows Your browser does not support the audio element. 0.50 Your browser does not support the audio element. 0.25

MrFlowDPO-1B vs. Reference

text prompt ↓ model → MelodyFlow-1B (Reference) MRFlowDPO-1B
An upbeat, soulful hip hop track with organic production, guitars and percussion. Your browser does not support the audio element. Your browser does not support the audio element.
Hard driving, 90s style alternative rock. Your browser does not support the audio element. Your browser does not support the audio element.
A twangy funk tune with a touch of 80s and a break beat finish, drums, bass, guitar and keys. Your browser does not support the audio element. Your browser does not support the audio element.
Fusion of World, Indian, Middle Eastern genre's with Dystonpian bellydance, glitch, pop vibe. Uptempo, very percussion centric with traditional Indian acoustic instruments masterfully blended with synth electronic effects. Your browser does not support the audio element. Your browser does not support the audio element.

MrFlowDPO-400M vs. Reference

text prompt ↓ model → Flow-400M (Reference) MRFlowDPO-400M
Chill, disco comedic, video game, jolly - electronic programming, jazzy hits. Jazz, soul, dance, house hybrid. Your browser does not support the audio element. Your browser does not support the audio element.
60's Inspired Acoustic Pop Your browser does not support the audio element. Your browser does not support the audio element.
Philosophical and abstract golden age hip hop track to relieve stress Your browser does not support the audio element. Your browser does not support the audio element.
Inspiring indie guitar UK Drill drum style instrumental Your browser does not support the audio element. Your browser does not support the audio element.

MrFlowDPO-400M vs. MusicGen

text prompt ↓ model → MusicGen-Medium MRFlowDPO-400M
This music clip is an percussionary instrument. The tempo is medium fast with steady bass drum, energetic snare drum beat and cymbal rides. This music is a youthful, punchy, energetic, simple drumming style. Your browser does not support the audio element. Your browser does not support the audio element.
This is an opening theme for a TV series. It is an instrumental piece. The main theme is being played by a loud brass section. There is a groovy synth bass line playing. The rhythmic background consists of a strong electronic drum beat. The atmosphere is energetic. This piece could be used in lifting samples for beat-making. Your browser does not support the audio element. Your browser does not support the audio element.
This is an instrumental progressive rock music piece. There is an electric guitar playing complex tunes and chords with a pitch shifting effect. There is a psychedelic feel to this track. Parts of this recording could be used in an advertisement jingle. Your browser does not support the audio element. Your browser does not support the audio element.

MrFlowDPO-400M vs. AudioLDM2

text prompt ↓ model → AudioLDM2 MRFlowDPO-400M
This music is instrumental. The tempo is medium fast with a melodious keyboard harmony, steady drumming, groovy bass, synthesiser arrangements , electronically articulated sounds and tambourine beats . The melody is harmonious, pleasant, uncomplicated and well layered. This music is Synth Pop. Your browser does not support the audio element. Your browser does not support the audio element.
The performer is snapping his fingers in rhythm with the upbeat japanese music playing in the background. The song is a j-pop song and features vibrant rhythmic synth activity and has a general dance feel to it. It's a live recording. Your browser does not support the audio element. Your browser does not support the audio element.
The song is an instrumental. The song is medium tempo with a steady drumming rhythm, cymbals crashing, piano accompaniment and a xylophone playing a cool melody. The song is emotional and passionate. The song is an ad jingle for a technology solutions company. Your browser does not support the audio element. Your browser does not support the audio element.