reproducibilityindex.ai

Preference Alignment with Flow Matching

Authors: Minu Kim, Yongsik Lee, Sehyeok Kang, Jihwan Oh, Song Chong, Se-Young Yun

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results indicate the practical effectiveness of our method, offering a new direction in aligning a pre-trained model to preference.
Researcher Affiliation	Academia	Minu Kim1 Yongsik Lee1 Sehyeok Kang1 Jihwan Oh1 Song Chong1 Se-Young Yun1 1KAIST AI {minu.kim, dldydtlr93, kangsehyeok0329, ericoh929, songchong, yunseyoung}@kaist.ac.kr
Pseudocode	Yes	Detailed algorithm can be found in Algorithm 1. ... Algorithm 1: PFM: Preference Flow Matching
Open Source Code	Yes	Our code is available at https://github.com/jadehaus/preference-flow-matching.
Open Datasets	Yes	We first evaluate PFM on a conditional image generation task using the MNIST dataset [Le Cun et al., 1998]. ... We train a preference flow on randomly selected pairs of movie reviews y+, y from the IMDB dataset [Maas et al., 2011]. ... we employ the D4RL [Fu et al., 2020] benchmark to assess the performance of PFM in reinforcement learning tasks.
Dataset Splits	No	The paper does not explicitly provide details about validation dataset splits or a specific validation methodology.
Hardware Specification	Yes	All experiments were conducted on a single Nvidia Titan RTX GPU and a single i9-10850K CPU core for each run.
Software Dependencies	No	The paper mentions software components such as DCGAN, Le Net, T5-based autoencoder, GPT-2 SFT model, PPO, behavior cloning, and uses terms like "PyTorch" indirectly (via common usage of such models) but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We utilize a pre-trained DCGAN [Radford et al., 2015] generator as πref and collect sample pairs from πref( \|x) conditioned on the digit labels x {0, , 9}. To construct preference datasets, we assign preferences to sample pairs according to the softmax probabilities of the labels from a Le Net [Le Cun et al., 1998]. ... we adopt the pre-trained sentiment classifier as the preference annotator. ... For our PFM framework to be applied to variable-length inputs, we employ a T5-based autoencoder to work with fixed-sized embeddings. ... we search KL regularization coefficient β from 0.01 to 100 and adopt the best one. ... The preference datasets consist of 1,000 pairs of preferred and rejected segments and their context for each offline dataset, with the segment length 10.