reproducibilityindex.ai

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

Authors: Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data.
Researcher Affiliation	Collaboration	1Google Brain, Mountain View, USA 2Massachusetts Institute of Technology, Cambridge, USA 3University of Cambridge, Cambridge, UK 4Max Planck Institute for Intelligent Systems, Stuttgart, Germany 5Universit e de Montr eal, Montr eal, Canada.
Pseudocode	No	Not found.
Open Source Code	Yes	The code for Sequence Tutor, including a checkpointed version of the trained melody RNN is available at redacted for anonymous submission.
Open Datasets	Yes	To train the model, we begin by extracting monophonic melodies from a corpus of 30,000 MIDI songs and encoding them as one-hot sequences of notes1. 1More information about both the note encoding and the reward metrics is available in the supplementary material.
Dataset Splits	No	The trained RNN eventually obtained a validation accuracy of 92% and a log perplexity score of .2536.
Hardware Specification	No	Not found.
Software Dependencies	No	Optimization was performed with Adam (Kingma & Ba, 2014)... To optimize for these metrics... we constructed a reward function that incentivizes validity, log P, SA, and QED using an open-source library called RDkit (http://www.rdkit.org/).
Experiment Setup	Yes	Optimization was performed with Adam (Kingma & Ba, 2014), a batch size of 128, initial learning rate of .5, and a stepwise learning rate decay of 0.85 every 1000 steps. Gradients were clipped to ensure the L2 norm was less than 5, and weight regularization was applied with β = 2.5 10 5. The Sequence Tutor model was trained using a similar conﬁguration to the one above, except with a batch size of 32, and a reward discount factor of γ=.5. The Target Q-network s weights θ were gradually updated towards those of the Q-network (θ) according to the formula (1 η)θ + ηθ, where η = .01 is the Target-Q-network update rate. For this experiment, we also made use of prioritized experience replay (Schaul et al., 2015) to allow the model to more frequently learn from relatively rare valid samples. A value of c = 2.85 led to a higher yield of valid molecules with high metrics, but still encouraged the diversity of generated samples.