Program Synthesis for Character Level Language Modeling

Authors: Pavol Bielik, Veselin Raychev, Martin Vechev

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We experimentally compare our DSL-based probabilistic model with state-of-the-art neural network models on two popular datasets: the Linux Kernel dataset (Karpathy et al., 2015) and the Hutter Prize Wikipedia dataset (Hutter, 2012). Our experiments show that the precision of our model is comparable to that of neural networks while sharing a number of advantages with n-gram models such as fast query time and the capability to quickly add and remove training data samples.
Researcher Affiliation Academia Pavol Bielik, Veselin Raychev & Martin Vechev Department of Computer Science, ETH Z urich, Switzerland {pavol.bielik,veselin.raychev,martin.vechev}@inf.ethz.ch
Pseudocode No The paper provides a syntax definition for the TChar language in Figure 1, but no formal pseudocode blocks or algorithms for the synthesis procedures.
Open Source Code No We provide an interactive visualization of the program and it s performance on the Linux Kernel dataset online at: www.srl.inf.ethz.ch/charmodel.html
Open Datasets Yes For our experiments we use two diverse datasets: a natural language one and a structured text (source code) one. Both were previously used to evaluate character-level language models the Linux Kernel dataset Karpathy et al. (2015) and Hutter Prize Wikipedia dataset Hutter (2012).
Dataset Splits Yes For both datasets we use the first 80% for training, next 10% for validation and final 10% as a test set.
Hardware Specification Yes All of our experiments were performed on a machine with Intel(R) Xeon(R) CPU E52690 with 14 cores.
Software Dependencies No To train our models we follow the experimental set-up and use the implementation of Karpathy et al. (2015). We initialize all parameters uniformly in range [ 0.08, 0.08], use mini-batch stochastic gradient descent with batch size 50 and RMSProp Dauphin et al. (2015) perparameter adaptive update with base learning rate 2 10 3 and decay 0.95.
Experiment Setup Yes To train our models we follow the experimental set-up and use the implementation of Karpathy et al. (2015). We initialize all parameters uniformly in range [ 0.08, 0.08], use mini-batch stochastic gradient descent with batch size 50 and RMSProp Dauphin et al. (2015) perparameter adaptive update with base learning rate 2 10 3 and decay 0.95. Further, the network is unrolled 100 time steps and we do not use dropout. Finally, the network is trained for 50 epochs (with early stopping based on a validation set) and the learning rate is decayed after 10 epochs by multiplying it with a factor of 0.95 each additional epoch.