reproducibilityindex.ai

Program Synthesis for Character Level Language Modeling

Authors: Pavol Bielik, Veselin Raychev, Martin Vechev

ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally compare our DSL-based probabilistic model with state-of-the-art neural network models on two popular datasets: the Linux Kernel dataset (Karpathy et al., 2015) and the Hutter Prize Wikipedia dataset (Hutter, 2012). Our experiments show that the precision of our model is comparable to that of neural networks while sharing a number of advantages with n-gram models such as fast query time and the capability to quickly add and remove training data samples.
Researcher Affiliation	Academia	Pavol Bielik, Veselin Raychev & Martin Vechev Department of Computer Science, ETH Z urich, Switzerland {pavol.bielik,veselin.raychev,martin.vechev}@inf.ethz.ch
Pseudocode	No	The paper provides a syntax definition for the TChar language in Figure 1, but no formal pseudocode blocks or algorithms for the synthesis procedures.
Open Source Code	No	We provide an interactive visualization of the program and it s performance on the Linux Kernel dataset online at: www.srl.inf.ethz.ch/charmodel.html
Open Datasets	Yes	For our experiments we use two diverse datasets: a natural language one and a structured text (source code) one. Both were previously used to evaluate character-level language models the Linux Kernel dataset Karpathy et al. (2015) and Hutter Prize Wikipedia dataset Hutter (2012).
Dataset Splits	Yes	For both datasets we use the ﬁrst 80% for training, next 10% for validation and ﬁnal 10% as a test set.
Hardware Specification	Yes	All of our experiments were performed on a machine with Intel(R) Xeon(R) CPU E52690 with 14 cores.
Software Dependencies	No	To train our models we follow the experimental set-up and use the implementation of Karpathy et al. (2015). We initialize all parameters uniformly in range [ 0.08, 0.08], use mini-batch stochastic gradient descent with batch size 50 and RMSProp Dauphin et al. (2015) perparameter adaptive update with base learning rate 2 10 3 and decay 0.95.
Experiment Setup	Yes	To train our models we follow the experimental set-up and use the implementation of Karpathy et al. (2015). We initialize all parameters uniformly in range [ 0.08, 0.08], use mini-batch stochastic gradient descent with batch size 50 and RMSProp Dauphin et al. (2015) perparameter adaptive update with base learning rate 2 10 3 and decay 0.95. Further, the network is unrolled 100 time steps and we do not use dropout. Finally, the network is trained for 50 epochs (with early stopping based on a validation set) and the learning rate is decayed after 10 epochs by multiplying it with a factor of 0.95 each additional epoch.