Program Synthesis for Character Level Language Modeling
Authors: Pavol Bielik, Veselin Raychev, Martin Vechev
ICLR 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally compare our DSL-based probabilistic model with state-of-the-art neural network models on two popular datasets: the Linux Kernel dataset (Karpathy et al., 2015) and the Hutter Prize Wikipedia dataset (Hutter, 2012). Our experiments show that the precision of our model is comparable to that of neural networks while sharing a number of advantages with n-gram models such as fast query time and the capability to quickly add and remove training data samples. |
| Researcher Affiliation | Academia | Pavol Bielik, Veselin Raychev & Martin Vechev Department of Computer Science, ETH Z urich, Switzerland {pavol.bielik,veselin.raychev,martin.vechev}@inf.ethz.ch |
| Pseudocode | No | The paper provides a syntax definition for the TChar language in Figure 1, but no formal pseudocode blocks or algorithms for the synthesis procedures. |
| Open Source Code | No | We provide an interactive visualization of the program and it s performance on the Linux Kernel dataset online at: www.srl.inf.ethz.ch/charmodel.html |
| Open Datasets | Yes | For our experiments we use two diverse datasets: a natural language one and a structured text (source code) one. Both were previously used to evaluate character-level language models the Linux Kernel dataset Karpathy et al. (2015) and Hutter Prize Wikipedia dataset Hutter (2012). |
| Dataset Splits | Yes | For both datasets we use the first 80% for training, next 10% for validation and final 10% as a test set. |
| Hardware Specification | Yes | All of our experiments were performed on a machine with Intel(R) Xeon(R) CPU E52690 with 14 cores. |
| Software Dependencies | No | To train our models we follow the experimental set-up and use the implementation of Karpathy et al. (2015). We initialize all parameters uniformly in range [ 0.08, 0.08], use mini-batch stochastic gradient descent with batch size 50 and RMSProp Dauphin et al. (2015) perparameter adaptive update with base learning rate 2 10 3 and decay 0.95. |
| Experiment Setup | Yes | To train our models we follow the experimental set-up and use the implementation of Karpathy et al. (2015). We initialize all parameters uniformly in range [ 0.08, 0.08], use mini-batch stochastic gradient descent with batch size 50 and RMSProp Dauphin et al. (2015) perparameter adaptive update with base learning rate 2 10 3 and decay 0.95. Further, the network is unrolled 100 time steps and we do not use dropout. Finally, the network is trained for 50 epochs (with early stopping based on a validation set) and the learning rate is decayed after 10 epochs by multiplying it with a factor of 0.95 each additional epoch. |