Understanding Robust Generalization in Learning Regular Languages
Authors: Soham Dan, Osbert Bastani, Dan Roth
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments, we implement the compositional strategy via an auxiliary task where the goal is to predict the intermediate states visited by the DFA when parsing a string. Our empirical results support our hypothesis, showing that auxiliary tasks can enable robust generalization. |
| Researcher Affiliation | Academia | 1Department of Computer and Information Science, University of Pennsylvania. |
| Pseudocode | No | The paper describes algorithms and methods but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing open-source code or provide links to a code repository. |
| Open Datasets | No | The paper describes a process for synthetically generating data using edge Markov chains based on DFAs (e.g., 'We construct an edge Markov chain e MCid to generate training examples...'), but it does not use or provide concrete access information for a publicly available, pre-existing dataset. |
| Dataset Splits | Yes | We use N + train = 1600 positive and N train = 1600 negative train examples, N + dev = N dev = 200 dev examples, and use N + test = 2000 positive and N test = 2000 negative examples for each of the i.d. and o.o.d. test sets. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'RNN with LSTM cells' and 'stochastic gradient descent (SGD)' but does not specify any software platforms (e.g., PyTorch, TensorFlow) or library versions with specific version numbers. |
| Experiment Setup | Yes | We use an RNN with LSTM cells, with an embedding dimension of 50 and a hidden layer with dimension 50, optimized using stochastic gradient descent (SGD) with a learning rate of 0.01 2. |