It’s Not What Machines Can Learn, It’s What We Cannot Teach

Authors: Gal Yehuda, Moshe Gabel, Assaf Schuster

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically explore a case study, Conjunctive Query Containment, and show how common data generation techniques generate biased datasets that lead practitioners to over-estimate model accuracy. Our results suggest that machine learning approaches that require training on a dense uniform sampling from the target distribution cannot be used to solve computationally hard problems, the reason being the difficulty of generating sufficiently large and unbiased training sets. [...] Figure 3 shows average performance during training, measured on the AUG test set: a balanced test set of 500K instances generated by applying the data augmentation procedure to a new seed set (Section 2.3). The average final accuracy after 15 million samples is 94.2% (SD 0.6%).
Researcher Affiliation Academia 1Department of Computer Science, Technion Israel Institute of Technology, Israel 2Department of Computer Science, University of Toronto, Canada. Correspondence to: Gal Yehuda <ygal@cs.technion.ac.il>, Moshe Gabel <mgabel@cs.toronto.ca>.
Pseudocode No The paper describes the model architecture and data generation process with descriptive text and a diagram (Figure 2), but it does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statement about releasing source code for the methodology, nor does it provide a link to a code repository.
Open Datasets No The paper states, "We address class imbalance by sampling (p, q) from a special distribution µ such that Pr [p q] 0.5, yet both positive and negative instances have the same structure (size, number of variables, etc.). We first generate a small seed set of query pairs by sampling from µ and labeling them using a deterministic theorem prover. We then use data augmentation to generate large training sets a common approach for this problem (Selsam et al., 2018)." While the paper describes how the dataset was generated and references a theorem prover, it does not provide concrete access information (link, DOI, repository) for the generated dataset itself.
Dataset Splits No The paper states, "learning rate was set to 0.00105 by tuning on a separate validation set." However, it does not provide specific details on the size or proportion of this validation set relative to the training data, nor does it describe the splitting methodology in a way that would allow for reproduction.
Hardware Specification Yes We used a 3.3GHz Intel i9-7900X machine with two Nvidia Ge Force GTX 1080 Ti GPUs.
Software Dependencies No The paper mentions the Adam optimizer, LSTMs with ReLU activations, and the Vampire theorem prover, but it does not specify any version numbers for these software components or libraries.
Experiment Setup Yes We trained 5 models using the Adam optimizer (Kingma & Ba, 2014), with binary cross entropy loss. We set the dimensionality of the LSTM output space to w = 256, and learning rate was set to 0.00105 by tuning on a separate validation set. Adam s hyperparameter β1 was set to 0.9 and β2 was set to 0.999. We train each model for 150 steps: in each step we generate 100K query pairs and train with mini-batch size of 500.