Glass-Box Program Synthesis: A Machine Learning Approach
Authors: Konstantina Christakopoulou, Adam Kalai
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments. We show that in practice, it is possible to synthesize solutions for a number of problems of interest, such as the GCD, that would be prohibitively slow with a naive approach. Also, we show that our Glass-Box Program Synthesis system (Glass PS), can improve itself even in a domain agnostic framework, where a union of grammars from various domains is considered; although better results can be achieved with domain-specific grammars.Setup. We use as an evaluation metric the fraction of test problems successfully solved. The sets of practice/train and test problems do not change throughout the experiment. |
| Researcher Affiliation | Collaboration | Konstantina Christakopoulou University of Minnesota, Twin Cities christa@cs.umn.edu Adam Tauman Kalai Microsoft Research, New England Adam.Kalai@microsoft.com |
| Pseudocode | Yes | Algorithmic Procedure To summarize our framework, a formal description of the procedure followed is given in Algorithm 1. Our system operates on iterations j = 1, . . . , T. At j round, Glass PS calls the SOLVE module, in order to attempt to solve the T1, . . . , Tn test problems, which are synthesized based on P. In SOLVE, the system generates its own practice problems P1, . . . , Pm from the CFG P, using uniform probabilities. To solve these generated practice problems, the system calls the TRAIN module. The goal of TRAIN is to a) construct a training dataset of problems solutions, so that b) the parameters Θ of the model are learned. To achieve a), the system uses the parameters of the previous round s logistic regression model, to guide how the program trees of the solution programs will be built (SEARCH). Specifically, for every node of the under construction solution program tree, features are extracted via the module FEATURIZE (discussed in the previous section), which creates the bag-of-words features φ for the problem and context. Given these features, the current learner s model is used to predict, what the probability of every rule of the solution CFG S is (inside the module LR-predict). These inferred probabilities are used as per rule weights to perform weighted sampling for which rule should be next in the candidate program tree. This is how the learning guides the search over programs. Via this procedure, for every problem, a set of candidate solution programs is constructed. Each candidate program Si is scored by the corresponding scoring program P(Si). The programs which successfully solve the respective glass-box problems, i.e., with the maximum score using shortest length to break the ties, are used to construct a (problem, context) solution rule training data set, calling again the FEATURIZE routine for the φ representation of (problem, context). This constructed training dataset is then used to learn a new logistic regression model, which will be used to find solutions for the test and challenge problems, and which will be subsequently used in the next j +1 learning round to guide the search over the candidate solution programs. Algorithm 1 Learning algorithm for glass-box synthesis procedure SOLVE(m, T1, . . . , Tn) learn & solve for i = 1 to m do: Pi GENRANDOMPRACTICE θ TRAIN(P1, P2, . . . , Pm) for i = 1 to n do: Si SEARCH(Ti, θ) return S1, S2, . . . , SN procedure TRAIN(P1, . . . , Pm) fit θ θ θ0 for j = 1 to num-epochs do for i = 1 to m do Si SEARCH(Pi, θ) Featurize and train LR on all nodes in (Pi, Si) m i=1 Update θ return θ procedure SEARCH(P, θ) solve a problem for i = 1 to num-candidates do Si NODE(P, root , θ) return Si with greatest score P(Si) procedure NODE(P, c, θ) node from prblm, context φ FEATURIZE(P, c) i weighted-sample(LR-predictθ(φ)) rule ri children = [] for j = 1 to number-of-rule-children(ri) do children.append (NODE(P, (ri, j), θ)) return new node with rule ri and children |
| Open Source Code | No | No explicit statement or link providing access to the open-source code for the described methodology. |
| Open Datasets | No | No specific link, DOI, or formal citation with authors/year is provided for the datasets used. The paper states: 'We considered 1,000 practice train problems and 100 test problems.' |
| Dataset Splits | Yes | The sets of practice/train and test problems do not change throughout the experiment. ... We considered 1,000 practice train problems and 100 test problems. We progressively performed four training rounds, as more rounds did not seem to improve learning. ... For the second setup of the all-domain experiment, we show the results in Figure 4(d), which includes the string domain making its results higher than Figure 4(a-c). |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) are mentioned for running the experiments. |
| Software Dependencies | No | The paper mentions 'python eval' and refers to Python objects, but does not provide specific version numbers for Python or any libraries. |
| Experiment Setup | Yes | Programs were limited to be at most size 20 nodes. For generating solutions, instead of generating programs independently at random (which results in a vast majority of duplicates), we generate the programs that are the most probable according to the learned parameters, using the search algorithm of Menon et al. 2013, which does not produce duplicates. Since problem generation is not a bottleneck, problems are generated more simply by sampling uniformly from the grammar P and then removing the duplicates. ... A timeout of 1 second was used per evaluation. ... Each round of training was run for 45 minutes. |