reproducibilityindex.ai

Latent Programmer: Discrete Latent Codes for Program Synthesis

Authors: Joey Hong, David Dohan, Rishabh Singh, Charles Sutton, Manzil Zaheer

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the LP on two domains, demonstrating that it yields an improvement in accuracy, especially on longer programs for which search is most difficult.
Researcher Affiliation	Industry	1Google Research, Mountain View, CA, USA.
Pseudocode	Yes	Algorithm 1 Program synthesis using two-level search
Open Source Code	No	The paper does not provide a direct statement about open-sourcing the code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	The dataset used consists of 111K python examples, which consist of a docstring and corresponding code snippet, collected from Github (Wan et al., 2018).
Dataset Splits	No	The paper mentions training on 'roughly 25M tasks' and evaluating on '1K held-out ones' but does not explicitly specify a validation set or its size/percentage for reproducibility.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts) used for running the experiments.
Software Dependencies	No	The paper mentions using Python for code generation and refers to Transformers, but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	All models have an embedding size of 128 and hidden size of 512, and the attention layers consist of 3 stacked layers with 4 heads each. For the LP model, we used a latent compression factor ℓ= 2 and vocabulary size K = 40.