reproducibilityindex.ai

Decoupling Gradient-Like Learning Rules from Representations

Authors: Philip Thomas, Christoph Dann, Emma Brunskill

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1 shows the results of applying this algorithm to a fixed data set using various k. Notice that different choices of how to represent normal distributions result in wildly different outcomes. A poor choice can result in a sequence of normal distributions that takes a circuitous path to the maximum likelihood distribution and produces poorly scaled updates. As a result, a poor choice of how to represent normal distributions can result in the likelihood of the model increasing slowly (notice that using σ4, the model failed to approach the maximum log-likelihood model. ... Figure 2: Reproduction of Figure 1 using naturalized gradient descent algorithms and with the legend suppressed.
Researcher Affiliation	Academia	1University of Massachusetts Amherst 2Carnegie Mellon University 3Stanford University. Correspondence to: Philip S. Thomas <pthomas@cs.umass.edu>.
Pseudocode	No	The paper defines algorithms and updates using mathematical equations but does not present them in a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described.
Open Datasets	No	The paper states, "We generated a data set containing 100,000 samples from N(3, 9)" for Figure 1, indicating a custom-generated dataset. Figure 2 uses data estimated from samples, but no concrete access information (link, DOI, formal citation to a public dataset) is provided for any publicly available or open dataset.
Dataset Splits	No	The paper mentions generating or estimating data for figures, but does not provide specific details on train/validation/test splits (e.g., percentages, sample counts, or references to standard splits).
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library names with versions) needed to replicate the experiments.
Experiment Setup	Yes	Figure 1: "using gradient descent, θi+1 θi α L(θi)... for 200,000 iterations and starting from N(2, 4)." and "step size α := .001/n". Figure 2: "Each plot uses a fixed step size for all k, but step sizes vary between plots." and "The Fisher information matrix was estimated from 1,000 samples of x." and "using only 100 samples", "using just 5 samples".