Decoupling Gradient-Like Learning Rules from Representations
Authors: Philip Thomas, Christoph Dann, Emma Brunskill
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Figure 1 shows the results of applying this algorithm to a fixed data set using various k. Notice that different choices of how to represent normal distributions result in wildly different outcomes. A poor choice can result in a sequence of normal distributions that takes a circuitous path to the maximum likelihood distribution and produces poorly scaled updates. As a result, a poor choice of how to represent normal distributions can result in the likelihood of the model increasing slowly (notice that using σ4, the model failed to approach the maximum log-likelihood model. ... Figure 2: Reproduction of Figure 1 using naturalized gradient descent algorithms and with the legend suppressed. |
| Researcher Affiliation | Academia | 1University of Massachusetts Amherst 2Carnegie Mellon University 3Stanford University. Correspondence to: Philip S. Thomas <pthomas@cs.umass.edu>. |
| Pseudocode | No | The paper defines algorithms and updates using mathematical equations but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | No | The paper states, "We generated a data set containing 100,000 samples from N(3, 9)" for Figure 1, indicating a custom-generated dataset. Figure 2 uses data estimated from samples, but no concrete access information (link, DOI, formal citation to a public dataset) is provided for any publicly available or open dataset. |
| Dataset Splits | No | The paper mentions generating or estimating data for figures, but does not provide specific details on train/validation/test splits (e.g., percentages, sample counts, or references to standard splits). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names with versions) needed to replicate the experiments. |
| Experiment Setup | Yes | Figure 1: "using gradient descent, θi+1 θi α L(θi)... for 200,000 iterations and starting from N(2, 4)." and "step size α := .001/n". Figure 2: "Each plot uses a fixed step size for all k, but step sizes vary between plots." and "The Fisher information matrix was estimated from 1,000 samples of x." and "using only 100 samples", "using just 5 samples". |