reproducibilityindex.ai

Maximum Likelihood Estimation for Learning Populations of Parameters

Authors: Ramya Korlakai Vinayak, Weihao Kong, Gregory Valiant, Sham Kakade

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Numerical Experiments Recall that the MLE (Equation 2) is a convex optimization problem, ˆPmle arg max Q D s=0 hobs s log EQ[hs], where D is the set of all distributions on [0, 1]. We discretize the interval [0, 1] into a uniform grid of width 1 m. Note that as long as the error due to discretization O( 1 m) is smaller than the expected error in earth mover s distance (EMD), we will not be losing much numerically. Unless otherwise speciﬁed, we use grid length of m = 1000. The discretized set ˆD can then be written as ˆDm := q Rm+1 : q 0, 1 q = 1 . We then solve the MLE which is convex on this discrete convex set using cvx (Grant & Boyd, 2014; 2008) for Matlab R .
Researcher Affiliation	Academia	1Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle 2Department of Computer Science, Stanford University, Stanford. Correspondence to: Ramya Korlakai Vinayak <ramya@cs.washington.edu>.
Pseudocode	No	The paper describes mathematical derivations and theoretical proofs but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	We ran the MLE on two real datasets used in (Tian et al., 2017): (1) A dataset on political leanings of counties in the US with data on whether a county leaned Democratic or Republican for N = 3116 counties in t = 8 presidential elections from 1976 to 2004. (2) A dataset of delays of ﬂights with N = 25, 156 ﬂights.
Dataset Splits	No	The paper focuses on estimating a population distribution from observed data rather than training a predictive model on explicit dataset splits. It describes the number of individuals (N) and observations per individual (t) but does not provide information on training, validation, or test dataset splits in the conventional machine learning sense.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'cvx' and 'Matlab R' for solving the MLE, but it does not specify version numbers for these software components.
Experiment Setup	Yes	We use grid length of m = 1000. With the population size N = 1e6, we vary t from 2 to 12. For t = 10, we vary the population size N from 10 to 108 in multiples of 10. For N = 1e6, we vary the number of tosses t from 2 to 10 in steps of two and then t = [50, 100, 500, 1000] to illustrate the performance of the MLE as t varies widely.