Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

State-by-state Minimax Adaptive Estimation for Nonparametric Hidden {M}arkov Models

Authors: Luc Lehéricy

JMLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we introduce a new estimator for the emission densities of a nonparametric hidden Markov model. It is adaptive and minimax with respect to each state s regularity as opposed to globally minimax estimators, which adapt to the worst regularity among the emission densities. Our method is based on Goldenshluger and Lepski s methodology. It is computationally eﬃcient and only requires a family of preliminary estimators, without any restriction on the type of estimators considered. We present two such estimators that allow to reach minimax rates up to a logarithmic term: a spectral estimator and a least squares estimator. We show how to calibrate it in practice and assess its performance on simulations and on real data.
Researcher Affiliation	Academia	Luc Leh ericy EMAIL Laboratoire de Math ematiques d Orsay Univ. Paris-Sud, CNRS, Universit e Paris-Saclay 91405 Orsay, France
Pseudocode	Yes	Algorithm 1: Spectral estimation of the emission densities of a HMM (short version) Algorithm 2: Least squares estimation of the emission densities of a HMM Algorithm 3: Spectral estimation of the emission densities of a HMM (full version)
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described. It mentions a third-party R implementation, but not the authors' own code release.
Open Datasets	Yes	In Section 5, we apply our algorithm to two sets of GPS tracks. The ﬁrst data set contains trajectories of artisanal ﬁshers from Madagascar... The second data set contains GPS positions of Peruvian seabird, recorded with 1 second time steps... We consider the seabird data from Bertrand et al. (2015) and we focus on the tracks named cormorant d in this paper.
Dataset Splits	Yes	We use 10-fold cross validation, that is we split the sequence into 10 segments of same size I1, . . . , I10. In order to avoid interferences between samples, we prune the ends of each segment, so that the observations in each segment can be considered independent. In practice, we take a gap of 30 observations between two segments.
Hardware Specification	No	The paper extensively discusses computational complexity and time taken for different parts of the algorithms (e.g., 'a few minutes' or 'a couple of seconds' for certain operations), but it does not specify any particular hardware components like CPU models, GPU models, or memory specifications used for these computations or experiments.
Software Dependencies	No	The paper mentions 'a R implementation of the spectral algorithm' but does not specify the version of R or any other software packages used for the experiments.
Experiment Setup	Yes	We take Y = [0, 1] equipped with the Lebesgue measure. We choose the approximation spaces spanned by a trigonometric basis: PM := Span(ϕ1, . . . , ϕM) with ϕ1(x) = 1 ϕ2m(x) = 2 cos(2πmx) ϕ2m+1(x) = 2 sin(2πmx) for all x [0, 1] and m N . We will consider a hidden Markov model with K = 3 hidden states and the following parameters: Transition matrix ... Emission densities ... We generate n observations and run the spectral algorithm in order to obtain estimators for the models PM with M_min M M_max, m = 20 and r = 2 log(n)+2 log(M) , where M_min = 3 and M_max = 300. Finally, we use the state-by-state selection method to choose the ﬁnal estimator for each emission density. ... We made 300 simulations, 20 per value of n, with n taking values in {5 104, 7 104, 1 105, 1.5 105, 2.2 105, 3.5 105, 5 105, 7 105, 1 106, 1.5 106, 2.2 106, 3.5 106, 5 106, 7 106, 1 107}.