Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Non-Asymptotic Guarantees for Robust Statistical Learning under Infinite Variance Assumption

Authors: Lihu Xu, Fang Yao, Qiuran Yao, Huiming Zhang

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Simulations and real data analysis demonstrate the robustness of log-truncated estimations over standard estimations. ... Section 5 includes simulations and real data analysis, which evaluate the effectiveness of the proposed log-truncated estimation for some regressions discussed in Section 3.
Researcher Affiliation Academia Lihu Xu EMAIL Department of Mathematics University of Macau, Taipa Macau, China... Fang Yao EMAIL Department of Probability & Statistics; Center for Statistical Science Peking University, Beijing, China... Qiuran Yao EMAIL Department of Mathematics University of Macau, Taipa Macau, China... Huiming Zhang EMAIL Institute of Artificial Intelligence Beihang University, Beijing, China...
Pseudocode Yes Let us consider a regularized optimization with a given penalty function Ω(θ): ˆθn(α, ρ) := arg min θ Θ { ˆRψλ,l,α(θ) + Ω(θ)}, (27) ... In practice, this optimization problem is solved by stochastic gradient descent (SGD) as the following: θt+1 = θt − rt ∂ ∂θ{ψλ[αl(Yit, Xit, θt)] + Ω(θt)}, t = 0, 1, 2, ...
Open Source Code No The paper does not provide an explicit statement about the release of source code for the methodology described, nor does it include a link to a code repository. It mentions using third-party libraries like PyTorch but not its own implementation code.
Open Datasets Yes Empirical studies, including Boston housing and MNIST datasets are performed well by the proposed robust DNN regression models. ... We use the Boston housing dataset provided by the python library Scikit-Learn to learn the log-truncated standard and deep LAD models. ... We use a handwritten digits database MNIST to learn a 6-layers elastic net penalized DNN LAD model...
Dataset Splits Yes Boston housing dataset: In our experiment, n1 = 339 samples are randomly selected for training and validation, and the remaining n2 = 167 samples are the testing set... We use 4/5 of the training samples to train the log-truncated standard LAD model (DNN model), then we select the optimal parameters on the remaining 1/5 of the training set... MNIST database: We randomly split the 70000 images into three groups: the validation set (10000 images); training set (50000 images) and testing set (10000 images) respectively.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It mentions using SGD and Adam optimizers, but not the underlying hardware.
Software Dependencies No The paper mentions 'Adam optimization algorithm in Py Torch' but does not specify any version numbers for PyTorch or other software dependencies, which is necessary for reproducibility.
Experiment Setup Yes For ℓ2-regularization, we employ five-fold cross validation (CV) method to find the optimal parameter pair (α, ρ)... For the elastic net regularized DNN model, we select the optimal parameters (α, β, γ)... When {ξi}n i=1 are Pareto noises, we choose β = 1.5 as τ ∈ {1.6, 1.8} and β = 2.0 as τ ∈ {2.01, 4.01, 6.01}. ... We use the Adam optimization algorithm in Py Torch as implement with n/4 batch size in each case. ... The batch size is 64. ... We consider ten values of β ∈ (1, 2] with β = 1.1, 1.2, ..., 1.