Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

A replica analysis of under-bagging

Authors: Takashi Takahashi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To check the validity of Claim 3, which is the most important result for characterizing the generalization performance, we briefly compare the result of numerical solution of the self-consistent equations with numerical experiments of finite-size systems. We also compare the values of (q, m, v, B) with (N−1 Pi Ec[ ˆwi(c, D)]2, N−1 Pi Ec[ ˆwi(c, D)], N−1 Pi Vc[ ˆwi(c, D)], ˆB(c, D)) obtained by numerical experiments with finite N, that is, the right-hand-side of equations in Claim 1 but with finite N. ...Figure 1 shows the comparison between the distribution in Claim 3 and the empirical distribution obtained by experiments with single realization of the training data D of finite size N = 213. ...Figure 2 shows the comparison of (q, m, v, B), obtained as the solution of the self-consistent equation, and (N−1 Pi Ec[ ˆwi(c, D)]2, N−1 Pi Ec[ ˆwi(c, D)], N−1 Pi Vc[ ˆwi(c, D)], ˆB(c, D)). ...Figure 9 shows the results of the experiment. The behavior of the F-measure and the weighting coefficients are similar to those observed for Gaussian mixture data in Section 4.1.
Researcher Affiliation Academia Takashi Takahashi Institute for Physics of Intelligence, The University of Tokyo EMAIL
Pseudocode No The paper describes methods and derivations using mathematical equations and prose but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper mentions that 'In experiments, the scikit-learn package is used (Pedregosa et al., 2011).' and 'Nelder-Mead method in Optim.jl package (Mogensen & Riseth, 2018).' These refer to third-party tools, not the authors' own source code for the methodology described in the paper. There is no explicit statement about releasing their own code or a link to a repository.
Open Datasets Yes For this, we performed binary classification with the logistic regression using the classes T-shirt/top and Shirt . Specifically, we chose the T-shirt/top class as a positive class and extracted M + = 50 data points. Then, the size of the Shirt class, which was specified as negative, was varied from M = 51 to 3000 to check the behavior of the F-measure and the weighting coefficients. The weighting coefficients were optimized using validation data with 400 images, and the F-measure was computed using test data with 1600 images.
Dataset Splits Yes Figure 9 shows the results of the experiment. The behavior of the F-measure and the weighting coefficients are similar to those observed for Gaussian mixture data in Section 4.1. The performance of SW with the naive coefficients drops rapidly with the number of excess majority samples. However, the performance of SW with the optimal coefficients is similar to that of UB. Moreover, the optimal weighting coefficients for the majority class are several orders of magnitude smaller than the naive coefficients, although the statistical fluctuation is large.
Hardware Specification No The paper mentions using the scikit-learn package for experiments but does not specify any hardware details such as GPU/CPU models, memory, or cloud computing platforms.
Software Dependencies No In experiments, the scikit-learn package is used (Pedregosa et al., 2011). ... using the Nelder-Mead method in Optim.jl package (Mogensen & Riseth, 2018). These software are mentioned by name but without specific version numbers for replication.
Experiment Setup Yes For simplicity, we fix the regularization parameter as λ = 10−4, and the variance of the noise as σ2 = 0.752. In all cases, they are in good agreement, demonstrating the validity of Claim 3 and Assumption 1. ... For simplicity, we fix the size of the overall training data as (M + + M −)/N = 1/2. ... To take the average over c, 128 independent realizations are used. ... The regularization parameter is set to λ = 10−3.