Long-Tailed Learning Requires Feature Learning

Authors: Thomas Laurent, James von Brecht, Xavier Bresson

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 6, we investigate empirically a few questions that we couldn t resolve analytically. In particular, our error bounds are restricted to the case in which a nearest neighbor classification rule is applied on the top of the features we provide empirical evidence in this last section that replacing the nearest neighbor classifier by a linear classifier leads to very minimal improvement.
Researcher Affiliation Academia 1 Loyola Marymount University, tlaurent@lmu.edu 2 National University of Singapore, xaviercs@nus.edu.sg
Pseudocode No The paper describes the neural network architecture textually and provides a diagram (Figure 2), but it does not include any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Codes are available at https://github.com/xbresson/Long_Tailed_Learning_Requires_Feature_Learning.
Open Datasets No The paper uses a custom-designed data model (described in Section 2) to generate synthetic data for its experiments. It does not use or provide access information for a pre-existing public dataset.
Dataset Splits No The paper specifies the generation of "A training set containing R nspl sentences" and "A test set containing 10,000 unfamiliar sentences" but does not mention a separate validation set or provide details about how the data is split for validation.
Hardware Specification No The paper states "Constructing each of these Gram matrices takes a few days on CPU" but does not specify any particular CPU model, GPU models, memory, or other detailed hardware specifications used for running the experiments.
Software Dependencies No The paper mentions using "SVC function of Scikit-learn Pedregosa et al. (2011), which itself relies on the LIBSVM library Chang & Lin (2011)". While it names the software, it does not provide specific version numbers for Scikit-learn or LIBSVM that were used in their experiments.
Experiment Setup Yes MLP 1: din = 150, dhidden = 500, dout = 10, MLP 2: din = 90, dhidden = 2000, dout = 1000... The learning rate is set to 0.01 (constant learning rate), and the batch size to 100... We chose C = 1... the parameter γ involved in the definition of the kernel was set to γ = 0.25 when n {1, 2} and to γ = 0.1 when n {3, 4, 5}.