reproducibilityindex.ai

Federated Learning with Only Positive Labels

Authors: Felix Yu, Ankit Singh Rawat, Aditya Menon, Sanjiv Kumar

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate the proposed Fed Aw S method on benchmark image classiﬁcation and extreme multi-class classiﬁcation datasets. In all experiments, both the class embedding wc s and instance embedding gθ(x) are ℓ2 normalized, as we found this slightly improves model quality. We compare the following methods in our experiments. Baseline-1: Training with only positive squared hinge loss. As expected, we observe very low precision values because the model quickly collapses to a trivial solution. Baseline-2: Training with only positive squared hinge loss with the class embeddings ﬁxed. This is a simple way of preventing the class embeddings from collapsing into a single point. Fed Aw S: Our method with stochastic negative mining (cf. Section 4.2). Softmax: An oracle method of regular training with the softmax cross-entropy loss function that has access to both positive and negative labels. 6.1. Experiments on CIFAR We ﬁrst present results on the CIFAR-10 and CIFAR100 datasets. We trained Res Nets (RESNETS) (He et al., 2016a;b) with different number of layers as the underlying model. Speciﬁcally, we train RESNET-8 and RESNET-32 for CIFAR-10; and train RESNET-32 and RESNET-56 for CIFAR-100 with the larger number of classes. From Table 1, we see that on both CIFAR-10 and CIFAR100, Fed Aw S almost matches or comes very close to the performance of the oracle method which has access to all labels.
Researcher Affiliation	Industry	Felix X. Yu 1 Ankit Singh Rawat 1 Aditya Krishna Menon 1 Sanjiv Kumar 1 1Google Research, New York. Correspondence to: Felix X. Yu <felixyu@google.com>, Ankit Singh Rawat <ankitsrawat@google.com>.
Pseudocode	Yes	Algorithm 1 Federated averaging with spreadout (Fed Aw S)
Open Source Code	No	The paper does not contain any explicit statements about the release of source code or links to a code repository.
Open Datasets	Yes	We empirically evaluate the proposed Fed Aw S method on benchmark image classiﬁcation and extreme multi-class classiﬁcation datasets. We ﬁrst present results on the CIFAR-10 and CIFAR100 datasets. ... We test the proposed approach on standard extreme multilabel classiﬁcation datasets (Varma, 2018). These datasets have a large number of classes, and therefore are a good representatives of the applications of federated learning with only positive labels. Similar to Reddi et al. (2019), because these datasets are multi-label, we uniformly sample positive labels to obtain datasets corresponding to multi-class classiﬁcation problems. The datasets and their statistics are summarized in Table 2.
Dataset Splits	No	The paper mentions 'Train Points' and 'Test Points' in Table 2 for the datasets used, but it does not explicitly specify separate validation dataset splits with percentages or sample counts for reproducibility.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions optimizers like 'SGD' and 'Adagrad' and model architectures like 'Res Nets', but it does not specify any software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup	Yes	For Fed Aw S, we use the squared hinge loss with cosine distance to deﬁne ˆRpos(Si) at the clients (cf. Algorithm 1): ℓpos(f(x), y) = max 0, 0.9 gθ(x) wy 2. ... We use a simple embedding-based classiﬁcation model wherein an instance x Rd , a high-dimensional sparse vector, is ﬁrst embedded into R512 using a linear embedding lookup followed by averaging. The vector is then passed through a three-layer neural network with layer sizes 1024, 1024 and 512, respectively. The ﬁrst two layers in the network apply a Re LU activation function. The output of the network is then normalized to obtain instance embeddings with unit ℓ2-norm. Each class is represented as a 512-dimensional normalized vector. ... SGD with a large learning rate is used to optimize the embedding layers, and Adagrad is used to update other model parameters. In each round, we randomly select 4K clients associated with 4K labels. ... There are two meta parameters in the proposed method: the learning rate multiplier of the spreadout loss λ (cf. Algorithm 1), and the number top confusing labels considered in each round k (cf. (8)). To make a fair comparison with other methods which do not have these meta parameters, in all of our other experiments in Table 3, we simply use k = 10 and λ = 10.