reproducibilityindex.ai

Learning to Learn Gradient Aggregation by Gradient Descent

Authors: Jinlong Ji, Xuhui Chen, Qianlong Wang, Lixing Yu, Pan Li

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate the performance of the proposed RNN aggregator and RNN aggregator with additional loss information (ARNN aggregator).
Researcher Affiliation	Academia	Jinlong Ji1 , Xuhui Chen1,2 , Qianlong Wang1 , Lixing Yu1 and Pan Li1 1Case Western Reserve University 2Kent State University {jxj405, qxw204, lxy257, pxl288}@case.edu, xchen2@kent.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide concrete access to source code (e.g., a specific repository link or an explicit code release statement).
Open Datasets	Yes	We conduct experiments on two image classiﬁcation tasks: handwritten digits classiﬁcation on MNIST dataset and object recognition on CIFAR-10 dataset.
Dataset Splits	No	We pick the best aggregator according to its performance on validation data and report its performance. The paper mentions "validation data" but does not specify the dataset split sizes or methodology for training/validation/test splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	In all experiments, both the RNN aggregators and ARNN aggregators use two-layer LSTMs (or GRUs) with 20 hidden units in each layer and they are trained with BPFT. The optimization process is performed by using ADAM. The paper mentions software components but does not provide specific version numbers for reproducibility.
Experiment Setup	Yes	In all experiments, both the RNN aggregators and ARNN aggregators use two-layer LSTMs (or GRUs) with 20 hidden units in each layer and they are trained with BPFT. The optimization process is performed by using ADAM. In addition, we use early stopping to avoid overﬁtting while training the aggregators. After some ﬁxed number of learning steps, we freeze the aggregator s parameters and evaluate its performance. We pick the best aggregator according to its performance on validation data and report its performance. In the MNIST task, the base network in each worker is multiple layer perceptrons (MLP) with one hidden layer of 20 units using a sigmoid activation function. Meanwhile, in the CIFAR-10 task, each worker holds a model, including two convolutional layers with max pooling followed by a fully-connected layer.