reproducibilityindex.ai

2Direction: Theoretically Faster Distributed Training with Bidirectional Communication Compression

Authors: Alexander Tyurin, Peter Richtarik

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, our theoretical findings are corroborated by experimental evidence.
Researcher Affiliation	Academia	Alexander Tyurin KAUST Saudi Arabia alexandertiurin@gmail.com Peter Richt arik KAUST Saudi Arabia richtarik@gmail.com
Pseudocode	Yes	Algorithm 1 2Direction: A Fast Gradient Method Supporting Bidirectional Compression
Open Source Code	No	The paper does not provide any explicit statement about releasing code, nor does it include a link to a code repository.
Open Datasets	Yes	The experiments were implemented in Python 3.7.9. The distributed environment was emulated on machines with Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz. In each plot we show the relations between the total number of coordinates transmitted from and to the server and function values. The parameters of the algorithms are taken as suggested by the corresponding theory, except for the stepsizes that we fine-tune from a set {2i \| i [ 20, 20]}. For 2Direction, we use parameters from Theorem 5.2 and finetune the step size L. We solve the logistic regression problem: fi(x1, . . . , xc) := 1 m P m j=1 log(1 + exp( yijaij , x)) − log(exp( yijaij , x)) + log P c y=1 exp a ijxy , where x1, . . . , xc Rd, c is the number of unique labels, aij Rd is a feature of a sample on the ith worker, yij is a corresponding label and m is the number of samples located on the ith worker. The Rand K compressor is used to compress information from the workers to the server, the Top K compressor is used to compress information from the server to the workers. The performance of algorithms is compared on CIFAR10 (Krizhevsky et al., 2009) (# of features = 3072, # of samples equals 50,000), and real-sim (# of features = 20958, # of samples equals 72,309) datasets.
Dataset Splits	No	The paper mentions the use of CIFAR10 and real-sim datasets but does not specify any training, validation, or test split percentages or sample counts.
Hardware Specification	Yes	The distributed environment was emulated on machines with Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz.
Software Dependencies	Yes	The experiments were implemented in Python 3.7.9.
Experiment Setup	Yes	The parameters of the algorithms are taken as suggested by the corresponding theory, except for the stepsizes that we fine-tune from a set {2i \| i [ 20, 20]}. For 2Direction, we use parameters from Theorem 5.2 and finetune the step size L.