Universal Sample Coding
Authors: Szymon Kobus, Tze-Yang Tung, Deniz Gunduz
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To corroborate the claims about universal sample coding made in Theorem 5.1, we compare the KL-divergence and total communication cost between the theory and numerical evaluations for different number of samples and dimensions. Each experiment is repeated 1000 times, where a probability distribution P is sampled from a k-dimensional Dirichlet distribution with concentration parameters α Rk, αi = 1. Then, n samples are communicated using universal sample coding (Algorithm 1). We consider ordered random coding (Theis and Yosri, 2022), shown in Appendix C, as the underlying sample communication method (line 7, Algorithm 1). In Figure 3, we observe that the measured KL-divergence DKL(P ˆQ) between the true probability P and the estimate ˆQ from the estimator in Lemma 5.2 follows the predicted values across a range of communicated samples. In Figure 4, we fix the number of samples to n = 128, but vary the dimension k, and show that the KL-divergence lies below the bound from Lemma 5.2. In Figure 5, we plot the total communication cost for various number of samples, which is contained between the asymptotic upper and lower bounds, Vk(c) log(n) (Theorem 5.1) and k 1 2 log(n) (Theorem 5.3), respectively. To show the efficacy of the universal sample coding in practice, we first apply it to the Federated Probabilistic Mask Training (Fed PM) algorithm proposed in Isik et al. (2024). |
| Researcher Affiliation | Collaboration | Szymon Kobus Dep. of Electrical and Electronic Engineering Imperial College London szymon.kobus17@imperial.ac.uk Tze-Yang Tung Not Diamond tze-yang@notdiamond.ai Deniz Gündüz Department of Electrical and Electronic Engineering Imperial College London d.gunduz@imperial.ac.uk |
| Pseudocode | Yes | Algorithm 1 Universal Sample Coding Input: P, n 1 Parameter: c > 0 1: Send sample X1 P. 2: Let i = 2, G(1) = 1. 3: while i log1+c(n) + 1 do 4: Let G(i) = (1 + c)i 1 5: Let Qi = ˆQi(XG(i 1)) be probability estimated with all previous samples (estimator Lemma 5.2). 6: Let g(i) = G(i) G(i 1). 7: Jointly send g(i) samples XG(i) G(i 1)+1 P g(i) using Q g(i) i (Eqn. (10)). 8: i = i + 1 9: end while |
| Open Source Code | No | Code implementing Universal Sample Coding, as well as the large language model experiments, is released. Code for federated learning is based on a yet-unreleased codebase shared by the authors of Isik et al. (2024). We are in ongoing discussions with the authors on the best way to release this code, but it will not be publicly available at the time of publication. |
| Open Datasets | Yes | For the experiments, we follow the same setup as Isik et al. (2024) using their CONV-6 architecture on classification of CIFAR-10 images, where the data is distributed among 10 clients. |
| Dataset Splits | No | The paper discusses training and testing, but it does not explicitly specify the use of a validation set or provide details on how data was split for validation purposes (e.g., percentages, sample counts). |
| Hardware Specification | Yes | The FL training was conducted on two Nvidia RTX 3090 GPUs, each with 25 GB of memory, with each experiment taking approximately 12 hours. The experiments were performed on 4 Nvidia RTX A6000 GPUs with 48GB of memory and Py Torch version 2.1, totalling 10 hours of wall-clock-time. |
| Software Dependencies | No | The paper mentions "Py Torch version 2.1" but does not provide specific version numbers for other software dependencies or libraries used in the experiments to ensure full reproducibility. |
| Experiment Setup | Yes | For the experiments, we follow the same setup as Isik et al. (2024) using their CONV-6 architecture on classification of CIFAR-10 images, where the data is distributed among 10 clients. A challenging scenario considered in that work is of congested clients, where only a fraction of clients contribute in each learning round. The results of our experiments are summarized in Table 2. The FL training was conducted on two Nvidia RTX 3090 GPUs, each with 25 GB of memory, with each experiment taking approximately 12 hours. The training, including preliminary experiments, took 30 days of gpu-time. ... To incorporate the prior θ into the proposed coding scheme, we use a Bayesian estimator. Although the mask is binary, we describe a general case with k possible outcomes. Let ω Pk be the prior probability, and ω be a random variable distributed according to the Dirichlet distribution with parameters α Rk, for j {0, . . . , k 1}: αj = µωj + PG l=1 1 {l-th sample = j} , where 1{ } is the indicator function, G is the number of samples communicated so far, and µ R+ is a hyperparameter controlling the reliance on the prior. |