Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distributed Gibbs: A Linear-Space Sampling-Based DCOP Algorithm

Authors: Duc Thien Nguyen, William Yeoh, Hoong Chuin Lau, Roie Zivan

JAIR 2019 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results show that our algorithms can ﬁnd solutions that are better than DUCT, run faster than DUCT, and solve some large problems that DUCT failed to solve due to memory limitations. The structure of this article is as follows: In Section 2, we provide the background for DCOPs. In Section 3, we provide a brief overview of the centralized Gibbs algorithm, which we will extend, and the Distributed UCT (DUCT) algorithm. We then describe the two variants of the Distributed Gibbs algorithm in Section 4 and present the experimental results in Section 5 before concluding in Section 6.
Researcher Affiliation	Academia	Duc Thien Nguyen EMAIL School of Information Systems Singapore Management University 80 Stamford Rd., Singapore 178902 William Yeoh EMAIL Department of Computer Science and Engineering Washington University in St. Louis 1 Brookings Dr., St. Louis, MO 63130, USA Hoong Chuin Lau EMAIL School of Information Systems Singapore Management University 80 Stamford Rd., Singapore 178902 Roie Zivan EMAIL Department of Industrial Engineering and Management Ben-Gurion University of the Negev P.O.B. 653 Beer-Sheva, 8410501 Israel
Pseudocode	Yes	Algorithm 1: Gibbs(z1, . . . , zn) 1 for i = 1 to n do 2 z0 i Initialize(zi) 4 for t = 1 to T do 5 for i = 1 to n do 6 zt i Sample(P(zi \| zt 1, . . . , zt i 1, zt 1 i+1, . . . , zt 1 n ))
Open Source Code	No	We use publicly-available implementations of MGM, DUCT, and DPOP, which are all implemented on the FRODO framework (L eaut e, Ottens, & Szymanek, 2009). As DUCT was designed to solve a minimization problem, the code provided by the authors allows it to solve maximization problems by pre-processing the input ﬁles to ﬂip the signs of the utilities (e.g., from positive to negative) and adding a positive constant to them so that the smallest utility is 0. We run our experiments on a quad-core Intel Xeon 2.40 GHz E5620 CPU with 2GB of memory per run.
Open Datasets	No	We used the random graph coloring problem generator provided in the FRODO framework (L eaut e et al., 2009) to generate our problems. We varied the size of the problem by increasing the number of agents \|X\| from 18 to 29, the graph density p113 from 0.2 to 0.8 and the domain size \|Di\| of each agent xi from 5 to 20, and we chose the constraint utilities uniformly from the range (0, 10) at random if the neighboring agents have diﬀerent values and 0 if they have the same value. Figure 2 shows our results, where we varied the number of agents \|X\| in Figures 2(a), 2(b), and 2(c), the domain size \|Di\| in Figure 2(d), the density p1 in Figures 2(e) and 2(f), and the DUCT parameters and ϵ in Figure 2(g). DPOP ran out of memory for problems with 20 agents and above, and DUCT ran out of memory for problems with domain sizes 19 and 20. Overall, DPOP found better solutions (when it did not run out of memory) than PDGibbs, which found better solutions than SD-Gibbs. Both versions of D-Gibbs generally found better solutions than MGM and DUCT. The diﬀerence is clearer in Figure 2(d). PD-Gibbs found better solutions than SD-Gibbs because it was able to explore more of the search space due to its parallel sampling operations. Interestingly, Rand was quite competitive the solutions that it found have qualities that are very similar to those found by MGM. Figure 2(b) shows the simulated runtimes of DUCT and DPOP. We omit results from the other incomplete algorithms as we let them run for as long as DUCT in all our experiments. DPOP is faster than DUCT when the problems are small, and vice versa when the problems are large. The reason is because DUCT requires a reasonably large number of samples to have the necessary conﬁdence to terminate. Thus, when the problems are small, the necessary computation for all the samples is larger than solving the problem exactly with DPOP. As the problems become larger, the diﬀerence decreases. As the amount of memory used by SDand PD-Gibbs grows with the height of their pseudo-trees, we compare their memory consumption by varying the number of agents in Figure 2(c) and the density in Figure 2(f). We did not vary the domain size for this experiment because the domain size does not aﬀect the height of the pseudo-tree. The results show that both versions of Gibbs require more memory as the number of agents or the density of the problem increase. The reason is because the height of the pseudo-tree increases with these two parameters. Additionally, in both cases, SD-Gibbs require at least ﬁve times less memory than PD-Gibbs, showing that while PD-Gibbs is able to ﬁnd better solutions, it comes at a cost of increased memory consumption. Finally, the and ϵ values of DUCT correspond to its error tolerance. As those values decrease, its runtime will increase. Therefore, one can interpret Figure 2(g) as a graph that plots the quality of solutions found with increasing runtimes, since we let all algorithms except DPOP run for as long as DUCT. Not surprisingly, DUCT ﬁnds better solutions with increasing runtimes. However, interestingly, the quality of solutions found by SD-Gibbs, PD-Gibbs, MGM, and Rand remained relatively unchanged despite given more runtime, which means that they converged to their solutions very early on. On average, both SDand PDGibbs found better solutions faster (when = ϵ = 0.1) than DUCT (when = ϵ = 0.01). However, it is important to note that DUCT provides quality guarantees on its solutions found while SDand PD-Gibbs do not. We use the same sensor network coordination problem as Nguyen et al. (2012). The sensors are arranged in a grid and each sensor can move along its 2D plane or stay stationary. Thus, the values of each sensor correspond to discretized directions of movements of that sensor. For example, if a sensor has 5 possible values, then it can move in the four cardinal directions or stay stationary. Additionally, sensors are constrained with all of their neighboring sensors. We varied the size of the problem by increasing the number of sensors \|X\| in the grid, where they are arranged in a square from 3 3 (i.e., \|X\| = 9) to 10 10 (i.e., \|X\| = 100). We also varied the domain size \|Di\| of each agent xi from 5 to 19, and we chose the constraint utilities uniformly from the range [0, 10] at random. Figure 3 shows our results, where we varied the number of agents \|X\| in Figures 3(a) and 3(b), the domain size \|Di\| in Figure 3(c), and the DUCT parameters and ϵ in Figure 3(d). DPOP ran out of memory for problems with conﬁgurations larger than 5 5. However, we omitted it from Figure 3(a) so that the ﬁgure is more easily readable. DUCT ran out of memory for problems with conﬁgurations larger than 8 8. The trends in these graphs are consistent with the trends for the previous graph coloring problems except that the quality of solutions found by SDand PD-Gibbs are very similar to each other. The reason is because these problems are inherently simpler than the graph coloring problems due to the grid structure of the sensor network. The fact that both SDand PD-Gibbs found solutions with qualities that are very close to optimal (see Figure 3(c)) asserts the simplicity of this class of problems. In these sensor network problems, each agent is constrained with exactly its four neighboring sensors. Therefore, there is a locality to these interactions and backedges only constrain two agents that are at most 4 hops away on the regular edges of the pseudo-tree (i.e., the longest backedge is between an agent and its great-great-grandparent on the pseudo-tree). In contrast, there is no such structure in graph coloring problems and backedges may constrain two agents that are \|X\| 1 hops away on the regular edges of the pseudo-tree (as in the case if the constraint graph is a loop). We use the same radar coordination problem as Fioretto, Yeoh, and Pontelli (2016). The problem models a set of radars that collect real-time data on the location and importance of atmospheric phenomena. Each phenomenon is characterized by size and weight (i.e., importance). Radars have limited sensing ranges, which determine their scanning regions. The goal is to ﬁnd a radar conﬁguration that maximizes the utility associated with the scanned phenomena. The radars are arranged in a grid like in the sensor network problems and they can scan their four cardinal directions. Phenomena are randomly generated across the grid until the underlying constraint graph is connected. Each phenomena must be scanned by at least p radars, where the value of p is randomly sampled from the range [1, 4].
Dataset Splits	No	The paper mentions averaging results over "50 instances" but does not describe training/test/validation dataset splits. The problem types (graph coloring, sensor networks, radar coordination) are often solved directly rather than through machine learning models requiring explicit data splits.
Hardware Specification	Yes	We run our experiments on a quad-core Intel Xeon 2.40 GHz E5620 CPU with 2GB of memory per run.
Software Dependencies	No	We use publicly-available implementations of MGM, DUCT, and DPOP, which are all implemented on the FRODO framework (L eaut e, Ottens, & Szymanek, 2009).
Experiment Setup	Yes	For all problems, we set the DUCT parameters = ϵ = 0.05, similar to the settings used in the original article (Ottens et al., 2012) unless mentioned otherwise. We also let MGM and both versions of D-Gibbs run for as long as DUCT did for fair comparisons. Each data point is averaged over 50 instances. In the DCOP literature, the utilities of hard constraints are commonly set to so that the utility of an infeasible solution is . If we do so, the probability for an agent of either version of D-Gibbs to take on an infeasible value is e = 0 (see Equation 21). While this result may appear to be desirable at ﬁrst glance, it, unfortunately, creates the following problem: The solution space of the problem will be segmented into islands of feasible solutions that have non-zero probabilities, and these islands are separated by infeasible solutions that have zero probabilities. Consequently, it is not possible for agents to probabilistically jump from one island to another, and the space of possible solutions explored is restricted to the initial island that the agents started on (i.e., the island that contains the initial randomly assigned solution). Therefore, to overcome this limitation, we have to set the utilities of hard constraints a ﬁnite negative value Uhard. Further, to better distinguish the soft constraints from the hard ones, we scale the utilities of soft constraints by multiplying them with a scaling factor C. As these two hyper-parameters will aﬀect the quality of solutions found, we empirically evaluate both versions of D-Gibbs on a small set of problem instances14 to ﬁnd the best-performing hyper-parameters before using them on the larger sets of experiments. We tabulate the results for SD-Gibbs in Table 2. Results for PD-Gibbs are omitted as they were very similar. From the hyper-parameters that were swept, we chose to set Uhard = 1 as the utility of hard constraints and C = 10 as the scaling factor for the utilities of soft constraints for our experiments below. The reason is that that combination of hyper-parameters resulted in the largest fraction of feasible solutions found as well as the smallest average number of constraint violations.