reproducibilityindex.ai

A Variant of Concurrent Constraint Programming on GPU

Authors: Pierre Talbot, Frédéric G Pinel, Pascal Bouvry3830-3839

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that TURBO obtains slightly better performance when compared to GECODE (Schulte, Tack, and Lagerkvist 2020), a parallel CPU-based solver. TURBO is only a proof of concept and several key components of modern constraint solvers are missing, in particular global constraints and constraint learning. Nevertheless, constraint decomposition of global constraints is an efficient approach that can be immediately used in TURBO (Narodytska 2011; Bessiere et al. 2009; Schutt et al. 2009). Still, it is clear that many improvements will be required to make TURBO a competitive solver. Evaluation We evaluate TURBO5 on the RCPSP problem introduced above.
Researcher Affiliation	Academia	Pierre Talbot, Fr ed eric Pinel, Pascal Bouvry Interdisciplinary Centre for Security, Reliability and Trust (Sn T) University of Luxembourg Esch-sur-Alzette L-4243, Luxembourg pierre.talbot@uni.lu, frederic.pinel@uni.lu, pascal.bouvry@uni.lu
Pseudocode	Yes	We define the propagation loop as a function void propagation(int tid, int stride, VStore& s, Vector<Propagator*>& props) where tid is a unique thread identifier, stride is the number of threads in the block, s is a store of interval variables and props is the array of propagators (which are PCCP processes): 1 __shared__ bool has_changed[3] = {true}; 2 for(int i = 1; !s.is_top() && has_changed[(i-1)%3]; ++i) { 3 for (int t = tid; t < props.size(); t += stride){ 4 if(props[t]->propagate(s)) 5 has_changed[i%3] = true; } 6 has_changed[(i+1)%3] = false; 7 __syncthreads(); 8 }
Open Source Code	Yes	Replicate: the version of TURBO used in this paper is available at https://github.com/ptal/turbo/tree/aaai2022.
Open Datasets	Yes	We experiment on 2 data sets: Patterson (Patterson 1984) (timeout of 5 minutes) which has instances with various numbers of tasks and resources, and j30 from PSPSLIB (Kolisch and Sprecher 1997) (timeout of 30 seconds) with 30 tasks and 4 resources.
Dataset Splits	No	The paper mentions using two datasets for experiments, Patterson and j30, but does not provide specific details on how these datasets were split into training, validation, and test sets. It does not specify percentages, sample counts, or refer to standard splits.
Hardware Specification	Yes	To illustrate our explanations, we take the example of the laptop GPU NVIDIA QUADRO RTX 5000 MAX-Q, which we later use to perform our experiments. It has 48 streaming multiprocessors (SMs), each with 64KB of L1 cache and 64 cores. There is a total of 48 * 64 = 3072 cores. We compare TURBO with the well-known constraint solver GECODE 6.2.0 (Schulte, Tack, and Lagerkvist 2020) in parallel mode on a processor i7-10750@2.60GHz with 6 cores and 12 threads.
Software Dependencies	Yes	We compare TURBO with the well-known constraint solver GECODE 6.2.0 (Schulte, Tack, and Lagerkvist 2020) in parallel mode on a processor i7-10750@2.60GHz with 6 cores and 12 threads.
Experiment Setup	Yes	To illustrate our explanations, we take the example of the laptop GPU NVIDIA QUADRO RTX 5000 MAX-Q... When programming in CUDA, we group parallel processes in blocks that are executed on a single SM, and have their own intra-block synchronization primitives... we have identified that multiplying by 4 the SMs to obtain 192 blocks and cores on SMs to obtain 256 threads yield the best efficiency on this particular GPU model. The constraint model and search strategy are the same for both solvers.