DistrictNet: Decision-aware learning for geographical districting
Authors: Cheikh Ahmed, Alexandre Forel, Axel Parmentier, Thibaut Vidal
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our approach outperforms existing methods as it can significantly reduce costs on real-world cities. |
| Researcher Affiliation | Academia | Cheikh Ahmed Polytechnique Montreal cheikh-abdallahi.ahmed@polymtl.ca Alexandre Forel Polytechnique Montreal alexandre.forel@polymtl.ca Axel Parmentier CERMICS, École des Ponts axel.parmentier@enpc.fr Thibaut Vidal Polytechnique Montreal thibaut.vidal@polymtl.ca |
| Pseudocode | Yes | Algorithm 1 Training DISTRICTNET, Algorithm 2 Iterated local search, Algorithm 3 Local search: LS, Algorithm 4 Modified Kruskal, Algorithm 5 Greedy merging, Algorithm 6 Repair, Algorithm 7 Training instance generation, Algorithm 8 Sampling a connected subgraph of size N |
| Open Source Code | Yes | The code to reproduce all experiments presented in this paper is publicly available at https://github.com/cheikh025/District Net under an MIT license. |
| Open Datasets | Yes | Our instances include all the real-world cities in the United Kingdom used by Ferraz et al. (2024) and we extend it with additional cities in France. Hence, our test set contains seven real-world cities in the United Kingdom and France (Bristol, Leeds, London, Lyon, Manchester, Marseilles, and Paris), which contain between 120 and 983 BUs. Our goal is to provide high-quality solutions for several values of the target district size t. This target size sets the bounds (d, d) on the district size t 20% and the target number of districts as k = N/t . [...] To assemble a large and diverse training set, we generate new cities by perturbing real-world ones. First, we read the geographical data of 27 real-world cities in England (excluding the ones from the test set). From these initial cities, we generate n = 100 random connected subgraphs of size N = 30 BUs and sample the population of each BU according to a normal distribution N(8 000, 2 000) truncated between 5 000 and 20 000. For each instance xi, we compute its optimal solution for the target size t = 3 by fully enumerating the possible districting solutions and evaluating their costs. This procedure generates our training set of n = 100 instances and associated solutions. [...] The geographical data including the boundaries of each city and BUs can be accessed at Uber Movement: https://movement.uber.com/. Additionally, census data is available at the Office for National Statistics (ONS) website: ONS. For the French cities, geographical data including the boundaries were obtained from Opendatasoft. Population data for these regions was sourced from the National Institute of Statistics and Economic Studies (INSEE) available at INSEE. |
| Dataset Splits | No | We perform a small experiment to investigate the in-distribution generalization ability of the different methods. That is, we study the setting where the training and test instances are sampled from the same distribution. We generate 200 instances following the procedure described in Appendix B.2 and split this into two sets of 100 instances. The first set is used to train all the methods and the second is used to evaluate them out-of-sample. |
| Hardware Specification | Yes | All experiments are run on a computing grid. Each experiment is run on two cores of an AMD Rome 7532 with 2.4 GHz and is allocated 16 GB RAM. |
| Software Dependencies | No | Our experiments are implemented in Julia (Bezanson et al., 2017) except for the district evaluation methods, which are taken from Ferraz et al. (2024) and implemented in C++. All experiments are run on a computing grid. [...] Each TSP is solved using the Lin-Kernighan-Helsgaun (LKH) algorithm (Lin and Kernighan, 1973; Helsgaun, 2017). We use the publically available implementation from: http://akira.ruc.dk/~keld/research/LKH-3/. |
| Experiment Setup | Yes | Architecture. The GNN of DISTRICTNET uses three graph convolution layers, each with a hidden size of 64 and Leaky Re LU activation functions. The DNN section uses three dense layers: two layers mapping with 64 inputs to 64 outputs and then one layer with an output of size 32. All three layers use Leaky Re LU activations. Finally, the final layer converts the latent 32-dimension vector into a one-dimensional output. For PREDGNN, we maintain the structure proposed by Ferraz et al. (2024), with the exception that we replace the Structure2vec layers with Graph Conv layers. The update rule for the Graph Conv layers is xt+1 i = Re LU(W1xt i + P j N(i) W2xt j), where W1 and W2 are the weight matrices and N(i) is the set of neighbors for node i. In contrast, the Structure2vec update rule applied in the original model is xt+1 i = Re LU(W1xt i + W2 P j N(i) xt j). This is a minor modification and should not alter its performance, as also stated by Ferraz et al. (2024). The architecture of PREDGNN thus consists of four Graph Conv layers. Each layer has a hidden size of 64 units and uses Re LU activation functions. The final layer produces an output of size 1028. We then aggregate these node embeddings to form a global graph embedding. This global embedding is initially processed by a feedforward layer, resulting in a 100-dimensional vector. A second and final layer reduces this to a single output, representing the cost of the district. Hyperparameters. The only hyperparameter of BD and FIG is the number of samples used in the Monte Carlo approximation of the average distance to requests. We use 100 scenarios in all our experiments. PREDGNN uses a batch size of |B| = 64, a learning rate of 1e 4, and train the model for 104 epochs. We further set a time limit of 24 hours for training and stop the process early if no significant change of the loss (i.e., greater than 1e 4) is observed in the last 1 000 epochs. DISTRICTNET uses a batch size of |B| = 1 and a learning rate with initial value of 1e 3 with a decay rate of 0.9 applied every 10 epochs and a minimum rate of 1e 4. The model is trained for 100 epochs. The target CMST solution in Equation (5) is constructed using a single observation of our random constructor (Kruskal with random weights). The perturbation Z is set to a multivariate standard Gaussian and we M = 20 samples to approximate the expected gradient in Equation (8). The randomized target constructor uses 1 000 samples. |