Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Authors: Mohammad Ali Alomrani, Reza Moravej, Elias Boutros Khalil

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data. We devise a set of neural network architectures, design feature representations, and empirically evaluate them across two online matching problems: Edge-Weighted Online Bipartite Matching and Online Submodular Bipartite Matching. We show that most of the learning approaches perform consistently better than classical baseline algorithms on four synthetic and real-world datasets.
Researcher Affiliation Academia Mohammad Ali Alomrani EMAIL Department of Electrical & Computer Engineering University of Toronto Reza Moravej EMAIL Department of Mechanical & Industrial Engineering University of Toronto Elias B. Khalil EMAIL Department of Mechanical & Industrial Engineering SCALE AI Research Chair in Data-Driven Algorithms for Modern Supply Chains University of Toronto
Pseudocode Yes Algorithm 1 greedy-rt; Algorithm 2 greedy-t; Algorithm 3 Graph Generation
Open Source Code Yes Our code is publicly available at https://github.com/lyeskhalil/CORL.
Open Datasets Yes We train and test our models across two synthetically generated datasets from the Erdos-Renyi (ER) (Erdos & Renyi, 1960) and Barabasi-Albert (BA) (Albert & Barabรกsi, 2002) graph families. In addition, we use two datasets generated from real-world base graphs. The g Mission base graph (Chen et al., 2014) comes from crowdsourcing data for assigning workers to tasks. We also use Movie Lens (Harper & Konstan, 2015), which is derived from data on users ratings of movies based on Dickerson et al. (2019).
Dataset Splits Yes We tune 4 training hyperparameters for each RL model using a held-out validation set of size 1000. ... We train our models for 300 epochs on datasets of 20000 instances
Hardware Specification Yes Training often takes less than 6 hours on a NVIDIA v100 GPU.
Software Dependencies Yes All environments are implemented in Pytorch (Paszke et al., 2019). We use Network X (Hagberg et al., 2008) to generate synthetic graphs and find optimal solutions for E-OBM problems. Optimal solutions for OSBM problems are found using Gurobi (Gurobi Optimization, LLC, 2021); see Appendix D. Pytorch Geometric (Fey & Lenssen, 2019) is used for handling graphs during training and implementing graph encoders.
Experiment Setup Yes We train our models for 300 epochs on datasets of 20000 instances using the Adam optimizer (Kingma & Ba, 2015). We use a batch size of 200 except for Movie Lens, where batch size 100 is used on graphs bigger than 10x60 due to memory constraints. ... The ff and ff-hist models have 3 hidden layers with 100 neurons each. inv-ff and inv-ff-hist have 2 hidden layers of size 100. The gnn-hist s decoder feed-forward neural network has 2 hidden layers of size 200 and the encoder uses embedding dimension 30 with one embedding layer. All models use the Re LU non-linearity.