Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach
Authors: Mohammad Ali Alomrani, Reza Moravej, Elias Boutros Khalil
TMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data. We devise a set of neural network architectures, design feature representations, and empirically evaluate them across two online matching problems: Edge-Weighted Online Bipartite Matching and Online Submodular Bipartite Matching. We show that most of the learning approaches perform consistently better than classical baseline algorithms on four synthetic and real-world datasets. |
| Researcher Affiliation | Academia | Mohammad Ali Alomrani EMAIL Department of Electrical & Computer Engineering University of Toronto Reza Moravej EMAIL Department of Mechanical & Industrial Engineering University of Toronto Elias B. Khalil EMAIL Department of Mechanical & Industrial Engineering SCALE AI Research Chair in Data-Driven Algorithms for Modern Supply Chains University of Toronto |
| Pseudocode | Yes | Algorithm 1 greedy-rt; Algorithm 2 greedy-t; Algorithm 3 Graph Generation |
| Open Source Code | Yes | Our code is publicly available at https://github.com/lyeskhalil/CORL. |
| Open Datasets | Yes | We train and test our models across two synthetically generated datasets from the Erdos-Renyi (ER) (Erdos & Renyi, 1960) and Barabasi-Albert (BA) (Albert & Barabรกsi, 2002) graph families. In addition, we use two datasets generated from real-world base graphs. The g Mission base graph (Chen et al., 2014) comes from crowdsourcing data for assigning workers to tasks. We also use Movie Lens (Harper & Konstan, 2015), which is derived from data on users ratings of movies based on Dickerson et al. (2019). |
| Dataset Splits | Yes | We tune 4 training hyperparameters for each RL model using a held-out validation set of size 1000. ... We train our models for 300 epochs on datasets of 20000 instances |
| Hardware Specification | Yes | Training often takes less than 6 hours on a NVIDIA v100 GPU. |
| Software Dependencies | Yes | All environments are implemented in Pytorch (Paszke et al., 2019). We use Network X (Hagberg et al., 2008) to generate synthetic graphs and find optimal solutions for E-OBM problems. Optimal solutions for OSBM problems are found using Gurobi (Gurobi Optimization, LLC, 2021); see Appendix D. Pytorch Geometric (Fey & Lenssen, 2019) is used for handling graphs during training and implementing graph encoders. |
| Experiment Setup | Yes | We train our models for 300 epochs on datasets of 20000 instances using the Adam optimizer (Kingma & Ba, 2015). We use a batch size of 200 except for Movie Lens, where batch size 100 is used on graphs bigger than 10x60 due to memory constraints. ... The ff and ff-hist models have 3 hidden layers with 100 neurons each. inv-ff and inv-ff-hist have 2 hidden layers of size 100. The gnn-hist s decoder feed-forward neural network has 2 hidden layers of size 200 and the encoder uses embedding dimension 30 with one embedding layer. All models use the Re LU non-linearity. |