Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Deep Policies for Online Bipartite Matching: A Reinforcement Learning Approach

Authors: Mohammad Ali Alomrani, Reza Moravej, Elias Boutros Khalil

TMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present an end-to-end Reinforcement Learning framework for deriving better matching policies based on trial-and-error on historical data. We devise a set of neural network architectures, design feature representations, and empirically evaluate them across two online matching problems: Edge-Weighted Online Bipartite Matching and Online Submodular Bipartite Matching. We show that most of the learning approaches perform consistently better than classical baseline algorithms on four synthetic and real-world datasets.
Researcher Affiliation Academia Mohammad Ali Alomrani EMAIL Department of Electrical & Computer Engineering University of Toronto Reza Moravej EMAIL Department of Mechanical & Industrial Engineering University of Toronto Elias B. Khalil EMAIL Department of Mechanical & Industrial Engineering SCALE AI Research Chair in Data-Driven Algorithms for Modern Supply Chains University of Toronto
Pseudocode Yes Algorithm 1 greedy-rt; Algorithm 2 greedy-t; Algorithm 3 Graph Generation
Open Source Code Yes Our code is publicly available at https://github.com/lyeskhalil/CORL.
Open Datasets Yes We train and test our models across two synthetically generated datasets from the Erdos-Renyi (ER) (Erdos & Renyi, 1960) and Barabasi-Albert (BA) (Albert & Barabรกsi, 2002) graph families. In addition, we use two datasets generated from real-world base graphs. The g Mission base graph (Chen et al., 2014) comes from crowdsourcing data for assigning workers to tasks. We also use Movie Lens (Harper & Konstan, 2015), which is derived from data on users ratings of movies based on Dickerson et al. (2019).
Dataset Splits Yes We tune 4 training hyperparameters for each RL model using a held-out validation set of size 1000. ... We train our models for 300 epochs on datasets of 20000 instances
Hardware Specification Yes Training often takes less than 6 hours on a NVIDIA v100 GPU.
Software Dependencies Yes All environments are implemented in Pytorch (Paszke et al., 2019). We use Network X (Hagberg et al., 2008) to generate synthetic graphs and find optimal solutions for E-OBM problems. Optimal solutions for OSBM problems are found using Gurobi (Gurobi Optimization, LLC, 2021); see Appendix D. Pytorch Geometric (Fey & Lenssen, 2019) is used for handling graphs during training and implementing graph encoders.
Experiment Setup Yes We train our models for 300 epochs on datasets of 20000 instances using the Adam optimizer (Kingma & Ba, 2015). We use a batch size of 200 except for Movie Lens, where batch size 100 is used on graphs bigger than 10x60 due to memory constraints. ... The ff and ff-hist models have 3 hidden layers with 100 neurons each. inv-ff and inv-ff-hist have 2 hidden layers of size 100. The gnn-hist s decoder feed-forward neural network has 2 hidden layers of size 200 and the encoder uses embedding dimension 30 with one embedding layer. All models use the Re LU non-linearity.