Similarity-based cooperative equilibrium

Authors: Caspar Oesterheld, Johannes Treutlein, Roger B. Grosse, Vincent Conitzer, Jakob Foerster

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also demonstrate experimentally that cooperation can be learned using simple ML methods.
Researcher Affiliation Collaboration 1FOCAL, Carnegie Mellon University 2CHAI, UC Berkeley 3Anthropic 4Vector Institute 5University of Toronto 6Institute for Ethics in AI, University of Oxford 7FLAIR, University of Oxford
Pseudocode Yes Algorithm 1 Alternating best response learning
Open Source Code No The paper does not provide an explicit statement about the public release of its source code for the methodology described, nor does it include a link to a code repository.
Open Datasets No The paper describes a custom-generated experimental environment and data (High-Dimensional Prisoner's Dilemma, construction of fC, fD, µ, and noisy diff mapping) but does not provide a public link, DOI, repository name, or citation for a publicly available dataset.
Dataset Splits No The paper describes the construction of its experimental environment and training procedures but does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing data.
Hardware Specification Yes To do so, we ran the experiment for a single random seed on an AMD Ryzen 7 PRO 4750U, a laptop CPU launched in 2020. (Note that we used the CPU not GPU (CUDA)).
Software Dependencies No The paper mentions software dependencies like 'pytorch', 'functorch', and 'Weights and Biases', but it does not provide specific version numbers for these components.
Experiment Setup Yes Throughout our experiments, our policies πθ are represented by neural networks with three fully connected hidden layers of dimensions 100, 50 and 50 with biases and Leaky Re LU activation functions. In our experiments, we maximized this by running Adam for 100 steps... The learning rate is 0.02. We ran alternating best response training for T = 1000 turns... we run gradient descent for T = 1000 steps... The learning rate γ is sampled uniformly from [0, γ = 0.00003] in each step.