Similarity-based cooperative equilibrium
Authors: Caspar Oesterheld, Johannes Treutlein, Roger B. Grosse, Vincent Conitzer, Jakob Foerster
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also demonstrate experimentally that cooperation can be learned using simple ML methods. |
| Researcher Affiliation | Collaboration | 1FOCAL, Carnegie Mellon University 2CHAI, UC Berkeley 3Anthropic 4Vector Institute 5University of Toronto 6Institute for Ethics in AI, University of Oxford 7FLAIR, University of Oxford |
| Pseudocode | Yes | Algorithm 1 Alternating best response learning |
| Open Source Code | No | The paper does not provide an explicit statement about the public release of its source code for the methodology described, nor does it include a link to a code repository. |
| Open Datasets | No | The paper describes a custom-generated experimental environment and data (High-Dimensional Prisoner's Dilemma, construction of fC, fD, µ, and noisy diff mapping) but does not provide a public link, DOI, repository name, or citation for a publicly available dataset. |
| Dataset Splits | No | The paper describes the construction of its experimental environment and training procedures but does not provide specific dataset split information (e.g., percentages, sample counts, or citations to predefined splits) for training, validation, or testing data. |
| Hardware Specification | Yes | To do so, we ran the experiment for a single random seed on an AMD Ryzen 7 PRO 4750U, a laptop CPU launched in 2020. (Note that we used the CPU not GPU (CUDA)). |
| Software Dependencies | No | The paper mentions software dependencies like 'pytorch', 'functorch', and 'Weights and Biases', but it does not provide specific version numbers for these components. |
| Experiment Setup | Yes | Throughout our experiments, our policies πθ are represented by neural networks with three fully connected hidden layers of dimensions 100, 50 and 50 with biases and Leaky Re LU activation functions. In our experiments, we maximized this by running Adam for 100 steps... The learning rate is 0.02. We ran alternating best response training for T = 1000 turns... we run gradient descent for T = 1000 steps... The learning rate γ is sampled uniformly from [0, γ = 0.00003] in each step. |