Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-Supervision is All You Need for Solving Rubik’s Cube
Authors: Kyo Takano
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When tested on Rubik s Cube, 15 Puzzle, and 7 7 Lights Out, our method outperformed the previous state-of-the-art method Deep Cube A, improving the trade-off between solution optimality and computational cost, despite significantly less training data. Furthermore, we investigate the scaling law of our Rubik s Cube solver with respect to model size and training data volume. |
| Researcher Affiliation | Industry | Kyo Takano EMAIL |
| Pseudocode | Yes | Algorithm 1 The proposed training algorithm. G: Goal state M: Set of moves B: Number of scrambles per batch N: Number of moves per scramble fθ: DNN parameterized by θ Ensure: fθ: Trained DNN while not converged do T for b = 1 to B do S G for n = 1 to N do Sample m random(M) Update S S m Add (S, m) to T Update fθ using T |
| Open Source Code | Yes | Our code is available at github.com/kyo-takano/efficientcube |
| Open Datasets | Yes | We utilize the publicly available Deep Cube A dataset on Code Ocean2, which comprises 1, 000 Rubik s Cube test cases. |
| Dataset Splits | Yes | We utilize the publicly available Deep Cube A dataset on Code Ocean2, which comprises 1, 000 Rubik s Cube test cases. |
| Hardware Specification | Yes | We used a single GPU (NVIDIA T4), whereas Agostinelli et al. (2019) employed four GPUs (NVIDIA TITAN V). |
| Software Dependencies | No | The paper mentions 'Adam optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for any software frameworks (e.g., PyTorch, TensorFlow), programming languages (e.g., Python), or other libraries. |
| Experiment Setup | Yes | The DNN parameters are updated using Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 10 3. In this experiment, we set the maximum scramble length to 26, which is God s Number for Rubik s Cube in the quarter-turn metric (Kunkle & Cooperman, 2007). Additionally, the training process excludes clearly redundant or self-canceling permutations, such as R following R , when generating random scrambles. We train the DNN for 2, 000, 000 steps with a batch size of 26, 000 (26 moves per scramble 1, 000 scrambles per batch), which is equivalent to 2 billion solutions. ... We set a beam width of k and only pass the top k candidate paths to the next depth i + 1, sorted by their cumulative product of probabilities. ... To investigate the trade-off between the number of nodes expanded and the optimality of solutions, we systematically increase the beam width k from 20 to 218 in powers of 2. |