Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Self-Supervision is All You Need for Solving Rubik’s Cube

Authors: Kyo Takano

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When tested on Rubik s Cube, 15 Puzzle, and 7 7 Lights Out, our method outperformed the previous state-of-the-art method Deep Cube A, improving the trade-off between solution optimality and computational cost, despite significantly less training data. Furthermore, we investigate the scaling law of our Rubik s Cube solver with respect to model size and training data volume.
Researcher Affiliation	Industry	Kyo Takano EMAIL
Pseudocode	Yes	Algorithm 1 The proposed training algorithm. G: Goal state M: Set of moves B: Number of scrambles per batch N: Number of moves per scramble fθ: DNN parameterized by θ Ensure: fθ: Trained DNN while not converged do T for b = 1 to B do S G for n = 1 to N do Sample m random(M) Update S S m Add (S, m) to T Update fθ using T
Open Source Code	Yes	Our code is available at github.com/kyo-takano/efficientcube
Open Datasets	Yes	We utilize the publicly available Deep Cube A dataset on Code Ocean2, which comprises 1, 000 Rubik s Cube test cases.
Dataset Splits	Yes	We utilize the publicly available Deep Cube A dataset on Code Ocean2, which comprises 1, 000 Rubik s Cube test cases.
Hardware Specification	Yes	We used a single GPU (NVIDIA T4), whereas Agostinelli et al. (2019) employed four GPUs (NVIDIA TITAN V).
Software Dependencies	No	The paper mentions 'Adam optimizer (Kingma & Ba, 2014)' but does not provide specific version numbers for any software frameworks (e.g., PyTorch, TensorFlow), programming languages (e.g., Python), or other libraries.
Experiment Setup	Yes	The DNN parameters are updated using Adam optimizer (Kingma & Ba, 2014) with an initial learning rate of 10 3. In this experiment, we set the maximum scramble length to 26, which is God s Number for Rubik s Cube in the quarter-turn metric (Kunkle & Cooperman, 2007). Additionally, the training process excludes clearly redundant or self-canceling permutations, such as R following R , when generating random scrambles. We train the DNN for 2, 000, 000 steps with a batch size of 26, 000 (26 moves per scramble 1, 000 scrambles per batch), which is equivalent to 2 billion solutions. ... We set a beam width of k and only pass the top k candidate paths to the next depth i + 1, sorted by their cumulative product of probabilities. ... To investigate the trade-off between the number of nodes expanded and the optimality of solutions, we systematically increase the beam width k from 20 to 218 in powers of 2.