MarioNette: Self-Supervised Sprite Learning

Authors: Dmitriy Smirnov, MICHAEL GHARBI, Matthew Fisher, Vitor Guizilini, Alexei Efros, Justin M. Solomon

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our self-supervised decomposition on several real (non-synthetic) datasets, compare to related work, and conduct an ablation study.
Researcher Affiliation Collaboration Dmitriy Smirnov MIT Michaël Gharbi Adobe Research Matthew Fisher Adobe Research Vitor Guizilini Toyota Research Institute Alexei A. Efros UC Berkeley Justin Solomon MIT
Pseudocode No The paper describes the method using diagrams and prose, but no explicit pseudocode or algorithm blocks are provided.
Open Source Code No No explicit statement about the release of source code or a link to a code repository was found in the paper.
Open Datasets Yes We train on Fighting Hero (one level, 5,330 frames), Nintendo Super Mario Bros. (one level, 2,220 frames), and ATARI Space Invaders (5,000 frames). Additionally, we evaluate on a synthetically-generated sprite-based game from [10].
Dataset Splits No The paper states the total number of frames used for training specific games (e.g., '5,330 frames', '2,220 frames', '5,000 frames') but does not provide explicit train/validation/test splits with percentages or sample counts.
Hardware Specification Yes We use the Adam W [39] optimizer on a Ge Force GTX 1080 GPU, with batch size 4 and learning rate 0.0001, except for the background module (learning rate 0.001 when used).
Software Dependencies No The paper mentions various components and optimizers (e.g., Adam W, Layer Normalization, Group Normalization) but does not provide specific version numbers for software libraries or frameworks used in the implementation.
Experiment Setup Yes We set λsparse = 0.005 and train for 200,000 steps ( 20 hours) with λBeta = 0.002 and finetune for 10,000 steps with λBeta = 0.1. We use the Adam W [39] optimizer on a Ge Force GTX 1080 GPU, with batch size 4 and learning rate 0.0001, except for the background module (learning rate 0.001 when used). Unless otherwise specified, we set latent dimension to d = 128 and patch size to k = 32.