Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Smooth and Flexible Camera Movement Synthesis via Temporal Masked Generative Modeling
Authors: Chenghao Xu, guangtao lyu, Jiexi Yan, Muli Yang, Cheng Deng
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental evaluations demonstrate the effectiveness of our Tem MEGA, highlighting its superiority in both online and offline camera movement synthesis. |
| Researcher Affiliation | Academia | Chenghao Xu1, Guangtao Lyu1, Jiexi Yan2 , Muli Yang3, Cheng Deng1 1 School of Electronic Engineering, Xidian University, Xi an, Shaanxi, China, 2 School of Computer Science and Technology, Xidian University, Xi an, Shaanxi, China, 3 Institute for Infocomm Research (I2R), A*STAR, Singapore EMAIL, EMAIL |
| Pseudocode | No | The paper describes the model architecture and training objectives in detail but does not include a dedicated pseudocode block or algorithm figure. |
| Open Source Code | No | At this stage, we do not plan to release the code due to ongoing related research. However, we provide detailed implementation and training settings in the paper to ensure reproducibility. |
| Open Datasets | Yes | In this work, we use DCM [37], a dataset consisting of 108 pieces of animator-designed paired dance-camera-music data including camera keyframe information. |
| Dataset Splits | Yes | To ensure the fairness of the experiment, we follow the previous works and re-use the train and test splits provided by the original dataset. |
| Hardware Specification | Yes | We train our models on 4 NVIDIA A6000 48 GB with a batch size of 512. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not specify software versions for libraries or frameworks like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | We train our models on 4 NVIDIA A6000 48 GB with a batch size of 512. Discrete Camera Tokenizer (DCT) architecture incorporates residual blocks within its encoder and decoder components, featuring a spatial downscaling factor of 4, which consists of 4 quantization layers, each covering a codebook comprising 2048 vectors of 32-dimensional entities. The quantization dropout ratio is set to 0.2. For Consecutive Memory Encoder (CME), we use two transformer encoder blocks to compress long-term memories with 2 layers, and we enhance the short-term memories with two transformer decoder blocks with 4 layers. we set Ll ,Ls and K to 256, 32 and 8. The number of masked transformer blocks, heads, and dimensions is set to 6, 8, and 512 in the Temporal Conditional Masked Transformer (CMT). We train the models by Adam optimizer [20] with the same hyperparameters (learning rate, β1, and β2 are set as 0.002, 0, and 0.99, respectively) as previous works. |