reproducibilityindex.ai

ChatCam: Empowering Camera Control through Conversational AI

Authors: Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments, including comparisons to state-of-the-art approaches and user studies, demonstrate our approach s ability to interpret and execute complex instructions for camera operation, showing promising applications in real-world production settings.
Researcher Affiliation	Academia	Xinhang Liu1 Yu-Wing Tai2 Chi-Keung Tang1 1HKUST 2Dartmouth College
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	We will release the codebase upon paper acceptance.
Open Datasets	Yes	We tested our method on scenes from a series of datasets suitable for 3D reconstruction with radiance field representations, including: (i) mip-Ne RF 360 [6], a real dataset with indoor and outdoor scenes. (ii) OMMO [50], a real dataset with large-scale outdoor scenes. (iii) Hypersim [61], a synthetic dataset for indoor scenes. (iv) Mannequin Challenge [44], a real dataset for human-centric scenes.
Dataset Splits	No	For each scene, we reconstructed using all available images without train-test splitting. The paper does not provide explicit training/validation/test dataset splits, nor does it specify how the 1000 manually constructed trajectories were split for Cine GPT training.
Hardware Specification	Yes	We implement our approach using Py Torch [56] and conduct all the training and inference on a single NVIDIA RTX 4090 GPU with 24 GB RAM.
Software Dependencies	No	The paper mentions software like Py Torch, Adam optimizer, CLIP, GPT-4, and 3DGS, but it does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	The trajectory tokenizer has a codebook with K = 256 latent embedding vectors, each with dimension d = 256. The temporal downsampling rate of the trajectory encoder is l = 4. Our cross-modal transformer decoder consists of 24 layers, with attention mechanisms employing an inner dimensionality of 64. The remaining sub-layers and embeddings have a dimensionality of 256. We train Cine GPT using the Adam optimizer [38] with an initial learning rate of 0.0001. ... The learning rate of anchor refinement is 0.002.