Goal Conditioned Reinforcement Learning for Photo Finishing Tuning

Authors: Jiarui Wu, Yujin Wang, Lingen Li, Zhang Fan, Tianfan Xue

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct detailed experiments on photo finishing tuning and photo stylization tuning tasks, demonstrating the advantages of our method. Project website: https://openimaginglab.github.io/RLPix Tuner/.
Researcher Affiliation Collaboration Jiarui Wu1,2 , Yujin Wang1 , Lingen Li1,2, Fan Zhang1, Tianfan Xue2 1Shanghai AI Laboratory 2The Chinese University of Hong Kong
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No Project website: https://openimaginglab.github.io/RLPix Tuner/. (This website states 'Code and data will be released soon.') The NeurIPS Paper Checklist also states 'Codes and datasets will be made publicly available upon acceptance.'
Open Datasets Yes We use the MIT-Adobe Five K Dataset [3], a renowned resource in the field of photo retouching, which comprises 5,000 photographs captured using DSLR cameras by various photographers.
Dataset Splits Yes For our study, we selected 4,500 images to serve as the training dataset, with the remaining 500 images designated as the validation dataset.
Hardware Specification Yes To evaluate the efficiency of our approach, we conducted speed testing experiments on a system equipped with an AMD EPYC 7402 (48C) @ 2.8 GHz CPU, 8 NVIDIA RTX 4090 GPUs with 24GB of RAM each, 512 GB of memory, and running Cent OS 7.9.
Software Dependencies No The paper mentions using the TD3 algorithm [5] and Adam optimizer [17]. It also states, 'We implement the CMAES method [28] based on open-source framework [1].' Reference [1] is 'pymoo: Multi-objective optimization in python.' However, specific version numbers for `pymoo` or other libraries like PyTorch/TensorFlow, CUDA, Python are not provided.
Experiment Setup Yes During the policy inference, the input and goal images are resized to the resolution of 64 64. ... We train our policy using the standard TD3 algorithm [5] and set the termination of our RL policy to trigger when the episode length reaches the maximum threshold (10 steps), ensuring efficiency. ... We set σ = 0.1 for a trade-off between exploration and exploitation. ... In this implementation, we set ϵ1 = 0.2, which is twice the value of ϵ. ... We set the EMA update rate ρ = 0.99 in all our experiments. The optimizer is performed using Adam optimizer [17] with (β1, β2) = (0.9, 0.999). We set the learning rate to specific values for policy and value network, that is, 1e-4 for action and 2e-4 for Q network. We set batch size as 64 for both photo finishing tuning and photo stylization tuning experiments. We set the discount factor γ = 0.9.