Goal Conditioned Reinforcement Learning for Photo Finishing Tuning
Authors: Jiarui Wu, Yujin Wang, Lingen Li, Zhang Fan, Tianfan Xue
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct detailed experiments on photo finishing tuning and photo stylization tuning tasks, demonstrating the advantages of our method. Project website: https://openimaginglab.github.io/RLPix Tuner/. |
| Researcher Affiliation | Collaboration | Jiarui Wu1,2 , Yujin Wang1 , Lingen Li1,2, Fan Zhang1, Tianfan Xue2 1Shanghai AI Laboratory 2The Chinese University of Hong Kong |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Project website: https://openimaginglab.github.io/RLPix Tuner/. (This website states 'Code and data will be released soon.') The NeurIPS Paper Checklist also states 'Codes and datasets will be made publicly available upon acceptance.' |
| Open Datasets | Yes | We use the MIT-Adobe Five K Dataset [3], a renowned resource in the field of photo retouching, which comprises 5,000 photographs captured using DSLR cameras by various photographers. |
| Dataset Splits | Yes | For our study, we selected 4,500 images to serve as the training dataset, with the remaining 500 images designated as the validation dataset. |
| Hardware Specification | Yes | To evaluate the efficiency of our approach, we conducted speed testing experiments on a system equipped with an AMD EPYC 7402 (48C) @ 2.8 GHz CPU, 8 NVIDIA RTX 4090 GPUs with 24GB of RAM each, 512 GB of memory, and running Cent OS 7.9. |
| Software Dependencies | No | The paper mentions using the TD3 algorithm [5] and Adam optimizer [17]. It also states, 'We implement the CMAES method [28] based on open-source framework [1].' Reference [1] is 'pymoo: Multi-objective optimization in python.' However, specific version numbers for `pymoo` or other libraries like PyTorch/TensorFlow, CUDA, Python are not provided. |
| Experiment Setup | Yes | During the policy inference, the input and goal images are resized to the resolution of 64 64. ... We train our policy using the standard TD3 algorithm [5] and set the termination of our RL policy to trigger when the episode length reaches the maximum threshold (10 steps), ensuring efficiency. ... We set σ = 0.1 for a trade-off between exploration and exploitation. ... In this implementation, we set ϵ1 = 0.2, which is twice the value of ϵ. ... We set the EMA update rate ρ = 0.99 in all our experiments. The optimizer is performed using Adam optimizer [17] with (β1, β2) = (0.9, 0.999). We set the learning rate to specific values for policy and value network, that is, 1e-4 for action and 2e-4 for Q network. We set batch size as 64 for both photo finishing tuning and photo stylization tuning experiments. We set the discount factor γ = 0.9. |