SocioDojo: Building Lifelong Analytical Agents with Real-world Text and Time Series

Authors: Junyan Cheng, Peter Chin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform experiments and ablation studies to explore the factors that impact performance. The results show that our proposed method achieves improvements of 32.4% and 30.4% compared to the state-of-the-art method in the two experimental settings. [...] We also perform experiments and ablation studies to explore factors that impact performance. [...] 5 EXPERIMENT In Section 5.1, we present our experimental setup. We then evaluate our proposed H&P prompting and compare it with other state-of-the-art prompting techniques in Section 5.2. Finally, we discuss the results of the ablation studies in Section 5.3.
Researcher Affiliation Academia Junyan Cheng Thayer School of Engineering Dartmouth College Hanover, NH 03755, USA jc.th@dartmouth.edu Peter Chin Thayer School of Engineering Dartmouth College Hanover, NH 03755, USA pc@dartmouth.edu
Pseudocode No The paper describes the agent's architecture and prompting process in detail but does not include any sections explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Our code and data are available at https://github.com/chengjunyan1/Socio Dojo.
Open Datasets Yes Our code and data are available at https://github.com/chengjunyan1/Socio Dojo. [...] Socio Dojo uses three components Information Sources, Time series, and Knowledge base & Tools based on 30 GB of high-quality real-world data that we have collected [...] We collect time series on a variety of topics for a comprehensive probe of the world state, including financial data from Yahoo Finance, economic time series from the St. Louis Federal Reserve Economic Data Database (FRED) with a popularity rating of more than 50%, Google trends of free trending keywords from Exploding Topics, a society trend tracking service, political polls from Five Thirty Eight, a famous political analysis website, and public opinion poll trackers from You Gov, an online survey platform.
Dataset Splits No The paper describes a lifelong learning environment where agents are 'constantly updated with the latest messages' and evaluated based on performance 'over time' (e.g., 'The game begins on 2021-10-01 and ends on 2023-08-01'). This indicates a continuous, time-based evaluation rather than traditional, fixed training/validation/test dataset splits.
Hardware Specification No The paper does not specify the hardware (e.g., specific GPU or CPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies No The paper mentions 'Chroma DB' and 'Instructor-XL (Su et al., 2023)' as components and 'GPT-3.5-Turbo series' and 'GPT-4' as foundation models. However, it does not provide specific version numbers for these software dependencies (e.g., Chroma DB vX.Y.Z or Instructor-XL vA.B).
Experiment Setup Yes In our experiment, we set this number [max steps for analyst] to 4. [...] It can call one of the query interfaces to find resources at each step. The query is handled iteratively through a multi-round dialog response loop with a max step of 3 in our experiment. [...] It initiates a multi-round dialog action loop with a max step of 5 in our experiment when an analysis report is received. [...] All methods use AAA architecture, one news channel as the information source, GPT-3.5-Turbo series as foundation models with a low temprature=0.2 for a more deterministic experiment result. [...] We chose a bound of 5, as it is possible to achieve a return of 5 times for a single asset without leverage in around 2 years, which is the time span of Socio Dojo. [...] Therefore, we selected overnight rates of FRD: 0.10, WEB: 0.05, and a bound of 5 as our experimental setting.