Skip to content

SkyworkAI/agent-studio

Repository files navigation

AgentStudio

Python 3.11 Code style: black License: AGPL v3 pre-commit

AgentStudio is an integrated solution featuring in-depth benchmark suites, realistic environments, and comprehensive toolkits. Here, we open-source everything to promote research towards generalist computer agents of the future. The paper, leaderboard, benchmark suites, and documentation for environments and toolkits can be found in our project page.

Comparisons with existing work:

Install

Please see docs/install.md for more details. We are going to create a packed release for out-of-box usage.

Three Offline Benchmark Suites

We curated three static datasets for benchmarking GUI grounding, success detection, and learning from videos, respectively. Please see the detailed evals/README.md for scripts that reproduce the benchmark results in our paper.

Customize Online Benchmarks in Real Environments

AgentStudio also provides a cross-platform real-world environments with most generic (human-like) observation and action spaces. We also offer a set of example tasks as a demonstration to benchmark computer agents in the wild. We also offer several auto-evaluators for easily benchmark without human evaluation. The implementation is straightforward and flexible, supporting adding custom tasks as well as human evaluation. Please find more in docs/online_benchmark.md.

Record GUI Data and Trajectories

The real-world environments also facilitate scalable data collection across different operating systems. AgentStudio offers two data collection pipelines for single-step GUI grounding data and task-completing trajectories, for both local recording (assuming there are two screens) and remote recording (based on VNC). Please refer to the docs/annotate_ground_ui.md and docs/annotate_trajectory.md for detailed instructions.

Here is an example of recording single-step GUI grounding data in MacOS:

Contributing

We are continuing to expand the collection of environments, tasks, and data over time. Contributions and feedback from everyone on how to make this into a better tool are more than welcome. Please check out CONTRIBUTING.md for how to get involved.

Acknowledgement

We would like to thank the following projects for their inspiration and contributions to the open-source community: Open Interpreter, WebArena, Cradle, Synapse, SeeClick, ScreenAgent, etc.

Citation

If you find AgentStudio useful, please cite our paper:

@article{zheng2024agentstudio,
  title={AgentStudio: A Toolkit for Building General Virtual Agents},
  author={Longtao Zheng and Zhiyuan Huang and Zhenghai Xue and Xinrun Wang and Bo An and Shuicheng Yan},
  journal={arXiv preprint arXiv:2403.17918},
  year={2024}
}