AgentStudio

AgentStudio is an integrated solution featuring in-depth benchmark suites, realistic environments, and comprehensive toolkits. Here, we open-source everything to promote research towards generalist computer agents of the future. The paper, leaderboard, benchmark suites, and documentation for environments and toolkits can be found in our project page.

Comparisons with existing work:

Install

Please see docs/install.md for more details. We are going to create a packed release for out-of-box usage.

Three Offline Benchmark Suites

We curated three static datasets for benchmarking GUI grounding, success detection, and learning from videos, respectively. Please see the detailed evals/README.md for scripts that reproduce the benchmark results in our paper.

Customize Online Benchmarks in Real Environments

AgentStudio also provides a cross-platform real-world environments with most generic (human-like) observation and action spaces. We also offer a set of example tasks as a demonstration to benchmark computer agents in the wild. We also offer several auto-evaluators for easily benchmark without human evaluation. The implementation is straightforward and flexible, supporting adding custom tasks as well as human evaluation. Please find more in docs/online_benchmark.md.

Record GUI Data and Trajectories

The real-world environments also facilitate scalable data collection across different operating systems. AgentStudio offers two data collection pipelines for single-step GUI grounding data and task-completing trajectories, for both local recording (assuming there are two screens) and remote recording (based on VNC). Please refer to the docs/annotate_ground_ui.md and docs/annotate_trajectory.md for detailed instructions.

Here is an example of recording single-step GUI grounding data in MacOS:

Contributing

We are continuing to expand the collection of environments, tasks, and data over time. Contributions and feedback from everyone on how to make this into a better tool are more than welcome. Please check out CONTRIBUTING.md for how to get involved.

Acknowledgement

We would like to thank the following projects for their inspiration and contributions to the open-source community: Open Interpreter, WebArena, Cradle, Synapse, SeeClick, ScreenAgent, etc.

Citation

If you find AgentStudio useful, please cite our paper:

@article{zheng2024agentstudio,
  title={AgentStudio: A Toolkit for Building General Virtual Agents},
  author={Longtao Zheng and Zhiyuan Huang and Zhenghai Xue and Xinrun Wang and Bo An and Shuicheng Yan},
  journal={arXiv preprint arXiv:2403.17918},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
agent_studio		agent_studio
dockerfiles		dockerfiles
docs		docs
evals		evals
scripts		scripts
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
annotate_ground_ui.py		annotate_ground_ui.py
online_benchmark.py		online_benchmark.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AgentStudio

Install

Three Offline Benchmark Suites

Customize Online Benchmarks in Real Environments

Record GUI Data and Trajectories

Contributing

Acknowledgement

Citation

About

Releases

Packages

Contributors 5

Languages

License

SkyworkAI/agent-studio

Folders and files

Latest commit

History

Repository files navigation

AgentStudio

Install

Three Offline Benchmark Suites

Customize Online Benchmarks in Real Environments

Record GUI Data and Trajectories

Contributing

Acknowledgement

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages