(by UI-TARS) - A fast, lightweight MCP server that empowers LLMs with browser automation via Puppeteer’s structured accessibility data, featuring optional vision mode for complex visual understanding and flexible, cross-platform configuration.
English | 简体中文
TARS* is a Multimodal AI Agent stack, currently shipping two projects: Agent TARS and UI-TARS-desktop:
Agent TARS | UI-TARS-desktop |
---|---|
Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a CLI and Web UI for usage. It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools. |
UI-TARS Desktop is a desktop application that provides a native GUI Agent based on the UI-TARS model.
It primarily ships a local and remote computer as well as browser operators. |
Agent TARS is a general multimodal AI Agent stack, it brings the power of GUI Agent and Vision into your terminal, computer, browser and product.
It primarily ships with a CLI and Web UI for usage.
It aims to provide a workflow that is closer to human-like task completion through cutting-edge multimodal LLMs and seamless integration with various real-world MCP tools.
Please help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline
https://github.com/user-attachments/assets/772b0eef-aef7-4ab9-8cb0-9611820539d8
Booking Hotel | Generate Chart with extra MCP Servers |
---|---|
Instruction: I am in Los Angeles from September 1st to September 6th, with a budget of $5,000. Please help me book a Ritz-Carlton hotel closest to the airport on booking.com and compile a transportation guide for me | Instruction: Draw me a chart of Hangzhou's weather for one month |
For more use cases, please check out #842.
# Luanch with `npx`.
npx @agent-tars/cli@latest
# Install globally, required Node.js >= 22
npm install @agent-tars/cli@latest -g
# Run with your preferred model provider
agent-tars --provider volcengine --model doubao-1-5-thinking-vision-pro-250428 --apiKey your-api-key
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey your-api-key
Visit the comprehensive Quick Start guide for detailed setup instructions.
🌟 Explore Agent TARS Universe 🌟
UI-TARS Desktop is a native GUI agent driven by UI-TARS and Seed-1.5-VL/1.6 series models, available on your local computer and remote VM sandbox on cloud.
   📑 Paper   
| 🤗 Hugging Face Models  
|   🫨 Discord  
|   🤖 ModelScope  
🖥️ Desktop Application   
|    👓 Midscene (use in browser)   
Instruction | Local Operator | Remote Operator |
---|---|---|
Please help me open the autosave feature of VS Code and delay AutoSave operations for 500 milliseconds in the VS Code setting. | ||
Could you help me check the latest open issue of the UI-TARS-Desktop project on GitHub? |
See Quick Start
See CONTRIBUTING.md.
This project is licensed under the Apache License 2.0.
If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}