-
-
Notifications
You must be signed in to change notification settings - Fork 670
Open
Description
Summary
The UITARS action space defines left_double and right_single, but convert_to_computer_actions only handles double_click and right_click.
Since the model emits the former names, these actions are silently dropped.
When this happens, the fallback logic in convert_uitars_messages_to_litellm appends a list of raw strings to content, which later causes huggingfacelocal_adapter._convert_messages to crash with 'str' object has no attribute 'get'.
Steps to Reproduce
-
Update the test prompt in
tests/agent_loop_testing/agent_test.py:message = "Open Safari browser by double click"
-
Run:
uv run tests/agent_loop_testing/agent_test.py --model "huggingface-local/ByteDance-Seed/UI-TARS-1.5-7B"
Expected Behavior
left_doubleshould correctly translate into a double-click action.right_singleshould translate into a right-click action.- Message conversion should never introduce raw strings into
content, ensuring downstream adapters can parse messages correctly.
Actual Behavior
Actions are ignored, resulting in no computer events being executed.
During the next conversion pass, huggingfacelocal_adapter._convert_messages crashes because content contains raw strings instead of structured message objects.
🔥Error Logs
🤖 Testing CUA Agent: huggingface-local//home/liangxiao/NAS/ByteDance-Seed/UI-TARS-1.5-7B
==================================================
✅ CUA Agent created
✅ Mock computer ready
🚀 Running agent...
Iteration 1:
Agent:
Unknown output type
Iteration 2:
(No output from agent)
Debug - Full result: {'output': [], 'usage': Usage(completion_tokens=0, prompt_tokens=0, total_tokens=0)}
Iteration 3:
(No output from agent)
Debug - Full result: {'output': [], 'usage': Usage(completion_tokens=0, prompt_tokens=0, total_tokens=0)}
❌ Test failed: litellm.APIConnectionError: 'str' object has no attribute 'get'
Traceback (most recent call last):
File "huggingfacelocal_adapter.py", line 70, in _convert_messages
if item.get("type") == "text":
^^^^^^^^
AttributeError: 'str' object has no attribute 'get'
LiteLLM Retried: 3 times
Root Cause
convert_to_computer_actionsdoes not map UITARS-native actions (left_double,right_single) to the expected internal action names.- When the model emits these actions, they are skipped, resulting in no structured action blocks.
- The fallback in
convert_uitars_messages_to_litellmappendscurrent_assistant_contentas a list of raw strings, breaking message schema and causing_convert_messagesto fail.
Fix
- In
convert_to_computer_actions: mapleft_doubleandright_singleto double-click/right-click. - In
convert_uitars_messages_to_litellm: wrap trailingcurrent_assistant_contentinto a text block:{"role": "assistant", "content": [{"type": "text", "text": "\n".join(...)}]}to keep formats consistent.
References
libs/python/agent/agent/loops/uitars.py(action conversion + message conversion)
Metadata
Metadata
Assignees
Labels
No labels