Skip to content

feat: add GetFile, GetFileList, ClearVirtualDisplays agent OS commands #277

Open
mlikasam-askui wants to merge 5 commits into
mainfrom
feat/agent-os-getfile-getfilelist-clear-virtual-displays
Open

feat: add GetFile, GetFileList, ClearVirtualDisplays agent OS commands #277
mlikasam-askui wants to merge 5 commits into
mainfrom
feat/agent-os-getfile-getfilelist-clear-virtual-displays

Conversation

@mlikasam-askui
Copy link
Copy Markdown
Contributor

Jira: SOLENG-363, SOLENG-362

mlikasam-askui and others added 5 commits May 13, 2026 08:46
Wire up three new commands on AskUiControllerClient via _send_command:

- get_file_list(path): returns the list of file paths at the given
  directory on the device under automation.
- get_file(path): decodes the base64 payload returned by the controller
  and dispatches to Image.Image when the bytes parse as an image, or to
  str when they decode cleanly as UTF-8 text. Anything else raises a
  DesktopAgentOsError, matching the server contract that always
  base64-encodes the raw file bytes.
- clear_virtual_displays(): removes all virtual displays, leaving only
  real ones active.

class RemoveVirtualDisplaysCommand(BaseModel):
name: Literal['RemoveVirtualDisplays'] = 'RemoveVirtualDisplays'
parameters: List[Any] = []
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we specify the type here instead of using Any?

self._start_session()
self._start_execution()
self.set_display(self._display)
if self._settings.clean_virtual_displays:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add an INFO log message here?


@staticmethod
def _decode_file_payload(base64_data: str) -> Image.Image | str:
try:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on one hand this omits the need for a "magic" set of endings that we convert, on the other this might be inefficient. Not sure how much time/compute this consumes for huge text files that are clearly not images

detail = f"image ({decoded.format}, {decoded.size[0]}x{decoded.size[1]})"
self._reporter.add_message(
"AgentOS", f"get_file({path}) -> {detail}", decoded
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add the PIL image here as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the decoded response is the PIL image.

"controller is using the real display"
)
)
self.set_display(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this become a problem if I in the act command I set removing displays to true, but set the display id to 2?

},
agent_os=agent_os,
required_tags=[ToolTags.SCALED_AGENT_OS.value],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the self.is_cacheable=True flag to init

},
agent_os=agent_os,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the self.is_cacheable=True flag to init

@philipph-askui
Copy link
Copy Markdown
Contributor

What happens if I have the current controller on mac and the agent tries to call the read_file, I assume it will crash?
Can we add a warning to the 2 new tools that they are currently only available under windows and that they will cause the execution to crash under mac/linux?

should we need to update the docs and add a hint, that e.g. the read_file operation is only available on windows currently?

@mlikasam-askui
Copy link
Copy Markdown
Contributor Author

What happens if I have the current controller on mac and the agent tries to call the read_file, I assume it will crash? Can we add a warning to the 2 new tools that they are currently only available under windows and that they will cause the execution to crash under mac/linux?

should we need to update the docs and add a hint, that e.g. the read_file operation is only available on windows currently?

It won’t crash. The agent will receive an error indicating that the currently used control does not support the requested command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants