Back to datasets
Dataset assetOpen Source CommunityMobile Device NavigationApplication Interaction Analysis

GUI-Odyssey

GUI Odyssey is an extensive dataset for training and evaluating cross‑application navigation agents on mobile devices. It contains 7,735 episodes collected from six types of mobile devices, covering six cross‑application task types, 201 apps, and 1.4 K app combinations. The data structure includes episode ID, device info, task info, total step count, and detailed step records. The dataset supports multiple split strategies—random, task, device, and app splits—to assess agent performance under various conditions. It is released under the Creative Commons Attribution 4.0 International License.

Source
huggingface
Created
Jun 13, 2024
Updated
Jun 24, 2024
Signals
371 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Card - GUI Odyssey

Introduction

GUI Odyssey is a comprehensive dataset for training and evaluating cross‑application navigation agents. The dataset contains 7,735 episodes from six mobile devices, covering six types of cross‑application tasks, 201 apps, and 1.4 K app combinations.

Data Structure

Data Fields

Each annotation field is described below:

  • episode_id (str): Unique identifier for the episode.
  • device_info (dict): Detailed information about the virtual device that collected the episode.
    • product (str): Name of the simulator product.
    • release_version (str): Android API level of the simulator.
    • sdk_version (str): SDK version used for the simulator.
    • h (int): Device screen height.
    • w (int): Device screen width.
    • device_name (str): Name of the virtual device, includes Pixel Fold, Pixel Tablet, Pixel 8 Pro, Pixel 7 Pro, Medium Phone, Small Phone.
  • task_info (dict): Detailed information about the task that generated the episode.
    • category (str): Task category, includes Multi_Apps, Web_Shopping, General_Tool, Information_Management, Media_Entertainment, Social_Sharing.
    • app (list[str]): Applications used for the task.
    • meta_task (str): Template of the task, e.g., "Search for the next {} and set a reminder."
    • task (str): Specific task created by filling the meta task, e.g., "Search for the next New York Fashion Week and set a reminder."
    • instruction (str): Detailed and paraphrased version of the task, possibly mentioning specific tools or apps.
  • step_length (int): Total number of steps in the episode.
  • steps (list[dict]): Each individual step in the episode, with the following fields:
    • step (int): Zero‑based step number indicating its position in the sequence.
    • screenshot (str): Screenshot of the current screen for the step.
    • action (str): Action taken at the step, includes CLICK, SCROLL, LONG_PRESS, TYPE, COMPLETE, IMPOSSIBLE, HOME, BACK.
    • info (Union[str, list[list]]): Detailed information required to perform the action. All coordinates are normalized to the [0, 1000] range.
      • For CLICK, info contains the click coordinates (x, y) or special keys KEY_HOME, KEY_BACK, KEY_RECENT.
      • For LONG_PRESS, info contains the long‑press coordinates (x, y).
      • For SCROLL, info contains start (x1, y1) and end (x2, y2) coordinates.
      • For other values, info is empty ("").
    • ps (str): Additional details or context based on the action value.
      • For COMPLETE or IMPOSSIBLE, may contain annotator comments on why the task was completed or impossible.
      • For SCROLL, contains the full scroll trajectory.

Data Splits

The GUI Odyssey dataset can be split in two ways to evaluate in‑domain and out‑of‑domain performance:

  • random_split: Randomly divide the dataset into training and testing sets with a 3:1 ratio.
  • task_split: Sample meta‑tasks proportionally from six categories. Tasks in the test set differ significantly from those in the training set.
  • device_split: Choose episodes annotated on Fold Phone as the test set, a device that differs markedly from others such as smartphones and tablets.
  • app_split: Split based on applications. Applications in the test set differ significantly from those in the training set.

Each split corresponds to a JSON file with the following fields:

  • train (list[str]): List of annotation file names for the training set, equivalent to episode_id.
  • test (list[str]): List of annotation file names for the test set, equivalent to episode_id.

License Information

The dataset is licensed under the Creative Commons Attribution 4.0 International License.

Disclaimer

The dataset is intended for research purposes only. We strongly oppose any harmful use of the data or technology.

Citation

bib @misc{lu2024gui, title={GUI Odyssey: A Comprehensive Dataset for Cross‑App GUI Navigation on Mobile Devices}, author={Quanfeng Lu and Wenqi Shao and Zitao Liu and Fanqing Meng and Boxuan Li and Botong Chen and Siyuan Huang and Kaipeng Zhang and Yu Qiao and Ping Luo}, year={2024}, eprint={2406.08451}, archivePrefix={arXiv}, primaryClass={cs.CV} }

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio