GUI-Odyssey

GUI Odyssey is an extensive dataset for training and evaluating cross‑application navigation agents on mobile devices. It contains 7,735 episodes collected from six types of mobile devices, covering six cross‑application task types, 201 apps, and 1.4 K app combinations. The data structure includes episode ID, device info, task info, total step count, and detailed step records. The dataset supports multiple split strategies—random, task, device, and app splits—to assess agent performance under various conditions. It is released under the Creative Commons Attribution 4.0 International License.

Updated 6/24/2024

huggingface

Description

Dataset Card - GUI Odyssey

Introduction

GUI Odyssey is a comprehensive dataset for training and evaluating cross‑application navigation agents. The dataset contains 7,735 episodes from six mobile devices, covering six types of cross‑application tasks, 201 apps, and 1.4 K app combinations.

Data Structure

Data Fields

Each annotation field is described below:

episode_id (str): Unique identifier for the episode.
device_info (dict): Detailed information about the virtual device that collected the episode.
- product (str): Name of the simulator product.
- release_version (str): Android API level of the simulator.
- sdk_version (str): SDK version used for the simulator.
- h (int): Device screen height.
- w (int): Device screen width.
- device_name (str): Name of the virtual device, includes Pixel Fold, Pixel Tablet, Pixel 8 Pro, Pixel 7 Pro, Medium Phone, Small Phone.
task_info (dict): Detailed information about the task that generated the episode.
- category (str): Task category, includes Multi_Apps, Web_Shopping, General_Tool, Information_Management, Media_Entertainment, Social_Sharing.
- app (list[str]): Applications used for the task.
- meta_task (str): Template of the task, e.g., "Search for the next {} and set a reminder."
- task (str): Specific task created by filling the meta task, e.g., "Search for the next New York Fashion Week and set a reminder."
- instruction (str): Detailed and paraphrased version of the task, possibly mentioning specific tools or apps.
step_length (int): Total number of steps in the episode.
steps (list[dict]): Each individual step in the episode, with the following fields:
- step (int): Zero‑based step number indicating its position in the sequence.
- screenshot (str): Screenshot of the current screen for the step.
- action (str): Action taken at the step, includes CLICK, SCROLL, LONG_PRESS, TYPE, COMPLETE, IMPOSSIBLE, HOME, BACK.
- info (Union[str, list[list]]): Detailed information required to perform the action. All coordinates are normalized to the [0, 1000] range.
  - For CLICK, info contains the click coordinates (x, y) or special keys KEY_HOME, KEY_BACK, KEY_RECENT.
  - For LONG_PRESS, info contains the long‑press coordinates (x, y).
  - For SCROLL, info contains start (x1, y1) and end (x2, y2) coordinates.
  - For other values, info is empty ("").
- ps (str): Additional details or context based on the action value.
  - For COMPLETE or IMPOSSIBLE, may contain annotator comments on why the task was completed or impossible.
  - For SCROLL, contains the full scroll trajectory.

Data Splits

The GUI Odyssey dataset can be split in two ways to evaluate in‑domain and out‑of‑domain performance:

random_split: Randomly divide the dataset into training and testing sets with a 3:1 ratio.
task_split: Sample meta‑tasks proportionally from six categories. Tasks in the test set differ significantly from those in the training set.
device_split: Choose episodes annotated on Fold Phone as the test set, a device that differs markedly from others such as smartphones and tablets.
app_split: Split based on applications. Applications in the test set differ significantly from those in the training set.

Each split corresponds to a JSON file with the following fields:

train (list[str]): List of annotation file names for the training set, equivalent to episode_id.
test (list[str]): List of annotation file names for the test set, equivalent to episode_id.

License Information

The dataset is licensed under the Creative Commons Attribution 4.0 International License.

Disclaimer

The dataset is intended for research purposes only. We strongly oppose any harmful use of the data or technology.

Citation

bib @misc{lu2024gui, title={GUI Odyssey: A Comprehensive Dataset for Cross‑App GUI Navigation on Mobile Devices}, author={Quanfeng Lu and Wenqi Shao and Zitao Liu and Fanqing Meng and Boxuan Li and Botong Chen and Siyuan Huang and Kaipeng Zhang and Yu Qiao and Ping Luo}, year={2024}, eprint={2406.08451}, archivePrefix={arXiv}, primaryClass={cs.CV} }

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Mobile Device Navigation

Application Interaction Analysis

Source

Organization: huggingface

Created: 6/13/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →