GUI-Odyssey
GUI Odyssey is an extensive dataset for training and evaluating cross‑application navigation agents on mobile devices. It contains 7,735 episodes collected from six types of mobile devices, covering six cross‑application task types, 201 apps, and 1.4 K app combinations. The data structure includes episode ID, device info, task info, total step count, and detailed step records. The dataset supports multiple split strategies—random, task, device, and app splits—to assess agent performance under various conditions. It is released under the Creative Commons Attribution 4.0 International License.
Description
Dataset Card - GUI Odyssey
Introduction
GUI Odyssey is a comprehensive dataset for training and evaluating cross‑application navigation agents. The dataset contains 7,735 episodes from six mobile devices, covering six types of cross‑application tasks, 201 apps, and 1.4 K app combinations.
Data Structure
Data Fields
Each annotation field is described below:
episode_id(str): Unique identifier for the episode.device_info(dict): Detailed information about the virtual device that collected the episode.product(str): Name of the simulator product.release_version(str): Android API level of the simulator.sdk_version(str): SDK version used for the simulator.h(int): Device screen height.w(int): Device screen width.device_name(str): Name of the virtual device, includes Pixel Fold, Pixel Tablet, Pixel 8 Pro, Pixel 7 Pro, Medium Phone, Small Phone.
task_info(dict): Detailed information about the task that generated the episode.category(str): Task category, includes Multi_Apps, Web_Shopping, General_Tool, Information_Management, Media_Entertainment, Social_Sharing.app(list[str]): Applications used for the task.meta_task(str): Template of the task, e.g., "Search for the next {} and set a reminder."task(str): Specific task created by filling the meta task, e.g., "Search for the next New York Fashion Week and set a reminder."instruction(str): Detailed and paraphrased version of the task, possibly mentioning specific tools or apps.
step_length(int): Total number of steps in the episode.steps(list[dict]): Each individual step in the episode, with the following fields:step(int): Zero‑based step number indicating its position in the sequence.screenshot(str): Screenshot of the current screen for the step.action(str): Action taken at the step, includes CLICK, SCROLL, LONG_PRESS, TYPE, COMPLETE, IMPOSSIBLE, HOME, BACK.info(Union[str, list[list]]): Detailed information required to perform the action. All coordinates are normalized to the [0, 1000] range.- For CLICK,
infocontains the click coordinates (x, y) or special keys KEY_HOME, KEY_BACK, KEY_RECENT. - For LONG_PRESS,
infocontains the long‑press coordinates (x, y). - For SCROLL,
infocontains start (x1, y1) and end (x2, y2) coordinates. - For other values,
infois empty ("").
- For CLICK,
ps(str): Additional details or context based on the action value.- For COMPLETE or IMPOSSIBLE, may contain annotator comments on why the task was completed or impossible.
- For SCROLL, contains the full scroll trajectory.
Data Splits
The GUI Odyssey dataset can be split in two ways to evaluate in‑domain and out‑of‑domain performance:
- random_split: Randomly divide the dataset into training and testing sets with a 3:1 ratio.
- task_split: Sample meta‑tasks proportionally from six categories. Tasks in the test set differ significantly from those in the training set.
- device_split: Choose episodes annotated on Fold Phone as the test set, a device that differs markedly from others such as smartphones and tablets.
- app_split: Split based on applications. Applications in the test set differ significantly from those in the training set.
Each split corresponds to a JSON file with the following fields:
train(list[str]): List of annotation file names for the training set, equivalent to episode_id.test(list[str]): List of annotation file names for the test set, equivalent to episode_id.
License Information
The dataset is licensed under the Creative Commons Attribution 4.0 International License.
Disclaimer
The dataset is intended for research purposes only. We strongly oppose any harmful use of the data or technology.
Citation
bib @misc{lu2024gui, title={GUI Odyssey: A Comprehensive Dataset for Cross‑App GUI Navigation on Mobile Devices}, author={Quanfeng Lu and Wenqi Shao and Zitao Liu and Fanqing Meng and Boxuan Li and Botong Chen and Siyuan Huang and Kaipeng Zhang and Yu Qiao and Ping Luo}, year={2024}, eprint={2406.08451}, archivePrefix={arXiv}, primaryClass={cs.CV} }
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: huggingface
Created: 6/13/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.