Datasets | JuheAPI

ActPlan-1K

Vision-Language ModelsProgram Planning

ActPlan‑1K is a multimodal planning benchmark jointly created by the Hong Kong University of Science and Technology and the University of California, San Diego. It evaluates vision‑language models' program planning abilities in domestic activities. The dataset includes 153 activities and 1,187 instances, each comprising a natural‑language task description and multiple environment images captured from the iGibson2 simulator. The creation process combined ChatGPT and iGibson2, converting BDDL activity definitions into natural‑language descriptions and collecting environment images. ActPlan‑1K is primarily used to assess the program planning capabilities of vision‑language models in multimodal tasks, especially for home activities and counterfactual scenarios.

Source arXivUpdated Oct 5, 2024248 viewsLinked

Inspect dataset

Dataset Catalog