Intentonomy
Intentonomy is a dataset of 14,455 images created jointly by Cornell University and Facebook AI to understand and analyze human intent behind social‑media images. The images span everyday scenarios and are manually annotated with 28 intent categories using a psychology‑based taxonomy. Labels were collected via a novel “purpose game” on Amazon Mechanical Turk. The dataset supports tasks such as fake‑news detection and improving vision systems’ understanding of human intent.
Description
Intentonomy Dataset Overview
Dataset Introduction
Dataset Download
- Name: Intentonomy
- Content: 14 K images manually labeled with 28 intent categories, organized in a hierarchical structure by psychology experts.
- Download: See DATA.md.
Annotation Method
- Method: “Purpose Game” approach, gathering intent annotations via Amazon Mechanical Turk.
- Details: See Appendix C of the paper.
Research Content
Relationship Between Image Content and Human Intent
- Goal: Explore the subtle link between visual content and intent.
- Findings:
- Different intent categories rely on distinct objects and scenes for recognition.
- For categories with large intra‑class variation, visual content offers limited performance gains.
- Focusing on relevant objects and scene categories positively influences intent recognition.
Intent Recognition Baselines
- Framework: Introduces weakly‑supervised localization and auxiliary label modeling to narrow the gap between human and machine image understanding.
- Implementation: Provides the localization loss in
loc_loss.py; download image masks and setMASK_ROOTaccordingly. - Dependencies: Requires
cv2andpycocotools.
Intent Category Sub‑division
- Basis:
- Content Dependency: Object‑dependent (O‑classes), Context‑dependent (C‑classes), and Others.
- Difficulty: Classified as “Easy”, “Medium”, and “Hard” based on the performance gap between visual models and random chance.
- Details: See Appendix A of the paper.
Baseline Results
Validation Set
| Model | Macro F1 | Micro F1 | Samples F1 |
|---|---|---|---|
| VISUAL | 23.03 ± 0.79 | 31.36 ± 1.16 | 29.91 ± 1.73 |
| VISUAL + $L_{loc}$ | 24.42 ± 0.95 | 32.87 ± 1.13 | 32.46 ± 1.18 |
| VISUAL + $L_{loc}$ + HT | 25.07 ± 0.52 | 32.94 ± 1.16 | 33.61 ± 0.92 |
Test Set
| Model | Macro F1 | Micro F1 | Samples F1 |
|---|---|---|---|
| VISUAL | 22.77 ± 0.59 | 30.23 ± 0.73 | 28.45 ± 1.71 |
| VISUAL + $L_{loc}$ | 24.37 ± 0.65 | 32.07 ± 0.84 | 30.91 ± 1.27 |
| VISUAL + $L_{loc}$ + HT | 23.98 ± 0.85 | 31.28 ± 0.36 | 31.39 ± 0.78 |
Validation Sub‑division (by Content Dependency)
| Model | Object | Context | Other |
|---|---|---|---|
| VISUAL | 25.58 ± 2.51 | 30.16 ± 2.97 | 21.34 ± 0.74 |
| VISUAL + $L_{loc}$ | 28.15 ± 1.94 | 28.62 ± 2.13 | 22.60 ± 1.40 |
| VISUAL + $L_{loc}$ + HT | 29.66 ± 2.19 | 32.48 ± 1.34 | 22.61 ± 0.48 |
Validation Sub‑division (by Difficulty)
| Model | Easy | Medium | Hard |
|---|---|---|---|
| VISUAL | 54.64 ± 2.54 | 24.92 ± 1.18 | 10.71 ± 1.33 |
| VISUAL + $L_{loc}$ | 57.10 ± 1.84 | 25.68 ± 1.24 | 12.72 ± 2.31 |
| VISUAL + $L_{loc}$ + HT | 58.86 ± 2.56 | 26.30 ± 1.42 | 13.11 ± 2.15 |
Citation
@inproceedings{jia2021intentonomy,
title={Intentonomy: a Dataset and Study towards Human Intent Understanding},
author={Jia, Menglin and Wu, Zuxuan and Reiter, Austin and Cardie, Claire and Belongie, Serge and Lim, Ser‑Nam},
booktitle={CVPR},
year={2021}
}
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 11/11/2020
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.