Dataset assetOpen Source CommunityComputer VisionSocial Media Analysis

Intentonomy

Intentonomy is a dataset of 14,455 images created jointly by Cornell University and Facebook AI to understand and analyze human intent behind social‑media images. The images span everyday scenarios and are manually annotated with 28 intent categories using a psychology‑based taxonomy. Labels were collected via a novel “purpose game” on Amazon Mechanical Turk. The dataset supports tasks such as fake‑news detection and improving vision systems’ understanding of human intent.

Source

arXiv

Created

Nov 11, 2020

Updated

Mar 28, 2021

Signals

163 views

Availability

Linked source ready

Overview

Dataset description and usage context

Intentonomy Dataset Overview

Dataset Introduction

Dataset Download

Name: Intentonomy
Content: 14 K images manually labeled with 28 intent categories, organized in a hierarchical structure by psychology experts.
Download: See DATA.md.

Annotation Method

Method: “Purpose Game” approach, gathering intent annotations via Amazon Mechanical Turk.
Details: See Appendix C of the paper.

Research Content

Relationship Between Image Content and Human Intent

Goal: Explore the subtle link between visual content and intent.
Findings:
1. Different intent categories rely on distinct objects and scenes for recognition.
2. For categories with large intra‑class variation, visual content offers limited performance gains.
3. Focusing on relevant objects and scene categories positively influences intent recognition.

Intent Recognition Baselines

Framework: Introduces weakly‑supervised localization and auxiliary label modeling to narrow the gap between human and machine image understanding.
Implementation: Provides the localization loss in loc_loss.py; download image masks and set MASK_ROOT accordingly.
Dependencies: Requires cv2 and pycocotools.

Intent Category Sub‑division

Basis:
1. Content Dependency: Object‑dependent (O‑classes), Context‑dependent (C‑classes), and Others.
2. Difficulty: Classified as “Easy”, “Medium”, and “Hard” based on the performance gap between visual models and random chance.
Details: See Appendix A of the paper.

Baseline Results

Validation Set

Model	Macro F1	Micro F1	Samples F1
VISUAL	23.03 ± 0.79	31.36 ± 1.16	29.91 ± 1.73
VISUAL + $L_{loc}$	24.42 ± 0.95	32.87 ± 1.13	32.46 ± 1.18
VISUAL + $L_{loc}$ + HT	25.07 ± 0.52	32.94 ± 1.16	33.61 ± 0.92

Test Set

Model	Macro F1	Micro F1	Samples F1
VISUAL	22.77 ± 0.59	30.23 ± 0.73	28.45 ± 1.71
VISUAL + $L_{loc}$	24.37 ± 0.65	32.07 ± 0.84	30.91 ± 1.27
VISUAL + $L_{loc}$ + HT	23.98 ± 0.85	31.28 ± 0.36	31.39 ± 0.78

Validation Sub‑division (by Content Dependency)

Model	Object	Context	Other
VISUAL	25.58 ± 2.51	30.16 ± 2.97	21.34 ± 0.74
VISUAL + $L_{loc}$	28.15 ± 1.94	28.62 ± 2.13	22.60 ± 1.40
VISUAL + $L_{loc}$ + HT	29.66 ± 2.19	32.48 ± 1.34	22.61 ± 0.48

Validation Sub‑division (by Difficulty)

Model	Easy	Medium	Hard
VISUAL	54.64 ± 2.54	24.92 ± 1.18	10.71 ± 1.33
VISUAL + $L_{loc}$	57.10 ± 1.84	25.68 ± 1.24	12.72 ± 2.31
VISUAL + $L_{loc}$ + HT	58.86 ± 2.56	26.30 ± 1.42	13.11 ± 2.15

Citation

@inproceedings{jia2021intentonomy,
  title={Intentonomy: a Dataset and Study towards Human Intent Understanding},
  author={Jia, Menglin and Wu, Zuxuan and Reiter, Austin and Cardie, Claire and Belongie, Serge and Lim, Ser‑Nam},
  booktitle={CVPR},
  year={2021}
}

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio