Back to datasets
Dataset assetOpen Source CommunityComputer VisionSocial Media Analysis
Intentonomy
Intentonomy is a dataset of 14,455 images created jointly by Cornell University and Facebook AI to understand and analyze human intent behind social‑media images. The images span everyday scenarios and are manually annotated with 28 intent categories using a psychology‑based taxonomy. Labels were collected via a novel “purpose game” on Amazon Mechanical Turk. The dataset supports tasks such as fake‑news detection and improving vision systems’ understanding of human intent.
Source
arXiv
Created
Nov 11, 2020
Updated
Mar 28, 2021
Signals
163 views
Availability
Linked source ready
Overview
Dataset description and usage context
Intentonomy Dataset Overview
Dataset Introduction
Dataset Download
- Name: Intentonomy
- Content: 14 K images manually labeled with 28 intent categories, organized in a hierarchical structure by psychology experts.
- Download: See DATA.md.
Annotation Method
- Method: “Purpose Game” approach, gathering intent annotations via Amazon Mechanical Turk.
- Details: See Appendix C of the paper.
Research Content
Relationship Between Image Content and Human Intent
- Goal: Explore the subtle link between visual content and intent.
- Findings:
- Different intent categories rely on distinct objects and scenes for recognition.
- For categories with large intra‑class variation, visual content offers limited performance gains.
- Focusing on relevant objects and scene categories positively influences intent recognition.
Intent Recognition Baselines
- Framework: Introduces weakly‑supervised localization and auxiliary label modeling to narrow the gap between human and machine image understanding.
- Implementation: Provides the localization loss in
loc_loss.py; download image masks and setMASK_ROOTaccordingly. - Dependencies: Requires
cv2andpycocotools.
Intent Category Sub‑division
- Basis:
- Content Dependency: Object‑dependent (O‑classes), Context‑dependent (C‑classes), and Others.
- Difficulty: Classified as “Easy”, “Medium”, and “Hard” based on the performance gap between visual models and random chance.
- Details: See Appendix A of the paper.
Baseline Results
Validation Set
| Model | Macro F1 | Micro F1 | Samples F1 |
|---|---|---|---|
| VISUAL | 23.03 ± 0.79 | 31.36 ± 1.16 | 29.91 ± 1.73 |
| VISUAL + $L_{loc}$ | 24.42 ± 0.95 | 32.87 ± 1.13 | 32.46 ± 1.18 |
| VISUAL + $L_{loc}$ + HT | 25.07 ± 0.52 | 32.94 ± 1.16 | 33.61 ± 0.92 |
Test Set
| Model | Macro F1 | Micro F1 | Samples F1 |
|---|---|---|---|
| VISUAL | 22.77 ± 0.59 | 30.23 ± 0.73 | 28.45 ± 1.71 |
| VISUAL + $L_{loc}$ | 24.37 ± 0.65 | 32.07 ± 0.84 | 30.91 ± 1.27 |
| VISUAL + $L_{loc}$ + HT | 23.98 ± 0.85 | 31.28 ± 0.36 | 31.39 ± 0.78 |
Validation Sub‑division (by Content Dependency)
| Model | Object | Context | Other |
|---|---|---|---|
| VISUAL | 25.58 ± 2.51 | 30.16 ± 2.97 | 21.34 ± 0.74 |
| VISUAL + $L_{loc}$ | 28.15 ± 1.94 | 28.62 ± 2.13 | 22.60 ± 1.40 |
| VISUAL + $L_{loc}$ + HT | 29.66 ± 2.19 | 32.48 ± 1.34 | 22.61 ± 0.48 |
Validation Sub‑division (by Difficulty)
| Model | Easy | Medium | Hard |
|---|---|---|---|
| VISUAL | 54.64 ± 2.54 | 24.92 ± 1.18 | 10.71 ± 1.33 |
| VISUAL + $L_{loc}$ | 57.10 ± 1.84 | 25.68 ± 1.24 | 12.72 ± 2.31 |
| VISUAL + $L_{loc}$ + HT | 58.86 ± 2.56 | 26.30 ± 1.42 | 13.11 ± 2.15 |
Citation
@inproceedings{jia2021intentonomy,
title={Intentonomy: a Dataset and Study towards Human Intent Understanding},
author={Jia, Menglin and Wu, Zuxuan and Reiter, Austin and Cardie, Claire and Belongie, Serge and Lim, Ser‑Nam},
booktitle={CVPR},
year={2021}
}
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.