Back to datasets
Dataset assetOpen Source CommunityComputer VisionImage Captioning

whyen-wang/coco_captions

COCO is a large-scale dataset for object detection, segmentation, and captioning, primarily used for image-to-text tasks. The dataset provides English captions, each image being associated with multiple textual descriptions. Detailed information about dataset creation, annotation processes, or social impact is not supplied.

Source
hugging_face
Created
Nov 28, 2025
Updated
Jul 14, 2024
Signals
145 views
Availability
Linked source ready
Overview

Dataset description and usage context

Dataset Card: COCO Captions

Dataset Description

Dataset Overview

COCO Captions is a large-scale dataset for object detection, segmentation, and caption generation.

Supported Tasks and Leaderboards

  • Image to Text

Language

  • English (en)

Dataset Structure

Data Instances

An example data instance is shown below:

{
    "image": PIL.Image(mode="RGB"),
    "captions": [
        "Closeup of bins of food that include broccoli and bread.",
        "A meal is presented in brightly colored plastic trays.",
        "there are containers filled with different kinds of foods",
        "Colorful dishes holding meat, vegetables, fruit, and bread.",
        "A bunch of trays that have different food."
    ]
}

Data Fields

  • Image (image): a PIL.Image object
  • Captions (captions): a list containing multiple captions

Data Splits

SplitTrainValidation
Default118,2875,000

Dataset Creation

Rationale

[More information to be added]

Source Data

Initial Collection and Normalization

[More information to be added]

Source Language Producers

[More information to be added]

Annotation

Annotation Process

[More information to be added]

Annotators

[More information to be added]

Personal and Sensitive Information

[More information to be added]

Considerations for Using the Data

Societal Impact

[More information to be added]

Discussion of Bias

[More information to be added]

Other Known Limitations

[More information to be added]

Additional Information

Curators

[More information to be added]

License

Creative Commons Attribution 4.0 License

Citation Information

@article{cocodataset,
  author    = {Tsung{-}Yi Lin and Michael Maire and Serge J. Belongie and Lubomir D. Bourdev and Ross B. Girshick and James Hays and Pietro Perona and Deva Ramanan and Piotr Doll{a}r and C. Lawrence Zitnick},
  title     = {Microsoft {COCO:} Common Objects in Context},
  journal   = {CoRR},
  volume    = {abs/1405.0312},
  year      = {2014},
  url       = {http://arxiv.org/abs/1405.0312},
  archivePrefix = {arXiv},
  eprint    = {1405.0312},
  timestamp = {Mon, 13 Aug 2018 16:48:13 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contributions

Thanks to @github-whyen-wang for adding this dataset.

Need downstream help?

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio