DATASET

Open Source Community

whyen-wang/coco_captions

COCO is a large-scale dataset for object detection, segmentation, and captioning, primarily used for image-to-text tasks. The dataset provides English captions, each image being associated with multiple textual descriptions. Detailed information about dataset creation, annotation processes, or social impact is not supplied.

Updated 7/14/2024

hugging_face

Description

Dataset Card: COCO Captions

Dataset Description

Dataset Overview

COCO Captions is a large-scale dataset for object detection, segmentation, and caption generation.

Supported Tasks and Leaderboards

Image to Text

Language

English (en)

Dataset Structure

Data Instances

An example data instance is shown below:

{
    "image": PIL.Image(mode="RGB"),
    "captions": [
        "Closeup of bins of food that include broccoli and bread.",
        "A meal is presented in brightly colored plastic trays.",
        "there are containers filled with different kinds of foods",
        "Colorful dishes holding meat, vegetables, fruit, and bread.",
        "A bunch of trays that have different food."
    ]
}

Data Fields

Image (image): a PIL.Image object
Captions (captions): a list containing multiple captions

Data Splits

Split	Train	Validation
Default	118,287	5,000

Dataset Creation

Rationale

[More information to be added]

Source Data

Initial Collection and Normalization

[More information to be added]

Source Language Producers

[More information to be added]

Annotation

Annotation Process

[More information to be added]

Annotators

[More information to be added]

Personal and Sensitive Information

[More information to be added]

Considerations for Using the Data

Societal Impact

[More information to be added]

Discussion of Bias

[More information to be added]

Other Known Limitations

[More information to be added]

Additional Information

Curators

[More information to be added]

License

Creative Commons Attribution 4.0 License

Citation Information

@article{cocodataset,
  author    = {Tsung{-}Yi Lin and Michael Maire and Serge J. Belongie and Lubomir D. Bourdev and Ross B. Girshick and James Hays and Pietro Perona and Deva Ramanan and Piotr Doll{a}r and C. Lawrence Zitnick},
  title     = {Microsoft {COCO:} Common Objects in Context},
  journal   = {CoRR},
  volume    = {abs/1405.0312},
  year      = {2014},
  url       = {http://arxiv.org/abs/1405.0312},
  archivePrefix = {arXiv},
  eprint    = {1405.0312},
  timestamp = {Mon, 13 Aug 2018 16:48:13 +0200},
  biburl    = {https://dblp.org/rec/bib/journals/corr/LinMBHPRDZ14},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

Contributions

Thanks to @github-whyen-wang for adding this dataset.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Image Captioning

Computer Vision

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →