Dataset assetOpen Source CommunityBenchmarkingMultimodal Large Language Models

MME-RealWorld-lite-lmms-eval

MME‑RealWorld is a benchmark dataset for multimodal large language models (MLLMs), containing 13,366 high‑resolution images and 29,429 manually annotated question‑answer pairs covering 43 tasks across five real‑world scenarios. It aims to address the limitations of existing benchmarks for practical applications, offering large scale, high quality, and challenging tasks. A Chinese version (MME‑RealWorld‑CN) with 5,917 QA pairs is also provided.

Source

huggingface

Created

Nov 13, 2024

Updated

Nov 14, 2024

Signals

359 views

Availability

Linked source ready

Overview

Dataset description and usage context

MME‑RealWorld‑lite‑lmms‑eval Dataset Overview

Dataset Information

Features

bytes: string
path: string
index: integer
question: string
multi‑choice options: list of strings
answer: string
category: string
l2‑category: string

Data Split

train: 1,919 samples, 1,990,753,320 bytes

Size

Download size: 1,880,779,075 bytes
Dataset size: 1,990,753,320 bytes

Configuration

default: data files located at data/train-*

Dataset Details

Characteristics

Scale: 29,429 manually annotated QA pairs collected from 32 volunteers, covering 43 sub‑tasks in 5 real‑world scenarios—currently the largest fully human‑annotated benchmark.
Quality:
- Resolution: Average image resolution of 2000 × 1500 px, the highest among competitors.
- Annotation: All annotations are performed and cross‑checked by a professional team to ensure quality.
Task Difficulty and Real‑World Relevance: Even the best models achieve <60 % accuracy. Many tasks are harder than traditional benchmarks, e.g., counting 133 cars in a surveillance video or identifying small objects on a 5,000 × 5,000 px remote‑sensing map.
MME‑RealWorld‑CN: For Chinese scenarios, 5,917 QA pairs were collected from Chinese volunteers, addressing translation‑induced issues.

Pair the dataset with AI analysis and content workflows.

Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.

Explore AI studio