JUHE API Marketplace
DATASET
Open Source Community

MME-RealWorld-lite-lmms-eval

MME‑RealWorld is a benchmark dataset for multimodal large language models (MLLMs), containing 13,366 high‑resolution images and 29,429 manually annotated question‑answer pairs covering 43 tasks across five real‑world scenarios. It aims to address the limitations of existing benchmarks for practical applications, offering large scale, high quality, and challenging tasks. A Chinese version (MME‑RealWorld‑CN) with 5,917 QA pairs is also provided.

Updated 11/14/2024
huggingface

Description

MME‑RealWorld‑lite‑lmms‑eval Dataset Overview

Dataset Information

Features

  • bytes: string
  • path: string
  • index: integer
  • question: string
  • multi‑choice options: list of strings
  • answer: string
  • category: string
  • l2‑category: string

Data Split

  • train: 1,919 samples, 1,990,753,320 bytes

Size

  • Download size: 1,880,779,075 bytes
  • Dataset size: 1,990,753,320 bytes

Configuration

  • default: data files located at data/train-*

Dataset Details

Characteristics

  1. Scale: 29,429 manually annotated QA pairs collected from 32 volunteers, covering 43 sub‑tasks in 5 real‑world scenarios—currently the largest fully human‑annotated benchmark.
  2. Quality:
    • Resolution: Average image resolution of 2000 × 1500 px, the highest among competitors.
    • Annotation: All annotations are performed and cross‑checked by a professional team to ensure quality.
  3. Task Difficulty and Real‑World Relevance: Even the best models achieve <60 % accuracy. Many tasks are harder than traditional benchmarks, e.g., counting 133 cars in a surveillance video or identifying small objects on a 5,000 × 5,000 px remote‑sensing map.
  4. MME‑RealWorld‑CN: For Chinese scenarios, 5,917 QA pairs were collected from Chinese volunteers, addressing translation‑induced issues.

Related Links

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Multimodal Large Language Models
Benchmarking

Source

Organization: huggingface

Created: 11/13/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.