BUAADreamer/llava-med-zh-instruct-60k

This Chinese dataset is translated from llava‑med, built using the Qwen1.5‑14B‑Chat model, and contains 60 k medical visual instruction data points. Features include a messages‑and‑images structure: messages consist of role and content fields; images are sequences. The dataset provides a training split of 56,649 samples (size: 6.66 GB) with a download size of 6.57 GB. Task categories are visual question answering and image‑to‑text. The language is Chinese, tags involve medical and biology, and the scale lies between 10 K and 100 K.

Updated 5/21/2024

hugging_face

Description

Dataset Overview

Basic Information

License: Apache‑2.0
Language: Chinese
Tags: Medical, Biology, llama‑factory
Size Category: 10K < size < 100K

Dataset Content

Features:
- messages:
  - role: string
  - content: string
- images: image sequences

Dataset Split

Training set:
- Sample count: 56,649
- Data size: 6,664,412,158.42 bytes
- Download size: 6,567,484,534 bytes

Task Types

Visual Question Answering
Image‑to‑Text

Configuration

Default configuration:
- Data files:
  - Split: train
  - Path: data/train-*

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Please login to view download links and access full dataset details.

Topics

Medical Visual Question Answering

Image to Text

Source

Organization: hugging_face

Created: Unknown

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.

Check Prices →