Back to datasets
Dataset assetOpen Source CommunityVisual Question AnsweringQuestion Answering Systems
InternVL-Chat-V1-2-SFT-Data
This dataset is used for visual question answering and QA tasks, supporting both Chinese and English. It includes multiple configuration files such as ai2d_train_12k, chartqa_train_18k, etc., each corresponding to different types of training data files.
Source
huggingface
Created
Aug 8, 2024
Updated
Aug 8, 2024
Signals
245 views
Availability
Linked source ready
Overview
Dataset description and usage context
Dataset Overview
License
- Apache 2.0
Task Categories
- Visual Question Answering
- Question Answering
Languages
- English
- Chinese
Configurations
-
ai2d_train_12k
- Data files:
- Split: train
- Path: opensource/ai2d_train_12k.jsonl
- Data files:
-
chartqa_train_18k
- Data files:
- Split: train
- Path: opensource/chartqa_train_18k.jsonl
- Data files:
-
docvqa_train_10k
- Data files:
- Split: train
- Path: opensource/docvqa_train_10k.jsonl
- Data files:
-
dvqa_train_200k.jsonl
- Data files:
- Split: train
- Path: opensource/dvqa_train_200k.jsonl
- Data files:
-
geoqa+.jsonl
- Data files:
- Split: train
- Path: opensource/geoqa+.jsonl
- Data files:
-
llava_instruct_150k_zh.jsonl
- Data files:
- Split: train
- Path: opensource/llava_instruct_150k_zh.jsonl
- Data files:
-
sharegpt4v_instruct_gpt4-vision_cap100k.jsonl
- Data files:
- Split: train
- Path: opensource/sharegpt4v_instruct_gpt4-vision_cap100k.jsonl
- Data files:
-
sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.jsonl
- Data files:
- Split: train
- Path: opensource/sharegpt4v_mix665k_cap23k_coco-ap9k_lcs3k_sam9k_div2k.jsonl
- Data files:
Need downstream help?
Pair the dataset with AI analysis and content workflows.
Once the source passes your review, move straight into summarization, transformation, report drafting, or presentation generation with the JuheAI toolchain.