Multi-P2A
Multi-P2A is a comprehensive benchmark dataset created by the Institute of Computing Technology, Chinese Academy of Sciences, intended to evaluate the privacy protection capabilities of large vision‑language models (LVLMs). The dataset covers 26 categories of personal privacy, 15 categories of commercial secrets, and 18 categories of state secrets, totaling 31,962 samples. It is constructed from existing datasets and social media platforms, generating samples via visual question answering (VQA) tasks to ensure high quality and diversity. Multi-P2A is mainly applied in privacy risk assessment, helping developers and researchers identify and mitigate potential privacy leaks in LVLMs during training and inference, thereby advancing privacy protection technologies.
Description
Multi-P2A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models
Dataset Overview
Multi-P2A is a comprehensive benchmark for assessing privacy protection capabilities of large vision‑language models (LVLMs).
Dataset Content
- Privacy Awareness: Evaluates the model's ability to recognize privacy‑sensitive input data, including images, queries, and privacy information flow risks across various scenarios.
- Privacy Leakage: Assesses the risk of privacy information leaking in model outputs, divided into three categories: (1) extracting privacy information from images, (2) inferring privacy from images, (3) leaking sensitive information from training data.
Dataset Scale
- Total Samples: 31,962
- Privacy Categories:
- Personal privacy: 26 types
- Commercial secrets: 15 types
- State secrets: 18 types
Tasks and Distribution
- Privacy Image Recognition: 3,202 samples
- Privacy Question Detection: 14,184 samples
- Privacy Information Flow Evaluation: 392 samples
- Perceptual Leakage: 2,232 samples
- Inferential Leakage: 2,682 samples
- Memory Leakage: 3,798 samples
- Non‑Sensitive Questions: 5,472 samples
Related Projects
Dataset Access
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: arXiv
Created: 12/27/2024
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.