JUHE API Marketplace
DATASET
Open Source Community

Multi-P2A

Multi-P2A is a comprehensive benchmark dataset created by the Institute of Computing Technology, Chinese Academy of Sciences, intended to evaluate the privacy protection capabilities of large vision‑language models (LVLMs). The dataset covers 26 categories of personal privacy, 15 categories of commercial secrets, and 18 categories of state secrets, totaling 31,962 samples. It is constructed from existing datasets and social media platforms, generating samples via visual question answering (VQA) tasks to ensure high quality and diversity. Multi-P2A is mainly applied in privacy risk assessment, helping developers and researchers identify and mitigate potential privacy leaks in LVLMs during training and inference, thereby advancing privacy protection technologies.

Updated 12/27/2024
arXiv

Description

Multi-P2A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models

Dataset Overview

Multi-P2A is a comprehensive benchmark for assessing privacy protection capabilities of large vision‑language models (LVLMs).

Dataset Content

  • Privacy Awareness: Evaluates the model's ability to recognize privacy‑sensitive input data, including images, queries, and privacy information flow risks across various scenarios.
  • Privacy Leakage: Assesses the risk of privacy information leaking in model outputs, divided into three categories: (1) extracting privacy information from images, (2) inferring privacy from images, (3) leaking sensitive information from training data.

Dataset Scale

  • Total Samples: 31,962
  • Privacy Categories:
    • Personal privacy: 26 types
    • Commercial secrets: 15 types
    • State secrets: 18 types

Tasks and Distribution

  • Privacy Image Recognition: 3,202 samples
  • Privacy Question Detection: 14,184 samples
  • Privacy Information Flow Evaluation: 392 samples
  • Perceptual Leakage: 2,232 samples
  • Inferential Leakage: 2,682 samples
  • Memory Leakage: 3,798 samples
  • Non‑Sensitive Questions: 5,472 samples

Related Projects

Dataset Access

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Privacy Protection
Vision-Language Models

Source

Organization: arXiv

Created: 12/27/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.