damlab/human_hiv_ppi
This dataset is extracted from the NCBI‑maintained Human‑HIV Interaction dataset and contains over 16,000 pairs of interactions between HIV and human proteins. Fields include HIV protein product, HIV protein name, interaction type, human protein product, human protein name, reference list, description, HIV protein sequence, and human protein sequence. The dataset was created to train models that identify proteins interacting with HIV. It was manually curated by experts, which may bias it toward well‑studied proteins and known interactions.
Description
Dataset Overview
Dataset Summary
This dataset is derived from the NCBI‑maintained Human‑HIV Interaction dataset and contains over 16,000 HIV‑human protein interaction pairs. Protein sequence information was retrieved from the NCBI Protein database and added to the dataset. The original data can be downloaded from the NCBI FTP site; the data curation strategy is described in the NAR research paper.
Dataset Structure
Data Instances
Fields include: hiv_protein_product, hiv_protein_name, interaction_type, human_protein_product, human_protein_name, reference_list, description, hiv_protein_sequence, human_protein_sequence.
Data split: none.
Dataset Creation
Purpose: to train models for identifying proteins that interact with HIV.
Initial collection and standardization: downloaded and curated on 2022‑04‑04; the underlying NCBI database was last updated in 2016.
Considerations When Using the Data
Bias discussion: the protein interaction dataset was manually curated by experts using published scientific literature, naturally biasing toward well‑studied and known interactions. The dataset does not contain negative interactions.
Additional Information
- Curator: Will Dampier
- Citation: TBD
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: hugging_face
Created: Unknown
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.