JUHE API Marketplace
DATASET
Open Source Community

Atlas.Y Dataset

The Atlas.Y dataset comprises two main components: a signal peptide dataset and a linker dataset. The signal peptide dataset is intended to facilitate research on protein subcellular localization and transport, while the linker dataset is used to study linkers between signal peptides and target proteins, aiding the design and optimization of fusion proteins.

Updated 9/26/2024
github

Description

Atlas.Y Dataset

Dataset Overview

Atlas.Y Dataset is a collection for studying protein subcellular localization and transport, consisting of a signal peptide dataset and a linker dataset. This dataset is released under the Attribution‑NonCommercial 4.0 International (CC BY‑NC 4.0) License, permitting non‑commercial sharing and adaptation with appropriate attribution. For commercial use, please contact tongji_china2019@163.com to request permission.

Signal Peptide Dataset

  • Design Purpose: Facilitates research on protein subcellular localization and transport.
  • Source: Derived from the dataset used to train the DeepLoc 2.1 deep‑learning model by Marius Thrane Ødum et al.
  • Selection Criteria: Includes only eukaryotic proteins, extracts signal peptides, classifies them, and assigns unique identifiers for efficient querying.
  • Applicable Domains: Bioinformatics research, protein design, cell‑biology experiments, especially subcellular location prediction.
  • File: Signal_Peptide.csv

Linker Dataset

  • Design Purpose: Supports investigation of linkers between signal peptides and target proteins, assisting the design and optimization of fusion proteins, particularly in subcellular localization and transport studies.
  • Data Classification: Divided into a classical linker table and a natural linker table.

Classical Linker Table

  • Content: Contains linkers extensively reviewed and classified in the literature, categorized by rigidity and flexibility.
  • Applicable Domains: Protein design, molecular biology, synthetic biology engineering projects.
  • File: Classical_Linker.csv

Natural Linker Table

  • Content: Short peptides extracted from natural protein sequences without artificial optimization.
  • Generation Method: Produced by removing signal peptides and conserved regions following the method of the 2021 Sun Yat‑sen University iGEM team.
  • Source: Utilizes protein sequences from the DeepLoc 2.1 dataset, with conserved domains identified using NCBI's Conserved Domain Database (CDD) and the batch CD‑Search tool.
  • File: Natural_Linker.csv

Application Areas

The dataset is widely applicable to protein engineering, molecular design, signal peptide functional studies, and bioinformatics analyses. Both tables provide foundational resources for scientists to efficiently query and exploit linker sequences.

AI studio

Generate PPTs instantly with Nano Banana Pro.

Generate PPT Now

Access Dataset

Login to Access

Please login to view download links and access full dataset details.

Topics

Biology
Protein Research

Source

Organization: github

Created: 9/26/2024

Power Your Data Analysis with Premium AI Models

Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.

Enjoy a free trial and save 20%+ compared to official pricing.