Publically Available Bladder Cancer Dataset
The project aims to evaluate publicly available single‑cell sequencing data from the SRA database using the `scanpy` library. It covers the full data‑analysis workflow from discovery and retrieval to final evaluation with `scanpy`.
Description
Dataset Overview
Dataset Description
The dataset is named "Single Cell Sequencing Data Analysis with Scanpy" and is intended to evaluate publicly available single‑cell sequencing data obtained from the Sequence Read Archive (SRA) database. The project employs the scanpy library for analysis, covering the complete workflow from data discovery and retrieval to final evaluation.
Data Sources
Data are sourced from the SRA database, a public repository maintained by the National Center for Biotechnology Information (NCBI) that contains a large collection of sequencing data.
Data Analysis Workflow
- Data Discovery and Retrieval: Use the Entrez Direct tool to search the SRA database for keywords such as "human bladder cancer samples" and download the relevant data.
- Quality Control: Perform quality control on downloaded FASTQ files using FastQC.
- Read Processing and Alignment: Quantify and align single‑cell RNA transcripts with CellRanger.
- Technical Artifact Removal: Remove background noise caused by extracellular RNA fragments using CellBender.
- Data Analysis: Use Scanpy to load, preprocess, predict doublets, normalize, reduce dimensionality, conduct PCA analysis, and integrate the dataset.
Tools and Libraries
- Entrez Direct: Retrieves data from the SRA database.
- SRA Toolkit: Downloads SRA data.
- FastQC: Performs quality control.
- Cell Ranger: Handles read processing and alignment.
- CellBender: Removes technical artifacts.
- Scanpy: Analyzes single‑cell sequencing data.
Data Formats
Data are primarily in FASTQ format. After processing with CellRanger, the output includes the filtered_feature_bc_matrix and raw_feature_bc_matrix directories, containing files such as matrix.mtx, features.tsv, and barcodes.tsv.
Data Applications
The dataset is suitable for single‑cell sequencing analysis, especially in studies of human bladder cancer samples, and can be used for gene expression analysis, cell‑type identification, and other biological investigations.
AI studio
Generate PPTs instantly with Nano Banana Pro.
Generate PPT NowAccess Dataset
Please login to view download links and access full dataset details.
Topics
Source
Organization: github
Created: 7/28/2023
Power Your Data Analysis with Premium AI Models
Supporting GPT-5, Claude-4, DeepSeek v3, Gemini and more.
Enjoy a free trial and save 20%+ compared to official pricing.